Using Excel for Statistical Analysis: Terms you Should Understand Before you Start

May 15
07:31

2008

Stephen L Nelson

Stephen L Nelson

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Trying to make sense of Microsoft Excel's statistical analysis tools? Make sure you understand the statistical terms Excel uses, says bestselling computer book author Stephen L. Nelson. Such an understanding will both ease and expedite your analysis.

mediaimage

Excel provides an almost countless number of statistical tools you can use to analyze data and make meaningful statements about it. However,Using Excel for Statistical Analysis: Terms you Should Understand Before you Start Articles without understanding the definitions of the statistical terms that Excel uses, the statistical tools offer little help.

Accordingly, this article supplies some background information about Excel's statistical tools and also defines the important statistical terms used by Excel.

Distinguishing between types of data

The science of statistics makes a fundamental distinction between two types of data sets, population data and sample data.

A population is the set of all elements of interest, while a sample is a subset of that population, drawn to make inferences about the characteristics of the population.

For example, if you want to describe the average number of televisions in American households, you can’t possibly collect data for the entire population (all American households). Instead, you must draw a sample from the population and make an estimate about the whole population based on that sample.

Unless otherwise stated, the Excel functions make a critical assumption regarding the process used to select the sample: they assume that the sample drawn was drawn at random, so in this case, every household would have the same likelihood (probability) of being selected.

Tip: When making statements about a population, it is wise to verify the selection process used to form the sample. For example, if the sample were formed by randomly selecting entries from a phone book, this is not random selection of the sample—it excludes households with unlisted numbers or no telephones and includes households with multiple telephone book entries multiple times. The households don’t have the same probability of being selected.

Elements versus Variables

When describing the data in a set, each member of the set is called an element. So if you’re describing customers, each customer is an element.

The characteristics of interest in the elements are called variables. So if you’re looking at annual income, age, and sales, these would be your variables.

The experimenter manipulates the independent variable and measures the dependent variable after the manipulation to see whether it experienced any effects.A random variable describes the outcome of an experiment numerically. It can take on different values or ranges with certain probabilities. The collective group of measurements obtained for an element is called an observation.

Probability Distributions

The term probability refers to the likelihood that an event will happen. Probabilities range between 0 (impossible) and 1 (inevitable).A probability distribution graphically depicts how probabilities are distributed over discrete values or ranges of the random variable.Probability distributions can take on several shapes. For example, a uniform probability distribution is rectangular—it occurs when there’s an equal probability for every value of the random variable.

Another common probability distribution is the normal or bell curve. This occurs when there’s a relatively high probability of a random variable taking a certain value or range and a symmetrically diminishing probability as you move away from this value.

Discrete versus Continuous Variables

A discrete variable is one that can’t fall to an infinite number of digits. For example, the number of children in a family is a discrete number, in this case a non-negative integer.

A continuous variable, on the other hand, can take on a value with any number of digits. For example, you can theoretically calculate the time it takes a person to run a mile down to the smallest fraction of a second.The probability, therefore, of a continuous random variable taking a particular value is zero. Note that statistics calculated from discrete variables are continuous variables. So you can say that the average number of children in a family is, for example, 2.3, although no family could have 2.3 children.

Events

An event is a collection of outcomes that share a condition. For example, you could call all outcomes in which a project goes over budget or in which a lot of goods is rejected an event.

An Excel-specific term: Logical Values

One final term that Excel uses in its statistical manipulations needed to be defined--the term "logical value."

In Excel, the term logical value refers to a value (usually textual) that Excel returns when you enter a conditional function in a cell. A conditional function is an equation that returns a result based on whether the cell meets the condition specified.For example, you can ask Excel to display the word TRUE if a value in a cell is greater than 100 or FALSE if it is less than or equal to 100. The most common logical values are TRUE and FALSE, but you can create your own logical values as well. For example, you can tell

Excel to display the word PASS if a value in a cell is greater than or equal to 50 or FAIL if it is less than 50.