Suppose a survey of the bodyweight of 100 people was taken and recorded in the table below. In this case the bodyweight range is the value, and the count is the number of times a bodyweight value falls into a specified range. A range is commonly called an interval when the sections are of equal length.
Body Weight Range | Tally and Frequency |
95 - 114 | 17 |
115 - 134 | 13 |
135 - 154 | 16 |
155 - 174 | 12 |
175 - 194 | 16 |
195 - 214 | 12 |
215 - 234 | 14 |
Data presented in the form of a Frequency Distribution table makes analysis of the data much easier than studying a set of raw data.
There are a few variations of the Frequency Distribution table. The format of such tables differs according to the nature of the data and the type of analysis desired. The table below is a version of the table above with the interval percentages inserted. This new column or table is called the Relative Frequency Distribution. The Relative Frequency P is defined as P = F/N, where F is the frequency associated with each interval or count, and N is the total number in the set of data. In this example N is equal to 100.
Body Weight Range | Frequency | Relative Frequency |
95 - 114 | 17 | .17 |
115 - 134 | 13 | .13 |
135 - 154 | 16 | .16 |
155 - 174 | 12 | .12 |
175 - 194 | 16 | .16 |
195 - 214 | 12 | .12 |
215 - 234 | 14 | .14 |
There are also graphical tools used to analyze Frequency Distributions. The standard graphic forms include bar charts, frequency polygons, and histograms. Below is an example of the Bodyweight distribution as an Advantage VISION:Excel histogram;
In this histogram the vertical axis represents the frequency or count and the horizontal axis represents the ranges of bodyweights. The appearance of a histogram will vary depending on the nature of the data. This appearance or graphical shape of a distribution is called the Kurtosis. This describes the flatness, or peakedness of a distribution when compared to another. The variations of Kurtosis are called Leptokurtic, Mesokurtic, and Platykurtic.
There are also descriptive measures used to explain the distribution of data. The mean, standard deviation, standard error of the mean and coefficient of variation can all be used to summarize and describe the distribution of data.
The Mean or average, assuming a set of numeric values, is Defined as M = T/N, where M is the mean, T the total sum of the distribution, and N is the total items in the set of data.
The Standard Deviation is basically a measure of how spread out the values in a dataset are. The Standard Deviation is defined as follows:
Where S.D. is the standard deviation of the distribution, Xi is the value of the ith item, M is the mean of the distribution, and N is the number of items in the set of data.
The Standard Error of the Mean gives an idea of how much variation you would expect to find if you took repeated samples from the same population. The Standard Error of the Mean is defined as the Standard Deviation divided by the square root of the number of items in the set of data. See the formula below.
The Coefficient of Variation is a measure of variability in relation to the mean. The Coefficient of Variation is defined as the Standard Deviation divided by the Mean of the population. See below.
With Advantage VISION:Excel, Frequency distributions tables, histograms, and the formulas explained above, can be produced with relative ease. The choices of Frequency Distributions with Advantage VISION:Excel are Equal Interval, Logarithmic, and Alphanumeric. With the example of the 100 bodyweight sample and distribution table with intervals of 20 pounds, the Equal Interval option of Advantage VISION:Excel would be best for this distribution. The following is a diagram of the syntax for an Advantage VISION:Excel Equal Interval Frequency distribution;
FREQ nn EQUAL
DISTFIELD dataname START nnnnnnnnnnn
INTERVALSIZE nnnnnnnnnnn INTERVALNO nn
[REPTITLE dataname] [STDREPT {YES | NO}]
[HISTOGRAM {NO | COUNT | VALUE | BOTH}]
Applying this syntax to the bodyweight example, below is the VISION:Excel code to produce an Equal Interval Frequency distribution table and histogram. This example uses a card input file that includes 100 randomly chosen values between 94 and 235. Following the code is the output including the frequency distribution table and the histogram.
With Advantage VISION:Excel, you can make the use of very complex Statistical methods easy!
FILE CARDFILE F 80
BODYWEIGHT 3 NU
SORT CARDFILE USING BODYWEIGHT
FREQ 07 EQUAL
DISTFIELD BODYWEIGHT
HISTOGRAM COUNT
START 95
INTERVALSIZE 20
INTERVALNO 7
FREQUENCY 07
100 BODYWEIGHT SAMPLE
6/14/05 FREQUENCY DISTRIBUTION REPORT PAGE 1
APPLICATION 07
------------ R A N G E ------------ COUNT % OF TOTAL % OF STANDARD SQUARE ROOT
L O W H I G H OF ITEMS ITEMS OF ITEMS TOTAL MEAN DEVIATION RANGE X COUNT
95 114 17 17.00 1790 11.00 105 6 18
115 134 13 13.00 1637 10.06 126 6 34
135 154 16 16.00 2324 14.28 145 6 51
155 174 12 12.00 1984 12.19 165 5 66
175 194 16 16.00 2946 18.10 184 6 83
195 214 12 12.00 2440 14.99 203 6 98
215 234 14 14.00 3159 19.40 226 6 114
TOTAL 100 100.00 16280 100.00
MEAN = 163 STANDARD ERROR OF THE MEAN = 4.100
STANDARD DEVIATION = 41 COEFFICIENT OF VARIATION = 0.252
MINIMUM VALUE = 95 MAXIMUM VALUE = 234
NUMBER OF ZERO ITEMS 0
COUNT HISTOGRAM
---------------------------------------------------
+++++++++++++++++
+++++++++++++
++++++++++++++++
++++++++++++
++++++++++++++++
++++++++++++
++++++++++++++