Showing posts with label percentile. Show all posts
Showing posts with label percentile. Show all posts

Monday, July 22, 2013

Data Continued: Describing the Sales Data Using Measures of Central Tendency



Adequately Describing Sales Data Using Simple Statistics

Continuing on the topic of data, I would like to discuss the widely used descriptive statistics: mean, median and mode. Collectively, these are defined as measures of central tendency. We use measures of central tendency to gain some understanding of the distribution of the collected data. Practically, whether it is “sound or not,” these measures are often used when it comes to making decisions about inventory, the success of a business, and even to establish the need for additional legislation. 

Using the Average

The arithmetic average, referred to, as the mean is a single value that best represents the set of value from which it is calculated. It is not without faults but is widely used.

For example, if you owned a car dealership, you would probably record the sales price for every car sold. You might also be interested in the average vehicle sales price at your dealership sold. Therefore, you would add up the sales total and divide it by the number of vehicles sold to calculate the average (also referred to as the mean) amount spent by each customer. 

An important issue to consider is that the mean is affected by extreme values. Additionally, the average may not tell a business owner what the best selling or most popular product is among consumers.

Figure 1. The mean is significantly influenced by extreme values. 
The mode is the value that appears the most in a set of numbers. 
The median is the midpoint of the data set.

Extreme Values Influence The Mean

To illustrate the influence of extreme values on average, I have present two scenarios. In scenario one, (Table 1) notice that the cars ranged in price from $16,100 to $278,295 and the number sold at each price listed. Multiplying the cost of the vehicle by the number sold for that price yields the total sale amount for that model. Add up the total number of cars sold (133) and do the same for the total sale price for each car ($3,076,655). To calculate the mean, divide $3,076,655 by 133 for a mean of $23,132.82. 



Just by removing the one vehicle that was sold for $278,295. We have a fairly substantial change in the average sale price of vehicles. The new average sale price is 21,199.77, almost $2,000 less than the first average.



Unfortunately, neither average is equal to an actual sales price. Both the $23,132 and the $21,040 averages are between the vehicles priced at $20,940 and $27,670. If we use the average (mean), we might recommend that we increase our inventory with vehicles priced between $20,940 and $27,670. It might be reasonable to suggest that the dealership increase the stock of vehicles at or near the average price; however, this recommendation may not yield an increase in actual sales. We should use other descriptive statistics to determine the most frequent sales price. 

The Most Frequent Sales Price (the mode)

The statistical term mode, the most frequently observed data point, may provide us with a bit more insight. The mode is most often used when analyzing data that is not “numerical,” but “categorical.” An example would be when you wish to determine which exterior color of the vehicle sold the best. In the case of color, you cannot add up values and compute a mean value. So counting up the number of vehicles sold of each color, and then reporting the highest value would be one way to use the mode.


Using the previous scenarios, from tables 1 and 2, we can use the mode to help us make a recommendation regarding the restocking of vehicles based on the sales price. Notice, in both scenarios, the vehicle priced at $18,995 was the most frequently sold vehicle. The vehicle at this price accounted for 21 of the 133 total sales. The mode is  $4137 and $2045 lower than the averages of scenario 1 and 2 respectively.

At this point, we have calculated the mean and the mode using the given data from both scenarios. We have a clearer understanding of the purchases made at our dealership, but we have one more descriptive statistic that we might want to explore.

Using the Median, the Mid-Point, Percentiles

A percentile is not the same as a percentage; it is the percentage of values that fall at or below a given value. It is likely that you are aware of a test score given in terms of percentiles. The results may have been something like you score: 87 percentile. This means that your score was equal to or greater than 87 percent of the scores on the same test. By taking the number of questions answered correctly and dividing it by the number of total questions you calculate a percentage.

The median is the midpoint of the data set. The exact midpoint is known as the 50th percentile marking the price where 50% of the sales were for a lower or a higher price. 

To find the data point that represents the median:
  • When you have an odd number of data points, in our case we have 133, you take the number of data points and add 1 to that number, take the sum and divide it by 2. So: 133 + 1 = 134; divide 134 by 2 (134 / 2) = 67.
  • If you have an even number of data points, calculate the usual average of the two data points found in the middle.
List vehicles sold by price, starting at the lowest price, count the number of cars sold until you reach 67. The sale price of the 67th car sold is $18,995


Sometimes this value is easy to determine and sometimes it is not. When it is difficult to determine it is common practice to report other percentiles. Using the above data sets, table 1 and table 2, we find that 61% of the 133 cars sold were at or below the price of $18,995. The 76th percentile for both scenarios is $20,940, and the 46th percentile is $18,500. Recall the average for scenario 1 and 2 were $23,132.82 and $21,040.49. Even the 76th percentile is not as high as the average value.

What About The Entire Picture? The Range

Sometimes you might want to report the range of the observed data. In this case, we might want to know the range in sales price. This information might be useful to highlight the large disparity among the customers, or even to justify having a few high priced vehicles in stock. To calculate the range, subtract the lowest sales price from the highest sales price:
  • Using the data from Table 1: $278,295 - $16,100 = $262,195
This range, $262,195, could be a bit misleading. If no further information is provided, the actual details behind this broad range in sales price are unknown. 

To provide the best picture of the data, other measures of central tendency should be reported. Choosing one or all of the measures of central tendency will give the best description of the data.

So What Should The Recommendation Be?

Based on the presented data, we should quickly realize that at least ¾ of the customers purchased vehicles priced at or under $20,940. The sales of a few higher priced vehicles caused what might be classified as a mean that may not be useful when determining what cars to restock. If the salesperson’s commission on a new vehicle is independent of the vehicle sales price, then it makes the most sense to stock the inventory with the cars that are most likely to sell. Using the mode and percentile data we find that cars priced at $18,995 as well as those priced at or under $20,940 are trendy among the customers.

Personally, I would stock only cars that were priced at or under $20,940 and offer the higher priced vehicles via order only. I would equally distribute the number of vehicles having higher prices across the number of cars ordered at the $18,500, $18,995 and $20,940 price points. 

Summary
Often times we can be misled by using only one statistic to describe the data. In this example, we found that the mean could be characterized as being slightly inflated. The mode was several thousand dollars below the mean; and, the price, of more than ¾ of the vehicles sold, was lower than the average. I find it interesting that most people are primarily concerned with the average.
  • The mean is affected by extreme values but is still the most representative of all the scores. It is the most representative because it includes all the values. 
  • The mode is the most frequently occurring data point, and, the only measure of central tendency that can be applied to data that is not numerical. In this case, we used the mode to determine the most frequent sales price.  
  • Also known as the 50th percentile, the median is the point at which 50 percent of the observed data points fall at or below.
There are several other instances that I can think of where I would like to have more than one measure of central tendency reported. How about you? 

Please share your thoughts in the comments below.