Sunday, November 4, 2007

Viewing sales data by using XY charts

An XY chart (scatter plot) can express more than a typical bar or line chart. Here's an example how you can utilize typical sales data and create two interesting XY charts that help your decision making processes.

Suppose you have a typical sales table:

CREATE TABLE salesdata (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
day DATE,
quantity INTEGER);

.. that contains row-level sales data like this (product names A-Z, market areas 1-12):


By using the data, you usually create charts like this:



.. and for sure this chart is informative in many ways.

Now suppose you are interested in knowing more about products and market areas. For example: what is the average quantity of items sold in this season (Y), and how many of the customers bought the product for the first time (X).

You can get the data by using the two following SQL queries (this season = 2007):

/* XY for products */
SELECT s.product,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",
SUM(s.quantity) "Total quantity of items sold"

FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.product;


/* XY for market areas */
SELECT s.market_area,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",

SUM(s.quantity) "Total quantity sold in area"
FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.market_area;


Now the charts:





Let's analyze:
  • Products clearly have different profiles. Product F is favored by old customers, while products L and T have recently found many new customers. In a single sales transaction, product U is sold by average nearly twice as much as D.
  • Symbol size is the total quantity sold. There are no big differences in the sales of different products. More difference exist in various market areas: area 1 is the most important, while areas 9, 10, and 11 are the smallest. (I created the data by using a random function.)
  • All market areas have almost the same "average quantity of items sold". The market areas differ more in X axis: in area 1 only 18% are new customers, while in areas 10 and 12 almost half of the customers are new. Perhaps the company just started marketing in areas 10 and 12?

All this is interesting -- no less so because the basic line chart cannot express this knowledge.

Nevertheless it must be said that these charts don't yet tell anything about the prospects in different market areas. Adding facts about market area size (for each product) would help your decision making where to target your marketing in the next season.

Therefore, don't be satisfied with line and bar charts, if you can improve your decision making by using XY...

3 comments:

Anonymous said...

Janne, I would say that we need both charts, since the scatterplot doesn't show how we got here. We need the time series, unless you can display it on the scatter plot (always a good challenge...)

Janne Pyykkö said...

Jorge, you are right. Having access to all of these charts, time series (line) and XY, is essential in decision making.

I think displaying time series in XY is possible (through animation or "trails" as in http://www.gapminder.org) but takes place more naturally by using a line chart.

Anonymous said...

I definitely agree. Many times the answer is - it depends. It really depends on what you are trying to extrapolate from the data. Like Jorge says, if you are looking to see what happened oover time, the line graph is better. If you want to compare products then xy is a great option. Nice work!