Janne Pyykkö's BI Blog

Sunday, April 6, 2008

Google's heat map and motion chart in comparison

In addition to the motion chart (reviewed here), Google announced a heat map gadget. It's quite easy to use just like the motion chart. That is, the most difficult part is to collect data for the chart in the right format.

The following heat maps display two variables, the median age (years) and the unemployment rate (%), of countries by using colors on the world map (data source: CIA's World Factbook).

How useful is the heat map concept? Is it a good infographic?

It depends: If your target is to visualize every single variable clearly, the concept works. If your target is to help your audience understand the relationship between variables, then an XY chart works better (such as Google's motion chart--here without the animation):

Let's analyze: By average, the lower the median age is in a country, the higher the unemployment rate.

UPDATE (Thursday, April 17, 2008):

Included below a more traditional chart to display the relationship between median age and unemployment rate. In this chart, countries are grouped in median age groups. An average unemployment rate is calculated for each group.

Monday, March 24, 2008

Motion Chart provided by Google--an outstanding BI tool for free

Google just announced a gadget gallery containing one gem for the business intelligence consultants. It's a motion chart, that is, an XY chart with a time dimension (feel free to play with it).

Explanation: The XY chart above displays ten countries according to how much they collect taxes and amount of government spending (bubble size = population, bubble color = unemployment percentage). When you play years between 1970 and 1999, you see countries changing locations. By analyzing the motion it's clear that since 1970 big European countries (UK, Germany, France, Italy, Spain) have adopted policies that make them more alike (bubbles moving near each other from starting positions) while Sweden all the time stands out with high taxes and government spending. Japan seems to be the other extreme with low taxes and government spending. Between 1993 and 1997 Spain turns red with a high unemployment rate (over 20%).

I'm really astonished for several reasons:

Motion chart as such is a very useful concept for tracking changes in time (it makes you understand the data, not just display it).
I was able to build the chart above and publish it here in two hours, that is, very quickly as a first-time-user (+one hour more for searching useful data).
Google motion chart works really well (try changing the chart above by using pop-up menus provided--it's a dynamic chart!).
No else BI provider offers this kind of visualization tool.
And it's available for free!

Of course, the motion chart concept is someting we have been waiting for ever since Google bought GapMinder. Now Google gives it for use for everybody. For example, you can load the motion chart with the products of your company and see their profiles changing over time.

Here are the steps (I made) to publish the motion chart...

1. I searched data from web with keywords "time series data". The macroeconomic data I found here. I decided to limit my test in ten countries. I collected the text data into spreadsheet. One hour.

2. The data was not in the right format for the motion chart (see example here). I used several methods to clear and pivot the data (not only in spreadsheet) until I was ready. One hour.

3. I copy-pasted the data into a Google spreadsheet document (yes, you must start using Google Docs to create and publish your motion chart). I inserted the motion chart gadget and it worked immediately. I still changed the format several times, because I needed (wanted) to learn how motion chart uses spreadsheet data. One hour.

NOTE: If you want to use quarters, months, or weeks in the time dimension, convert them into dates--such as 2008-Q3 saved as "7/1/2008" or 3rd week of 2008 saved as "1/17/2008" (Thursday, the middle day of week). Dates must be in the US format (MM/DD/YYYY) and decimal numbers with a dot as the decimal separator.

4. From the top-right gadget menu, I chose Publish Gadget, and it gave me html to put into a web page. One minute.

Final thoughts... I'm not completely satisfied with the fact that in order to use the motion chart Google will host my (perhaps confidential) data. Nevertheless I'm going to explore the motion chart concept with my customers in the future, I'm sure.

Sunday, February 24, 2008

Google Chart API--why it's not important in BI?

Just for curiosity I took a look at Google Chart API. Here are two sample charts I created with it.

Nice charts. As the data and axes are controlled separately, it's possible to compose:

A chart where axis labels are qualitative--not quantitative--which is sometimes a preferred situation in an XY chart (good vs. poor etc)
A line chart where the line changes it's value several times within a month--not easily done in Excel or other BI tools having a direct correspondence between data values and categories displayed in the X axis

As a BI consultant, should I be interested in Google Charts?

- No. Google Chart API is a programmer tool to build larger systems.

OK. When such a larger system appears, should I be interested then?

- No again. All Google Charts are by nature static pictures (PNG files), without any means to add user interaction.

For example, if you move the mouse cursor onto a symbol (or click the symbol) in the XY chart above (or onto the line chart), nothing happens. Simply said there's no mechanism to display what the symbols represent (or what is the exact value displayed in the line chart). It's not possible to program drilling-down either. And all these features are essential in business intelligence...

Anyway, I had fun going through the Google Chart API. If you have a mind of a programmer, it's worth studying. Here are the urls which I used to create the XY chart and the line chart (broken into several lines for clarity).

XY:
http: //chart.apis.google.com/chart?cht=s
&chd=s:984sttvuvkQIBLKNCAIi,DEJPgq0uov17zwopQODS,AFLPTXaflptx159gsDrn
&chxt=x,y
&chtt=Google+Chart+API+-+Sample+XY
&chxl=0:Good|Poor1:Small|Large
&chs=250x250
&chm=o,ff000033,1,1,25
&chg=50,50,2,5
Line:
http: //chart.apis.google.com/chart?cht=lc
&chd=s:rpnpnosztwurrmmllgbZQOOORVURRSQROQSRSTKCBIKIKMLIIJIKLJKKNLNVWakiljhfedcbdcZbbZ, UVVUVVUUUVVUSSVVVXXYadfhjlmpokllmmliigdbbZZXVVUUUTU
&chco=5555ff,ff9933
&chtt=Google+Chart+API+-+Sample+Line
&chls=2.0,1.0,0.0
&chxt=x,x,y
&chxl=0:||Oct||Nov||Dec||Jan||Feb||Mar||Apr||May||1:||2007||||||2008||||||||||2:60|70|80|90|100
&chs=320x175
&chg=101,25,2,5
&chf=c,ls,0,f4f4ff,0.125,ffffff,0.125
&chdl=First|Second

Sunday, November 25, 2007

Adding value to sales data: willingness to use service again

Among other things this blog is about saving advantageous solutions for future use. After writing the previous posting Viewing sales data by using XY charts, it came into my mind to compose a more general solution how to add value to your sales data.

Let's start with the following sales table (same as before, products A-Z, etc):

CREATE TABLE salesdata (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
day DATE,
quantity INTEGER);

By using the salesdata table, we create a new table salesdata_prod_cust_pair that summarizes sales for each product-customer pair within a season.

CREATE TABLE salesdata_prod_cust_pair (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
first_day_overall DATE,
first_day_season DATE,
next_day DATE,
count_season INTEGER,
quantity_season INTEGER,
new_customer_flag INTEGER,
use_again_flag INTEGER,
use_again_days_between INTEGER);

The following SQL populates the table:

/* Explore season 2007-Q1 */
INSERT INTO salesdata_prod_cust_pair
SELECT s.product, s.customerno, s.market_area,
MIN(overall.min_day) overall_first_day,
MIN(s.day) seasonal_first_day,
MIN(seasonal_next.next_day) next_day,
COUNT(*) count_season,
SUM(s.quantity) quantity_season,
1-SIGN(DATEDIFF(MIN(s.day),MIN(overall.min_day))) new_customer_flag,
SIGN(DATEDIFF(MIN(seasonal_next.next_day),MIN(s.day))) use_again_flag,
DATEDIFF(MIN(seasonal_next.next_day),MIN(s.day)) use_again_days_between
FROM salesdata s

LEFT OUTER JOIN(
SELECT nxt.product, nxt.customerno, MIN(nxt.day) next_day
FROM salesdata nxt,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
WHERE day >= '2007-01-01'
GROUP BY product, customerno) q
WHERE nxt.customerno = q.customerno
AND nxt.product = q.product
AND nxt.day > q.min_day
GROUP BY nxt.product, nxt.customerno) seasonal_next
ON s.product = seasonal_next.product
AND s.customerno = seasonal_next.customerno,

(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) overall

WHERE s.product = overall.product
AND s.customerno = overall.customerno
AND s.day BETWEEN '2007-01-01' AND '2007-03-31'
GROUP BY s.product, s.customerno, s.market_area;

Please note that you must change red texts according to your database. DATEDIFF() returns days between two dates. 2007-01-01 and 2007-03-31 specify the season to explore; in this case 2007-Q1.

Here's some data:

As you can see:

Product A was sold to Customer 4 first time in May 2006, so C-4 is not a new customer during season 2007-Q1, therefore new_customer_flag = 0.
Product A was sold to Customer 13 first time on January 5th, 2007, so C-13 is a new customer during season 2007-Q1, thus new_customer_flag = 1. This customer used the product/service again on March 8th, so use_again_flag = 1 and use_again_days_between = 62.

OK. This was only the first part. By using the new salesdata_prod_cust_pair table, you can write in SQL:

SELECT product,
SUM(quantity_season) "Total quantity of items sold",
100*SUM(new_customer_flag)/COUNT(*) "% new customers in this season",
100*SUM(use_again_flag*new_customer_flag)/SUM(new_customer_flag) "% willingness to use product/service again",
AVG(use_again_days_between/new_customer_flag) "Average days before using
product/service again"
FROM salesdata_prod_cust_pair
GROUP BY product;

.. and get the following results:

See the last two columns! For example the willingness to use product/service again is an excellent indicator to monitor how much your customers trust your products.

By using an XY chart you can visualize the data:

Let's analyze:

There must be some problem in products U and B. Marketing might be ok (42% new customers), but once used the first time, only 10 % of customers are willing to use the service again.
The product C is in the other extreme. When customers use it, they probably love to use it again. Perhaps the teams responsible for products U and B can learn something from the team C?

I think I'm done with the sales data for now. Next time, something else.

Sunday, November 4, 2007

Viewing sales data by using XY charts

An XY chart (scatter plot) can express more than a typical bar or line chart. Here's an example how you can utilize typical sales data and create two interesting XY charts that help your decision making processes.

Suppose you have a typical sales table:

CREATE TABLE salesdata (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
day DATE,
quantity INTEGER);

.. that contains row-level sales data like this (product names A-Z, market areas 1-12):

By using the data, you usually create charts like this:

.. and for sure this chart is informative in many ways.

Now suppose you are interested in knowing more about products and market areas. For example: what is the average quantity of items sold in this season (Y), and how many of the customers bought the product for the first time (X).

You can get the data by using the two following SQL queries (this season = 2007):

/* XY for products */
SELECT s.product,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",
SUM(s.quantity) "Total quantity of items sold"
FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.product;

/* XY for market areas */
SELECT s.market_area,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",
SUM(s.quantity) "Total quantity sold in area"
FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.market_area;

Now the charts:

Let's analyze:

Products clearly have different profiles. Product F is favored by old customers, while products L and T have recently found many new customers. In a single sales transaction, product U is sold by average nearly twice as much as D.
Symbol size is the total quantity sold. There are no big differences in the sales of different products. More difference exist in various market areas: area 1 is the most important, while areas 9, 10, and 11 are the smallest. (I created the data by using a random function.)
All market areas have almost the same "average quantity of items sold". The market areas differ more in X axis: in area 1 only 18% are new customers, while in areas 10 and 12 almost half of the customers are new. Perhaps the company just started marketing in areas 10 and 12?

All this is interesting -- no less so because the basic line chart cannot express this knowledge.

Nevertheless it must be said that these charts don't yet tell anything about the prospects in different market areas. Adding facts about market area size (for each product) would help your decision making where to target your marketing in the next season.

Therefore, don't be satisfied with line and bar charts, if you can improve your decision making by using XY...