The following heat maps display two variables, the median age (years) and the unemployment rate (%), of countries by using colors on the world map (data source: CIA's World Factbook).
Sunday, April 6, 2008
Google's heat map and motion chart in comparison
The following heat maps display two variables, the median age (years) and the unemployment rate (%), of countries by using colors on the world map (data source: CIA's World Factbook).
Monday, March 24, 2008
Motion Chart provided by Google--an outstanding BI tool for free
Explanation: The XY chart above displays ten countries according to how much they collect taxes and amount of government spending (bubble size = population, bubble color = unemployment percentage). When you play years between 1970 and 1999, you see countries changing locations. By analyzing the motion it's clear that since 1970 big European countries (UK, Germany, France, Italy, Spain) have adopted policies that make them more alike (bubbles moving near each other from starting positions) while Sweden all the time stands out with high taxes and government spending. Japan seems to be the other extreme with low taxes and government spending. Between 1993 and 1997 Spain turns red with a high unemployment rate (over 20%).
I'm really astonished for several reasons:
- Motion chart as such is a very useful concept for tracking changes in time (it makes you understand the data, not just display it).
- I was able to build the chart above and publish it here in two hours, that is, very quickly as a first-time-user (+one hour more for searching useful data).
- Google motion chart works really well (try changing the chart above by using pop-up menus provided--it's a dynamic chart!).
- No else BI provider offers this kind of visualization tool.
- And it's available for free!
Here are the steps (I made) to publish the motion chart...
1. I searched data from web with keywords "time series data". The macroeconomic data I found here. I decided to limit my test in ten countries. I collected the text data into spreadsheet. One hour.
2. The data was not in the right format for the motion chart (see example here). I used several methods to clear and pivot the data (not only in spreadsheet) until I was ready. One hour.
3. I copy-pasted the data into a Google spreadsheet document (yes, you must start using Google Docs to create and publish your motion chart). I inserted the motion chart gadget and it worked immediately. I still changed the format several times, because I needed (wanted) to learn how motion chart uses spreadsheet data. One hour.
NOTE: If you want to use quarters, months, or weeks in the time dimension, convert them into dates--such as 2008-Q3 saved as "7/1/2008" or 3rd week of 2008 saved as "1/17/2008" (Thursday, the middle day of week). Dates must be in the US format (MM/DD/YYYY) and decimal numbers with a dot as the decimal separator.
4. From the top-right gadget menu, I chose Publish Gadget, and it gave me html to put into a web page. One minute.
Final thoughts... I'm not completely satisfied with the fact that in order to use the motion chart Google will host my (perhaps confidential) data. Nevertheless I'm going to explore the motion chart concept with my customers in the future, I'm sure.
Sunday, February 24, 2008
Google Chart API--why it's not important in BI?
Nice charts. As the data and axes are controlled separately, it's possible to compose:
- A chart where axis labels are qualitative--not quantitative--which is sometimes a preferred situation in an XY chart (good vs. poor etc)
- A line chart where the line changes it's value several times within a month--not easily done in Excel or other BI tools having a direct correspondence between data values and categories displayed in the X axis
- No. Google Chart API is a programmer tool to build larger systems.
OK. When such a larger system appears, should I be interested then?
- No again. All Google Charts are by nature static pictures (PNG files), without any means to add user interaction.
For example, if you move the mouse cursor onto a symbol (or click the symbol) in the XY chart above (or onto the line chart), nothing happens. Simply said there's no mechanism to display what the symbols represent (or what is the exact value displayed in the line chart). It's not possible to program drilling-down either. And all these features are essential in business intelligence...
Anyway, I had fun going through the Google Chart API. If you have a mind of a programmer, it's worth studying. Here are the urls which I used to create the XY chart and the line chart (broken into several lines for clarity).
- XY:
http: //chart.apis.google.com/chart?cht=s
&chd=s:984sttvuvkQIBLKNCAIi,DEJPgq0uov17zwopQODS,AFLPTXaflptx159gsDrn
&chxt=x,y
&chtt=Google+Chart+API+-+Sample+XY
&chxl=0:Good|Poor1:Small|Large
&chs=250x250
&chm=o,ff000033,1,1,25
&chg=50,50,2,5 - Line:
http: //chart.apis.google.com/chart?cht=lc
&chd=s:rpnpnosztwurrmmllgbZQOOORVURRSQROQSRSTKCBIKIKMLIIJIKLJKKNLNVWakiljhfedcbdcZbbZ, UVVUVVUUUVVUSSVVVXXYadfhjlmpokllmmliigdbbZZXVVUUUTU
&chco=5555ff,ff9933
&chtt=Google+Chart+API+-+Sample+Line
&chls=2.0,1.0,0.0
&chxt=x,x,y
&chxl=0:||Oct||Nov||Dec||Jan||Feb||Mar||Apr||May||1:||2007||||||2008||||||||||2:60|70|80|90|100
&chs=320x175
&chg=101,25,2,5
&chf=c,ls,0,f4f4ff,0.125,ffffff,0.125
&chdl=First|Second
Sunday, November 25, 2007
Adding value to sales data: willingness to use service again
Let's start with the following sales table (same as before, products A-Z, etc):
CREATE TABLE salesdata (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
day DATE,
quantity INTEGER);
By using the salesdata table, we create a new table salesdata_prod_cust_pair that summarizes sales for each product-customer pair within a season.
CREATE TABLE salesdata_prod_cust_pair (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
first_day_overall DATE,
first_day_season DATE,
next_day DATE,
count_season INTEGER,
quantity_season INTEGER,
new_customer_flag INTEGER,
use_again_flag INTEGER,
use_again_days_between INTEGER);
The following SQL populates the table:
/* Explore season 2007-Q1 */
INSERT INTO salesdata_prod_cust_pair
SELECT s.product, s.customerno, s.market_area,
MIN(overall.min_day) overall_first_day,
MIN(s.day) seasonal_first_day,
MIN(seasonal_next.next_day) next_day,
COUNT(*) count_season,
SUM(s.quantity) quantity_season,
1-SIGN(DATEDIFF(MIN(s.day),MIN(overall.min_day))) new_customer_flag,
SIGN(DATEDIFF(MIN(seasonal_next.next_day),MIN(s.day))) use_again_flag,
DATEDIFF(MIN(seasonal_next.next_day),MIN(s.day)) use_again_days_between
FROM salesdata s
LEFT OUTER JOIN(
SELECT nxt.product, nxt.customerno, MIN(nxt.day) next_day
FROM salesdata nxt,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
WHERE day >= '2007-01-01'
GROUP BY product, customerno) q
WHERE nxt.customerno = q.customerno
AND nxt.product = q.product
AND nxt.day > q.min_day
GROUP BY nxt.product, nxt.customerno) seasonal_next
ON s.product = seasonal_next.product
AND s.customerno = seasonal_next.customerno,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) overall
WHERE s.product = overall.product
AND s.customerno = overall.customerno
AND s.day BETWEEN '2007-01-01' AND '2007-03-31'
GROUP BY s.product, s.customerno, s.market_area;
Please note that you must change red texts according to your database. DATEDIFF() returns days between two dates. 2007-01-01 and 2007-03-31 specify the season to explore; in this case 2007-Q1.
Here's some data:
As you can see:
- Product A was sold to Customer 4 first time in May 2006, so C-4 is not a new customer during season 2007-Q1, therefore new_customer_flag = 0.
- Product A was sold to Customer 13 first time on January 5th, 2007, so C-13 is a new customer during season 2007-Q1, thus new_customer_flag = 1. This customer used the product/service again on March 8th, so use_again_flag = 1 and use_again_days_between = 62.
OK. This was only the first part. By using the new salesdata_prod_cust_pair table, you can write in SQL:
SELECT product,
SUM(quantity_season) "Total quantity of items sold",
100*SUM(new_customer_flag)/COUNT(*) "% new customers in this season",
100*SUM(use_again_flag*new_customer_flag)/SUM(new_customer_flag) "% willingness to use product/service again",
AVG(use_again_days_between/new_customer_flag) "Average days before using
product/service again"
FROM salesdata_prod_cust_pair
GROUP BY product;
.. and get the following results:
See the last two columns! For example the willingness to use product/service again is an excellent indicator to monitor how much your customers trust your products.
By using an XY chart you can visualize the data:
Let's analyze:
- There must be some problem in products U and B. Marketing might be ok (42% new customers), but once used the first time, only 10 % of customers are willing to use the service again.
- The product C is in the other extreme. When customers use it, they probably love to use it again. Perhaps the teams responsible for products U and B can learn something from the team C?
I think I'm done with the sales data for now. Next time, something else.
Sunday, November 4, 2007
Viewing sales data by using XY charts
Suppose you have a typical sales table:
CREATE TABLE salesdata (
product VARCHAR(1),
customerno INTEGER,
market_area INTEGER,
day DATE,
quantity INTEGER);
.. that contains row-level sales data like this (product names A-Z, market areas 1-12):
By using the data, you usually create charts like this:
.. and for sure this chart is informative in many ways.
Now suppose you are interested in knowing more about products and market areas. For example: what is the average quantity of items sold in this season (Y), and how many of the customers bought the product for the first time (X).
You can get the data by using the two following SQL queries (this season = 2007):
/* XY for products */
SELECT s.product,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",
SUM(s.quantity) "Total quantity of items sold"
FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.product;
/* XY for market areas */
SELECT s.market_area,
AVG(s.quantity) "Average quantity of items sold",
100*SUM(1-SIGN(ABS(s.day-q.min_day)))/COUNT(*) "% new customers in this season",
SUM(s.quantity) "Total quantity sold in area"
FROM salesdata s,
(SELECT product, customerno, MIN(day) min_day
FROM salesdata
GROUP BY product, customerno) q
WHERE s.product = q.product
AND s.customerno = q.customerno
AND s.day >= '2007-01-01'
GROUP BY s.market_area;
Now the charts:
Let's analyze:
- Products clearly have different profiles. Product F is favored by old customers, while products L and T have recently found many new customers. In a single sales transaction, product U is sold by average nearly twice as much as D.
- Symbol size is the total quantity sold. There are no big differences in the sales of different products. More difference exist in various market areas: area 1 is the most important, while areas 9, 10, and 11 are the smallest. (I created the data by using a random function.)
- All market areas have almost the same "average quantity of items sold". The market areas differ more in X axis: in area 1 only 18% are new customers, while in areas 10 and 12 almost half of the customers are new. Perhaps the company just started marketing in areas 10 and 12?
All this is interesting -- no less so because the basic line chart cannot express this knowledge.
Nevertheless it must be said that these charts don't yet tell anything about the prospects in different market areas. Adding facts about market area size (for each product) would help your decision making where to target your marketing in the next season.
Therefore, don't be satisfied with line and bar charts, if you can improve your decision making by using XY...