Sunday, May 13, 2007

XY chart and graph theory

Do you know graph theory? To me it's inspiring as such. In business intelligence, a graph (in the meaning of graph theory) provides a method to visualize relationships between entities such as between people, companies, geographical locations, etc.



Hierarchical systems, where each element of the system (except for the top element) is subordinate to a single other element, are graphs too. Various organization charts and genealogy charts (both in biology and as family trees) are good examples of such graphs.

XY chart is a good way to visualize graphs. To do it clearly, at least XY must be able to:


  • Draw data points (entities)
  • Draw lines between data points to visualize existing relationships
  • Label data points (entities)
In my previous posting I already wrote about the first two aspects. The third aspect, labeling a data point, is necessary to understand the meaning of nodes of a graph.

The following step-by-step example shows a method how you can automatically create a graph by using XY and two tables graph_node (entities) and graph_link (relationships).

1. Create the graph_node and graph_link tables.

CREATE TABLE graph_node (node_id VARCHAR(3), node_name VARCHAR(40), node_order INTEGER, node_count INTEGER);
CREATE TABLE graph_link (node1_id VARCHAR(3), node2_id VARCHAR(3));

2. Populate tables with countries of Europe and border neighboring relationships (or anything you wish).



Note that I have included the node_order and node_count columns in the graph_node table. They are necessary to calculate (x, y) locations of data points later on. In a real situation, you should calculate node_order and node_count automatically by using SQL (or perhaps your OLAP system is able to calculate them on the fly).

By the way, there are 43 countries in Europe and 80 borders between them (not including Vatican and Turkey as nodes; including RUS-LTU, RUS-POL, and GBR-ESP relationships because of separate areas of Kaliningrad and Gibraltar).

3. Create a graph_link_view based on the two tables:

CREATE VIEW graph_link_view (link_id, datapoint_id, node_id)
AS
SELECT CONCAT(node1_id,node2_id), 0, node1_id FROM graph_link
UNION
SELECT CONCAT(node1_id,node2_id), 1, node2_id FROM graph_link
UNION
SELECT CONCAT(node1_id,node2_id), -1, NULL FROM graph_link
UNION
SELECT node_id, 0, node_id FROM graph_node
WHERE node_id not in (select node1_id from graph_link)
AND node_id not in (select node2_id from graph_link)
UNION
SELECT node_id, -1, NULL FROM graph_node
WHERE node_id not in (select node1_id from graph_link)
AND node_id not in (select node2_id from graph_link);

This view is necessary because of the way the XY chart draws data points and lines between them (see my previous posting). The following image shows some contents of the view:



Each row in graph_link table produces 3 rows in graph_link_view with datapoint_id values -1, 0, and 1 (0=start line, 1=finish line, -1=break line). Some nodes such as Iceland does not have border relationships with other country nodes, so it produces 2 rows into view (-1, 0).

4. Use the following query behind your report.

SELECT link.link_id, link.datapoint_id, node.node_id, node.node_name, node.node_order, node.node_count
FROM graph_link_view link
LEFT OUTER JOIN graph_node node ON link.node_id = node.node_id;

5. Create the measure dimension with the followin three calculations:

X = MIN(SIN(360*node.node_order/node.node_count/180*PI()))
Y = MIN(COS(360*node.node_order/node.node_count/180*PI()))
Label = MIN(node.node_id)

6. Create other dimensions as follows:

Outer category = link.link_id
Middle category = link.datapoint_id
Inner category = node.node_name (=serves as a tool tip label)

7. Create an XY chart and format it as follows:



The XY chart displays countries as data points in circle and border neighboring relationships between them as lines. Iceland and Malta aren't connected (they are islands).

A few notes:

I created this XY chart with the Voyant tool. It displays the data point label to the right of the data point. Additionally, when you move the mouse cursor over a data point, I specified the chart to display the inner category dimension as a tool tip (ISL data point displays "Iceland").

I explored some other OLAP charting tools whether or not it's possible to create the same XY. All of them seem to fall short in displaying data point labels, such as Excel, BusinessObjects Desktop Intelligence and Web Intelligence. Why such a useful feature is not supported by them, I don't know?

About the XY chart itself, it's actually somewhat hard to read, because the lines cross so often. Therefore I created another chart (below) in which I used longitudes and latitudes for (x, y) coordinates. Much better!



And of course, there's no meaning in displaying neighbor country relationships as a graph, because a simple map of Europe will do.

Nevertheless, the method above is ok to visualize connections found within customer relationship databases. For example, supposing your CRM database contains information about meetings (when, who, on which project), you can use this data to visualize which people were involved together in certain projects at certain time period. An XY graph visualizes this much better than a table.

I will now put an end to this posting, though I have two more issues in my mind:


  • What to do if your graph is very large?
  • Hierarchical graphs?
Later...

UPDATE (May 14, 2007): I've forgotten two borders, one between Germany and Czech Republic and another between Switzerland and Austria.

UPDATE (May 18, 2007): I recently learned that there's software for fully automated graph layout, Mathematica 6. If this functionality existed in some OLAP tool, it would be nice.

No comments: