The ComparisonMatrix component… or Playboy Playmates and the Economy

One of the challenges of information visualization is to present data in a way that allows the interesting patterns to rise to above the noise.  The ComparisonMatrix allows users to quickly assess where these patterns are so they can make an informed decision about where to start exploring.  To do this, the ComparisonMatrix breaks a multidimensional data set into all of the possible pairs of attributes and evaluates the relationships between those pairs.  The relationships are rendered in a grid where the cell at the intersection of the column labeled “X” and the row labeled “Y” shows you the relationship between X and Y.  The cells are colored to provide a quick visual cue as to which relationships are most interesting.  This is similar to the rank-by-feature prism found in HCE.

How about an example?

In 2004 Terry F. Pettijohn and Brian J. Jungeberg looked at the relationships between the physical features of Playboy’s playmates of the year and the economy.  I grabbed the csv for the playmate data off of Flowing Data, added a few fields to it, and dropped it into a ComparisonMatrix (view source enabled).  Pink represents a positive correlation while yellow represents negative.  The deeper the color, the stronger the relationship.  You can click on a cell in the matrix to bring up a scatterplot of the data set based on the two selected attributes.

The strongest relationship in the playmate data set is actually between GDP and year, so that makes it somewhat difficult to make any assertions about trends in the playmates’ physical features and the economy while not making the same assertion about those physical features and time.  (The authors of the paper actually used something called the “General Hard Times Measure” to evaluate the economy, and it looks like GDP isn’t really equivalent).  Despite this problem, the ComparisonMatrix still provides some interesting information about the playmates.

  • The playmate’s heights, weights, and waists are all correlated.  These women have roughly the same figures, just scaled up and out differently.
  • The playmates’ bust to waist ratios have decreased with time while their waist to hip ratios have increased.
  • BMI, though being a function of height and weight, is more strongly correlated with hip size than height.
  • There is no relationship between the age at which these women claimed their titles and their hip sizes.  That’s right, if you have nice hips, it might not be too late too be a playmate.

Using the ComparisonMatrix

Like all my stuff, the ComparisonMatrix is under the MIT license, so you can use and adapt it in whatever way you like.  The code is documented pretty thoroughly, and the source of the playmate example is pretty useful for getting started.

In terms of usage, all you need to do is supply the ComparisonMatrix with the data set you want to analyze (the dataProvider property) and the array of attributes it should consider (the fields property).  Given those two properties, the ComparisonMatrix will be able to render your data, though there are a bunch of other options available.  Take a look at the documentation in the code for details.

The default comparison function is the correlation coefficent.  As a result, the default behavior is to expect comparison values from -1 to 1.  Since the ComparisonMatrix allows you to plug in your own comparison function, you’ll have to provide your own color and alpha functions if you want to use a different scale.

I made an effort to make the renderers (the actual cells in the grid) as flexible as possible so you can write your own without rewriting the ComparisonMatrix itself.  To do this, write a class that implements the IComparisonRenderer interface and set the comparisonRenderer property to be a factory for that class.

The next step

Statisticians and information visualization researchers have developed a slew of measures for determining interesting relationships between attributes in a data set.  Even though the ComparisonMatrix allows you to plug in whatever comparison function you want, I’ll probably build these in at some point for easy access.  As I make updates to this component, I’ll be committing them to my repository.  As a result, the example here will not always be the most up to date.  For the most up to date code, check here.

I have lofty dreams for this thing.  More later.

UPDATE (January 21, 2009)

I wasn’t kidding.  The ComparisonMatrix has changed a bit.  If you want the best matrix you can get, grab it from the repo, not the source of the swf.

This entry was posted in Flex, Information Visualization. Bookmark the permalink. Both comments and trackbacks are currently closed.

2 Comments

  1. Posted April 20, 2009 at 3:25 pm | Permalink

    This is very interesting. How does this data structure compare to multiple OLAP queries? Have you done any work with creating the comparison values at runtime for the Matrix? Thanks.

  2. Posted April 21, 2009 at 4:22 am | Permalink

    The data is represented as a simple array objects, each representing a playmate. Whenever the data changes (which only happens once in this example), the ComparisonMatrix runs through the entire data set and re-evaluates the comparison values. For a relatively small data set, this can be done in the span of a single frame.