07 July 2022

Same graph, different narratives

People who make data visualizations often talk about “storytelling with data visualizations.” This is something that I think can be hard for people to wrap their heads around. Let me try an example.

Here is a run of the mill scatterplot.

Scatterplot with generally increasing Y values as X values increase.

There are at least four different points you can make with that data. Probably more, but I’m just going to limit myself to the obvious ones.

First, you might be most interesting in communicating the trend.

Scatterplot with generally increasing Y values as X values increase. A regression line shows the increase. Text in the graph reads, "Overall growth."

But it’s also possible that you want to make people aware of the variation. In that case you would want to remove or minimized the trendline. If this was a timeline or other continuous record, you might join the dots.

Scatterplot with generally increasing Y values as X values increase. The individual data points are joined by a line. Text in the graph reads, "Substantial fluctuation."

Or it might be that the key point of the graph is even more focused on a small number of data points. In many cases, the most extreme data points are of interest.
Scatterplot with generally increasing Y values as X values increase. Arrow points to largest Y value and text reads, "Record high."
Low extremes can also be interesting, and annotations help contextualize what the value means.
Scatterplot with generally increasing Y values as X values increase. Arrow points to smallest Y value and text reads, "HUmble beginnings."

A graph is always intended to persuade, so why not make it easier for a viewer to see the same thing you see?

P.S.—If the graph looks familiar, it’s because it’s the first in Anscomb’s quartet.


No comments: