03 May 2021

Never use a graph you can’t explain


Box plot on left, violin plot on right showing same data.

Irena Chelyseva asked if people preferred box plots or violin plots. She said, “I guess there is no right answer.”

I think there is, if not a right answer, a justifiable answer for why some people should use one or the other.

I can explain absolutely every element in my box plots. Usually, center line in the median, box is 50% of the data, whiskers are 95% of the data, and I have symbols for minimum and maximum. 

There is no accepted standard for box plot displays, however, particularly for the whiskers. People often fail to label the graph enough for me to interpret them.

So I frequently have to ask people, “What are the whiskers in your box plot showing?” (Or bar graph, but people are usually a bit better at labeling them.) And I am surprised by how often they can’t tell me. They’ve forgotten.

I recently asked, and the presenter said, “I think they’re quartiles.” But quartiles should include the entire range of data, and their plot had data points past the end of the whiskers.

This makes for an uncomfortable moment when you’re presenting a poster.

Getting back to the original question, this is why there is at least one good justification for preferring one chart over the other. 

I can’t explain what a violin plot shows. I know in principle that the curve shows and estimate of distribution. But I can’t tell you how the curve in a violin plot is calculated or derived. Because it’s an estimate, I suspect there are different methods of estimating distribution, and I don’t know how they differ.

I should use box plots instead of violin plots, regardless of the data, because I know how to explain what one shows but not the other.

It doesn’t matter if one graph shows something better in theory if I can’t communicate exactly what is being shown.

