: Should either the area or the length of a square be proportional to the data that is being visualized? I'm making a data visualization. Each datum is represented by a square. To make the underlying
I'm making a data visualization. Each datum is represented by a square. To make the underlying data intuitively legible should the length of each square's side or the area of each square be proportional to the datum it represents?
More posts by @Gonzalez368
5 Comments
Sorted by latest first Latest Oldest Best
We're not as good at judging differences in area as we are in length. We use length as a proxy and therefore tend to underestimate differences in areas.
For this reason, a circle that actually has 2x the area of another appears too small because our brain is relating their radii, which differ by a factor of 1.4x.
There's are interesting attempts at reconciling this phenomena, such as Proportional Symbol Mapping in R, which proposes perceptual scaling of symbols to more closely align with how we judge lengths and areas.
Here is Fig. 2 from this paper
Personally I don't have any experience with this and avoid using areas if quantitative judgements are required.
An interesting tangent is the relationship between perception of volume and length. The difference in how we perceive these is even more striking. This can be illustrated in this video of star size comparisons.
By the time you get to the largest star, which is about 1,700x the diameter of the sun, you are left with the impression that it is much larger than 1,700x.
For a more systematic look at our error in perceiving differences in areas and lengths, see Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design by Jeffrey Heer and Michael Bostock.
Tufte dealt with this extensively. See:
The Visual Display of Quantitative Information,
Envisioning Information and others.
Some principles of graphical integrity :
The representation of numbers, as physically measured on the surface of the graph itself, should be directly proportional to the
numerical quantities represented
Clear, detailed and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the
data on the graph itself. Label important events in the data.
Show data variation, not design variation.
In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
The number of information carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. Graphics must
not quote data out of context.
In your case you have to ask yourself if the data is better represented by a 2D or 3D image or a line. A cube, a square, and a line are not the same. That is one of the reasons why 3D bar charts so often misleading.
If you, the creator, is unsure, how will the reader know which it is?
Short answer: the value should be linked 1:1 to the amount of colour on the page. So in your example, it should be area. But there's more than that: you also need to avoid misleading cues that might make a reader read it incorrectly, and you need to know why you're using area instead of length (e.g. bar charts), because it has real pros and cons.
First, never have both length and width (i.e. area) of a shape change when actually the variable is only linked to the length of one side. If X is double Y but Y has four times as much colour on the page, you're misleading your readers. This sort of distortion is sometimes referred to as a "lie factor", and is often assumed to be a deliberate attempt to mislead and exaggerate differences.
If you use area as a measure I'd strongly recommend:
Knowing why you're using area. By using area instead of a linear dimension like length, you:
Sacrifice the ability to clearly see differences mathematically (you can't easily say "look, that's double the other one")
Invite your readers to view it in an intuitive everyday non-numerical way the way people, for example, compare sizes of pies in a shop. Less sophisticated, but more immediate. More gut, less head.
Small differences between very similar numbers become almost invisible.
When one variable is many many times smaller than another, the very small one doesn't disappear as badly as it would in a bar chart, which can allow more flexibility in layouts.
Consider using circles for area, not squares, centre aligned:
Circles because it doesn't invite confusion with bar charts and similar. Height and width are less to the fore: it looks less like you're inviting a height or width based comparison.
Centre-aligned because it doesn't invite people to compare heights
For example, above, it's hard not to see the square labelled "5" as being three quarters the height of the square labelled "10", so it's potentially misleading.
The circles don't invite this sort of comparison: it's more of a gut-level, instant "That blob is rather a lot bigger than the next blob".
There's a variety of evidence from user testing to small-scale studies (will try to hunt some examples down later) that this sort of intuitive area-based comparison can be more engaging, can lower the barrier to entry to less engaged audiences, and can help keep the reader's focus on the subject matter rather than the cold minutiae of the numbers. But this comes at the cost of getting in the way of more numerically-minded analysis.
Don't choose between one-dimension (length or distance) and two-dimension (area) for aesthetic reasons: choose between them based on your audience and message.
Which is more appropriate for the communication: instant gut-level comparisons at the level of "that's much bigger", or more considered numerical comparisons at the level of "that's about 80% of the other one"?
Or are there practical reasons why you need to use area?
Then, when you've chosen for practical reasons, apply aesthetics.
I'd say the area. Optically, a square with a side two times as long shows as an area 4 times as big. Casual observers will relate to the area, even without reading your legend.
A nice example is this legendary graph by xkcd's Randall Munroe:
(huge, legible version)
In my opinion the area (D), not each side (E).
If you are using a side of length 2, then the area would be 4 times the value and you would have a very overlapped graph. (E)
When you have a normal bar graph (A), the data is linear, and the with of the bar is just for esthetic. (B)
In those cases the area again is representative of the data because the with of the bars are the same. You can have a 3D bar and still the volume of the bar is the one representing the data (C)
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.