Mobile app version of vmapp.org
Login or Join
Reiling762

: Why should I ever use Unicode’s special characters for Roman numerals? This is to answer a question which arose in the comments on this question on the Unicode characters for Roman numerals:

@Reiling762

Posted in: #BestPractice #Fonts #Typesetting

This is to answer a question which arose in the comments on this question on the Unicode characters for Roman numerals:


Why is this necessary or preferred over the usual way of typing ai, ai-ai, ai-ai-ai, vee-ai, etc.?


To start from the beginning, in Unicode’s Number Forms block, there exist code points for Roman Numerals that are at first glance very similar in appearance to standard capital latin letters or combinations thereof (U+2160 – U+217F). For example, U+2165 (Roman Numeral Six) looks very much like VI (Latin Capital Letter V and Latin Capital Letter I).

Thus, the question arises why one should not use the latter to represent those digits and, e.g., type Louis VII instead of Louis Ⅶ. Obviously, using no special characters avoids compatibility issues with fonts that do not support them. But even if I know that the text will be rendered with a font that does support these characters, why should I bother using them?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Reiling762

3 Comments

Sorted by latest first Latest Oldest Best

 

@Gail6891361

From a perspective of how it looks there may not be much of a difference. So if you publish only printed material then no difference, except in some fonts as Wrzlprmft points out in his excellent answer.

Semantics are important

The semantic difference is huge. By using roman numerals it makes it blatantly clear that you're talking of the number 5 instead of letter V. Sure they look the same, but they mean different. That would mean that the search engine might have a higher chance of finding "XX mark V" when you search "XX version 5".

In fact the reason that some things work badly is because we do not embed semantic info. The world would indeed be a better place if we would. So using the right semantic meaning is about the same as using styles in a word processor versus styling manually. There is little difference on the human end, but big power in automation.

Fonts should make different roman numerals

Font makers aren't really using these because they are not very often used. But by using these you could get the roman numeral slabs on the letters that differentiates them from the text. So the feature is under-utilized because it's a rare usage. Fonts don't really implement everything, nor should they. By using these you would benefit if they are present.

Conclusion

This all is certainly a chicken and a egg type problem. If people do not use the special character ranges then no special allowances for those ranges will be made. So font won't support specially styled roman literals, because doing so would just be wasting effort on features no one uses. Same applies for searching: if nobody does use roman literals then no search engine will find roman literals and the semantics are lost. Semantics suffer from not adopting right semantic meaning. This same thing certainly applies to a wider range of Unicode characters as well.

As for input complexity, yes most users can not write extended characters but that's no excuse for a knowledgeable person to skip doing so if it makes sense. If nobody makes things better no progress will ever be made. Hell even word has modes for writing alpha by typing /alpha. So there's really no reason why there could not be a easy way of tagging numerals or even auto suggesting them as such. Again if nobody does this then it will never get more widespread adoption.

10% popularity Vote Up Vote Down


 

@Cooney243

TL;DR The Unicode consortium recommends using the latin letter where possible and not the numeral, which where included for compatibility with East-Asian typography.

The full story : (with justification of the above assertion)

Unless you are doing some East-Asian typography, using the (non-archaic) Roman numeral characters from unicode (U+2160 — U+217F) is a hack.

These character have been included for compatibility with pre-Unicode East-Asian standards. These characters stays vertical where the East-Asian text is typeset from top to bottom, while usually, text in Latin characters (e.g. names) are written sideways in this context.

To quote the last version of the Unicode standard (v 7.0, chap. 22, p. 20):


Roman Numerals. For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters. However, the uppercase and lowercase variants of the Roman numerals through 12, plus L, C, D, and M, have been encoded in the Number Forms block (U+2150..U+218F) for compatibility with East Asian standards. Unlike sequences of Latin letters, these symbols remain upright in vertical layout. Additionally, in certain locales, compact date formats use Roman numerals for the month, but may expect the use of a single character.


So, in theory, the distinction between Roman Numerals and letter is a matter of rich text, like italics, a font change, or optional ligatures. That said, as @Wrzlprmft shows, some font use it to avoid a font change for each Roman numeral while keeping a good typography.

The existence of a character for XII and not for XIII implies that there are several different encodings the same numeral, which leads to difficulties in text search : If you write about Louis XII and Louis XIII, you will probably write XIII as X+I+I+I, but will you write XII as a single character ? Or as X+I+I to have a consistent display with XIII ? There is no single good answer to this question while using the Roman Numeral Characters, and that’s why the Unicode consortium recommends using the Latin letters when possible and not the numerals.

Edit : added the TL;DR assertion in the beginning

10% popularity Vote Up Vote Down


 

@Hamaas979

In many fonts you will indeed find hardly any difference between using the Unicode characters for Roman numerals and just composing them from stardard Latin letters. For example, the following shows Louis VII (top) and Louis Ⅶ (bottom, using codepoints for Roman numerals) rendered with FreeSans:



Apart from a tiny difference in spacing, which was propably not intentional, the output is identical.

Here is the same text rendered with DejaVu Sans:



While the characters still look identical, there is a considerable difference in spacing. It may be a matter of taste whether the latter is preferrable for Roman numerals, but it certainly wouldn’t be a good choice of kerning for regular all-caps.

Linux Libertine goes one step further:



Here the Roman numerals are slightly smaller than the capital letters, thus matching the font’s Arabic numerals. Most importantly, they are connected, reproducing a feature often found in hand-drawn Roman numerals.

Now, some may still argue that there aren’t any improvements in the above or that they aren’t worth the effort. So here is a case, where not using the Unicode characters will produce horrible results:



(Note that the small size of the numerals reflects some actual historic typesetting.) Something similar may occur for script or caligraphic fonts.

Without specific Unicode points for Roman numerals, dissolving the latter problem would only be possible with:


Using a complex OpenType feature (or similar) that tries to detect whether a sequence of capital letters is a roman numeral. This will inevitably cause problems with words that would also be a valid Roman numeral.
Using a simple OpenType feature, that needs to be manually activated for every Roman numeral.
Using Unicode’s Private-Use Area. Compatibility issues are likely to ensue even when switching between two fonts that both support Roman numerals.


From Unicode’s point of view, the huge semantical difference between capital Latin letters and Roman numerals should already have sufficed for a seperate encoding of Roman numerals.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme