Mobile app version of vmapp.org
Login or Join
Kaufman565

: What's the practical difference between a 'glyph' and a 'character'? I saw this question on the Typography site proposal and it bugged me that I didn't know the answer. I'd always treated 'glyph'

@Kaufman565

Posted in: #Terminology #Typography

I saw this question on the Typography site proposal and it bugged me that I didn't know the answer. I'd always treated 'glyph' and 'character' as interchangable.



After reading an explanation on the Unicode Character Encoding Model page, my understanding is roughly this:


Characters are defined by their meaning in language, glyphs, by
their appearance. So, the ligature for aesthetically combining fi
is one glyph, but two characters.


So, my belief is (please correct me if I'm wrong) that the practical difference would be:


Text parsers that aren't interested in the aestetics of text will read glyphs as their respective characters. So:

If you were to copy and paste text containing glyphs into a plain text editor, the glyphs would be converted to their respective characters (a fi ligature glyph would become f and i)
Any well made automated system based on text parsing (e.g. search engine crawlers, screen readers, spell checkers) would interpret the glyphs as their respective characters.
One character can have many glyphs or glyph sets. I want to say one glyph can only have one character, but this clearly isn't right as there's an example on the linked article of 3 glyphs and glyph sets that seem to each correspond to a character and set of characters. I don't quite see how this could work: surely that means there will be inconsistency or ambiguity in how those glyphs are interpreted, varying by interpretter? (or does it vary by language, or by font?)
While glyph browsers (e.g. the one in Illustrator) contain the full glyph set of a font, character maps (e.g. the Windows character map) only contain characters, not glyphs that are multiple characters like ligatures (something I'd not noticed before)



I feel like I'm nearly there but I've clearly misunderstood something somewhere along the line: not just the "One glyph multiple characters" thing, but also, copying and pasting behaviour with ligatures isn't quite what I expected:


Copy the ligature fi from Illustrator to this input box: pastes as fi (two characters) as expected.
Paste in the HTML code for it (fi) - displays as the ligature when not in a code block (fi - which in this font doesn't look much like a ligature, but you'll see is one if you try to select just half of it), and the code when in a code block (fi), as expected.
Copy and paste the rendered non-code-block ligature back into the input box: pastes as the ligature character, and renders as the ligature regardless of whether it's in a code block or not (fi and fi). Likewise words containing it: fit misfits (fit misfits) pastes as fit misfits (fit misfits). Maybe it depends on whether the place it's being pasted understands the encoding used?




How far wrong is my understanding of this? Can someone put me right: stating a clear definition of the difference between glyphs and characters (if mine is wrong or can be improved), and give clearer/more accurate examples than mine of what that means in practice?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Kaufman565

4 Comments

Sorted by latest first Latest Oldest Best

 

@Merenda852

There are a couple of answers here that give good information about glyphs vs characters, but they don't really address the source of your confusion with respect to copying and pasting.

First of all, your understanding is fundamentally correct:


Characters are defined by their meaning in language, glyphs, by their
appearance. So, the ligature for aesthetically combining fi is one
glyph, but two characters.


It's worth emphasizing that the list of characters is defined by the Unicode standard, which is published by the Unicode Consortium, due to the fact that they're the authority on encoding text in a machine readable format. The definition above is essentially the primary guideline that the Unicode Consortium members use to determine whether or not some proposed addition to Unicode is a character and thus worthy of inclusion, or a glyph and should be handled by font renderers.

I mention this because the confusion you experienced above was due to the fact that there exist several ligature characters (not glyphs) in Unicode. For instance, U+FB01 is the character for the fi ligature: unicode.org/charts/PDF/UFB00.pdf
Having ligature characters in Unicode isn't really in the spirit of the above definition for what sorts of things should be included in the Unicode standard as characters, since ligatures don't really have a meaning independent of the composition of two other characters. The Unicode people are naturally aware of this, and the Unicode FAQ on ligatures admits as much:


The existing ligatures exist basically for compatibility and round-tripping with non-Unicode character sets. Their use is discouraged.


The existence of this character is ultimately the source of your confusion.

In correctly implemented software, copying text should always copy the characters that were specified, not the glyphs, and that's exactly what's going on in your three examples.

1) In the first example, you typed f and i into Illustrator, which rendered a single ligature glyph. When you selected and copied that rendered glyph, Illustrator correctly copied the f (U+0066) and i (U+0069) characters onto your clipboard.

2) In the second example, you typed the HTML code for the ligature character (&#64257) into the input box, and correctly got the ligature glyph representing the ligature character (. Since the underlying character is actually the obscure and relatively pointless ligature character I mentioned above, selecting that glyph will copy a single character U+FB01.

3) In the third example, you're copying the rendered ligature character U+FB01 that was rendered in part 2, which will always paste as that character. Your main confusion seems to be regarding the difference between HTML entity codes and characters, particularly with regard to how they are rendered in and outside of code blocks.

The HTML entity code &#64257; is a string of 8 distinct characters. The HTML renderer of your web browser substitutes those 8 characters U+0026 U+0023 U+0036 U+0032 U+0035 U+0037 U+0023 with the single Unicode character U+FB01, which it then renders appropriately. However, the <code> tag in HTML disables this behavior, leaving those 8 characters as they are.

When you copy off of rendered HTML, you copy the rendered characters (which are different from the rendered glyphs). Thus, when you copy your rendered HTML entity, the single U+FB01 character is copied to your clipboard.

When you paste the fi U+FB01 character back into the HTML, no substitution needs to take place, meaning the character is rendered as a ligature regardless of whether or not it falls within a <code> block.

10% popularity Vote Up Vote Down


 

@Merenda852

Characters are what stored in text files, processed by applications, and moved around, while glyphs is their visual representation.

To have a clear picture, lets see what happens when an application tries to render a string of text on the screen (in a bit simplified way):


The application first read the text string, that it the string of characters stored on the disk or in memory.
It would then send it to a text layout engine, among some other properties like the desired font, text language and so on:

The text layout engine basically opens the font file, asks it for the glyph(s) corresponding to each character and do some glyph substitution (like replacing the glyph for f and i with the ligature glyph of fi) and positioning (like kerning).
At the end the layout engine has a sequence of glyphs, their positions relative to each other, and a mapping between input characters and the output glyphs. The character to glyph mapping is so that it knows that the first two characters in the word file correspond two the first glyph (the fi ligature), the 3rd character to the 2nd glyph and the 4th character to the 3rd glyph.

A graphics rendering library is then used to “draw” those glyphs on the screen using shapes from the font.
When the user selects “glyphs” on the screen, the application would then consult the glyph to text mapping provided by the layout engine to find what part of the input text corresponds to what the user is selecting and send that text to the clipboard when the user copies it.
The same happens when the user inserts the cursor in the middle of text and starts typing, the mapping determines where in the input text to insert the new characters, and the updating text is send to the layout engine to process and redrawn and so on.

10% popularity Vote Up Vote Down


 

@Shelley591

Glyphs relate to how text is rendered, characters to how it's interpreted. When you copy&paste, the source application usually gives a choice of several formats. Plain text will decompose the fi ligature into f and i, HTML format may translate it to the char entity you quoted or also decompose it in f and i.

In general the relation between characters and glyphs is n:m. In Indic languages some characters divide into two glyphs that are placed at different places of the word. In Latin the closest to that situation would be rendering é as two glyphs (e and ´). In Arabic each character has different glyphs depending on its position within a word: initial, middle, final or isolated.

The translation from characters to glyphs is specific to each application and the typographic features it supports. For Latin text this translation used to be straightforward, but OpenType fonts introduced additional features like ligatures, swashes, alternate forms, small caps etc.

For practical reasons you only concern yourself with glyphs when you implement how an application renders text, or when you design a font, or when you want to apply an OpenType feature that replaces some glyphs with others (e.g. ligatures). Otherwise Unicode code points are your friend.

10% popularity Vote Up Vote Down


 

@Rambettina927

I don't think your understanding is incorrect you're just seeing systems that try to help the user by pasting what it thinks they want. Since some ligatures ('fi', 'fl') are fairly common outside of typesetting systems, software recognizes that the user probably didn't enter that glyph, rather another app transformed their typed characters.

In short: Character refers to a linguistic unit. Glyph refers to a designed instance of that unit, whether it be uppercase, lowercase, small cap, historic, or stylistic variant.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme