What is a String?
0xFFFF and so they fit in 16 bits. That wasn't sufficient, though, so early on the range was extended to its current
0x10FFFF (inclusive), which is effectively 21 significant bits. (For now; it could grow though there's no plan for it to at the moment.) To avoid forcing systems to use three or four bytes (for alignment) for every character, Unicode defines transformation formats that encode those 21 bits into one or more code units of smaller size. UTF-8 uses byte-sized (8-bit) code units that can represent a lot of western text characters in one byte, but may require two, three, or even four bytes for some other characters. (For instance, the winking emoji code point I used above takes four code units in UTF-8:
0xF0 0x9F 0x98 0x89.) UTF-16 uses 16-bit code units, so it may require one or two code units depending on the code point being represented. That same winking face is
0xD83D 0xDE09 in UTF-16. (There's a lot more to it than this, of course.)
...a finite ordered sequence of zero or more 16-bit unsigned integer values...Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text.
const wink = "😉"; console.log(wink.length); // 2
Either way, it means that you can't assume one "character" is a character on its own. For instance, a naïve version of "reversing" a string often looks like this:
const reverse = str => str.split("").reverse().join("");
But that will mess up strings containing surrogate pairs (and other things):
const wink = "😉"; const reversedWink = reverse(wink); console.log(reversedWink); // Outputs two "unknown character" glyphs
There's more to the story. Beyond surrogate pairs of code units that combine to create a code point, it can even take multiple Unicode code points to create a specific "character" (glyph). For example, in Devanagari, a writing system used in India and Nepal, vowel sounds are written as marks modifying the consonant glyph. Code point U+0928, न, is pronounced "na", but you can follow it with code point U+093F to produce नि ("ni"). (More details on that in Chapter 10 of the book.)
Have a question about or comment on this post? Tweet me!