What is a String?
0xFFFF and so they fit in 16 bits. That wasn't sufficient, though, so early on the range was extended to its current
0x10FFFF (inclusive), which is effectively 21 significant bits. (For now; it could grow though there's no plan for it to at the moment.) To avoid forcing systems to use three or four bytes (for alignment) for every character, Unicode defines transformation formats that encode those 21 bits into one or more code units of smaller size. UTF-8 uses byte-sized (8-bit) code units that can represent a lot of western text characters in one byte, but may require two, three, or even four bytes for some other characters. (For instance, the winking emoji code point I used above takes four code units in UTF-8:
0xF0 0x9F 0x98 0x89.) UTF-16 uses 16-bit code units, so it may require one or two code units depending on the code point being represented. That same winking face is
0xD83D 0xDE09 in UTF-16. (There's a lot more to it than this, of course.)
...a finite ordered sequence of zero or more 16-bit unsigned integer values...Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text.
const wink = "😉"; console.log(wink.length); // 2
Either way, it means that you can't assume one "character" is a character on its own. For instance, a naïve version of "reversing" a string often looks like this:
const reverse = str => str.split("").reverse().join("");
But that will mess up strings containing surrogate pairs (and other things):
const wink = "😉"; const reversedWink = reverse(wink); console.log(reversedWink); // Outputs two "unknown character" glyphs
But it goes beyond just surrogate pairs. Unicode code points aren't necessarily "characters" on their own, even when you don't break up their surrogate pairs. Sometimes it takes more than one code point to make what someone looking at the text will think of as a "character." If you'd like to know more about it, continue reading with Splitting Strings in 2021.
Have a question about or comment on this post? Tweet me!