The Daily Insight.

Connected.Informed.Engaged.

news

What does UTF-8 mean?

By Grace Evans |

What does UTF-8 mean?

This is the meaning of “UTF”, or “Unicode Transformation Format.” There are other encoding systems for Unicode besides UTF-8, but UTF-8 is unique because it represents characters in one-byte units.

Why UTF-8 is the default character encoding in XML?

The World Wide Web Consortium recommends UTF-8 as the default encoding in XML and HTML (and not just using UTF-8, also stating it in metadata), “even when all characters are in the ASCII range.. Using non-UTF-8 encodings can have unexpected results”. Many other standards only support UTF-8, e.g. open JSON exchange requires it.

What is a high surrogate in UTF8 encoding?

The two UTF8Encoding instances encode a character array that contains two high surrogates (U+D801 and U+D802) in a row, which is an invalid character sequence; a high surrogate should always be followed by a low surrogate.

What are the most common character encodings?

The most commonly used encodings are UTF-8 and UTF-16: A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages.