Linux Ask!

Linux Ask! is a Q & A web site specific for Linux related questions. Questions are collected, answered and audited by experienced Linux users.

Mar 042008
 

What is the maximum length of a valid UTF-8 character?

Answer:

A valid UTF-8 character take up 1 to 4 bytes (and within each octet, only the first 128 US-ASCII characters is used, so it can compatible with legacy systems).

A subset of UTF-8 called UTF-8 Basic Multilingual Plane (BMP), which only takes 3 bytes and can represent most frequently used characters and is compatible with UTF-16/UCS-2.

Reference: http://en.wikipedia.org/wiki/UTF-8