Mar 042008
What is the maximum length of a valid UTF-8 character?
Answer:
A valid UTF-8 character take up 1 to 4 bytes (and within each octet, only the first 128 US-ASCII characters is used, so it can compatible with legacy systems).
A subset of UTF-8 called UTF-8 Basic Multilingual Plane (BMP), which only takes 3 bytes and can represent most frequently used characters and is compatible with UTF-16/UCS-2.
Reference: http://en.wikipedia.org/wiki/UTF-8