Linux Ask!

Linux Ask! is a Q & A web site specific for Linux related questions. Questions are collected, answered and audited by experienced Linux users.

Mar 042008

What is the maximum length of a valid UTF-8 character?


A valid UTF-8 character take up 1 to 4 bytes (and within each octet, only the first 128 US-ASCII characters is used, so it can compatible with legacy systems).

A subset of UTF-8 called UTF-8 Basic Multilingual Plane (BMP), which only takes 3 bytes and can represent most frequently used characters and is compatible with UTF-16/UCS-2.


 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>