Linux Ask!

Linux Ask! is a Q & A web site specific for Linux related questions. Questions are collected, answered and audited by experienced Linux users.

What is the maximum length of a valid UTF-8 character?

Answer:

A valid UTF-8 character take up 1 to 4 bytes (and within each octet, only the first 128 US-ASCII characters is used, so it can compatible with legacy systems).

A subset of UTF-8 called UTF-8 Basic Multilingual Plane (BMP), which only takes 3 bytes and can represent most frequently used characters and is compatible with UTF-16/UCS-2.

Reference: http://en.wikipedia.org/wiki/UTF-8

  1. Display non-printable character in vim
  2. Maximum Filename Length in EXT3?
  3. Character classes not working in grep command
  4. Wide character in print… warning in Perl
  5. How to grep for tab character(s) in Linux

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>