Linux Ask!

Linux Ask! is a Q & A web site specific for Linux related questions. Questions are collected, answered and audited by experienced Linux users.

How to remove BOM from UTF-8?

Answer:

# awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' text.txt

Source: http://stackoverflow.com/questions/1068650/using-awk-to-remove-the-byte-order-mark

Updated: (Suggested by Van Overveldt Peter)

# tail --bytes=+4 text.txt

  1. Output the last N bytes of a file in Linux
  2. Join lines on a common field
  3. Remove a document in a collection in MongoDB
  4. Remove newline character from the end of a string in Perl
  5. How to remove HTML from text using PHP?

3 Responses to “How to remove BOM from UTF-8?”

  1. My preferred command to get rid of the BOM, is:

    tail --bytes=+4 UTF8WithBom.txt > UTF8WithoutBom.txt

  2. Thanks for your suggestion, I will include your comments in the post soon.

    Thanks again.

  3. This little line of code saved my tail. I had a latex source file that had picked this up and it was refusing to compile it to a PDF because of this nasty little character.

    Thanks a bundle guys :)

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>