Monday, December 22, 2008

Converting UTF-16 to UTF-8

I was running grep on an xml file (big one) and realized that it did not match the pattern I was expecting the the file. I then did a "head" from the file to see a pattern that is present in it. Grepping on that too returned nothing telling me that grep is somehow not working on this file.
"head" was returning some junk characters at the begining of the file - This idicated that the file does not have a normal text encoding and probably that is the reason why grep is not working on it.
Reading the XML header proved that, since it was "UTF-18", grep was not running over it. I had to convert the file to UTF-8 which is more text friendly if the content is only English.

iconv --from-code UTF-16 --to-code UTF-8 input_file.xml > output_file.xml

4 comments:

Anonymous said...

UTF-18, eh?

sandeep said...

>>> UTF-18, eh?
typos are programmers prerogative :D

Mozai said...

Thank you, very useful.

Anonymous said...

helped me, thank you.