Wednesday, December 8, 2010

Brazilian Wax Day After

Problems encoding to UTF-8 Perl scripts

I have a script that reads a CSV file, does some processing and then generates a new CSV file on standard output.

Problem: If the csv source contains extended characters in utf8 encoding (eg the German umlaut letters) I was in the error: Can not parse
: __contenuto_della_riga_del_csv__ Solution: tell Perl that what they are reading is' a file utf8 transforming the line: open (CSV,'\u0026lt;:', $ file) in:

open (CSV, '\u0026lt;: utf8', $ file)
This did disappear warning for the parsing of the file, but 'the output was wrong with extended characters replaced by horrific sequences and codes.

The new problem was that the output was encoded in ISO-8859
To force writes to standard output in UTF8, you must enter at the beginning of the script line:
binmode STDOUT, ": utf8";







0 comments:

Post a Comment