Reading AdSense CSV in Perl (UTF-16 Problems)

I was trying to write a Perl script for analyzing a CSV file. It was generated by Google AdSense and contained lots of statistics. Naturally, I started by reading the file:

open( CSV, '<', 'adsense-report.csv' );
while ( CSV ) {
    # handle each line
}
close( CSV );

However, upon trying to match each line with a regular expression, I found that it was not possible to match several characters in a row. Only very simple regexps such as m/5/ worked! After some research, I found the problem:

$ file adsense-report.csv
adsense-report.csv: \012- Unicode text, UTF-16, little-endian

Apparently, Perl assumed some other encoding. I changed the second argument of open():

open( CSV, '<:encoding(utf-16)', 'adsense-report.csv' );
while ( CSV ) {
    # handle each line
}
close( CSV );

It now works well. Unfortunately, I receive two errors of “UTF-16:Partial character”, which I cannot seem to solve.

Maybe Related?

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

FireStats iconAnvänder FireStats