Reading AdSense CSV in Perl (UTF-16 Problems)
I was trying to write a Perl script for analyzing a CSV file. It was generated by Google AdSense and contained lots of statistics. Naturally, I started by reading the file:
open( CSV, '<', 'adsense-report.csv' ); while ( CSV ) { # handle each line } close( CSV );
However, upon trying to match each line with a regular expression, I found that it was not possible to match several characters in a row. Only very simple regexps such as m/5/ worked! After some research, I found the problem:
$ file adsense-report.csv adsense-report.csv: \012- Unicode text, UTF-16, little-endian
Apparently, Perl assumed some other encoding. I changed the second argument of open():
open( CSV, '<:encoding(utf-16)', 'adsense-report.csv' ); while ( CSV ) { # handle each line } close( CSV );
It now works well. Unfortunately, I receive two errors of “UTF-16:Partial character”, which I cannot seem to solve.
