Perl
Perl was the first language that I learnt, and it is still my favourite language for its beauty and obfuscation capabilities.
(back to Programming)
I am currently moving a WordPress blog from one server to another. In the process, I messed up the MySQL database transportation, resulting in some tables being created, and some rows being inserted. It would not be possible to simply run the SQL dump files again, since that would result in duplicate rows. This was easy to fix with some commandline and perl magic, though. The file containing all SQL statements is called wordpress-dump.sql (in this example).
First, find the names of all tables:
$ grep -E '^CREATE TABLE' wordpress-dump.sql
CREATE TABLE `cat_visibility` (
CREATE TABLE `comments` (
CREATE TABLE `links` (
CREATE TABLE `movie_ratings` (
CREATE TABLE `options` (
CREATE TABLE `photopress` (
CREATE TABLE `postmeta` (
CREATE TABLE `posts` (
CREATE TABLE `pp_cats` (
CREATE TABLE `sk2_logs` (
CREATE TABLE `sk2_spams` (
CREATE TABLE `term_relationships` (
CREATE TABLE `term_taxonomy` (
CREATE TABLE `terms` (
CREATE TABLE `usermeta` (
CREATE TABLE `users` (
Next, reformat those lines into DROP queries.
$ grep -E '^CREATE TABLE' wordpress-dump.sql | perl -pe 's/CREATE/DROP/; s/ *\($/;/;'
Now, you won’t have to change those by hand!
In the post about Project Euler 7, I described how to obtain a list of primes. These are not enough: the last one is 1,299,827, and we need all up to 2,000,000. The first 1,000,000 prime numbers can be found at the same site. Here is a ZIP file of the 1,000,000 first primes. Download the file, unzip it and format it as in problem 7.
Summing the first ones is fast and simple. This Perl script does it well:
1
2
3
4
5
6
7
8
9
| my $sum = 0;
open( PRIMES, '<', 'dat/primes-1000000-first.txt' ) or die $!;
while ( <PRIMES> ) {
chomp;
last if $_ >= 2000000;
$sum += $_;
}
close( PRIMES );
print $sum, "\n"; |
This mission can be solved by calculating lots of prime numbers. However, we do not want to put extra effort into our solutions, do we? After a bit of searching, I found a list of the 100000 first prime numbers (100008, actually). I saved this file. Unfortunately, it is not in an optimal format. I wrote a short Perl script that creates a new file, in which each prime number is on a separate line:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # read the file from utm.edu
open( PRIMES, '<', 'dat/100000.txt' ) or die $!;
# this is the file that we will write to
open( TO, '>', 'dat/primes-100000-first.txt' ) or die $!;
while ( <PRIMES> ) { # for each line
for ( split / +/ ) { # split, separating by spaces, and for each piece
if ( m/^\d+$/ ) { # if it is a number
print TO "$_\n"; # print the number to the file
}
}
}
close( TO );
close( PRIMES ); |
Of course, you need to replace the filenames.
Now, it is a piece of cake to find the 10001st prime number. Just do:
head -10001 dat/primes-100000-first.txt | tail -1
Which finds the 10001 first primes and outputs the last one of them.
Or you could open it in notepad and start counting rows.
There is exactly one Pythagorean triplet (a^2+b^2=c^2) in which a+b+c=1000. Since all terms are positive, we can try all alternatives for 0 < a,b < 1000, which is less than 1000^2. This Perl script does it all:
1
2
3
4
5
6
7
8
9
10
11
12
| # first, generate pairs of a and b
for ( my $a = 1; $a < 1000; $a++ ) {
for ( my $b = $a; $b < 1000; $b++ ) {
# c is then calculated from these
my $c = sqrt( $a*$a + $b*$b );
if ( $a+$b+$c == 1000 ) {
# print and exit if we found the answer
printf "%d*%d*%d=%d\n", $a, $b, $c, $a*$b*$c;
exit;
}
}
} |
There is probably a faster solution without a square root, but this one is fast enough.
I was trying to write a Perl script for analyzing a CSV file. It was generated by Google AdSense and contained lots of statistics. Naturally, I started by reading the file:
open( CSV, '<', 'adsense-report.csv' );
while ( CSV ) {
# handle each line
}
close( CSV );
However, upon trying to match each line with a regular expression, I found that it was not possible to match several characters in a row. Only very simple regexps such as m/5/ worked! After some research, I found the problem:
$ file adsense-report.csv
adsense-report.csv: \012- Unicode text, UTF-16, little-endian
Apparently, Perl assumed some other encoding. I changed the second argument of open():
open( CSV, '<:encoding(utf-16)', 'adsense-report.csv' );
while ( CSV ) {
# handle each line
}
close( CSV );
It now works well. Unfortunately, I receive two errors of “UTF-16:Partial character”, which I cannot seem to solve.
A while ago, I started playing lots of scrabble. The scrabble game in the Ubuntu repository (by Brian White) records all games in the file ~/.scrabble-games. The following perl script analyzes the file and creates statistics for your games.
#!/usr/bin/perl
use strict;
my $mode = shift @ARGV;
unless ( $mode ) {
print "arg: (freq|gen|score|size)\n";
exit;
}
chdir;
open( LOG, '<scrabble-games' ) or die $!;
my $yourwords = 0;
my ( %wfreq, %wscore );
my $numgames = 0;
while ( <LOG> ) {
chomp;
if ( m/^level #(\d); computer rack: "([A-Z]*)"; player rack: "([A-Z]*)"$/ ) {
$numgames++;
}
if ( m/My Words/ ) {
$yourwords = 0;
} elsif ( m/Your Words/ ) {
$yourwords = 1;
} elsif ( $yourwords ) {
while ( m/[^a-z]([a-z]+)\((\d+)\)/g ) {
$wfreq{$1}++ if length $1 > 2;
$wscore{$1} = $2 if length $1 > 2;
}
}
}
close( LOG );
my @freq = sort { $wfreq{$a} <=> $wfreq{$b} } keys %wfreq;
my @size = sort { length $a <=> length $b } keys %wfreq;
my @score = sort { $wscore{$a} <=> $wscore{$b} } keys %wscore; # TODO: no hash needed; calculate score from word and ignore bonuses
if ( $mode eq 'gen' ) {
printf "No of games:\t%d\n", $numgames;
printf "Longest:\t%s\n", join( ', ', &liststart( reverse @size ) );
printf "Most used:\t%s\n", join( ', ', &liststart( reverse @freq ) );
printf "Highest score:\t%s\n", join( ', ', &liststart( reverse @score ) );
} elsif ( $mode eq 'score' ) {
for ( @score ) {
printf "%d\t%s\n", $wscore{$_}, $_;
}
} elsif ( $mode eq 'freq' ) {
for ( @freq ) {
printf "%d\t%s\n", $wfreq{$_}, $_;
}
} elsif ( $mode eq 'size' ) {
for ( @size ) {
printf "%d\t%s\n", $wfreq{$_}, $_;
}
}
sub liststart {
my @t;
for ( 1..5 ) {
push @t, shift;
}
return @t;
}
Call it with one of the arguments freq (for most frequently used words), score (for best-scoring words), size (for longest words) or gen (for a summary). Note that only your words are counted, not your opponent’s.
While being in the Alps skiing (Argentiere), my cellphone broke due to a 55kg-weight dropping on it. Since I used to set my cellphone’s alarm clock to make me wake up, I had to come up with a better solution. I wrote this perl script:
use strict;
my ( $t, $f );
if ( @ARGV == 1 ) {
$t = $ARGV[0];
$f = '~/wakeup.mp3';
} elsif ( @ARGV == 2 ) {
( $t, $f ) = @ARGV;
} else {
die 'args: sec, file';
}
while ( $t > 0 ) {
sleep 1;
$t--;
my $h = int( $t / 3600 );
my $m = int( $t / 60 - $h * 60 ) ;
my $s = int( $t - $h * 3600 - $m * 60 ) ;
print "$h:$m:$s\n";
}
print "BEEP BEEP BEEP\n";
`mplayer $f`;
The script counts down $t seconds. When it reaches zero, $f will start playing. Eight hours equal 28800 seconds.
Unfortunately, writing this script kept me from sleeping anyway.
A while ago, bot-depot.com maintained the handy MSN.pm module. Nowadays, though, the MSN protocol has evolved, and MSN.pm development has fallen behind. Trying to run the current module resulted in the error message “No expected reply recieved” (sic, not “No expected reply received”). Fortunately, there was an updated version to be found at botwork.com (MSN2.0.84.zip). It worked a bit better, at least:
$ perl echobot.pl
Can't locate Digest/SHA1.pm in @INC (@INC contains: ./lib /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at lib/MSN/Notification.pm line 21.
Install Digest::SHA1 with CPAN, and:
$ perl echobot.pl
MSN 2.0 (01/21/2005) Rev: 84 - Checksum: 60068-NS31944-SB22354
SERVER ERROR : Authentication Error: No response from Passport server
Searching the bot-depot forums, I found that the problem is that you need Crypt::SSLeay. Install it with CPAN, and the echobot.pl works fine, ready to be modified.
I just finished Playlog, my first Amarok plugin, written in Perl. I grew tired of last.fm having the sole properties of my listening-log, and such, this script everything that you listen to, along with the time that you listen to it, into two nice and handy MySQL tables. In the future, I plan to write some kind of analysis script for the data.
This was also my first piece of software to send in to kde-apps.org: Playlog.
Here is the Perl source code, in case you are interested. I release it under the GPL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
| #!/usr/bin/perl
use strict;
use DBI;
my $dbh = DBI->connect( 'DBI:mysql:database=playlog;host=localhost',
'playlog', '',
{ 'RaiseError' => 1 }
);
$dbh->do( 'CREATE TABLE IF NOT EXISTS songs ( song_id INT(11) NOT NULL AUTO_INCREMENT, song_artist VARCHAR(255), song_album VARCHAR(255), song_title VARCHAR(255), PRIMARY KEY (song_id) )' );
$dbh->do( 'CREATE TABLE IF NOT EXISTS playlog ( play_time INT(11), song_id INT(11) )' );
sub cleanup {
$dbh->disconnect;
}
$SIG{'TERM'} = 'cleanup';
while ( <> ) {
if ( m/^trackChange$/ ) {
chomp( my $artist = `dcop amarok player artist` );
chomp( my $album = `dcop amarok player album` );
chomp( my $song = `dcop amarok player title` );
my $id;
my @s = &qsong( $artist, $album, $song );
if ( ! $s[0] ) {
$dbh->do( sprintf(
'INSERT INTO songs ( song_artist, song_album, song_title ) VALUES ( %s, %s, %s )',
$dbh->quote( $artist ),
$dbh->quote( $album ),
$dbh->quote( $song )
) ) or die $dbh->errstr;
@s = &qsong( $artist, $album, $song );
}
$dbh->do( sprintf(
'INSERT INTO playlog ( play_time, song_id ) VALUES ( %d, %d )',
time,
$s[0]
) );
}
}
sub qsong {
my ( $artist, $album, $song ) = @_;
my $sth = $dbh->prepare( sprintf(
'SELECT song_id FROM songs WHERE song_artist = %s AND song_album = %s AND song_title = %s',
$dbh->quote( $artist ),
$dbh->quote( $album ),
$dbh->quote( $song )
) ) or die $dbh->errstr;
$sth->execute();
return $sth->fetchrow_array();
} |
I’m trying currently my skills at Hack This Site, and found that I am supposed to crack an MD4 hash. I wrote the following Perl script to do it for me:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| use Digest::MD4 md4_hex;
my $goal = '3184342944a094dd5dbe6fccaeb8dc96'; # hash to find
my @chars = ( a..z, 0..9 ); # characters to use
my $minlen = 1; # minimum length of string
my $maxlen = 8; # maximum length of string
for ( $minlen..$maxlen ) {
print 'Length: ', $_, "\n";
&checkStrings($_);
}
sub checkStrings {
my ( $n, $s ) = ( (shift) - 1, shift );
for ( @chars ) {
if ( $n ) {
checkStrings($n, $s . $_ );
} else {
print $s, $_, "\n" if md4_hex( $s . $_ ) eq $goal;
}
}
} |
&checkStrings is a recursive function that checks all strings built from @chars of a certain number of characters. The script can easily be changed to cater for other message digest or hashing algorithms; simply change the function in line 19 from md4_hex() to whatever function you want. It could even be something like reverse(), although I hope that you’ve got a better way for finding such a string.
The output of the above configuration is:
Length: 1
Length: 2
Length: 3
Length: 4
Length: 5
c6a18
Length: 6
Length: 7
Length: 8
Note that the script keeps going after a successful collision–there could be several hits.
Earlier Posts »