Code Snippets

Writing an MSN bot in Perl

A while ago, bot-depot.com maintained the handy MSN.pm module. Nowadays, though, the MSN protocol has evolved, and MSN.pm development has fallen behind. Trying to run the current module resulted in the error message “No expected reply recieved” (sic, not “No expected reply received”). Fortunately, there was an updated version to be found at botwork.com (MSN₂.0.84.zip). It worked a bit better, at least:

$ perl echobot.pl 
Can't locate Digest/SHA1.pm in @INC (@INC contains: ./lib /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at lib/MSN/Notification.pm line 21.

Install Digest::SHA1 with CPAN, and:

$ perl echobot.pl 
MSN 2.0 (01/21/2005) Rev: 84  - Checksum: 60068-NS31944-SB22354
SERVER ERROR : Authentication Error: No response from Passport server

Searching the bot-depot forums, I found that the problem is that you need Crypt::SSLeay. Install it with CPAN, and the echobot.pl works fine, ready to be modified.

Amarok Script: Playlog

I just finished Playlog, my first Amarok plugin, written in Perl. I grew tired of last.fm having the sole properties of my listening-log, and such, this script everything that you listen to, along with the time that you listen to it, into two nice and handy MySQL tables. In the future, I plan to write some kind of analysis script for the data.

This was also my first piece of software to send in to kde-apps.org: Playlog.

Here is the Perl source code, in case you are interested. I release it under the GPL.

#!/usr/bin/perl
 
use strict;
use DBI;
 
my $dbh = DBI->connect( 'DBI:mysql:database=playlog;host=localhost',
	'playlog', '',
	{ 'RaiseError' => 1 }
);
$dbh->do( 'CREATE TABLE IF NOT EXISTS songs ( song_id INT(11) NOT NULL AUTO_INCREMENT, song_artist VARCHAR(255), song_album VARCHAR(255), song_title VARCHAR(255), PRIMARY KEY (song_id) )' );
$dbh->do( 'CREATE TABLE IF NOT EXISTS playlog ( play_time INT(11), song_id INT(11) )' );
 
sub cleanup {
	$dbh->disconnect;
}
 
$SIG{'TERM'} = 'cleanup';
 
while ( <> ) {
	if ( m/^trackChange$/ ) {
		chomp( my $artist = `dcop amarok player artist` );
		chomp( my $album = `dcop amarok player album` );
		chomp( my $song = `dcop amarok player title` );
		my $id;
		my @s = &qsong( $artist, $album, $song );
		if ( ! $s[0] ) {
			$dbh->do( sprintf(
				'INSERT INTO songs ( song_artist, song_album, song_title ) VALUES ( %s, %s, %s )',
				$dbh->quote( $artist ),
				$dbh->quote( $album ),
				$dbh->quote( $song )
			) ) or die $dbh->errstr;
			@s = &qsong( $artist, $album, $song );
		}
		$dbh->do( sprintf(
			'INSERT INTO playlog ( play_time, song_id ) VALUES ( %d, %d )',
			time,
			$s[0]
		) );
	}
}
 
sub qsong {
	my ( $artist, $album, $song ) = @_;
	my $sth = $dbh->prepare( sprintf(
		'SELECT song_id FROM songs WHERE song_artist = %s AND song_album = %s AND song_title = %s',
		$dbh->quote( $artist ),
		$dbh->quote( $album ),
		$dbh->quote( $song )
	) ) or die $dbh->errstr;
	$sth->execute();
	return $sth->fetchrow_array();
}

PHP: Calculate FRES and SMOG of a Text

A few days ago, I set up a site where students can upload their essays for money (Uppsatslotto.se: Tjäna pengar på dina uppsatser), and wanted a nice way for analyzing texts. I found two in particular, FRES (Flesch Reading Ease Score) and SMOG (Simple Measure of Gobledygook). Slightly modified excerpts from the respective Wikipedia articles (licensed under the GNU Free Document License):

The Flesch/Flesch–Kincaid Readability Tests are readability tests designed to indicate how difficult a reading passage is to understand. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level. Although they supposedly use the same measures, just placed into a different scale, the results of the two tests do not always correlate (a text with a better score on the Reading Ease test over another text may end up with a worse score on the Grade Level test). Both sytems were devised by Rudolf Flesch. In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark harder-to-read passages. The formula for the Flesch Reading Ease Score (FRES) test is 206.835 – 1.015 * W/Se – 84.6 * Sy/W where W/Se is the average number of words per sentence and Sy/W is the average number of syllables per word.

SMOG (Simple Measure Of Gobbledygook) is a readability formula that estimates the years of education needed to understand a piece of writing. SMOG is widely used, particularly for checking health messages. The precise SMOG formula yields an outstandingly high 0.985 correlation with the grades of readers who had 100% comprehension of test materials. SMOG was published by G. Harry McLaughlin in 1969 as a more accurate and more easily calculated substitute for the Gunning-Fog Index. The SMOG of a text can be calculated by: 1.0430 * sqrt( 30 * Psy/Se ) + 3.1291.

The following code is a PHP implementation for calculating the required values (number of words, number of sentences, number of syllables, and number of polysyllabic words) and putting it all together using the above mentioned formulae.

// Number of words: number of space series or linebreaks + 1
$wc = preg_match_all( '/[ \r]/', preg_replace( '/ +/', ' ', $text ), $tmp );
// Number of syllables: vowels not followed by another vowel. Quite accurate approximation.
$syc = preg_match_all( '/[aeiouy][^aeiouy]/', $text, $tmp );
// Number of polysyllabic words (>=3 syllables): Vowel, non-spaces, vowel, non-spaces, vowel (or more non-spaces-vowel)
$psyc = preg_match_all( '/[aeiouy]([^ ]*[aeiouy]){2,}/', $text, $tmp );
// Number of sentences: Number of periods, exclamation marks, question marks and linebreaks
$sec = preg_match_all( '/[.!?\r]/', $essayf, $tmp );
// Flesch Reading Ease Score
$fres = 206.835 - 1.015 * ( $wc / $sec ) - 84.6 * ( $syc / $wc );
// Simple Measure of Gobbledygook
$smog = 1.043 * sqrt( $psyc * ( 30 / $sec ) ) + 3.1291;

MD4 Hash Collision Finder in Perl

I’m trying currently my skills at Hack This Site, and found that I am supposed to crack an MD4 hash. I wrote the following Perl script to do it for me:

use Digest::MD4 md4_hex;
 
my $goal = '3184342944a094dd5dbe6fccaeb8dc96'; # hash to find
my @chars = ( a..z, 0..9 ); # characters to use
my $minlen = 1; # minimum length of string
my $maxlen = 8; # maximum length of string
 
for ( $minlen..$maxlen ) {
	print 'Length: ', $_, "\n";
	&checkStrings($_);
}
 
sub checkStrings {
	my ( $n, $s ) = ( (shift) - 1, shift );
	for ( @chars ) {
		if ( $n ) {
			checkStrings($n, $s . $_ );
		} else {
			print $s, $_, "\n" if md4_hex( $s . $_ ) eq $goal;
		}
	}
}

&checkStrings is a recursive function that checks all strings built from @chars of a certain number of characters. The script can easily be changed to cater for other message digest or hashing algorithms; simply change the function in line 19 from md4_hex() to whatever function you want. It could even be something like reverse(), although I hope that you’ve got a better way for finding such a string.

The output of the above configuration is:

Length: 1
Length: 2
Length: 3
Length: 4
Length: 5
c6a18
Length: 6
Length: 7
Length: 8

Note that the script keeps going after a successful collision–there could be several hits.

Project Euler 8: Greatest Product of Consecutive Digits in String

This problem is one of the easiest ones at Project Euler. You can even solve it manually in less than a minute; just highlight the nines in your browser, and you will see the potential candidates. You’ll have problems proving it that way, though, but you can just try the solution and check if it works.

Here is a nice Perl script that checks every product. Being the lazy perl-programmer I am, I let Perl format the numbers instead of doing it by hand.

# put the numbers into an array
@n = split //, join '', qw/
73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450
/;
 
my $m; # this is the greatest product yet found
foreach my $i ( 0..(@n-5) ) {
	my $p = $n[$i]*$n[$i+1]*$n[$i+2]*$n[$i+3]*$n[$i+4]; # find current product
	$m = $p if $p > $m; # update $m if this product is greater
}
print $m; # print the result

There is really nothing interesting here, except perhaps the second line. Using the qw// operator, Perl first makes a list where every element is a row of the long number. join joins the rows together to one long string, and split splits them such that each element contains one number.

I thought about writing a more elaborate script that would move through the list, dividing by one element and multiplying by another. However, I decided that the existence of zeroes would make it too difficult to be worth the trouble.

Warning: The code looks ugly with all those pink numbers if you’re using a weird colour hilighting.

Writing a Factorial Subroutine in Perl

Perl seems to lack a native factorial function. Here is a simple subroutine that does the job.

sub factorial {
	my $n = 1;
	$n *= $_ for 2..shift;
	return $n;
}

The script is pretty straight-forward, except the third line. Expanding it to a more eligible block, we would get:

sub factorial {
	my $n = 1;
	my $max = shift; # multiply all numbers up to this
	for my $i (2..$max) { # for all numbers between 2 and $max,
		$n *= $i # multiply them together
	}
	return $n;
}

If you want, for example, 64! (64 factorial), you just call &factorial(64). This seems to work fine for all integers up to 171, which Perl equals to infinity.

Use Google as a Spellchecker with Perl

Try googling for “recieve”, and you will receive the following messages above the results:

Did you mean: receive

This feature can be used to check texts for commonly misspelled words, and doing so is especially easy since the correctly spelled version is always highlighted with B and I elements. The following perl code uses LWP to accomplish the task.

#!/usr/bin/perl
 
use strict;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent('Perl 5');
my $text = 'You will recieve good resullts.';
my @newtext;
foreach my $word ( split / /, $text ) {
    my $req = HTTP::Request->new( GET => 'http://google.com/search?q=' . $word );
    my $res = $ua->request( $req );
    if ( $res->content =~ m/<b><i>(.*?)<\/i><\/b>/i ) {
        push @newtext, $1;
    } else {
        push @newtext, $word;
    }
    sleep 1;
}
print "@newtext\n";

The output of the above script is: you will receive good results. Note, however, that this script is quite useless since abusing the Google servers like this is against their terms of service and there are better spell checkers available.

Generate Slugs from Strings with PHP

In these days of Web 2.0, no site can survive for long without pretty, mod_rewrite-modified permanent links. Everyone uses it. If you look at the URL for this post, for example, it is timjoh.com/generate-slugs-from-strings-with-php. As you might have guessed, I never typed that hyphenated string–WordPress did it all for me. If your PHP script is fed a string that you want a similar slug from, you can use this handy one-liner, with $string being the string that you want to slug.

1	$slug = str_replace( ' ', '-', preg_replace( '/[^a-z0-9- ]/', '', strtolower( $string ) ) );

Or you could use the neater version:

1
2
3

$slug = strtolower( $string ); // lower-case the string
$slug = preg_replace( '/[^a-z0-9- ]/', '', $slug ); // remove all non-alphanumeric characters except for spaces and hyphens
$slug = str_replace( ' ', '-', $slug ); // substitute the spaces with hyphens

This turns the string I'm an U_G_L_Y string! into the slug im-an-ugly-string. Note that it is often useful to shorten the slug; no-one wants a metre-wide URL. It is also important to check so that the slug doesn’t already exist. We can add to the code a loop that appends an ID number to the slug in case it already exists, making it unique.

$slug = strtolower( $string ); // lower-case the string
$slug = preg_replace( '/[^a-z0-9- ]/', '', $slug ); // remove all non-alphanumeric characters except for spaces and hyphens
$slug = str_replace( ' ', '-', $slug ); // substitute the spaces with hyphens
$i = ''; // start with no appended value
while ( slugExists( $slug . $i ) ) { // if the slug already exists..
    $i++; // increment the appended value
    # it is bad practice to increment a string
    # but in this case, it simplifies the code
}
$slug .= $i; // append the value to the real slug
 
function slugExists( $slug ) {
    # this is just an example function;
    # the real function depends on the layout of your script
    $query = sprintf( 'SELECT post_id FROM posts WHERE post_slug = \'%s\'', // write the MySQL query
        mysql_real_escape_string( $slug ) // escape the string correctly (make sure that magic_quotes_gpc is turned off)
    );
    $result = mysql_query( $query ); // assuming that we are connected to MySQL
    if ( mysql_num_rows( $result ) ) { // if a post with this slug was found..
        return TRUE; // ..this slug exists
    } else { // if not..
        return FALSE; // ..this slug does not exist
    }
}

Now you can slug whatever you want, without fearing collisions. This means that you can even use the slug column as an ID instead of post_id in this example.

Perl Script to Parse IRC Logs for Use in MegaHAL.trn

MegaHAL is an interesting AI engine that has no knowledge of the words; it only reassembles sentences and abuses the ability of the human mind to read order in chaos. If you are into MegaHAL IRC bots, the following Perl code might be handy:

#!/usr/bin/perl
while ( <> ) {
    s/<.*>//;
    s/^[^ ]*://;
    print unless m/^\*/;
}

This is for mIRC-style logs without timestamps, i.e:

<Tim> This is an example.
* Tim explains

Line 3 removes the nickname from each line. If the first word in a line is followed by a colon, the fourth line removes that word and the colon. This is useful since people often highlight each other in this style, and you do not want those nicknames. Line 5 prints the resulting fixed line, unless it starts with a *, i.e. CTCP ACTIONs (”/me”) and server messages are not printed. To use the perl script just pipe the output to the MegaHAL training file:

$ perl logfix.pl SomeNet-somechan.log > megahal.trn

Perl: Post automatically to WordPress blog with wp-poster.pl

I just wrote a Perl script that can automatically post to a WordPress blog. This could come very handy if you want to post a certain post at times. This could be anything, from a list of what you listen to every day, to your bookmarks, to your AdSense earnings, to the status of a server.

The code was tested on the latest version of WordPress, WP 2.2 (actually, it was WordPress MU 1.2.1), but I suspect that it will work fine for almost every version. Three HTTP requests are required; one to log in, one to get some security-check-values (_wpnonce and temp_ID), and one to make the actual post. You need the LWP (lib-www-perl) package, which you probably already do have, and if you don’t, you can get it freely at cpan.

#!/usr/bin/perl
 
#    Copyright 2007 Tim Johansson
#    /
 
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 
use strict;
use LWP::UserAgent;
 
# change this
my $burl = '/'; # don't forget the trailing slash
my ( $usr, $pwd, $uid ) = ( 'USERNAME', 'PASSWORD', 1 ); # uid=1 if you are the initial administrator
 
# ua
my $ua = LWP::UserAgent->new;
$ua->agent('wp-poster');
$ua->cookie_jar( {} );
 
# login
my $req = HTTP::Request->new( POST => $burl . 'wp-login.php' );
$req->content_type('application/x-www-form-urlencoded');
$req->content( sprintf('log=%s&pwd=%s&wp-submit=1&redirect_to=wp-admin/',$usr,$pwd) );
my $res = $ua->request( $req );
 
# get _wpnonce and temp_id
$req = HTTP::Request->new( POST => $burl . 'wp-admin/post-new.php' );
$res = $ua->request( $req );
if ( $res->is_success ) {
        if ( $res->content =~ m/"_wpnonce" value="([0-9a-f9]+)".*'temp_ID' value='(-?[0-9]+)'/s ) {
                # post
                $req = HTTP::Request->new( POST => $burl . 'wp-admin/post.php' );
                $req->content_type('application/x-www-form-urlencoded');
                $req->content( sprintf(
                        '_wpnonce=%s' .
                        '&user_ID=%d' .
                        '&action=post&originalaction=post&post_type=post' .
                        '&temp_ID=%s' .
                        '&advanced_view=1' .
                        #'&comment_status=open' .
                        #'&ping_status=open' .
                        #'&post_password=' .
                        #'&post_name=' . #slug
                        '&post_status=publish' .
                        #'&edit_date=1' .
                        '&post_title=%s' .
                        '&content=%s' .
                        '&post_pingback=1' .
                        '&prev_status=draft' .
                        '&publish=Publish' .
                        '&referredby=redo'
                        ,
                        $1, #nonce
                        $uid,
                        $2, #tempid
                        'test', #title
                        'Testar.' #content
                        )
                );
                $res = $ua->request( $req );
                &debug($res, 2);
        } else { &debug($res, 1); print $res->content }
} else { &debug($res, 0) }
 
sub debug {
        my ( $res, $id ) = @_;
        print $id, ': ', $res->status_line, "\n";
        foreach ( $res->header_field_names ) {
                print $_, ': ', $res->header($_), "\n";
        }
        return;
}

Please do not use this code for any malicious purpose. Spamming is evil.