PHP: Calculate FRES and SMOG of a Text

A few days ago, I set up a site where students can upload their essays for money (Uppsatslotto.se: Tjäna pengar på dina uppsatser), and wanted a nice way for analyzing texts. I found two in particular, FRES (Flesch Reading Ease Score) and SMOG (Simple Measure of Gobledygook). Slightly modified excerpts from the respective Wikipedia articles (licensed under the GNU Free Document License):

The Flesch/Flesch–Kincaid Readability Tests are readability tests designed to indicate how difficult a reading passage is to understand. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level. Although they supposedly use the same measures, just placed into a different scale, the results of the two tests do not always correlate (a text with a better score on the Reading Ease test over another text may end up with a worse score on the Grade Level test). Both sytems were devised by Rudolf Flesch. In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark harder-to-read passages. The formula for the Flesch Reading Ease Score (FRES) test is 206.835 - 1.015 * W/Se - 84.6 * Sy/W where W/Se is the average number of words per sentence and Sy/W is the average number of syllables per word.

SMOG (Simple Measure Of Gobbledygook) is a readability formula that estimates the years of education needed to understand a piece of writing. SMOG is widely used, particularly for checking health messages. The precise SMOG formula yields an outstandingly high 0.985 correlation with the grades of readers who had 100% comprehension of test materials. SMOG was published by G. Harry McLaughlin in 1969 as a more accurate and more easily calculated substitute for the Gunning-Fog Index. The SMOG of a text can be calculated by: 1.0430 * sqrt( 30 * Psy/Se ) + 3.1291.

The following code is a PHP implementation for calculating the required values (number of words, number of sentences, number of syllables, and number of polysyllabic words) and putting it all together using the above mentioned formulae.

1
2
3
4
5
6
7
8
9
10
11
12
// Number of words: number of space series or linebreaks + 1
$wc = preg_match_all( '/[ \r]/', preg_replace( '/ +/', ' ', $text ), $tmp );
// Number of syllables: vowels not followed by another vowel. Quite accurate approximation.
$syc = preg_match_all( '/[aeiouy][^aeiouy]/', $text, $tmp );
// Number of polysyllabic words (>=3 syllables): Vowel, non-spaces, vowel, non-spaces, vowel (or more non-spaces-vowel)
$psyc = preg_match_all( '/[aeiouy]([^ ]*[aeiouy]){2,}/', $text, $tmp );
// Number of sentences: Number of periods, exclamation marks, question marks and linebreaks
$sec = preg_match_all( '/[.!?\r]/', $essayf, $tmp );
// Flesch Reading Ease Score
$fres = 206.835 - 1.015 * ( $wc / $sec ) - 84.6 * ( $syc / $wc );
// Simple Measure of Gobbledygook
$smog = 1.043 * sqrt( $psyc * ( 30 / $sec ) ) + 3.1291;

Maybe Related?

7 Comments »

  1. The FRES and SMOG analysing algorithms are interesting, since it almost all the time gives a correct judgement of the text. Nice work!

    Comment by Somerunce — August 1, 2007 @ 10:15 am

  2. The FRES and SMOG analysing algorithms are interesting, since they use very different ways to calculate the value and still gets approximate correct. For Swedish texts, the LIX-value is pretty great. Nice work!

    Comment by Somerunce — August 1, 2007 @ 10:19 am

  3. This implementation is so trivial that it is worthless. It ignores abbreviations. It does not consider a ; as a sentence delimiter. The syllable algorithm is not very accurate “aged”, for example. There are lots of web sites that have much more accurate implementations.

    Comment by Gil Carrick — August 1, 2007 @ 3:33 pm

  4. Gil, thanks for your comment. Despite its simplicity, I would not say that it is worthless. If your system has limited resources, it can be useful. In case you want exact values, you can always conduct a readability survey.

    Comment by Tim — August 1, 2007 @ 8:49 pm

  5. To: Gil Carrick

    Can you list other web sites that have the source code for readability? I don’t care if it is C#, Java, Smalltalk, etc.

    So far, Google search turned up this PHP implentation in my search. Maybe I have been using the wrong keywords.

    Comment by Anonymous — September 12, 2007 @ 7:47 pm

  6. For the “Number of words” you may want to use either str_word_count() provided by PHP or a simpler preg_match:
    $wc = preg_match_all( ‘/\s+/’, $text, $tmp );

    About your use of \r for line breaks: it appears that “\r” only matches MacOS 9 line breaks.. You may want to match “\r?\n|\r” instead.

    Comment by dAniel hAhler — November 16, 2007 @ 3:12 pm

  7. Tim, I like your style man…Nice blog..nice perl scripts and stuff-that is a skill I really admire-I want to learn perl myself but am caught up trying to teach myself C right now-that is right; plain old, powerful as anything God ever created K&R C! Anyways-I am thepheonixproject and it is nice to meet you-keep up the nice work…if ever you want to chat-find me on irc.freeshell.org #sdf and make sure you enter under a visibly recognizable nick-I would suggest TimJ Or TJohan or such…maybe I’ll see you around…p.s. #SDF is a public Unix system chat room-small tight crowd-lots of knowledgeable programmers and coders in there…they all belong to a shell site.

    Comment by thepheonixproject — June 21, 2008 @ 4:20 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

FireStats iconAnvänder FireStats