Hudzilla.org - the homepage of Paul Hudson
Contents > Miscellaneous topics > Spellchecking and text matching Wish List | Report Bug | About Me ]

16.7.1     Calculating similarity of words: similar_text()

This is NOT the latest copy of this book; click here for the latest version.

int similar_text ( string first, string second [, float percent])

You can calculate how similar two words are by using the function similar_text. It takes a minimum of two parameters (the words to check) and returns an integer reporting the number of letters matched between the two. Note that this is actually quite a smart function, as demonstrated by the following script:

<?php
    $word1
= "Connotation";
    
$word2 = "Annotation";
    
$match = similar_text($word1, $word2);
    echo
"$match letters are the same between '$word1' and '$word2'\n";
?>

That outputs "9 letters are the same between 'Connotation' and 'Annotation'" - it sees that the two words are identical apart from the opening letter or two.

For additional power you can pass a third parameter where PHP will store a percentage score of the match. This makes our script look like this:

<?php
    $word1
= "Connotation";
    
$word2 = "Annotation";
    
$match = similar_text($word1, $word2, $percent);
    
$percent = round($percent, 2);
    echo
"$match letters are the same between '$word1' and '$word2': a $percent% match.\n";
?>

This time you should get "9 letters are the same between 'Connotation' and 'Annotation': a 85.71% match.". If you don't put the call to round() in there the percentage is likely to be very long!

So, with this new way to compare similarities of words, we can rewrite our sentence suggestion script so that it reports how alike each suggestion is to the original word: now we're finally getting to the stage of making cool scripts with this stuff!

<?php
    $pspell
= pspell_new("en");
    
$sentence = "The quik brown fox jumpd over the lazyyy dog";
    
$words = explode(" ", $sentence);

    foreach(
$words as $word) {
        if (
pspell_check($pspell, $word)) {
            
// this word is fine; print as-is
            
echo $word, " ";
        } else {
            
// this word is bad; look for suggestions
            
$suggestions = pspell_suggest($pspell, $word);

            if (
count($suggestions)) {
                
// we have suggestions for this word; print them out
                
echo " <SELECT>";

                foreach(
$suggestions as $suggestion) {
                    
$match = similar_text($word, $suggestion, $percent);
                    
$percent = round($percent, 2);

                    echo
"<OPTION>$suggestion ($percent%)</OPTION>";
                }

                echo
"</SELECT> ";
            } else {
                
// no suggestions; just print the word
                
echo $word;
            }
        }
    }
?>


That new script is just the combination of the previous two, so there should be no surprises in there. Having said that, looking at the screenshot you should notice that the suggestions for "jumpd" aren't sorted according to the absolute similarity with the word. That's a relatively easy fix to make, so I'll leave it as a challenge to you!





<< 16.7 Spellchecking and text matching   16.8 Templates >>
Table of Contents
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.



My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!



Top-right shadow
 
Bottom-left shadow Bottom shadow

Comments from other readers
Be the first to add a comment to this chapter!



Add comment
Please note that by posting a comment here you are committing it to the public domain. This is important so that others can make use of your code themselves, and also so that I can incorporate helpful notes directly into the main text. Comments are limited to 2000 characters in length.

If you are reporting an error in the content, please tell me directly.

Your name/email address:
Your comment:
 
Now, in order to verify that you're a real person, please answer this simple question: what is zero plus five?
The answer is:
(please write in
numbers, eg 19)


Top-right shadow
 
Bottom-left shadow Bottom shadow