16.7.1 Calculating similarity of words: similar_text()This is NOT the latest copy of this book; click here for the latest version.
int similar_text ( string first, string second [, float percent])
You can calculate how similar two words are by using the function similar_text. It takes a minimum of two parameters (the words to check) and returns an integer reporting the number of letters matched between the two. Note that this is actually quite a smart function, as demonstrated by the following script:
<?php
$word1 = "Connotation";
$word2 = "Annotation";
$match = similar_text($word1, $word2);
echo "$match letters are the same between '$word1' and '$word2'\n"; ?>
That outputs "9 letters are the same between 'Connotation' and 'Annotation'" - it sees that the two words are identical apart from the opening letter or two.
For additional power you can pass a third parameter where PHP will store a percentage score of the match. This makes our script look like this:
<?php
$word1 = "Connotation";
$word2 = "Annotation";
$match = similar_text($word1, $word2, $percent);
$percent = round($percent, 2);
echo "$match letters are the same between '$word1' and '$word2': a $percent% match.\n"; ?>
This time you should get "9 letters are the same between 'Connotation' and 'Annotation': a 85.71% match.". If you don't put the call to round() in there the percentage is likely to be very long!
So, with this new way to compare similarities of words, we can rewrite our sentence suggestion script so that it reports how alike each suggestion is to the original word: now we're finally getting to the stage of making cool scripts with this stuff!
<?php
$pspell = pspell_new("en");
$sentence = "The quik brown fox jumpd over the lazyyy dog";
$words = explode(" ", $sentence);
foreach($words as $word) {
if (pspell_check($pspell, $word)) {
// this word is fine; print as-is
echo $word, " ";
} else {
// this word is bad; look for suggestions
$suggestions = pspell_suggest($pspell, $word);
if (count($suggestions)) {
// we have suggestions for this word; print them out
echo " <SELECT>";
foreach($suggestions as $suggestion) {
$match = similar_text($word, $suggestion, $percent);
$percent = round($percent, 2);
echo "<OPTION>$suggestion ($percent%)</OPTION>";
}
echo "</SELECT> ";
} else {
// no suggestions; just print the word
echo $word;
}
}
} ?>

That new script is just the combination of the previous two, so there should be no surprises in there. Having said that, looking at the screenshot you should notice that the suggestions for "jumpd" aren't sorted according to the absolute similarity with the word. That's a relatively easy fix to make, so I'll leave it as a challenge to you!
|
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.
My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!
|