In today’s Programming Praxis exercise, we have to implement Peter Norvig’s spelling corrector. Let’s get started, shall we?
First, we’re going to need a bunch of imports.
import Data.Char import Data.List import qualified Data.List.Key as K import qualified Data.Map as M import qualified Data.Set as S import qualified Data.Text as T
The first step is to create our dictionary of valid words. I’m using the Data.Text functions instead of the Prelude ones here for performance reasons, as the text file is fairly big at 6.2 MB.
nwords :: IO (M.Map String Int) nwords = fmap (M.fromListWith (+) . map (flip (,) 1 . T.unpack) . T.splitBy (not . isLetter) . T.toLower . T.pack) $ readFile "big.txt"
Creating all the words at edit distance 1 is fairly simple with some list comprehensions. The reason for using the Data.Set functions here is that nub from Data.List is horribly slow on large lists.
edits :: String -> [String] edits word = S.elems . S.fromList $ dels ++ trans ++ repls ++ ins where s = zip (inits word) (tails word) dels = [a ++ b | (a, _:b) <- s] trans = [a ++ c:b:d | (a, b:c:d) <- s] repls = [a ++ c:b | (a, _:b) <- s, c <- ['a'..'z']] ins = [a ++ c:b | (a, b) <- s, c <- ['a'..'z']]
The python version has separate functions for the words at edit distance 1 and 2, but thanks to lazy evaluation we can just write a one-size-fits all function.
known_edits :: M.Map String a -> String -> Int -> [String] known_edits dict word n = filter (`M.member` dict) $ iterate (edits =<<) [word] !! n
Correcting a word is then simply a matter of generating the alternatives and finding the most common one.
correct :: M.Map String Int -> String -> String correct dict word = maybe word (K.maximum (`M.lookup` dict)) . find (not . null) $ map (known_edits dict word) [0..2]
Let’s see if everything went correctly:
main :: IO () main = do dict <- nwords print $ correct dict "speling" print $ correct dict "korrecter"
Yup. One spelling korrector… uhh… corrector, all done.