In today’s Programming Praxis exercise we have to implement the soundex algorithm for encoding people’s last names. Let’s get started, shall we?
Some imports:
import Data.Char import Data.List
The algorithm itself is not all that complicated. My only mistake was that I failed to account for names where the second letter was equal to the first, such as Lloyd, since initially I only grouped after removing the first letter.
soundex :: String -> String soundex = f . map head . group . map toUpper where f [] = [] f (x:xs) = x : take 3 [toNum c | c <- xs ++ repeat '0', notElem c "AEHIOUWY"] toNum c = maybe '0' snd . find (elem c . fst) $ zip (words "BFPV CGJKQSXZ DT L MN R") ['1'..]
A test to see if everything is working correctly:
main :: IO () main = do test ["Euler", "Gauss", "Hilbert", "Knuth", "Lloyd", "Lukasiewicz"] test ["Ellery", "Ghosh", "Heilbronn", "Kant", "Ladd", "Lissajous"] where test xs = print $ map soundex xs == result result = ["E460", "G200", "H416", "K530", "L300", "L222"]
Yup, and at only about a third the size of the Scheme solution I’d say that’s not bad.
Tags: bonsai, code, Haskell, kata, last, name, praxis, programming, soundex
February 16, 2010 at 1:04 pm |
Lloyd: me too.
February 18, 2010 at 8:29 pm |
You are only eliminating duplicate letters and not duplicate numbers giving the incorrect value for the input “Ashcraft”
Consider the “findIndex” function from Data.List to make toNum more readable.
February 18, 2010 at 8:36 pm |
The US covernment site: archives.gov describes the soundex algorithm and links to an off-site calculator.
On this site Lloyd is coded L-300 and not L-430
February 18, 2010 at 9:12 pm |
In my code I always follow the same specifications as the Scheme solution, since this makes for easier comparison. Therefore, like the version on Programming Praxis, my algorithm produces A226 for Ashcraft and L300 for Lloyd. This is due the specific version of the algorithm used in the exercise. Wikipedia does indeed list a slightly different version that describes what to do when an h or w separates two consonants, as is the case in Ashcraft. Presumably, this is the same algorithm archives.gov uses. But since this was not part of the exercise. I didn’t implement it.