Programming Praxis – Soundex

In today’s Programming Praxis exercise we have to implement the soundex algorithm for encoding people’s last names. Let’s get started, shall we?

Some imports:

import Data.Char
import Data.List

The algorithm itself is not all that complicated. My only mistake was that I failed to account for names where the second letter was equal to the first, such as Lloyd, since initially I only grouped after removing the first letter.

soundex :: String -> String
soundex = f . map head . group . map toUpper where
    f []     = []
    f (x:xs) = x : take 3 [toNum c | c <- xs ++ repeat '0',
                                     notElem c "AEHIOUWY"]
    toNum c = maybe '0' snd . find (elem c . fst) $
              zip (words "BFPV CGJKQSXZ DT L MN R") ['1'..]

A test to see if everything is working correctly:

main :: IO ()
main = do test ["Euler",  "Gauss", "Hilbert",
                "Knuth",  "Lloyd", "Lukasiewicz"]
          test ["Ellery", "Ghosh", "Heilbronn",
                "Kant",   "Ladd",  "Lissajous"]
          where test xs = print $ map soundex xs == result
                result = ["E460", "G200", "H416",
                          "K530", "L300", "L222"]

Yup, and at only about a third the size of the Scheme solution I’d say that’s not bad.

About these ads

Tags: , , , , , , , ,

4 Responses to “Programming Praxis – Soundex”

  1. programmingpraxis Says:

    Lloyd: me too.

  2. glguy Says:

    You are only eliminating duplicate letters and not duplicate numbers giving the incorrect value for the input “Ashcraft”

    Consider the “findIndex” function from Data.List to make toNum more readable.

  3. glguy Says:

    The US covernment site: archives.gov describes the soundex algorithm and links to an off-site calculator.

    On this site Lloyd is coded L-300 and not L-430

  4. Remco Niemeijer Says:

    In my code I always follow the same specifications as the Scheme solution, since this makes for easier comparison. Therefore, like the version on Programming Praxis, my algorithm produces A226 for Ashcraft and L300 for Lloyd. This is due the specific version of the algorithm used in the exercise. Wikipedia does indeed list a slightly different version that describes what to do when an h or w separates two consonants, as is the case in Ashcraft. Presumably, this is the same algorithm archives.gov uses. But since this was not part of the exercise. I didn’t implement it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 35 other followers

%d bloggers like this: