Programming Praxis – Word hypenation

Today’s Programming Praxis problem is about word hyphenation. Let’s see what we can come up with.

First some imports

import Data.Char
import Data.List
import Data.List.HT

We define the exceptions by making a lookup table that zips the plain words to their hyphenated form.

exceptions :: [(String, String)]
exceptions = zip (map (filter isLetter) ws) ws
    where ws = words "as-so-ciate as-so-ciates dec-li-na-tion \
                     \oblig-a-tory phil-an-thropic present presents \
                     \project projects reci-procity re-cog-ni-zance \
                     \ref-or-ma-tion ret-ri-bu-tion ta-ble"

The program consists of loading the patterns and testing two inputs.

main :: IO ()
main = do patterns <- fmap words $ readFile "patterns.txt"
          print $ hyphenate patterns "hyphenation"
          print $ hyphenate patterns "associate"

To hyphenate a word, we first check if it’s an exception, otherwise we do the actual hyphenation.

hyphenate :: [String] -> String -> String
hyphenate ps s = maybe (hyphenate' s ps) id $ lookup s exceptions

First we prepare the input and then we try to apply all the patterns we have. For example, the word hyphenation would result in .0h0y3p0h0e2n5a4t2i0o0n0. (see the Programming Praxis page for more detail). Then we replace all the odd numbers with hyphens and remove all the useless characters.

hyphenate' :: String -> [String] -> String
hyphenate' s = concat . intersperse "-" . map (filter isLetter) .
               chop (\c -> isDigit c && odd (digitToInt c)) .
               foldl (flip (tryPattern . format)) ("." ++ format s ++ ".")

Formatting makes sure that a string consists of alternating letters and numbers (e.g. abc2d becomes a0b0c2d)

format :: String -> String
format (x:y:xs) | all isLetter [x, y] = x : '0' : format (y:xs)
format (x:xs)   = x : format xs
format []       = []

Trying a pattern means overlaying it everywhere it matches.

tryPattern :: String -> String -> String
tryPattern _ [] = []
tryPattern p s  = x : tryPattern p xs
                  where (x:xs) = if match p s then overlay p s else s

When matching a pattern we ignore the numbers.

match :: String -> String -> Bool
match (x:xs) (y:ys) = (all isDigit [x, y] || x == y) && match xs ys
match xs     _      = null xs

When overlaying two numbers, we are keep the highest one.

overlay :: String -> String -> String
overlay p = zipWith max (p ++ repeat '0')

And there we go; a hyphenation algorithm in 40 lines. Not quite as short as I’d like, but it’s the best I could come up with.

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: