Today’s Programming Praxis problem is about word hyphenation. Let’s see what we can come up with.
First some imports
import Data.Char import Data.List import Data.List.HT
We define the exceptions by making a lookup table that zips the plain words to their hyphenated form.
exceptions :: [(String, String)] exceptions = zip (map (filter isLetter) ws) ws where ws = words "as-so-ciate as-so-ciates dec-li-na-tion \ \oblig-a-tory phil-an-thropic present presents \ \project projects reci-procity re-cog-ni-zance \ \ref-or-ma-tion ret-ri-bu-tion ta-ble"
The program consists of loading the patterns and testing two inputs.
main :: IO () main = do patterns <- fmap words $ readFile "patterns.txt" print $ hyphenate patterns "hyphenation" print $ hyphenate patterns "associate"
To hyphenate a word, we first check if it’s an exception, otherwise we do the actual hyphenation.
hyphenate :: [String] -> String -> String hyphenate ps s = maybe (hyphenate' s ps) id $ lookup s exceptions
First we prepare the input and then we try to apply all the patterns we have. For example, the word hyphenation would result in .0h0y3p0h0e2n5a4t2i0o0n0. (see the Programming Praxis page for more detail). Then we replace all the odd numbers with hyphens and remove all the useless characters.
hyphenate' :: String -> [String] -> String hyphenate' s = concat . intersperse "-" . map (filter isLetter) . chop (\c -> isDigit c && odd (digitToInt c)) . foldl (flip (tryPattern . format)) ("." ++ format s ++ ".")
Formatting makes sure that a string consists of alternating letters and numbers (e.g. abc2d becomes a0b0c2d)
format :: String -> String format (x:y:xs) | all isLetter [x, y] = x : '0' : format (y:xs) format (x:xs) = x : format xs format [] = []
Trying a pattern means overlaying it everywhere it matches.
tryPattern :: String -> String -> String tryPattern _ [] = [] tryPattern p s = x : tryPattern p xs where (x:xs) = if match p s then overlay p s else s
When matching a pattern we ignore the numbers.
match :: String -> String -> Bool match (x:xs) (y:ys) = (all isDigit [x, y] || x == y) && match xs ys match xs _ = null xs
When overlaying two numbers, we are keep the highest one.
overlay :: String -> String -> String overlay p = zipWith max (p ++ repeat '0')
And there we go; a hyphenation algorithm in 40 lines. Not quite as short as I’d like, but it’s the best I could come up with.
Tags: hyphenation, kata, praxis, programming