Archive for December, 2009

Programming Praxis – A Statisticle Speling Korrecter

December 29, 2009

In today’s Programming Praxis exercise, we have to implement Peter Norvig’s spelling corrector. Let’s get started, shall we?

First, we’re going to need a bunch of imports.

import Data.Char
import Data.List
import qualified Data.List.Key as K
import qualified Data.Map as M
import qualified Data.Set as S
import qualified Data.Text as T

The first step is to create our dictionary of valid words. I’m using the Data.Text functions instead of the Prelude ones here for performance reasons, as the text file is fairly big at 6.2 MB.

nwords :: IO (M.Map String Int)
nwords = fmap (M.fromListWith (+) . map (flip (,) 1 . T.unpack) .
               T.splitBy (not . isLetter) . T.toLower . T.pack) $
         readFile "big.txt"

Creating all the words at edit distance 1 is fairly simple with some list comprehensions. The reason for using the Data.Set functions here is that nub from Data.List is horribly slow on large lists.

edits :: String -> [String]
edits word = S.elems . S.fromList $ dels ++ trans ++ repls ++ ins
    where s     = zip (inits word) (tails word)
          dels  = [a ++ b     | (a, _:b)   <- s]
          trans = [a ++ c:b:d | (a, b:c:d) <- s]
          repls = [a ++ c:b   | (a, _:b)   <- s, c <- ['a'..'z']]
          ins   = [a ++ c:b   | (a, b)     <- s, c <- ['a'..'z']]

The python version has separate functions for the words at edit distance 1 and 2, but thanks to lazy evaluation we can just write a one-size-fits all function.

known_edits :: M.Map String a -> String -> Int -> [String]
known_edits dict word n = filter (`M.member` dict) $
    iterate (edits =<<) [word] !! n

Correcting a word is then simply a matter of generating the alternatives and finding the most common one.

correct :: M.Map String Int -> String -> String
correct dict word = maybe word (K.maximum (`M.lookup` dict)) .
    find (not . null) $ map (known_edits dict word) [0..2]

Let’s see if everything went correctly:

main :: IO ()
main = do dict <- nwords
          print $ correct dict "speling"
          print $ correct dict "korrecter"

Yup. One spelling korrector… uhh… corrector, all done.


Programming Praxis – Permuted Index

December 22, 2009

In today’s Programming Praxis exercise we have to implement David Parnas’ permuted index system. Let’s get started.

We’re going to need some imports.

import Data.Char
import Data.List
import qualified Data.List.Key as K
import Text.Printf

We start by defining the list of stop words:

stopList :: [String]
stopList = words "a an and by for if in is of on the to"

Creating all the rotations of a line can be done with a simple list comprehension.

rot :: [String] -> [(String, String)]
rot xs = [(unwords a, unwords b) | (a, b) <- init $
          zip (inits xs) (tails xs), notElem (head b) stopList]

To print the index, we check how big the longest pre- and postfixes are and pad the others accordingly.

prettyPrint :: [(String, String)] -> IO ()
prettyPrint xs = mapM_ (\(a, b) -> printf "%*s   %-*s\n" l1 a l2 b) xs
    where l1 = maximum $ map (length . fst) xs
          l2 = maximum $ map (length . snd) xs

The only step left is to put all the rotations in the correct order.

permuteIndex :: String -> IO ()
permuteIndex = prettyPrint . K.sort (\(_, x) -> (map toLower x, x)) .
               concatMap (rot . words) . lines

A quick test shows that we get the correct output:

main :: IO ()
main = permuteIndex $ "All's well that ends well.\n" ++
                      "Nature abhors a vacuum.\n" ++
                      "Every man has a price.\n"

Making it work with files or console input is merely a matter of binding the result of readFile or getContents to permuteIndex.

Programming Praxis – Calculating Logarithms

December 18, 2009

In today’s Programming Praxis problem we have to implement Euler’s method for calculating logarithms (using Newton’s method for calculating square roots). Let’s get started.

First we define our desired accuracy.

eps :: Double
eps = 1e-7

Newton’s method is just repeating the same function until it’s close enough.

sqrt' :: Double -> Double
sqrt' n = head . filter (\x -> abs (x^2 - n) < eps) $
    iterate (\x -> x - (x^2 - n) / (2*x)) 1

In Euler’s algorithm we first find the smallest power of the base higher than n, and then keep calculating the geometric mean between the low and the high number, repeating for whichever of the two resulting intervals holds n until we’re close enough.

log' :: Double -> Double -> Double
log' b n = f 0 0 where
    f lo hi | b ** hi < n             = f lo (hi + 1)
            | abs (lo / hi - 1) < eps = arit
            | geom < n                = f arit hi
            | otherwise               = f lo arit
            where geom = sqrt' (b ** lo * b ** hi)
                  arit = (lo + hi) / 2

A quick test shows that everything is working correctly:

main = do print $ sqrt' 2
          print $ log' 10 612

Programming Praxis – Affine-Shift Cipher

December 15, 2009

In today’s Programming Praxis we have to implement the Affine-Shift Cipher. Let’s get going, shall we?

A quick import:

import Data.Char

Since both encoding an decoding have roughly the same structure, we’re going to abstract that out into a function.

convert :: (Int -> Int) -> String -> String
convert f = map (chr . (\i -> f (i - 65) `mod` 26 + 65) . ord . toUpper)

For decoding, we need to calculate the modular inverse of a number.

inverse :: Int -> Int -> Int
inverse x n = f (mod x n) 1 where
    f 0 _ = error "Numbers not coprime"
    f 1 a = a
    f y a = let q = - div n y in f (n + q * y) (mod (q * a) n)

Encoding and decoding is then simply a case of calling the convert function with the correct argument.

encode :: Int -> Int -> String -> String
encode a b = convert (\i -> a*i + b)

decode :: Int -> Int -> String -> String
decode a b = convert (\i -> inverse a 26 * (i-b))

All that’s left to do is test if everything works correctly.

main :: IO ()
main = do print $ encode 5 8 "BONSAICODE" == "NAVUIWSAXC"
          print $ decode 5 8 "NAVUIWSAXC" == "BONSAICODE"

All clear.

Programming Praxis – Word Count

December 8, 2009

In today’s Programming Praxis exercise, we have to implement the Unix wc command line utility. Let’s get started.

First, we need some imports.

import System.Environment
import Text.Printf

We handle the command line arguments with a bit of pattern matching.

parseOpts :: [String] -> ([Bool], [String])
parseOpts (('-':ps):args) = (map (`elem` ps) "lwc", args)
parseOpts args            = (replicate 3 True, args)

Next, we need to do the actual counting:

count :: [Bool] -> [(String, String)] -> [String]
count opts = map (\(name, text) -> concat
    [printf "%8s" $ if opt then show . length $ f text else "-"
    | (f, opt) <- zip [lines, words, map return] opts] ++ " " ++ name)

And finally, the program itself.

main :: IO ()
main = do args <- getArgs
          let (opts, files) = parseOpts args
          mapM_ putStrLn . count opts =<< if null files
              then fmap (\x -> [("", x)]) getContents
              else fmap (zip files) $ mapM readFile files

Programming Praxis – Autokey

December 4, 2009

In today’s Programming Praxis we have another cipher algorithm. Let’s get started.

A quick import:

import Data.Char

First of all, we need a function to add and subtract characters.

combine :: (Int -> Int -> Int) -> Char -> Char -> Char
combine f a b = chr $ mod (f (ord a) (ord b) - 2 * 65) 26 + 65

Encrypting or decrypting is just a matter of using this combine function on all the letter pairs of the key and the message.

cipher :: (Int -> Int -> Int) -> String -> String -> String
cipher f key msg = zipWith (combine f) (clean msg) (clean key)
                   where clean = map toUpper . filter isLetter

When encrypting, we can simply append the message to the key.

encrypt :: String -> String -> String
encrypt key msg = cipher (+) (key ++ msg) msg

When decrypting, we instead need to append the unencrypted message, which we do not have yet. Fortunately, thanks to lazy evaluation we can simply recursively call the decrypt function. This means it will produce a stack overflow when fed an empty key, but since that would be useless anyway we don’t really care.

decrypt :: String -> String -> String
decrypt key msg = cipher (-) (key ++ decrypt key msg) msg

All that’s left is to test our functions:

main :: IO ()
main = do print $ encrypt "BONSAI" "Pablo Picasso"
          print $ decrypt "BONSAI" "QOODOXXCBDGD"

That seems to work just fine. Another one down.

Programming Praxis – Wolves And Rabbits

December 1, 2009

The task in today’s Programming Praxis problem is to define a function that models the populations of wolves and rabbits. Let’s get started, shall we?

Like the provided solution, we use the second-order Runge-Kutta method to calculate the next time step. The resulting function is pretty straightforward:

pops :: Fractional a => a -> a -> a -> a -> a -> a -> [(a, a)]
pops r w rg wg rd wd = (r, w) : pops r' w' rg wg rd wd where
    dr x y = rg*x - rd*x*y
    dw x y = wg*x*y - wd*x
    rh     = r + dr r w / 2
    wh     = w + dw w r / 2
    r'     = r + dr rh wh
    w'     = w + dw wh rh

A quick test shows the same output as the provided solution:

main :: IO ()
main = mapM_ print . take 201 $ pops 40 15 0.1 0.005 0.01 0.1