Posts Tagged ‘search’

Programming Praxis – Binary Search Tree

March 5, 2010

In today’s Programming Praxis exercise we have to implement a Binary Search Tree. Let’s get started, shall we?

We need two imports:

import Control.Monad
import System.Random

The data structure is your run-of-the-mill binary tree.

data BTree k v = Node k v (BTree k v) (BTree k v) | Empty

Finding an element is pretty straightforward. Just keep taking the correct branch until we exhaust the tree or find what we want.

find :: (k -> k -> Ordering) -> k -> BTree k v -> Maybe v
find _   _ Empty          = Nothing
find cmp k (Node k' v' l r) = case cmp k k' of EQ -> Just v'
                                               LT -> find cmp k l
                                               GT -> find cmp k r

Inserting works the same way as find: move to the correct position and insert or replace the new value.

insert :: (k -> k -> Ordering) -> k -> v -> BTree k v -> BTree k v
insert _   k v Empty            = Node k v Empty Empty
insert cmp k v (Node k' v' l r) = case cmp k k' of
    EQ -> Node k v l r
    LT -> Node k' v' (insert cmp k v l) r
    GT -> Node k' v' l (insert cmp k v r)

Since the deletion algorithm calls for a random number, delete is an IO action. You can consider using unsafePerformIO to hide this (I did in my first draft), but I decided to stick with the honest, safer (though less convenient) version. Alternatively you could accept the occasional imbalance and just always start on the left.

delete :: (k -> k -> Ordering) -> k -> BTree k v -> IO (BTree k v)
delete _   _ Empty              = return Empty
delete cmp k t@(Node k' v' l r) = case cmp k k' of
    EQ -> fmap (flip deroot t . (== 0)) $ randomRIO (0,1 :: Int)
    LT -> fmap (flip (Node k' v') r) $ delete cmp k l
    GT -> fmap (      Node k' v'  l) $ delete cmp k r

For the deroot function we use a slightly different approach than the Scheme version. I’m not sure how that version deals with the case of one of the two branches being empty, but here they are explicitly included in the patterns. The rot-left and rot-right functions are rewritten as patterns.

deroot :: Bool -> BTree k v -> BTree k v
deroot _    Empty              = Empty
deroot _    (Node _ _ l Empty) = l
deroot _    (Node _ _ Empty r) = r
deroot True (Node k v l (Node rk rv rl rr)) =
    Node rk rv (deroot False $ Node k v l rl) rr
deroot _    (Node k v (Node lk lv ll lr) r) =
    Node lk lv ll (deroot True $ Node k v lr r)

Converting the search tree to a list is trivial.

toList :: BTree k v -> [(k, v)]
toList Empty          = []
toList (Node k v l r) = toList l ++ (k, v) : toList r

And, as always, a test to see if everything is working correctly:

main :: IO ()
main = do let t = foldr (uncurry $ insert compare) Empty $
                  [(n, n) | n <- [4,1,3,5,2]]
          print $ toList t
          print $ find compare 3 t
          print $ find compare 9 t
          print . toList =<< foldM (flip $ delete compare) t [4,2,3,5,1]

Programming Praxis – Grep

September 25, 2009

Before we begin, a small note: I’ll be on conference in Istanbul next week, so I won’t be here for the next two exercises.

In today’s Programming Praxis exercise we’re supposed to implement grep based on the regex matching functions we wrote in previous exercises. Let’s get started.

First, some imports.

import System.Environment
import Text.Printf

Obviously, we’re going to need the code we wrote in the previous two exercises.

--Code from the previous two exercises goes here

grepString shows all the lines of the input that (don’t) contain the regular expression, prefixed by the filename if one is provided.

grepString :: Bool -> Maybe String -> [Chunk] -> String -> IO ()
grepString b p cs = mapM_ (printf "%s%s\n" $ maybe "" (++ ": ") p) .
                    filter ((== b) . match cs) . lines

grep searches all the given files, or the standard input if none are provided, for the regular expression.

grep :: Bool -> [String] -> IO ()
grep _ []     = error "Not enough arguments provided"
grep b (r:ps) = either print (f ps) $ parseRegex r where
    f [] cs = grepString b Nothing cs =<< getLine
    f _  cs = mapM_ (\p -> grepString b (Just p) cs =<< readFile p) ps

Finally, main checks whether to display the lines that do or do not match.

main :: IO ()
main = f =<< getArgs where
           f ("-v":args) = grep False args
           f args        = grep True  args

And that’s it. See you guys in a week.

Programming Praxis – String Search: Rabin-Karp

September 1, 2009

Today’s Programming Praxis problem marks the end of the string search series. In this final entry, we have to implement the Rabin-Karp search algorithm. Let’s see what we can do.

First some imports:

import Data.Char
import Data.List

We need the ascii values of the characters in the string, and since the hash values we’re going to be using can get quite large, we’re going to use Integers.

asciis :: String -> [Integer]
asciis = map (fromIntegral . ord)

The hash function treats the list of ascii values as a base 256 number.

hash :: Num a => [a] -> a
hash = sum . zipWith (*) (iterate (* 256) 1) . reverse

For the algorithm, we need the hashes of all the pattern-length substrings of the search string.

hashes :: Int -> String -> [Integer]
hashes p xs = scanl (\s (f,t) -> 256*s - 256^p*f + t) (hash $ take p ascs) .
              zip ascs $ drop p ascs where ascs = asciis xs

With those helper functions defined, the search algorithm becomes fairly straightforward:

rabinKarp :: String -> Maybe Int -> String -> Maybe Int
rabinKarp p s = lookup (hash $ asciis p) . drop (maybe 0 id s) .
                flip zip [0..] . hashes (length p)

And as before, we run our algorithm through the test suite:

test :: (String -> Maybe Int -> String -> Maybe Int) -> IO ()
test f = do assertEqual (f ""   Nothing  "Hello World") (Just 0)
            assertEqual (f "He" Nothing  "Hello World") (Just 0)
            assertEqual (f "od" Nothing  "Bonsai Code") (Just 8)
            assertEqual (f "ef" Nothing  "Bonsai Code") (Nothing)
            assertEqual (f ""   (Just 1) "Hello World") (Just 1)
            assertEqual (f "He" (Just 1) "Hello World") (Nothing)
            assertEqual (f "od" (Just 1) "Bonsai Code") (Just 8)
            assertEqual (f "ef" (Just 1) "Bonsai Code") (Nothing)
         where assertEqual a b = print (a == b, a, b)

main :: IO ()
main = test rabinKarp

Everything’s working correctly. Not bad for 6 lines of code.

Programming Praxis – String Search: Boyer-Moore

August 29, 2009

In yesterday’s Programming Praxis problem we have to implement a more efficient string search algorithm than the brute force approach we did earlier, namely the Horspool variation of the Boyer-Moore algorithm. Let’s get started.

Our import:

import Data.Map (findWithDefault, fromList, (!))

The algorithm this time is a bit longer than the brute force one, but it’s nothing too bad. In lines 2-4 we cache some values to remove some duplication and possibly avoid recalculation. The last four lines are the actual algorithm and the first line just calls it with the proper initial arguments.

horspool :: Ord a => [a] -> Maybe Int -> [a] -> Maybe Int
horspool pat skip xs = f (lp - 1 + maybe 0 id skip) p' where
    (lp, lxs, p') = (length pat, length xs, reverse pat)
    t = fromList $ zip pat [lp - 1, lp - 2..]
    m = fromList $ zip [0..] xs
    f n []     = Just (n + 1)
    f n (p:ps) | n >= lxs   = Nothing
	       | p == m ! n = f (n - 1) ps
               | otherwise  = f (n + findWithDefault lp (m ! n) t) p'

When we test our algorithm with the same test suite as last time we can see everything is working correctly:

test :: (String -> Maybe Int -> String -> Maybe Int) -> IO ()
test f = do assertEqual (f ""   Nothing  "Hello World") (Just 0)
            assertEqual (f "He" Nothing  "Hello World") (Just 0)
            assertEqual (f "od" Nothing  "Bonsai Code") (Just 8)
            assertEqual (f "ef" Nothing  "Bonsai Code") (Nothing)
            assertEqual (f ""   (Just 1) "Hello World") (Just 1)
            assertEqual (f "He" (Just 1) "Hello World") (Nothing)
            assertEqual (f "od" (Just 1) "Bonsai Code") (Just 8)
            assertEqual (f "ef" (Just 1) "Bonsai Code") (Nothing)
         where assertEqual a b = print (a == b, a, b)

main :: IO ()
main = test horspool

Programming Praxis – Brute Force

August 21, 2009

Today’s Programming Praxis post is another easy one. We have to implement a brute-force string search. Let’s get started.

Our import:

import Data.List

Since it’s no extra effort, we’ll just make our search function generic.

bruteSearch :: Eq a => [a] -> Maybe Int -> [a] -> Maybe Int
bruteSearch n o = fmap fst . find (isPrefixOf n . snd) .
                  drop (maybe 0 id o) . zip [0..] . tails

With that out of the way, let’s define our test suite:

test :: (String -> Maybe Int -> String -> Maybe Int) -> IO ()
test f = do print $ f ""   Nothing  "Hello World" == Just 0
            print $ f "He" Nothing  "Hello World" == Just 0
            print $ f "od" Nothing  "Bonsai Code" == Just 8
            print $ f "ef" Nothing  "Bonsai Code" == Nothing
            print $ f ""   (Just 1) "Hello World" == Just 1
            print $ f "He" (Just 1) "Hello World" == Nothing
            print $ f "od" (Just 1) "Bonsai Code" == Just 8
            print $ f "ef" (Just 1) "Bonsai Code" == Nothing

And let’s run it to see if everything’s in order.

main :: IO ()
main = test bruteSearch

It works fine, but obviously it’s not the most efficient search algorithm. For that you’ll have to wait for the upcoming other search algorithms.

Programming Praxis – Ternary Search Tries

June 5, 2009

Today’s Programming Praxis problem is about Ternary search tries, which are basically hashmaps of strings to values, but which can be faster in some cases. We have to implement the data structure and provide functions to find, insert, update and delete items and a function to convert the trie to a list. In order to match the provided solution, we will test our code by putting all the words in the bible, as well as their frequencies, into a ternary trie. Our target is 52 lines (the size of the provided solution). Let’s go.

Our imports:

import Data.Char
import qualified Data.List.Key as K
import Prelude hiding (lookup)

The data structure has two small differences from the one in the provided solution: I use a Maybe instead of the bool+value approach, since it’s more idiomatic Haskell, and the split is a list instead of a single type. The reason for this is that it makes the ternary trie far more generic: the provided solution works only for strings, because in the insert function it uses the first ASCII character  as a default. By making the split a list, I can use an empty list as the default, which means I can also define a trie with e.g. lists of Int as the key. This would not be possible with a single item, since there is no way to get a default value for an arbitrary type.
Another thing you might notice are the strictness annotations on the three branches of a node. If you omit these (as I initially did) and try to cram the bible in a trie you’re going to go through stack space like a hot knife through butter, which means the program will be terribly slow. Note that in order to see the benefit you will have to compile the program: running it in ghci will still produce a stack overflow.

data TernaryTrie k v = Empty | Node { val :: Maybe v,
    split :: [k], lb :: !(TernaryTrie k v),
    eb :: !(TernaryTrie k v), gb :: !(TernaryTrie k v) }

For lookup we have two terminating cases (an empty node is reached or the key is found). The rest is just simple recursion.

lookup :: Ord k => [k] -> TernaryTrie k v -> Maybe v
lookup _      Empty = Nothing
lookup []     t     = val t
lookup (x:xs) t     = case compare [x] $ split t of
                           GT -> lookup (x:xs) $ gb t
                           LT -> lookup (x:xs) $ lb t
                           EQ -> lookup xs     $ eb t

Insert, update and delete all follow the same pattern. To get rid of the boring ‘take the correct action for each branch’ code I factored that out into the modify function to better comply with the DRY principle.

insert :: Ord k => [k] -> v -> TernaryTrie k v -> TernaryTrie k v
insert k  v Empty = insert k v $
                    Node Nothing (take 1 k) Empty Empty Empty
insert [] v t     = t { val = Just v }
insert k  v t     = modify (flip insert v) k t

update :: Ord k => [k] -> v -> (v -> v) ->
          TernaryTrie k v -> TernaryTrie k v
update k  v _ Empty = insert k v Empty
update [] v p t     = t { val = Just . maybe v p $ val t }
update k  v p t     = modify (\x -> update x v p) k t

delete :: Ord k => [k] -> TernaryTrie k v -> TernaryTrie k v
delete _  Empty = Empty
delete [] t     = t { val = Nothing }
delete k  t     = modify delete k t

modify :: Ord k => ([k] -> TernaryTrie k v -> TernaryTrie k v) ->
                   [k] -> TernaryTrie k v -> TernaryTrie k v
modify f k t = case compare (take 1 k) (split t) of
                    LT -> t { lb = f (drop 0 k) $ lb t }
                    EQ -> t { eb = f (drop 1 k) $ eb t }
                    GT -> t { gb = f (drop 0 k) $ gb t }

And more recursion for the enlist method.

enlist :: TernaryTrie k v -> [([k], v)]
enlist = enlist' [] where
    enlist' _ Empty = []
    enlist' k t     =
        maybe [] (\v -> [(k, v)]) (val t) ++ enlist' k (lb t) ++
        enlist' (k ++ split t) (eb t) ++ enlist' k (gb t)

And we test it by finding the 25 most common words in the bible.

main :: IO ()
main = print . take 25 . reverse . K.sort snd . enlist .
       foldl (\t k -> update k 1 succ t) Empty .
       map (map toLower . filter isAlpha) . words =<<
       readFile "bible.txt"

And there we go. With 35 lines it’s not a huge reduction (although the provided solution grows by about 15 lines if you reformat it to the 65-ish character limit I use on this blog), but that was to be expected since it’s mostly basic recursion, which doesn’t easily lend itself to alternative solutions. Still, it’ll do.

Programming Praxis – Word Search Solver

May 26, 2009

Today’s Programming Praxis problem is about word search solvers. The provided solution is 77 lines, so let’s see if we can improve on that.

Our imports:

import Data.List
import Data.Map (Map, fromList, member, keys, (!))
import Text.Printf

First let’s define the 8 directions that we can search in. The puzzle is going to be represented as a Map with a tuple of Ints as the key, so the directions are functions for transforming these keys.

dirs :: [(String, (Int, Int) -> (Int, Int))]
dirs = zip ["down right", "up right", "right", "down left",
            "up left", "left", "down", "up"] $
           [\(x,y) -> (x+h, y+v) | h <- [1,-1,0], v <- [1,-1,0]]

We’re going to enter the puzzle as a list of strings, but since that would make access an O(n2) operation we’re going to turn it into a Map instead, since that gives us O(log n2) access.

toGrid :: [[a]] -> Map (Int, Int) a
toGrid = fromList . concat .
         zipWith (\y -> zipWith (\x c -> ((x,y), c)) [1..]) [1..]

Next we need a function to check whether the search word appears at the given position in the given direction.

found :: (Eq a, Ord k) => k -> (k -> k) -> Map k a -> [a] -> Bool
found pos dir g w = isPrefixOf w . map (g !) .
                    takeWhile (flip member g) $ iterate dir pos

Finding the location and direction of a search word is then simply a matter of checking every direction for every position:

findWord :: Map (Int, Int) Char -> String -> String
findWord g w = head [printf "%s row %d column %d %s" w y x ds |
                     (x,y) <- keys g, (ds, dir) <- dirs,
                     found (x,y) dir g w]

That’s all we need, so let’s test if it works.

puzzle :: [String]
puzzle = ["FYYHNRD",
          "RLJCINU",
          "AAWAAHR",
          "NTKLPNE",
          "CILFSAP",
          "EOGOTPN",
          "HPOLAND"]

main :: IO ()
main = mapM_ (putStrLn . findWord (toGrid puzzle)) $ words
             "ITALY HOLLAND POLAND SPAIN FRANCE JAPAN TOGO PERU"

And indeed it does, using less than half the lines of the provided solution (almost a third if you ignore the lines required to define the puzzle). That will do nicely.