Posts Tagged ‘tries’

Programming Praxis – Ternary Search Tries

June 5, 2009

Today’s Programming Praxis problem is about Ternary search tries, which are basically hashmaps of strings to values, but which can be faster in some cases. We have to implement the data structure and provide functions to find, insert, update and delete items and a function to convert the trie to a list. In order to match the provided solution, we will test our code by putting all the words in the bible, as well as their frequencies, into a ternary trie. Our target is 52 lines (the size of the provided solution). Let’s go.

Our imports:

import Data.Char
import qualified Data.List.Key as K
import Prelude hiding (lookup)

The data structure has two small differences from the one in the provided solution: I use a Maybe instead of the bool+value approach, since it’s more idiomatic Haskell, and the split is a list instead of a single type. The reason for this is that it makes the ternary trie far more generic: the provided solution works only for strings, because in the insert function it uses the first ASCII character  as a default. By making the split a list, I can use an empty list as the default, which means I can also define a trie with e.g. lists of Int as the key. This would not be possible with a single item, since there is no way to get a default value for an arbitrary type.
Another thing you might notice are the strictness annotations on the three branches of a node. If you omit these (as I initially did) and try to cram the bible in a trie you’re going to go through stack space like a hot knife through butter, which means the program will be terribly slow. Note that in order to see the benefit you will have to compile the program: running it in ghci will still produce a stack overflow.

data TernaryTrie k v = Empty | Node { val :: Maybe v,
    split :: [k], lb :: !(TernaryTrie k v),
    eb :: !(TernaryTrie k v), gb :: !(TernaryTrie k v) }

For lookup we have two terminating cases (an empty node is reached or the key is found). The rest is just simple recursion.

lookup :: Ord k => [k] -> TernaryTrie k v -> Maybe v
lookup _      Empty = Nothing
lookup []     t     = val t
lookup (x:xs) t     = case compare [x] $ split t of
                           GT -> lookup (x:xs) $ gb t
                           LT -> lookup (x:xs) $ lb t
                           EQ -> lookup xs     $ eb t

Insert, update and delete all follow the same pattern. To get rid of the boring ‘take the correct action for each branch’ code I factored that out into the modify function to better comply with the DRY principle.

insert :: Ord k => [k] -> v -> TernaryTrie k v -> TernaryTrie k v
insert k  v Empty = insert k v $
                    Node Nothing (take 1 k) Empty Empty Empty
insert [] v t     = t { val = Just v }
insert k  v t     = modify (flip insert v) k t

update :: Ord k => [k] -> v -> (v -> v) ->
          TernaryTrie k v -> TernaryTrie k v
update k  v _ Empty = insert k v Empty
update [] v p t     = t { val = Just . maybe v p $ val t }
update k  v p t     = modify (\x -> update x v p) k t

delete :: Ord k => [k] -> TernaryTrie k v -> TernaryTrie k v
delete _  Empty = Empty
delete [] t     = t { val = Nothing }
delete k  t     = modify delete k t

modify :: Ord k => ([k] -> TernaryTrie k v -> TernaryTrie k v) ->
                   [k] -> TernaryTrie k v -> TernaryTrie k v
modify f k t = case compare (take 1 k) (split t) of
                    LT -> t { lb = f (drop 0 k) $ lb t }
                    EQ -> t { eb = f (drop 1 k) $ eb t }
                    GT -> t { gb = f (drop 0 k) $ gb t }

And more recursion for the enlist method.

enlist :: TernaryTrie k v -> [([k], v)]
enlist = enlist' [] where
    enlist' _ Empty = []
    enlist' k t     =
        maybe [] (\v -> [(k, v)]) (val t) ++ enlist' k (lb t) ++
        enlist' (k ++ split t) (eb t) ++ enlist' k (gb t)

And we test it by finding the 25 most common words in the bible.

main :: IO ()
main = print . take 25 . reverse . K.sort snd . enlist .
       foldl (\t k -> update k 1 succ t) Empty .
       map (map toLower . filter isAlpha) . words =<<
       readFile "bible.txt"

And there we go. With 35 lines it’s not a huge reduction (although the provided solution grows by about 15 lines if you reformat it to the 65-ish character limit I use on this blog), but that was to be expected since it’s mostly basic recursion, which doesn’t easily lend itself to alternative solutions. Still, it’ll do.