Posts Tagged ‘evaluation’

Programming Praxis – Streaming Knapsack

May 15, 2012

In today’s Programming Praxis exercise, our goal is to write that a function that gives the first possible knapsack, i.e. the first combination of k values that sum up to a given value n, from a lazy list. The list should not be evaluated further than necessary. Let’s get started, shall we?

Some imports:

import Data.List
import Debug.Trace
import System.IO

Rather than constructing a matrix we use a solution that probably isn’t as efficient, but significantly more compact and easy to understand. Just take all possible combinations of numbers and find the first one of length k that sums up to n.

knapsack :: Num a => Int -> a -> [a] -> Maybe [a]
knapsack k n = find (\s -> length s == k && sum s == n) . subsequences

To demonstrate that the function is indeed lazy, we apply it to a list that indicates when one of its elements is evaluated. The hFlush is to make sure the result of the first function is printed before the second list gets evaluated, since without it the two results will be printed at the end.

main :: IO ()
main = do print . knapsack 7 37 $ map debugEval [1..20]
          hFlush stdout
          print . knapsack 7 17 $ map debugEval [1..20]
    where debugEval n = trace ("evaluated: " ++ show n) n

Programming Praxis – Carl Hewitt’s Same-Fringe Problem

August 3, 2010

In today’s Programming Praxis exercise we have to implement Carl Hewitt’s same-fringe algorithm, which determines if two binary trees have the same leaves in the same order, regardless if their respective structures. The simple solution of flattening both trees and comparing them doesn’t work in most languages, since their strict evaluation requires that the entire flattened list be loaded into memory, which will fail on big trees. Haskell on the other hand is lazy by default, so we don’t need to do anything complicated to have things work correctly. Let’s get started, shall we?

Since we’ll be working with binary trees, we’ll need to define a data structure for them:

data Tree a = Node (Tree a) (Tree a) | Leaf a

Flattening a tree is trivial.

flatten :: Tree a -> [a]
flatten (Node l r) = flatten l ++ flatten r
flatten (Leaf x)   = [x]

As mentioned in the intro, all we need to do is check whether the two flattened lists are equal.

sameFringe :: Eq a => Tree a -> Tree a -> Bool
sameFringe a b = flatten a == flatten b

As always, a test to see if things are working correctly:

main :: IO ()
main = print $ sameFringe (Node (Leaf 1) (Node (Leaf 2) (Leaf 3)))
                          (Node (Node (Leaf 1) (Leaf 2)) (Leaf 3))

Looks like they are. But how do we prove this method doesn’t require a big memory overhead? Well, let’s just create some big trees and find out. A binary tree of depth 28 has 2^28 = over 268 million leaves. At the very least, that would result in a list of 1 GB, and that’s assuming 32 bits per integer and no overhead. Since Haskell’s default lists are linked lists, you should probably at least double that. However, the following program

main = print $ sameFringe (treeOfDepth 28 (Leaf 1))
                          (treeOfDepth 28 (Leaf 1))
       where treeOfDepth n t = iterate (\x -> Node x x) t !! (n - 1)

hums along at a constant memory use of less than 20 MB when interpreted and less than 3 MB when compiled, the compiled version finishing in just under 32 seconds. To quote Larry Kersten, “Hard work often pays off after time, but laziness always pays off now.” 🙂

Programming Praxis – Expression Evaluation

April 16, 2010

In today’s Programming Praxis exercise our task to write a parser for simple mathematical expressions. Let’s get started, shall we?

Some imports:

import Control.Applicative
import Text.Parsec hiding ((<|>))

We’re going to be using a somewhat different approach than the provided Scheme solution. Rather than doing everything ourselves, we will use the Parsec library, which is the go-to solution for writing parsers in Haskell. This, however, results in a small limitation. Parsec is a parser combinator library, and parser combinators cannot deal with left-recursive grammars. The grammar in the assignment is left-recursive, because if we were to enter the rule expr = expr + term (as I did in my first attempt in solving this), our program will enter an infinite loop. Fortunately, left-recursive grammars can be fairly easily rewritten using the chain functions in Parsec.

An expression is one or more terms, separated by addition and subtraction operators.

expr = chainl1 term ((+) <$ char '+' <|> (-) <$ char '-')

A term works just like an expression, but with multiplication and divison.

term = chainl1 fact ((*) <$ char '*' <|> div <$ char '/')

Factors need no change from the specified grammar: they’re either numbers or expressions in parentheses.

fact = read <$> many1 digit <|> char '(' *> expr <* char ')'

Evaluating an expression is trivial. The only extra step is to filter out all the spaces, since they have not been defined in the grammar but are present in the test cases.

eval :: String -> Int
eval = either (error . show) id . parse expr "" . filter (/= ' ')

A quick test shows that our rewritten grammar passes all of the test cases correctly:

main :: IO ()
main = mapM_ print [eval "6+2"            == 8
                   ,eval "6-2"            == 4
                   ,eval "6*2"            == 12
                   ,eval "6/2"            == 3
                   ,eval "6 * 2"          == 12
                   ,eval "2+3*4"          == 14
                   ,eval "2*3+4"          == 10
                   ,eval "2+3+4"          == 9
                   ,eval "2-3-4"          == -5
                   ,eval "2*3*4"          == 24
                   ,eval "(2+3)*4"        == 20
                   ,eval "(2*3)+4"        == 10
                   ,eval "2+(3*4)"        == 14
                   ,eval "2*(3+4)"        == 14
                   ,eval "12 * (34 + 56)" == 1080

Using a parser library and slightly rewriting the grammar reduces the amount of required lines from 48 to 4. That’s a good trade-off in my book.

Strictbench 0.1

June 7, 2009

In the post ‘Forcing evaluation in Haskell‘ , I described how to fully evaluate a value to get around Haskell’s lazy evaluation. Since then, I’ve found myself using the following snippet a lot:

import Control.Parallel.Strategies
import Test.BenchPress

bench 1 . print . rnf

This snippet fully evaluates a value and prints how long it took to do so. I regularly use it on the Programming Praxis problems to see where the bottleneck lies in my algorithm.  It has the minor annoyance, however, that it prints a lot of information (min, max, mean, median, percentiles) that is all identical, because I only run it once. The reason I only run it once is that I’m typically evaluating a pure value, which means that any subsequent attempts to benchmark the evaluation time will take no time at all, since it has already been evaluated.

To solve this, I decided to write a small library to make this process easier and only print the time taken once. The result is StrictBench. A short example:

import Test.StrictBench

main = bench [1..10000000 :: Integer]

This code would give

2890.625 ms

as output. For the rest of the documentation I refer you to the Hackage page. The source code is pretty simple:

module Test.StrictBench (bench, benchDesc, time) where

import Control.Parallel.Strategies
import Test.BenchPress hiding (bench)
import Text.Printf

bench :: NFData a => a -> IO ()
bench = (putStrLn . (++ " ms") . show =<<) . time

benchDesc :: NFData a => String -> a -> IO ()
benchDesc s = (putStrLn . printf "%s: %s ms" s . show =<<) . time

time :: NFData a => a -> IO Double
time = fmap (median . fst) . benchmark 1 (return ())
       (const $ return ()) . const . putStr . (`seq` "") . rnf

Nothing complicated, but a nice convenience library that I’ll be using from now on.

Forcing evaluation in Haskell

April 27, 2009

As you might know, Haskell is a lazy language. This means that nothing is evaluated until it is actually needed. This allows you to do nice things like

foo = take 10 [0..]

without taking infinity to evaluate the list. Until yesterday, however, I didn’t fully appreciate quite how literally Haskell takes this concept. I was working on a BMP importer and had the following data structure:

data Image = Image { width :: Int, height :: Int, pixels :: [[Pixel]] }

data Pixel = Pixel { red :: Int, green :: Int, blue :: Int, alpha :: Int }
             deriving Show

Pretty simple. Now let’s make a function that generates a bitmap:

makeImage :: Int -> Int -> Image
makeImage w h = Image w h . replicate h . replicate w $ Pixel 255 0 0 255

Let’s jump to ghci to test working with a reasonable sized bitmap.

> :set +s

This makes ghci report how long everything takes and how much memory it requires.

> width $ makeImage 800 600
(0.05 secs, 527520 bytes)

As the more perceptive of you may have noticed, we have 800 * 600 = 480000 pixels. Each pixel has four ints, which are 4 bytes each, so the full data structure should take 480000 * 16 = 7680000 bytes, or 7 MB at the very least. Our command only took 527 KB, so obviously it’s not evaluating the whole thing, which is logical. It doesn’t need the pixels, so it doesn’t evaluate them.

So how do we make it evaluate all the pixels? Until yesterday i thought getting the last pixel via (last . last) would work, figuring it should have to evaluate the whole thing to give me the last element. Let’s try that.

> last . last . pixels $ makeImage 800 600
Pixel {red = 255, green = 0, blue = 0, alpha = 255}
(0.00 secs, 522388 bytes)

Well, that was fast. But if we look at the memory used we see the obvious problem: it’s still not nearly enough. GHC is a clever compiler, so it skips doing the work for all the other pixels because it sees it doesn’t have to. Unfortunately it took me a whole lot longer to see this problem since I was testing a bigger section of code, which took a lot more memory, obscuring this problem.

Fortunately the kind people in the #haskell IRC channel pointed out this problem and also provided a way to force the evaluation of an object, namely the rnf function in Control.Parallel.Strategies. So let’s try that:

import Control.Parallel.Strategies

Go to ghci again and type

> rnf . pixels $ makeImage 800 600
    No instance for (NFData Pixel)
      arising from a use of `rnf' at :1:0-2
    Possible fix: add an instance declaration for (NFData Pixel)
    In the first argument of `(.)', namely `rnf'
    In the first argument of `($)', namely `rnf . pixels'
    In the expression: rnf . pixels $ makeImage 800 600

Hm. We’ll need to make Pixel an instance of NFData. Had this instead been a [[Int]] or another common data type it would have worked out of the box. Now we could just say

instance NFData Image

but this would tell only part of the story. If we look at the source code for Control.Parallel.Strategies we see that rnf does nothing by default. Using this version we would indeed evaluate the whole list (because the instance for lists has been correctly defined), but leave the pixels themselves unevaluated. In my test this resulted in a reported time of about 5 seconds instead of the 9-10 it actually took to fully load the image. Another look at the source code reveals that the correct implementation is very simple: just seq everything together.

instance NFData Pixel where
    rnf (Pixel r g b a) = rnf r `seq` rnf g `seq` rnf b `seq` rnf a

Let’s also make Image an instance of NFData for good measure.

instance NFData Image where
    rnf (Image w h ps) = rnf w `seq` rnf h `seq` rnf ps

Let’s try evaluating the image again:

> rnf $ makeImage 800 600
(1.78 secs, 92503000 bytes)

Quite a difference. Now we have a realistic memory use and the actual time it takes to fully evalutate the image. So if you’re ever unsure if something has been fully evaluated or if your compiler is just being sneaky, remember the rnf function. It might save you a few hours of confusion 🙂