Posts Tagged ‘rnf’

Strictbench 0.1

June 7, 2009

In the post ‘Forcing evaluation in Haskell‘ , I described how to fully evaluate a value to get around Haskell’s lazy evaluation. Since then, I’ve found myself using the following snippet a lot:

import Control.Parallel.Strategies
import Test.BenchPress

bench 1 . print . rnf

This snippet fully evaluates a value and prints how long it took to do so. I regularly use it on the Programming Praxis problems to see where the bottleneck lies in my algorithm.  It has the minor annoyance, however, that it prints a lot of information (min, max, mean, median, percentiles) that is all identical, because I only run it once. The reason I only run it once is that I’m typically evaluating a pure value, which means that any subsequent attempts to benchmark the evaluation time will take no time at all, since it has already been evaluated.

To solve this, I decided to write a small library to make this process easier and only print the time taken once. The result is StrictBench. A short example:

import Test.StrictBench

main = bench [1..10000000 :: Integer]

This code would give

2890.625 ms

as output. For the rest of the documentation I refer you to the Hackage page. The source code is pretty simple:

module Test.StrictBench (bench, benchDesc, time) where

import Control.Parallel.Strategies
import Test.BenchPress hiding (bench)
import Text.Printf

bench :: NFData a => a -> IO ()
bench = (putStrLn . (++ " ms") . show =<<) . time

benchDesc :: NFData a => String -> a -> IO ()
benchDesc s = (putStrLn . printf "%s: %s ms" s . show =<<) . time

time :: NFData a => a -> IO Double
time = fmap (median . fst) . benchmark 1 (return ())
       (const $ return ()) . const . putStr . (`seq` "") . rnf

Nothing complicated, but a nice convenience library that I’ll be using from now on.

Forcing evaluation in Haskell

April 27, 2009

As you might know, Haskell is a lazy language. This means that nothing is evaluated until it is actually needed. This allows you to do nice things like

foo = take 10 [0..]

without taking infinity to evaluate the list. Until yesterday, however, I didn’t fully appreciate quite how literally Haskell takes this concept. I was working on a BMP importer and had the following data structure:

data Image = Image { width :: Int, height :: Int, pixels :: [[Pixel]] }

data Pixel = Pixel { red :: Int, green :: Int, blue :: Int, alpha :: Int }
             deriving Show

Pretty simple. Now let’s make a function that generates a bitmap:

makeImage :: Int -> Int -> Image
makeImage w h = Image w h . replicate h . replicate w $ Pixel 255 0 0 255

Let’s jump to ghci to test working with a reasonable sized bitmap.

> :set +s

This makes ghci report how long everything takes and how much memory it requires.

> width $ makeImage 800 600
(0.05 secs, 527520 bytes)

As the more perceptive of you may have noticed, we have 800 * 600 = 480000 pixels. Each pixel has four ints, which are 4 bytes each, so the full data structure should take 480000 * 16 = 7680000 bytes, or 7 MB at the very least. Our command only took 527 KB, so obviously it’s not evaluating the whole thing, which is logical. It doesn’t need the pixels, so it doesn’t evaluate them.

So how do we make it evaluate all the pixels? Until yesterday i thought getting the last pixel via (last . last) would work, figuring it should have to evaluate the whole thing to give me the last element. Let’s try that.

> last . last . pixels $ makeImage 800 600
Pixel {red = 255, green = 0, blue = 0, alpha = 255}
(0.00 secs, 522388 bytes)

Well, that was fast. But if we look at the memory used we see the obvious problem: it’s still not nearly enough. GHC is a clever compiler, so it skips doing the work for all the other pixels because it sees it doesn’t have to. Unfortunately it took me a whole lot longer to see this problem since I was testing a bigger section of code, which took a lot more memory, obscuring this problem.

Fortunately the kind people in the #haskell IRC channel pointed out this problem and also provided a way to force the evaluation of an object, namely the rnf function in Control.Parallel.Strategies. So let’s try that:

import Control.Parallel.Strategies

Go to ghci again and type

> rnf . pixels $ makeImage 800 600
    No instance for (NFData Pixel)
      arising from a use of `rnf' at :1:0-2
    Possible fix: add an instance declaration for (NFData Pixel)
    In the first argument of `(.)', namely `rnf'
    In the first argument of `($)', namely `rnf . pixels'
    In the expression: rnf . pixels $ makeImage 800 600

Hm. We’ll need to make Pixel an instance of NFData. Had this instead been a [[Int]] or another common data type it would have worked out of the box. Now we could just say

instance NFData Image

but this would tell only part of the story. If we look at the source code for Control.Parallel.Strategies we see that rnf does nothing by default. Using this version we would indeed evaluate the whole list (because the instance for lists has been correctly defined), but leave the pixels themselves unevaluated. In my test this resulted in a reported time of about 5 seconds instead of the 9-10 it actually took to fully load the image. Another look at the source code reveals that the correct implementation is very simple: just seq everything together.

instance NFData Pixel where
    rnf (Pixel r g b a) = rnf r `seq` rnf g `seq` rnf b `seq` rnf a

Let’s also make Image an instance of NFData for good measure.

instance NFData Image where
    rnf (Image w h ps) = rnf w `seq` rnf h `seq` rnf ps

Let’s try evaluating the image again:

> rnf $ makeImage 800 600
(1.78 secs, 92503000 bytes)

Quite a difference. Now we have a realistic memory use and the actual time it takes to fully evalutate the image. So if you’re ever unsure if something has been fully evaluated or if your compiler is just being sneaky, remember the rnf function. It might save you a few hours of confusion 🙂