Archive for October 26th, 2010

Programming Praxis – Benford’s Law

October 26, 2010

In today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more likely to be the first digit of numbers found i large collections of data) holds for the littoral areas of lakes in Minnesota. Let’s get started, shall we?

Some imports:

import Data.List
import Text.Printf

The algorithm for calculating the distribution of leading digits given a group of numbers is virtually identical to the Scheme version, despite the fact that I didn’t look at the solution beforehand. It’s not too surprising though, since it’s simply the most obvious method. Note that the function argument is a list of floats. Initially I assumed all areas were integers, which resulted in incorrect results until I found that there were 10 floats hidden in the input (thank god for regular expressions).

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
    toEnum (length xs))) . group . sort $ map (head . show) xs

With that function out of the way, the problem becomes trivial: just call firstDigits on all the appropriate numbers.

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]

Of course we need to run the algorithm over the given data, using the parser from two exercises ago:

main :: IO ()
main = either print (mapM_ (uncurry $ printf "%c %f\n") . shriram)
       =<< readDB csv "csv.txt"

This produces the same list of percentages as the Scheme version. Looks like Benford’s Law holds in this case as well.