Programming Praxis – Benford’s Law

In today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more likely to be the first digit of numbers found i large collections of data) holds for the littoral areas of lakes in Minnesota. Let’s get started, shall we?

Some imports:

import Data.List
import Text.Printf

The algorithm for calculating the distribution of leading digits given a group of numbers is virtually identical to the Scheme version, despite the fact that I didn’t look at the solution beforehand. It’s not too surprising though, since it’s simply the most obvious method. Note that the function argument is a list of floats. Initially I assumed all areas were integers, which resulted in incorrect results until I found that there were 10 floats hidden in the input (thank god for regular expressions).

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
    toEnum (length xs))) . group . sort $ map (head . show) xs

With that function out of the way, the problem becomes trivial: just call firstDigits on all the appropriate numbers.

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]

Of course we need to run the algorithm over the given data, using the parser from two exercises ago:

main :: IO ()
main = either print (mapM_ (uncurry $ printf "%c %f\n") . shriram)
       =<< readDB csv "csv.txt"

This produces the same list of percentages as the Scheme version. Looks like Benford’s Law holds in this case as well.

Tags: , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: