Posts Tagged ‘mapreduce’

Programming Praxis – MapReduce

October 6, 2009

In today’s Programming Praxis exercise, we have to implement the famous MapReduce algorithm. Let’s get going, shall we?

First, some imports:

import Control.Arrow
import Data.Char
import Data.List
import qualified Data.Map as M

Since I wasn’t here for the Red-Black Tree exercise, I’ll just use Maps.

mapReduce :: Ord k => (a -> (k, v)) -> (v -> v -> v) ->
                      (k -> k -> Bool) -> [a] -> [(k, v)]
mapReduce m r lt = sortBy (\(a,_) (b,_) -> if lt a b then LT else GT) .
                   M.assocs . M.map (foldl1 r) .
                   M.fromListWith (++) . map (second return . m)

With that, the version that works on files is trivial.

mapReduceInput :: Ord k => (a -> (k, v)) -> (v -> v -> v) ->
    (k -> k -> Bool) -> (String -> [a]) -> FilePath -> IO [(k, v)]
mapReduceInput m r lt g = fmap (mapReduce m r lt . g) . readFile

In order to test our algorithm, let’s reproduce the tests from Programming Praxis:

anagrams = map snd . mapReduce (sort &&& id) (\a b -> a ++ " " ++ b) (<)

getWords = concat . zipWith (\i -> map (\w -> (w, [i])) . words) [1..] .
           map (map clean) . lines where
           clean c = if isAlphaNum c || isSpace c then c else ' '

xref = mapReduceInput id (flip union) (<) getWords

main = do print $ mapReduce (\x -> (x, 1)) (+) (<) "banana"
          print $ anagrams ["time", "stop", "pots", "cars", "emit"]
          print =<< xref "mapreduce.txt"

Everything’s working correctly.