Posts Tagged ‘natural’

Programming Praxis – Natural Join

June 15, 2010

In today’s Programming Praxis exercise we have to implement a program that joins two files with tables in them. The provided Scheme solution is 29 lines, so let’s see if we can’t reduce that a little bit.

Some imports:

import Data.List
import Data.List.Split
import System.Environment

The tables are read from files with lines of tab-separated values.

loadFile :: FilePath -> IO [[String]]
loadFile = fmap (map (splitOn "\t") . lines) . readFile

Due to the assumption given in the exercise (the key is always the first field), the join algorithm can be pretty simple. It doesn’t take into account that the keys are sorted, so it’s not as efficient as it could be (O(n^2) versus O(n)), but for the small test case we’re using this doesn’t matter that much.

join :: Eq a => [[a]] -> [[a]] -> [[a]]
join (ks1:f1) (ks2:f2) = union ks1 ks2 :
    [k1 : v1++v2 | (k1:v1) <- f1, (k2:v2) <- f2, k1 == k2]
join _        _        = []

While the exercise only calls for two parameters, this would actually take more code than accepting an arbitrary number of them (>= 2), so we do that instead.

main :: IO ()
main = mapM_ putStrLn . map (intercalate "\t") .
       foldl1 join =<< mapM loadFile =<< getArgs

Six lines, one of which is technically not needed for the test case. Not too shabby.