In today’s Programming Praxis exercise, our goal is to implement the Unix command line utility comm. Let’s get started, shall we?

Some imports:

import Control.Monad
import System.Environment
import Text.Printf
import System.IO
import qualified System.IO.Strict as SIO
import GHC.IO.Handle

Determining the common lines isn’t too difficult. We go trough the two lists element by element, putting them in column 1,2 or 3 as appropriate. Afterwards, we filter out the specified columns.

comm :: (Num b, Ord a) => [b] -> [a] -> [a] -> [(a, b)]
comm flags zs = filter ((`notElem` flags) . snd) . f zs where
    f xs     []     = map (flip (,) 1) xs
    f []     ys     = map (flip (,) 2) ys
    f (x:xs) (y:ys) = case compare x y of 
        LT -> (x,1) : f xs     (y:ys)
        GT -> (y,2) : f (x:xs) ys
        EQ -> (x,3) : f xs     ys

Displaying the results in columns can be achieved with printf.

columns :: [(String, Int)] -> IO ()
columns xs = let width = maximum (map (length . fst) xs) + 2 in
    mapM_ (\(s,c) -> printf "%*s%-*s\n" ((c - 1) * width) "" width s) xs

Handling the arguments is fairly straightforward for the most part, with one exception: if the input for both files comes from stdin, the default getContents function will not work for two reasons: first, since the handle gets closed after the first one, the second call to getContents will fail. The way to resolve this is to duplicate the handle to stdin. Secondly, since getContents is lazy by default it will read the first file from stdin first, marking each line as unique to the first file, followed by doing the same thing for the second file. We therefore need to read both files strictly first. Both problems are resolved by the newStdIn function.

main :: IO ()
main = do args <- getArgs
          columns =<< case args of
              (('-':p:ps):fs) -> go (map (read . return) (p:ps)) fs
              fs              -> go [] fs
    where go args ~[f1, f2] = liftM2 (comm args) (file f1) (file f2)
          file src = fmap lines $ if src == "-" then newStdIn
                                                else readFile src
          newStdIn = catch (SIO.hGetContents =<< hDuplicate stdin)
                           (\_ -> return [])

