Posts Tagged ‘lcs’

Programming Praxis – Diff

June 8, 2010

In today’s Programming Praxis exercise our task is to write a diff command line tool. Let’s get started, shall we?

First, we import a package that gives us a longest common subsequence function.

import Data.List.LCS.HuntSzymanski

There are three possible differences between files: deletions, additions and changes.

data Change = D | A | C

We define a few helper functions for the output: one to display the line numbers in the correct format, one for printing headers and one to show the lines that are different.

linenum :: (Int, Int) -> String
linenum (s, e) = if s == e then show s else show s ++ "," ++ show e

header :: (Int, Int) -> String -> (Int, Int) -> IO ()
header l op r = putStrLn $ linenum l ++ op ++ linenum r

section :: Char -> [String] -> IO ()
section c = mapM_ (\s -> putStrLn $ c:' ':s)

The diff function follows the same structure as that in the Scheme solution, but with some of the repetition abstracted out.

diff :: String -> String -> IO ()
diff xs ys = f 0 0 (lines xs) (lines ys) where
    f n1 n2 = g where
        g [] b  = change A [] b
        g a  [] = change D a []
        g a  b  = case lcs a b of
            []    -> change C a b
            (d:_) -> case (head a == d, head b == d) of
                (True, True) -> rec 1 1
                (True, _   ) -> change A q1 q2 >> rec len1 len2
                (_   , True) -> change D q1 q2 >> rec len1 len2
                _            -> change C q1 q2 >> rec len1 len2
                where [q1, q2] = map (takeWhile (/= d)) [a, b]
                      [len1, len2] = map length [q1, q2]
                      rec l r = f (n1+l) (n2+r) (drop l a) (drop r b)

        change D a _ = header (n1+1, n1+length a) "d" (n2, n2) >>
                       section '<' a
        change A _ b = header (n1, n1) "a" (n2+1, n2 + length b) >>
                       section '>' b
        change C a b = header (n1+1, n1+length a) "c" (n2+1, n2+length b) >>
                       section '<' a >> putStrLn "---" >> section '>' b

To test our function, just load two files and diff them.

main :: IO ()
main = do f1 <- readFile "file1.txt"
          f2 <- readFile "file2.txt"
          diff f1 f2

Sadly, this version isn’t all that much shorter than the Scheme version (about a third). There is still a small amount of duplicated code, but I don’t see a way of elegantly factoring it out.