Programming Praxis – Longest Common Subsequence

Today’s Programming Praxis problem is about finding the longest common subsequence of two sequences, and our target is 23 lines. Let’s go.

Our import:

import Data.Array

Hackage doesn’t seem to have an easily installable matrix package (hmatrix needs some C libraries), so we’re going to use an Array of Arrays to represent the grid of numbers. We’re going to define a little convenience function to access elements in the 2D array.

at :: Int -> Int -> Array Int (Array Int e) -> e
at x y a = a ! y ! x

And the longest common subsequence function itself.

lcs :: Eq a => [a] -> [a] -> [a]
lcs xs ys = reverse $ walk imax jmax where
    imax = length xs
    jmax = length ys
    ax = listArray (1, imax) xs
    ay = listArray (1, jmax) ys
    ls = listArray (0, jmax) [listArray (0, imax)
               [lcs' i j | i <- [0..imax]] | j <- [0..jmax]]

    lcs' 0 _ = 0
    lcs' _ 0 = 0
    lcs' i j | ax ! i == ay ! j = 1 + at (i-1) (j-1) ls
             | otherwise        = max (at (i-1) j ls) (at i (j-1) ls)

    walk 0 _ = []
    walk _ 0 = []
    walk i j | at (i-1) j ls == at i j ls = walk (i-1) j
             | at i (j-1) ls  < at i j ls = ax ! i : walk i (j-1)
             | otherwise                  = walk i (j-1)

And the usual test:

main :: IO ()
main = print $ lcs "programming" "praxis"

19 lines. Not much of an improvement. More importantly, the speed is far from optimal. The lcs in Data.List.LCS is considerably faster (though considerably longer). If anyone knows a better version, I’m all ears.

About these ads

Tags: , , , , , ,

3 Responses to “Programming Praxis – Longest Common Subsequence”

  1. programmingpraxis Says:

    How are arrays implemented in Haskell? Are arrays immutable?

    As I read your program, the ls array-of-arrays is built one element at a time, but no elements are ever mutated — they are assigned a value once, in the proper order, and referenced both by lcs as it builds succeeding values of ls and also by walk as it scans for the final result, but never mutated. Is that correct?

    I assume arrays are immutable — that’s how Haskell does everything else. So an array ‘update’ operation must somehow build one new array element and copy the unchanging array elements. Is there such a thing in Haskell? How is it implemented? I have to believe that no actual copying takes place.

    Thanks for your help.

    Phil

  2. Remco Niemeijer Says:

    “How are arrays implemented in Haskell? Are arrays immutable?”

    Haskell has both mutable and immutable arrays (the IArray and MArray modules on http://hackage.haskell.org/packages/archive/array/0.2.0.0/doc/html/index.html ). The default one (which I’m using) is immutable.

    “As I read your program, the ls array-of-arrays is built one element at a time, but no elements are ever mutated — they are assigned a value once, in the proper order, and referenced both by lcs as it builds succeeding values of ls and also by walk as it scans for the final result, but never mutated. Is that correct?”

    Correct. The concept is comparable to the popular implementation of Fibonacci numbers in Haskell:
    fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

    “I assume arrays are immutable — that’s how Haskell does everything else. So an array ‘update’ operation must somehow build one new array element and copy the unchanging array elements. Is there such a thing in Haskell? How is it implemented? I have to believe that no actual copying takes place.”

    There’s no function in the library to change just one element in an immutable array. When you use amap or ixmap (changing values or keys respectively) you build a new array of the resulting values, which most likely does not do copying, since typically all values would change. If you want to modify individual values, you’ll have to use mutable arrays. I assume that will use copying, but I’m not entirely sure, as the code gets kind of hairy there, with lots of unboxed values and other optimization stuff. If you’re really interested you can try asking in the #haskell irc channel. Maybe someone in there will know.

  3. akovacs Says:

    I did a shorter version, although it’s 50-100 percent slower for long strings on my machine than your solution.

    https://gist.github.com/4419542

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 35 other followers

%d bloggers like this: