Posts Tagged ‘algorithm’

Programming Praxis – Dutch National Flag

March 5, 2013

In today’s Programming Praxis exercise, our goal is to implement a sorting algorithm for lists with three different elements that works in linear time. Let’s get started, shall we?

import qualified Data.Vector as V

We’re only allowed to use index and swap operations. Swap isn’t defined in the Vector package, but is easy to express as a bulk update.

swap :: Int -> Int -> V.Vector a -> V.Vector a
swap i j a = a V.// [(i,a V.! j), (j,a V.! i)]

To sort the array, we keep track of where the next red and blue elements should be inserted, as well as the current index. When encountering red and blue elements, they are shifted to the correct location. If the element was blue, we have to test again since it could be anything. If it was red, the element that’s swapped to the current location will always be white, so we can move on. Once we reach the start of the group of blue elements at the end we can stop.

flag :: V.Vector Char -> V.Vector Char
flag xs = f (0, V.length xs - 1) xs 0 where
  f (r,b) a n = if n > b then a else case a V.! n of
    'R' -> f (r+1,b  ) (swap n r a) (n+1)
    'B' -> f (r,  b-1) (swap n b a) n
    _   -> f (r,  b  ) a            (n+1)

Some tests to see if everything is working properly:

test :: String -> Bool
test x = flag (V.fromList x) == V.fromList
  (filter (== 'R') x ++ filter (== 'W') x ++ filter (== 'B') x)

main :: IO ()
main = do print $ test ""
          print $ test "W"
          print $ test "R"
          print $ test "B"
          print $ test "RWB"
          print $ test "BWR"
          print $ test "RWBR"
          print $ test "WRBRBRBWRWBWBRBWRBWRBWRBWBBBRBRWBRWB"
Advertisements

Programming Praxis – List Intersection And Union

November 16, 2012

In today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in three different time complexities: O(n^2), O(n log n) and O(n). We then need to show that our functions actually have the correct complexity by timing them. Let’s get started, shall we?

Some imports:

import qualified Data.IntSet as I
import qualified Data.Set as S
import System.CPUTime
import Text.Printf

Rather than trying to come up with entirely different algorithms, I’m going to use this opportunity to show the importance of choosing an appropriate datastructure for your algorithms. We do this by using essentially the same algorithm for all three complexities, but each with a different data structure.

The simplest algorithm for union is to take all the elements in the first list plus the ones from the second list that we don’t already have. Taking the intersection of two list can be achieved by taking all the elements from the first list that also appear in the second list. The interesting operation in both algorithms is seeing whether an element is present in a list, since it determines the overall time complexity. On a linked list, this operation is O(n). There are different data structures, however, that offer better performance. The generic algorithms, therefor, need to know how to convert a linked list to the desired data structure and how to find an element in that data structure.

genUnion, genIntersect :: ([a] -> b) -> (a -> b -> Bool) -> [a] -> [a] -> [a]
genUnion load get xs ys = xs ++ filter (not . flip get (load xs)) ys
genIntersect load get xs ys = filter (`get` (load ys)) xs

For the O(n^2)  version, we use a linked list as our datastructure, so there’s no conversion to be done.

union_n2, intersect_n2 :: Eq a => [a] -> [a] -> [a]
union_n2        = genUnion     id elem
intersect_n2    = genIntersect id elem

By storing our list in a Set, we get O(log n) lookup rather than O(n), which reduces the total complexity to O(n log n).

union_nlogn, intersect_nlogn :: Ord a => [a] -> [a] -> [a]
union_nlogn     = genUnion     S.fromList S.member
intersect_nlogn = genIntersect S.fromList S.member

Since we’re working with Ints, we can store them in an IntSet, which further reduces the lookup time to O(1), resulting in O(n) total time.

union_n, intersect_n :: [Int] -> [Int] -> [Int]
union_n         = genUnion     I.fromList I.member
intersect_n     = genIntersect I.fromList I.member

Timing how long something takes in Haskell is a bit trickier than in most languages since Haskell uses lazy evaluation. If we don’t print the result or otherwise use it, Haskell will happily avoid doing any work and report that doing nothing took 0 ms. Since I’m only interested in the duration here I don’t want to print the result, I instead used the $! operator to evaluate the result to head normal form. Note that this only works if the result of the function you’re benchmarking is strict; an Int, for example, works fine, but if the result is a list only the first element will be evaluated, which will once again lead to a time of 0 ms.

time :: a -> IO Integer
time f = do start <- getCPUTime
            _     <- return $! f
            end   <- getCPUTime
            return (div (end - start) (10^9))

To benchmark a function we feed it a number of lists that double in size each time and print the results in a row. We use the length function to force evaluation of the list and return something that’s in head normal form.

benchmark :: ([Int] -> [Int] -> [Int]) -> [Int] -> IO ()
benchmark f ns = putStrLn . (printf "%7d" =<<) =<<
                 mapM (\n -> time (length $ f [1..n] [1..n])) ns

All that’s left to do is to first test whether our functions work correctly and then to benchmark them. I cut the O(n^2) version off at 80000 elements because I’m not that patient; adding the other four steps would take almost three hours.

main :: IO ()
main = do let a = [4,7,12,6,17,5,13]
          let b = [7,19,4,11,13,2,15]
          let testUnion f = print $ f a b == [4,7,12,6,17,5,13,19,11,2,15]
          let testIsect f = print $ f a b == [4,7,13]
          testUnion union_n2
          testIsect intersect_n2
          testUnion union_nlogn
          testIsect intersect_nlogn
          testUnion union_n
          testIsect intersect_n

          let ns = iterate (*2) 10000
          benchmark union_n2    $ take 4 ns
          benchmark union_nlogn $ take 8 ns
          benchmark union_n     $ take 8 ns

Here’s the resulting timing table. As we can see, the O(n^2) version takes four times as long when the input length doubles, which shows that it is indeed O(n^2). The O(n log n) grows a little faster than O(n) as expected, and the O(n) version appears to be linear. Looks like the default Haskell collection types perform like they should. The 0 and 15 ms timings at the beginning are a consequence of running this test on Windows, which has a timer resolution of 15 ms.

    546   2043   8252  32963
     15     15     46     93    187    436    858   1872
      0      0     15     31     62    124    296    592

As we can see, even if we keep the algorithm the same, the choice of data structure can have a huge impact on performance.

Programming Praxis – The Evolution Of Flibs

July 13, 2012

In today’s Programming Praxis exercise, our goal is to implement a genetic algorithm to evolve finite state machines that predict repeating sequences. The solution is one of the longest ones in quite a while, so let’s get started, shall we?

Some imports:

import Control.Monad
import Data.List
import System.Random

First we need something to keep track of the allowable states and inputs. When I wrote it I didn’t quite know how many arguments where going to end up in it, so I made it a datatype rather than a tuple. A difference with the Scheme solution is that the characters used for the state are customisable. It doesn’t serve much purpose, but it allows things to be a bit more generic.

data Args = Args { _symbols :: String, _numSymbols :: Int
                 , _states  :: String, _numStates :: Int }

I started with the function to run a flib since I wanted to know if I understood them correctly. Initially I stored the table in a Map, but after a while I found that keeping them as strings would require less conversion and code when displaying, generating and mutating them.

runFlib :: Args -> (Char, String) -> Char -> ((Char, String), Char)
runFlib (Args smbs nsmbs sts _) (s, m) input = ((s',m), out) where
    (out:s':_) = drop (2 * (nsmbs * index s sts + index input smbs)) m
    index x    = head . elemIndices x

For the score function we cycle the input to the given length and check how many times the next item is predicted correctly.

score :: Int -> Args -> String -> String -> Int
score run args flib input = length . filter id . zipWith (==) (tail input') .
    snd . mapAccumL (runFlib args) (head $ _states args,flib) $ init input'
    where input' = take (run + 1) $ cycle input

Two generic functions we need later on: oneOf chooses a random element of a given list and replace replaces the element with the given index in a list with the new value.

oneOf :: [a] -> IO a
oneOf xs = fmap (xs !!) $ randomRIO (0, length xs - 1)

replace :: Int -> a -> [a] -> [a]
replace i v xs = take i xs ++ v : drop (i + 1) xs

To generate a random flib we simply concatenate the appropriate number of inputs and states.

randomFlib :: Args -> IO String
randomFlib (Args smbs nsmbs sts nsts) = fmap concat $
    replicateM (nsmbs * nsts) (sequence [oneOf smbs, oneOf sts])

To breed two flibs we take the beginning and/or end of one flib and insert the middle of the other.

crossover :: Args -> String -> String -> IO String
crossover (Args _ nsmbs _ nsts) a b = do
    start <- randomRIO (0,         2 * nsmbs * nsts - 2)
    end   <- randomRIO (start + 1, 2 * nsmbs * nsts - 1)
    return $ take start a ++ take (end - start) (drop start b) ++ drop end a

To mutate a flib we replace a random character with a new one of the correct type.

mutate :: Args -> String -> IO String
mutate (Args smbs nsmbs sts nsts) flib = do
    i <- randomRIO (0, 2 * nsmbs * nsts - 1)
    c <- oneOf $ if mod i 2 == 0 then smbs else sts
    return $ replace i c flib

Finally, we have to function that does the actual work of testing and changing the different generations. First we create a random population of the desired size. Each generation, we calculate all the scores, print the best one if it’s an improvement, potentially breed to best and worst flibs, mutate one of the elements and repeat the whole process until we have found one that can correctly predict the entire sequence.

evolve :: String -> Int -> Float -> Int -> String -> IO ()
evolve states popSize breedChance run input =
    nextGen (0, "") =<< replicateM popSize (randomFlib args) where
    args = Args (map head symbols) (length symbols)
                states  (length . group $ sort states)
                where symbols = group $ sort input
    nextGen (top,_) _ | top == run = return ()
    nextGen best pop = do
        let scored = sort $ map (\flib -> (score run args flib input, flib)) pop
        let top = last scored
        breed <- fmap (< breedChance) $ randomRIO (0, 1)
        mix <- crossover args (snd $ head scored) (snd top)
        let newPop = (if breed then replace 0 mix else id) (map snd scored)
        mutIndex <- randomRIO (0, popSize - 1)
        mutant <- mutate args (newPop !! mutIndex)
        when (fst top > fst best) (print top)
        nextGen (max best top) $ replace mutIndex mutant newPop

A test to see if everything is working properly:

main :: IO ()
main = evolve "ABCD" 10 0.3 100 "010011"

Programming Praxis – Two Bad Sorts

May 17, 2011

In today’s Programming Praxis exercise, our task is to implement two inefficient sorting algorithms. Let’s get started, shall we?

Some imports:

import Control.Arrow
import System.Random
import System.Random.Shuffle

Stoogesort is fairly bad at O(n^2.7).

stoogesort :: Ord a => [a] -> [a]
stoogesort []       = []
stoogesort xs@(h:t) = f $ if last xs < h then last xs : init t ++ [h] else xs
    where f = if length xs > 2 then s first 2 . s second 1 . s first 2 else id
          s p n = uncurry (++) . p stoogesort . splitAt (div (n * length xs) 3)

Bogosort is more interesting. It has the potential of sorting a list in O(n). The chance of this happening, however, is pretty low. The resulting average performance is a terrible O(n*n!).

bogosort :: Ord a => [a] -> IO [a]
bogosort [] = return []
bogosort xs = if and $ zipWith (<=) xs (tail xs) then return xs
              else bogosort . shuffle' xs (length xs) =<< newStdGen

Some tests to see if everything is working properly:

main :: IO ()
main = do print . (== [1..5]) =<< bogosort [3,1,5,4,2]
          print $ stoogesort [3,1,5,4,2] == [1..5]

Seems like it is. Having said that, never use either of these in practice.

Programming Praxis – Dijkstra’s Algorithm

January 4, 2011

In today’s Programming Praxis, our task is to implement Dijkstra’s shortest path algorithm. Let’s get started, shall we?

Some imports:

import Data.List
import qualified Data.List.Key as K
import Data.Map ((!), fromList, fromListWith, adjust, keys, Map)

In order make the rest of the algorithm simpler, we convert the list of edges to a map that lists all the neighbors of each vertex.

buildGraph :: Ord a => [(a, a, Float)] -> Map a [(a, Float)]
buildGraph g = fromListWith (++) $ g >>=
               \(a,b,d) -> [(a,[(b,d)]), (b,[(a,d)])]

The algorithm follows the usual steps, albeit in a functional rather than the typical procedural style: start by giving all non-source vertices an infinite distance, then go through all the vertices in order of their distance from the source, relaxing all their neighbors.

dijkstra :: Ord a => a -> Map a [(a, Float)] -> Map a (Float, Maybe a)
dijkstra source graph =
    f (fromList [(v, (if v == source then 0 else 1/0, Nothing)) 
                | v <- keys graph]) (keys graph) where
    f ds [] = ds
    f ds q  = f (foldr relax ds $ graph ! m) (delete m q) where
              m = K.minimum (fst . (ds !)) q
              relax (e,d) = adjust (min (fst (ds ! m) + d, Just m)) e

Getting the shortest path is then simply a matter of tracing the path from the endpoint to the beginning.

shortestPath :: Ord a => a -> a -> Map a [(a, Float)] -> [a]
shortestPath from to graph = reverse $ f to where
    f x = x : maybe [] f (snd $ dijkstra from graph ! x)

A test to see if everything is working properly:

main :: IO ()
main = do let g = buildGraph [('a','c',2), ('a','d',6), ('b','a',3)
                             ,('b','d',8), ('c','d',7), ('c','e',5)
                             ,('d','e',10)]
          print $ shortestPath 'a' 'e' g == "ace"

As expected, the shortest route for the given graph is A-C-E.

Programming Praxis – Zeller’s Congruence

October 8, 2010

In today’s Programming Praxis exercise, our goal is to implement a function created by Christian Zeller to determine the day of the week for a given date. Let’s get started.

First, we need to import an existing date library to test Zeller’s algorithm against.

import Data.Time.Calendar
import Data.Time.Calendar.WeekDate

Next, a quick data type for the days of week.

data Weekday = Su | Mo | Tu | We | Th | Fr | Sa deriving (Enum, Eq, Show)

The algorithm itself is virtually identical to the Scheme implementation.

zeller :: Int -> Int -> Int -> Weekday
zeller year month day = toEnum $ mod (day + div (13 * m - 1) 5 +
                        d + div d 4 + div c 4 - 2 * c) 7 where
    y = if month < 3 then year - 1 else year
    m = if month < 3 then month + 10 else month - 2
    (c, d) = divMod y 100

To test if the algorithm works correctly, we write a convenience function to test if a given date is correct.

test :: Day -> Bool
test date = zeller (fromEnum y) m d == toEnum (mod w 7) where
    (y, m, d) = toGregorian date
    (_, _, w) = toWeekDate date

Like the given solution, we test whether today is indeed Friday and whether it produces the correct results for a thousand years starting from January 1st, 1753.

main :: IO ()
main = do print $ zeller 2010 10 8 == Fr
          print $ all test [fromGregorian 1753 1 1..fromGregorian 2753 1 1]

Indeed it does. Looks like Zeller’s algorithm works correctly.

Programming Praxis – Integer Logarithms

May 7, 2010

In today’s Programming Praxis exercise we have to write an algorithm to find the integer logarithm of a number, i.e. the largest power the base can be raised to that does not exceed the number. Let’s get started.

First, the O(n) solution, which works the same as the Scheme version.

ilog :: Integral a => a -> a -> Integer
ilog b n = if n == 0 then -1 else ilog b (div n b) + 1

For the O(log n) version, we use the until function to determine the bounds rather than using explicit recursion. Other than that, there’s not much to be had in the way of improvements.

ilognew :: (Ord a, Num a) => a -> a -> Integer
ilognew b n = f (div ubound 2) ubound where
    ubound = until (\e -> b ^ e >= n) (* 2) 1
    f lo hi | mid == lo   = if b ^ hi == n then hi else lo
            | b ^ mid < n = f mid hi
            | b ^ mid > n = f lo mid
            | otherwise   = mid
            where mid = div (lo + hi) 2

Like the Scheme solution, we check the equivalence of the two methods by testing a few different bases and the numbers 1 to a million.

main :: IO ()
main = print $ and [ilognew b n == ilog b n
                   | b <- [2,3,5,7], n <- [1..1000000]]

Only a 30% reduction in lines this time compared to the Scheme solution, since most of the code is checking conditions. Oh well, it’ll do.

Programming Praxis – Minimum Spanning Tree: Prim’s Algorithm

April 9, 2010

In today’s Programming Praxis exercise we have to implement an algorithm for finding the minimum spanning tree of a graph. The Scheme solution weighs in at 15 lines, so let’s see if we can do better.

As usual, some imports:

import Data.List
import qualified Data.List.Key as K

Both the Scheme solution and the pseudocode algorithm on Wikipedia take both a list of vertices and a list of edges as input, but since the list of vertices is implicitly defined in the edges there’s really no point in specifying it separately. To get the starting vertex, we just take the first vertex of the first edge. Other than that, we do pretty much what the pseudocode algorithm says: check if there’s an edge with one connected and one unconnected point. If multiple exist, add the shortest. Add the unconnected vertex to the list of connected ones. Stop if there are no more edges with unconnected vertices.

prim :: (Eq a, Ord b) => [(a, a, b)] -> [(a, a, b)]
prim es = f [(\(v,_,_) -> v) $ head es] [] where
    f vs t = if null r then t else f (union vs [x,y]) (m:t) where
        r = filter (\(a,b,_) -> elem a vs /= elem b vs) es
        m@(x,y,_) = K.minimum (\(_,_,c) -> c) r

A quick test shows we get the same result as the Scheme version:

main :: IO ()
main = print $ prim [('A', 'D',  5), ('A', 'B', 7), ('B', 'D', 9)
                    ,('B', 'C',  8), ('C', 'E', 5), ('B', 'E', 7)
                    ,('D', 'E', 15), ('D', 'F', 6), ('E', 'F', 8)
                    ,('F', 'G', 11), ('E', 'G', 9)]

And with that we’ve reduced a 15-line solution to four lines. Not bad.

Programming Praxis – Extending Pollard’s P-1 Factorization Algorithm

March 19, 2010

In today’s Programming Praxis exercise we need to write an improved version of a factorization algorithm. I was on vacation when the original exercise was posted, so let’s see what we can do with it.

As usual, some imports:

import Data.Bits
import Data.List

We need the same expm function we have used in several previous exercises. Alternatively we could use the expmod function from Codec.Encryption.RSA.NumberTheory, but it’s a lot slower than this version.

expm :: Integer -> Integer -> Integer -> Integer
expm b e m = foldl' (\r (b', _) -> mod (r * b') m) 1 .
             filter (flip testBit 0 . snd) .
             zip (iterate (flip mod m . (^ 2)) b) .
             takeWhile (> 0) $ iterate (`shiftR` 1) e

The scheme solution has some duplication in it: the start of the pollard1 and pollard2 functions is nearly identical. Since programmers hate repeating themselves, let’s factor that out into a separate function.

pollard :: (Integer -> t) -> (Integer -> t) -> Integer -> Integer -> t
pollard found notFound n b1 = f 2 2 where
    f a i | i < b1         = f (expm a i n) (i + 1)
          | 1 < d && d < n = found d
          | otherwise      = notFound a
          where d = gcd (a - 1) n

pollard1 then becomes very simple: if we don’t find anything we stop, otherwise we return the result.

pollard1 :: Integer -> Integer -> Maybe Integer
pollard1 = pollard Just (const Nothing)

pollard2 is a bit more involved, because we now have an extra step if we don’t find anything. The structure of this is very similar to the pollard function, but there are enough differences that it’s not worth the bother of abstracting it.

pollard2 :: Integer -> Integer -> Integer -> Maybe (String, Integer)
pollard2 n b1 b2 = pollard (Just . (,) "stage1") (f b1) n b1 where
    f j a | j == b2        = Nothing
          | 1 < d && d < n = Just ("stage2", d)
          | otherwise      = f (j + 1) a
          where d = gcd (expm a j n - 1) n

And of course the test to see if everything’s working correctly:

main :: IO ()
main = do print $ pollard1 15770708441 150
          print $ pollard1 15770708441 180
          print $ pollard2 15770708441 150 180

Programming Praxis – Traveling Salesman: Nearest Neighbor

March 16, 2010

In today’s Programming Praxis exercise we have to implement a significantly faster algorithm for the traveling salesman problem than in the previous exercise. Let’s get started, shall we?

As usual, some imports:

import Control.Monad
import Data.List
import qualified Data.List.Key as K
import System.Random

The functions for calculating distance and total distance are largely the same as in the previous exercise, though because I’ve switched to using lists of integers we now need an extra fromIntegral, and repeating the first point in order to complete the loop has been moved to the cost function.

dist :: (Int, Int) -> (Int, Int) -> Float
dist (x1, y1) (x2, y2) = sqrt (f (x1 - x2) ** 2 + f (y1 - y2) ** 2)
    where f = fromIntegral

cost :: [(Int, Int)] -> Float
cost xs = sum $ zipWith dist xs (tail xs ++ xs)

Generating a series of random points is a bit longer than it could be because we have to make sure all the points are unique.

randomPoints :: Int -> IO [(Int, Int)]
randomPoints n = f [] where
    f ps = if length ps == n then return ps else
           do p <- liftM2 (,) rnd rnd
              if elem p ps then f ps else f (p:ps)
    rnd = randomRIO (0, 10 * n)

Determining the tour to take using the nearest neighbor algorithm is not that difficult. Again, we index the points for similarity to the Programming Praxis solution, not because it is needed.

nearTour :: [(Int, Int)] -> [(Integer, (Int, Int))]
nearTour = f . zip [0..] where
    f [] = []
    f [x] = [x]
    f ((i,p):ps) = (i,p) : f (nxt : delete nxt ps) where
        nxt = K.minimum (dist p . snd) ps

To test, we check both a random set of points, as well as the set from the Programming Praxis solution.

main :: IO ()
main = do rndTour <- fmap nearTour $ randomPoints 25
          print (cost $ map snd rndTour, rndTour)
          let praxisTour = nearTour
                [(139, 31),( 41,126),(108, 49),(112,193),(179,188),
                 (212, 24),(245, 50),(167,187),(159,236),(185, 78),
                 ( 27, 63),(101,188),(195,167),( 30, 10),(238,110),
                 (221, 60),( 27,231),(146, 67),(249,172),( 36, 71),
                 ( 37,203),(118, 38),(241,226),(197, 29),(220,186)]
          print (cost $ map snd praxisTour, praxisTour)

We get the same tour as the Programming Praxis solution (Ok, the reverse to be exact. Again, this doesn’t matter and I think starting with the first point is more logical), and at a third of the line count, so I think we can call this one done.