Archive for November, 2012

Programming Praxis – Amazon Interview Question

November 27, 2012

In today’s Programming Praxis exercise, our goal is to find the 100 points closest to (0, 0) out of a list of 1000000 random points in O(n). Let’s get started, shall we?

Some imports:

import Control.Applicative
import qualified Data.IntMap as I
import System.Random

The obvious solution is to simply sort the list and take the first 100 elements. Sorting, however, usually takes O(n log n), which is not allowed. Fortunately, by using the square of the distance rather than the distance we not only save a million square root operations, but it also makes the value we’re sorting by an integer, which allows us to use an IntMap that has O(1) insertion and thus O(n) sorting.

closest :: Int -> [(Int, Int)] -> [(Int, Int)]
closest n = take n . concat . I.elems . I.fromListWith (flip (++)) .
            map (\(x,y) -> (x*x+y*y, [(x,y)]))

To test, we need a million random points.

points :: IO [(Int, Int)]
points = take 1000000 <$> liftA2 zip (randomRs (-1000, 1000) <$> newStdGen)
                                     (randomRs (-1000, 1000) <$> newStdGen)

Finally we run the algorithm.

main :: IO ()
main = print . closest 100 =<< points

To see whether our algorithm is truly linear, let’s look at some timings:

1 million: 2.9 s
2 million: 5.7 s
4 million: 11.6 s
8 million: 23.8 s

Looks fairly linear to me.

Programming Praxis – List Intersection And Union

November 16, 2012

In today’s Programming Praxis exercise, our goal is to write union and intersection functions for lists in three different time complexities: O(n^2), O(n log n) and O(n). We then need to show that our functions actually have the correct complexity by timing them. Let’s get started, shall we?

Some imports:

import qualified Data.IntSet as I
import qualified Data.Set as S
import System.CPUTime
import Text.Printf

Rather than trying to come up with entirely different algorithms, I’m going to use this opportunity to show the importance of choosing an appropriate datastructure for your algorithms. We do this by using essentially the same algorithm for all three complexities, but each with a different data structure.

The simplest algorithm for union is to take all the elements in the first list plus the ones from the second list that we don’t already have. Taking the intersection of two list can be achieved by taking all the elements from the first list that also appear in the second list. The interesting operation in both algorithms is seeing whether an element is present in a list, since it determines the overall time complexity. On a linked list, this operation is O(n). There are different data structures, however, that offer better performance. The generic algorithms, therefor, need to know how to convert a linked list to the desired data structure and how to find an element in that data structure.

genUnion, genIntersect :: ([a] -> b) -> (a -> b -> Bool) -> [a] -> [a] -> [a]
genUnion load get xs ys = xs ++ filter (not . flip get (load xs)) ys
genIntersect load get xs ys = filter (`get` (load ys)) xs

For the O(n^2)  version, we use a linked list as our datastructure, so there’s no conversion to be done.

union_n2, intersect_n2 :: Eq a => [a] -> [a] -> [a]
union_n2        = genUnion     id elem
intersect_n2    = genIntersect id elem

By storing our list in a Set, we get O(log n) lookup rather than O(n), which reduces the total complexity to O(n log n).

union_nlogn, intersect_nlogn :: Ord a => [a] -> [a] -> [a]
union_nlogn     = genUnion     S.fromList S.member
intersect_nlogn = genIntersect S.fromList S.member

Since we’re working with Ints, we can store them in an IntSet, which further reduces the lookup time to O(1), resulting in O(n) total time.

union_n, intersect_n :: [Int] -> [Int] -> [Int]
union_n         = genUnion     I.fromList I.member
intersect_n     = genIntersect I.fromList I.member

Timing how long something takes in Haskell is a bit trickier than in most languages since Haskell uses lazy evaluation. If we don’t print the result or otherwise use it, Haskell will happily avoid doing any work and report that doing nothing took 0 ms. Since I’m only interested in the duration here I don’t want to print the result, I instead used the $! operator to evaluate the result to head normal form. Note that this only works if the result of the function you’re benchmarking is strict; an Int, for example, works fine, but if the result is a list only the first element will be evaluated, which will once again lead to a time of 0 ms.

time :: a -> IO Integer
time f = do start <- getCPUTime
            _     <- return $! f
            end   <- getCPUTime
            return (div (end - start) (10^9))

To benchmark a function we feed it a number of lists that double in size each time and print the results in a row. We use the length function to force evaluation of the list and return something that’s in head normal form.

benchmark :: ([Int] -> [Int] -> [Int]) -> [Int] -> IO ()
benchmark f ns = putStrLn . (printf "%7d" =<<) =<<
                 mapM (\n -> time (length $ f [1..n] [1..n])) ns

All that’s left to do is to first test whether our functions work correctly and then to benchmark them. I cut the O(n^2) version off at 80000 elements because I’m not that patient; adding the other four steps would take almost three hours.

main :: IO ()
main = do let a = [4,7,12,6,17,5,13]
          let b = [7,19,4,11,13,2,15]
          let testUnion f = print $ f a b == [4,7,12,6,17,5,13,19,11,2,15]
          let testIsect f = print $ f a b == [4,7,13]
          testUnion union_n2
          testIsect intersect_n2
          testUnion union_nlogn
          testIsect intersect_nlogn
          testUnion union_n
          testIsect intersect_n

          let ns = iterate (*2) 10000
          benchmark union_n2    $ take 4 ns
          benchmark union_nlogn $ take 8 ns
          benchmark union_n     $ take 8 ns

Here’s the resulting timing table. As we can see, the O(n^2) version takes four times as long when the input length doubles, which shows that it is indeed O(n^2). The O(n log n) grows a little faster than O(n) as expected, and the O(n) version appears to be linear. Looks like the default Haskell collection types perform like they should. The 0 and 15 ms timings at the beginning are a consequence of running this test on Windows, which has a timer resolution of 15 ms.

    546   2043   8252  32963
     15     15     46     93    187    436    858   1872
      0      0     15     31     62    124    296    592

As we can see, even if we keep the algorithm the same, the choice of data structure can have a huge impact on performance.

Programming Praxis – Taxicab Numbers

November 9, 2012

In today’s Programming Praxis exercise, our goal is to prove that 1729 is the smallest number that can be written as the sum of two cubes. Let’s get started, shall we?

A quick import:

import Text.Printf

To find the pairs of cubes we use an incremental search algorithm that covers al of the unique combinations.

cubesums :: Integer -> [(Integer, Integer)]
cubesums n = f 0 (round $ fromIntegral n ** (1/3)) where
    f x y = if y < x then [] else case compare (x^3 + y^3) n of
                EQ -> (x,y) : f (x + 1) (y - 1)
                LT -> f (x + 1) y
                GT -> f x (y - 1)

Once we have a way of determining the cube sums, generating the taxi cab numbers is trivial. By using a pattern match to get the cube sums we don’t have to specify that there must be two solutions as a separate condition.

taxicab :: [(Integer, (Integer, Integer), (Integer, Integer))]
taxicab = [(n,p,q) | n <- [1..99999], [p,q] <- [cubesums n]]

And finally we pretty-print the result.

main :: IO ()
main = mapM_ (\(n,p,q) -> printf "%d : %s %s\n" n (show p) (show q)) taxicab

Programming Praxis – Fix Something Annoying

November 6, 2012

In today’s Programming Praxis exercise, our goal is to fix something that annoys us in our programming environment. Let’s get started, shall we?

The first annoyance that came to mind for me is one I encountered in the previous exercise. While I love the Parsec library in general, it is not without its little niggles. Specifically, there were two points:

  • The default method of parsing numbers is to use the functions defined on GenTokenParser, which requires two additional import lines and requires an extra parameters after the integer and float parsers.
  • I normally use the operators from Control.Applicative for composing parsers, but they’re right-associative and have an impractical precedence, which results in lots of parentheses.

Today’s exercise was a good excuse to finally sit down and write a small library to fix these issues once and for all. And since I want to be able to easily use it in future projects and exercises, I’ve put it up on Hackage. Note: If you’re reading this shortly after I post it: documentation always takes a while to generate on Hackage, so you might need to wait a bit.

I won’t cover all the functions of the library here (that’s what the documentation is for), so instead we’ll see what the gas mileage program looks like when using this new library:

Some imports:

import Text.Parsec
import Text.Parsec.Utils
import Text.Printf

The showLog function is unchanged from last time.

showLog :: [(Float, Float)] -> [String]
showLog es = "Miles  Gals  Avg" : "------ ---- ----" : zipWith (\(m2,g) (m1,_) ->
    printf "%.0f %.1f %.1f" m2 g ((m2 - m1) / g)) (tail es) es

The parser is nice and simple now.

logParser = many $ (,) .: float -: space +: float -: newline

And the main function is also simpler than it would otherwise have been, since we don’t need to deal with parse errors anymore.

main = mapM_ putStrLn . showLog =<< parseFile logParser "input.txt"

Programming Praxis – Gasoline Mileage Log

November 2, 2012

In today’s Programming Praxis exercise, our goal is to calculate and show the gas mileage given an input file containing the total amount of miles and the amount of fuel bought at each fuel stop. Let’s get started, shall we?

Some imports:

import Text.Printf
import Text.XFormat.Read

Normally I’d use Parsec for something like this (and indeed in my first attempt), but the fact that parsing numbers requires token parsers makes the program less elegant. Instead, I used the XFormat package, which in this case produces cleaner code.

line :: String -> (Float, Float)
line s = (m,g) where Just (m,_,g) = readf (Float, Space, Float) s

Once we’ve read in the number, calculating the mileage is a simple matter of taking the difference between each pair of mile totals and dividing it by the fuel amount. We use printf to format everything.

showLog :: [(Float, Float)] -> [String]
showLog es = "Miles  Gals  Avg" : "------ ---- ----" : zipWith (\(m2,g) (m1,_) ->
    printf "%.0f %.1f %.1f" m2 g ((m2 - m1) / g)) (tail es) es

All that’s left to do is route the input through the functions above and output the result.

main :: IO ()
main = mapM_ putStrLn . showLog . map line . lines =<< readFile "input.txt"