Programming Praxis – Bit Hacks

August 9, 2013

In today’s Programming Praxis exercise, our goal is to write three functions that use bit twiddling, namely one to determine a numbers sign, one to determine of the signs of two numbers are equal and one to take the absolute value of a number without using branching. Let’s get started, shall we?

import Data.Bits
import Data.Composition

To determine whether or not a number is negative we can simply look at the highest bit.

negative :: Int -> Bool
negative n = testBit n (bitSize n - 1)

To check whether two numbers have the same sign we use an xor operation, which will produce a 0 in the highest bit when they are the same and a 1 when they’re not. We then test that bit to produce the result.

sameSign :: Int -> Int -> Bool
sameSign = (not . negative) .: xor

For the absolute function I used the provided algorithm. When testing I thought I’d found a mistake since abs(minBound) was not equal to maxBound. Turns out this is correct behaviour: minBound is equal to -2147483648, whereas maxBound is equal to 2147483647. Note the difference in the last number. Taking the absolute of minBound produces a value that cannot be expressed in 32 bits and thus loops right back around to minBound.

absolute :: Int -> Int
absolute n = xor (n + mask) mask where mask = shiftR n (bitSize n - 1)

Some tests to see if everything is working properly:

main :: IO ()
main = do print $       negative minBound
          print $       negative (-100)
          print $       negative   (-1)
          print $ not $ negative     0
          print $ not $ negative     1
          print $ not $ negative   100
          print $ not $ negative maxBound
          print $       sameSign minBound minBound
          print $       sameSign (-1) (-1)
          print $ not $ sameSign (-1)   1
          print $ not $ sameSign   1  (-1)
          print $       sameSign   1    1
          print $       sameSign maxBound maxBound
          print $ absolute minBound == minBound
          print $ absolute   (-100) ==      100
          print $ absolute       0  ==        0
          print $ absolute     100  ==      100
          print $ absolute maxBound == maxBound

Programming Praxis – J K Rowling

July 19, 2013

In today’s Programming Praxis exercise, our goal is to write a program to analyse whether two books were written by the same author. Let’s get started, shall we?

import Data.Char
import Data.List
import Data.List.Split
import qualified Data.List.Key as K
import qualified Data.Map as M

We record four pieces of information about a book: a list of the words used, the average sentence length, the average paragraph length and the amount of punctuation used.

data Info = Info { _words :: [String], _sentenceLength :: Float,
                   _paraLength :: Float, _puncPct :: Float }

Extracting the four facts of information from the text of a book is fairly self-explanatory.

avg :: (Fractional a, Integral a1) => [a1] -> a
avg xs = fromIntegral (sum xs) / fromIntegral (length xs)

sentenceLength :: String -> Float
sentenceLength = avg . map length . splitOneOf ".!?"

paragraphLength :: String -> Float
paragraphLength = avg . map (length . words . unlines) . splitOn [""] . lines

punctuationPct :: String -> Float
punctuationPct text = fromIntegral (length $ filter isPunctuation text) /
                      fromIntegral (length text) * 100

process :: String -> Info
process text = Info (words . filter (not . isPunctuation) $ map toLower text)
                    (sentenceLength text)
                    (paragraphLength text)
                    (punctuationPct text)

We use the words of a book to determine the top 100 most used ngrams, using the assumption that every writer has certain expressions he or she uses often.

topNgrams :: Int -> [String] -> [[String]]
topNgrams n ws = take 100 . map fst . K.sort (negate . snd) . M.assocs $
                 M.fromListWith (+) . map (flip (,) 1 . take n) $
                 foldr ($) (tails ws) $ replicate n init

To calculate the similarity of two books, we look at a weighted combination of the amount of shared n-grams of lengths 3, 4 and 5 minus the difference in sentence length, paragraph length and punctuation use. The higher the score, the more similar they are.

similarity :: Info -> Info -> Float
similarity (Info wsA slA plA puA) (Info wsB slB plB puB) =
  1 * fromIntegral (length $ intersect (topNgrams 3 wsA) (topNgrams 3 wsB)) +
  2 * fromIntegral (length $ intersect (topNgrams 4 wsA) (topNgrams 4 wsB)) +
  4 * fromIntegral (length $ intersect (topNgrams 5 wsA) (topNgrams 5 wsB)) -
  abs (slA - slB) - abs (plA - plB) - 10 * abs (puA - puB)

To test our algorithm, we compare a few groups of books.

main :: IO ()
main = do hamlet      <- fmap process $ readFile "F:/hamlet.txt"
          romeo       <- fmap process $ readFile "F:/romeo.txt"
          oliver      <- fmap process $ readFile "F:/oliver.txt"
          huckleberry <- fmap process $ readFile "F:/huckleberry.txt"
          twocities   <- fmap process $ readFile "F:/twocities.txt"
          crusoe      <- fmap process $ readFile "F:/crusoe.txt"
          island      <- fmap process $ readFile "F:/island.txt"
          mystery     <- fmap process $ readFile "F:/sawyer.txt"

          print $ similarity romeo hamlet
          print $ similarity romeo huckleberry
          print $ similarity romeo oliver
          putStrLn "---"
          print $ similarity oliver twocities
          print $ similarity oliver romeo
          print $ similarity oliver huckleberry
          putStrLn "---"
          print $ similarity mystery crusoe
          print $ similarity mystery twocities
          print $ similarity mystery island
          print $ similarity mystery huckleberry

The results are as follows:

-2.1613884
-70.873985
-52.829903
---
51.257236
-52.829903
-4.4081688e-2
---
-271.75955
11.982366
4.711507
22.518066

As we can see, Romeo & Juliet is most similar to Hamlet, Oliver Twist is most similar to The Tale of Two Cities and our mystery book is correctly identified as belonging to Mark Twain by virtue of being most similar to Huckleberry Finn.

Programming Praxis – A Programming Puzzle

June 28, 2013

In today’s Programming Praxis exercise, our goal is to write a function such that f (f n) = -n. Let’s get started, shall we?

My first instinct was to define f as a 90 degree rotation using the complex plane, but decided that that was against the spirit of the exercise. The input is plain integers, so the output shouldn’t consist of complex numbers. After that I mucked around for a good while with doubling and halving numbers, which worked on everything except multiples of four. The basic idea (reversing parity and/or signs) was correct, but halving numbers always produced cases where the same condition holds for n and f(n). About an hour and a half after starting on the exercise the solution finally hit me: simply swap each pair of adacent numbers and reverse the sign for the even ones. I replaced the first draft of the correct solution, which had the four conditions written out, with the following more elegant formula.

f :: Integer -> Integer
f n = n * (2 * mod n 2 - 1) + signum n

A test to see if everything is working properly:

main :: IO ()
main = print $ all (\n -> f (f n) == -n) [-1000..1000]

Programming Praxis – 3SUM

June 18, 2013

In today’s Programming Praxis exercise, our goal is to find all groups of three numbers in a given list that sum up to 0, and to do so in O(n2). Let’s get started, shall we?

import Data.List
import qualified Data.IntSet as I

The naive O(n3) version can be modified fairly easily to be more efficient. The first two loops can remain unchanged. In the final loop, we already know the number we’re looking for (the complement of the other two and all we need to know is whether it exists in the list. This can be done in O(1) using an IntSet. Unfortunately, this returns every triple thrice, so we sort the triples (O(1)) and remove the duplicates (I used the rather inefficient nub function here for the sake of brevity; in practive you’ll probably want to use a Set to reduce this part from O(k2) to O(k log k)).

sum3 :: [Int] -> [[Int]]
sum3 xs = nub [sort [a,b,-a-b] | (a:bs) <- tails xs, b <- bs, I.member (-a-b) s]
          where s = I.fromList xs

A test to see if everything is working properly:

main :: IO ()
main = print $ sum3 [8,-25,4,10,-10,-7,2,-3] == [[-10,2,8],[-7,-3,10]]

To check whether the function is indeed O(n2) I ran some timings by using list of consecutive numbers:

Input list
Time taken
1 to 8000 0.4s
1 to 16000 1.4s
1 to 32000 5.2s
1 to 64000 20.6s

As you can see, doubling the input size leads to a quadrupling of execution time, give or take a few tenths of a second, which means the algorithm is indeed O(n2).

Programming Praxis – Longest Substring Of Two Unique Characters

June 11, 2013

In today’s Programming Praxis exercise, our goal is to find the longest substring consisting of only two characters in a string. Let’s get started, shall we?

import Data.List

First, we group identical characters together and then take all the tails so that each tail starts with two unique groups of characters. This is to eliminate the need for special logic for cases where a substring starts with two identical characters. For each tail, we discard everything starting from the third unique letter. Of the remaining groups, we look for the longest one, giving preference to ones on the right.

lstuc :: Eq a => [a] -> [a]
lstuc xs = foldr (\x a -> if length x > length a then x else a) []
           [concat $ a:b:takeWhile (flip elem [head a, head b] . head) cs
           | (a:b:cs) <- tails $ group xs]

Some tests to see if everything is working properly:

main :: IO ()
main = do print $ lstuc "abcabcabcbcbc" == "bcbcbc"
          print $ lstuc "abababcabc"    == "ababab"
          print $ lstuc "abcacacabc"    == "cacaca"
          print $ lstuc "acacbdbd"      == "bdbd"
          print $ lstuc "aaccbdb"       == "aacc"
          print $ lstuc ""              == ""

Programming Praxis – Sets

June 7, 2013

In today’s Programming Praxis exercise, our goal is to implement a Set data structure. Let’s get started, shall we?

import Data.Hashable
import qualified Data.HashTable.IO as H
import Data.List (sort)

The data structure underlying our Set will be a hashtable. This does have the downside that all operations will be monadic, but has the advantage that Set elements do not need to implement Ord. Initially I used the HashTable from Data.HashTable.ST.Basic, but I decided that having everything operate in the IO monad would be more convenient when using it.

data Set a = Set (H.BasicHashTable a ())

new, member, adjoin and delete are thin wrappers around the existing hashtable functions. Since we only care about the keys in the hashtable, we simply insert Unit as values. Additionally, we make adjoin and delete return the modified set to make chaining operations easier.

new :: IO (Set a)
new = fmap Set H.new

member :: (Eq a, Hashable a) => a -> Set a -> IO Bool
member x (Set s) = fmap (maybe False $ const True) $ H.lookup s x

adjoin :: (Eq a, Hashable a) => a -> Set a -> IO (Set a)
adjoin x (Set s) = H.insert s x () >> return (Set s)

delete :: (Eq a, Hashable a) => a -> Set a -> IO (Set a)
delete x (Set s) = H.delete s x >> return (Set s)

fold is a convenience function that reorders the parameters of the existing fold on hashtables and ignores the values, which results in significantly cleaner code in some of the functions below.

fold :: (a -> b -> IO b) -> Set a -> b -> IO b
fold f (Set s) x = H.foldM (\a (k,_) -> f k a) x s

For a union, we simply insert all the keys of both sets in a new one. Thanks to the fold function we can chain everything together nice and neat.

union :: (Eq a, Hashable a) => Set a -> Set a -> IO (Set a)
union s1 s2 = fold adjoin s2 =<< fold adjoin s1 =<< new

Since intersect and minus are virtually identical, I’ve refactored the common code into a combine function.

combine :: (Eq a, Hashable a) => (Bool -> Bool) -> Set a -> Set a -> IO (Set a)
combine cond s1 s2 = fold (\k a -> member k s2 >>= \b ->
    if cond b then adjoin k a else return a) s1 =<< new 

The insersect function takes the elements from the first set that do exist in the other one…

intersect :: (Eq a, Hashable a) => Set a -> Set a -> IO (Set a)
intersect = combine id

and the minus functions takes the ones that don’t.

minus :: (Eq a, Hashable a) => Set a -> Set a -> IO (Set a)
minus = combine not

To convert the hashtable to a list we just cons all the elements together. Note that since the order is determined by the hashing algorithm, the resulting list is not guaranteed to be ordered. Hence you will see calls to sort in the tests when the results are checked.

toList :: Set a -> IO [a]
toList s = fold ((return .) . (:)) s []

We could calculate the size of the set with another fold, but this is shorter, more intuitive and works just as well.

size :: Set a -> IO Int
size = fmap length . toList

Some tests to see if everything is working correctly:

main :: IO ()
main = do s <- adjoin 1 =<< adjoin 2 =<< adjoin 3 =<< new
          t <- adjoin 3 =<< adjoin 4 =<< adjoin 5 =<< new
          print . (== [1..3]) . sort =<< toList s
          print . (== 3) =<< size s
          print . (== [3..5]) . sort =<< toList t
          print . (== 3) =<< size t
          print . (== [3 :: Int]) =<< toList =<< intersect s t
          print . (== [1..5]) . sort =<< toList =<< union s t
          print . (== [1..2]) . sort =<< toList =<< minus s t

Programming Praxis – Egyptian Fractions

June 4, 2013

In today’s Programming Praxis exercise, our goal is to convert a given fraction to a sum of fractions with numerator 1. Let’s get started, shall we?

import Data.Ratio

The implementation is fairly similar to the provided one. The main difference is that the ceiling of the fraction is performed via a div, eliminating potential problems with floating point inaccuracies.

egypt :: Integer -> Integer -> [Integer]
egypt 1 d = [d]
egypt n d = e : egypt (numerator r) (denominator r)
            where (e,r) = (div (d+n-1) n, n%d - 1%e)

Some tests to see if everything is working properly:

main :: IO ()
main = do print $ egypt 5 6 == [2,3]
          print $ egypt 7 15 == [3,8,120]
          print $ egypt 5 121 == [25,757,763309,873960180913
                                 ,1527612795642093418846225]

Programming Praxis – The Seven Bridges of Königsberg

May 31, 2013

In today’s Programming Praxis exercise, our goal is to write a function that determines whether a given graph is a eulerian circuit, path, or neither and if so, to produce that path. Let’s get started, shall we?

import Data.List
import qualified Data.Map as M

A graph that has 1 or more than 2 vertices with an odd amount of neighbours will never be a eulerian path. To determine whether a path is a circuit we simply check if it loops around. If possible, we start at a vertex with an odd amount of neighbours, since this is required for paths and optional for circuits.

check :: Ord a => M.Map a [a] -> Maybe (String, [a])
check graph | notElem (length . filter (odd . length) $ M.elems graph) [0,2] = Nothing
            | head path == last path = Just ("Circuit", path)
            | otherwise              = Just ("Path", path)
    where path  = walk [] graph start
          start = maybe (last $ M.keys graph) id $

To actually walk the graph we use the algorithm provided in the problem description.

walk :: Ord a => [(a, [a])] -> M.Map a [a] -> a -> [a]
walk stack g v = case (g M.! v, stack) of
    (n:_,_)        -> walk ((v, g' M.! v):stack) g' n where
                      g' = M.adjust (delete n) v $ M.adjust (delete v) n g
    ([] ,(s,_):ss) -> v : walk ss g s
    ([] ,[])       -> [v]

Some tests to see if everything is working properly:

main :: IO ()
main = do
    let square   = M.fromList [('A',"BC"), ('B',"AD"), ('C',"AD"), ('D',"BC")]
    let envelope = M.fromList [('A',"BCD"), ('B',"ACD"), ('C',"ABDE"), ('D',"ABCE"), ('E',"CD")]
    let seven    = M.fromList [('A',"BBC"), ('B',"AACDD"), ('C',"ABD"), ('D',"BBC")]
    let five     = M.fromList [('A',"BC"), ('B',"ACD"), ('C',"ABD"), ('D',"BC")]
    let star     = M.fromList [('A',"B"), ('B',"ACD"), ('C',"B"), ('D',"B")]
    print $ check square
    print $ check envelope
    print $ check star == Nothing
    print $ check seven == Nothing
    print $ check five

Programming Praxis – Coin Change, Part 1

May 17, 2013

In today’s Programming Praxis exercise, our goal is to list all of the ways in which a target amount can be reached given a list of coin denominations. Let’s get started, shall we?

import Data.List

We check the first remaining coin to see if it’s not bigger than the remaining target amount. If so, subtract it from the target amount and call the algorithm recursively. If not, delete it from the list of remaining coins and continue. When the remaining amount reaches 0, we have found a valid combination.

coins :: (Num a, Ord a) => [a] -> a -> [[a]]
coins _  0 = [[]]
coins xs n = [c:r | (c:cs) <- tails xs, c <= n, r <- coins (c:cs) (n-c)]

Since the logic for counting the total number of options and generating the options is nigh identical, we simply ask for the length of the resulting list.

count :: (Num a, Ord a) => [a] -> a -> Int
count xs = length . coins xs

Some tests to see if everything is working properly:

main :: IO ()
main = do print $ count [1,5,10,25] 40 == 31
          mapM_ print $ coins [1,5,10,25] 40

Programming Praxis – MindCipher

May 10, 2013

In today’s Programming Praxis exercise, our goal is to solve two exercises from the MindCipher website (technically three, but the third one can be solved without programming). Let’s get started, shall we?

import Control.Monad
import Data.List
import System.Random

The first exercise is to see which pattern, on average, takes longer to come up when flipping a coin: heads, tails, heads or heads, tails, tails. First, we define a function to simulate single series of flips, counting the number of flips before the desired pattern is produced. Heads and tails are represented as booleans for the sake of convenience.

flipUntil :: [Bool] -> IO Int
flipUntil pattern = fmap (length . takeWhile (not . isPrefixOf pattern) .
                    tails . randomRs (False, True)) newStdGen

Next, we simulate an entire day by repeating this process 10000 times and taking the average.

day :: [Bool] -> IO Double
day pattern = fmap (\cs -> fromIntegral (sum cs) / fromIntegral (length cs)) .
              replicateM 10000 $ flipUntil pattern

For the second exercise, we need to find the first year for which the sum of both groups of two digits is equal to the middle two digits. This is easily achieved via a simple brute-force search.

sumDay :: Maybe Integer
sumDay = find (\d -> div d 100 + mod d 100 == div (mod d 1000) 10) [1979..]

Running our two algorithms shows that monday’s pattern takes longer on average and that the first year that satisfies the criteria is 2307.

main :: IO ()
main = do print =<< liftM2 compare (day [False, True, False])
                                   (day [False, True, True])
          print sumDay