Archive for October, 2010

Programming Praxis – Fibonacci Primes

October 29, 2010

Today’s Programming Praxis exercise comes from the same challenge as the longest palindrome exercise from two weeks ago. The goal is to write a function that calculates the sum of the factors of 1 plus the first prime fibonacci number higher than the input number. Let’s get started, shall we?

To generate the fibonacci numbers, we use the well-known one-liner.

fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Getting the factors of a number is done via simple trial division. Unlike the Scheme solution, we don’t repeat factors, since the original challenge says not to.

factors :: Integer -> [Integer]
factors = f 2 where
    f d n | n < d        = []
          | mod n d == 0 = d : f (d + 1) (div n d)
          | otherwise    = f (d + 1) n

We could write some fancy algorithm to check primality, but since we already have a function to calculate factors, why bother? Since the only factor of a prime number will be itself, we can just use that as a check. Initally I just used the isPrime function from Data.Numbers.Primes for this, but looking through the source code I realized I could replace it with this version. Obviously this won’t work too well on large numbers, but for our test case it’s fast enough.

isPrime :: Integer -> Bool
isPrime n = factors n == [n]

The top-level function does what it says on the tin: look through the fibonacci numbers to find one that’s greater than n and prime, add one and return the sum of the factors.

greplin :: Integer -> Integer
greplin n = head [sum $ factors (f + 1) | f <- fibs, f > n, isPrime f]

All that’s left is to apply it to the test number:

main :: IO ()
main = print $ greplin 227000

As expected, we get 352 as a result. And, in the spirit of the contest, which is to do these challenges quickly, I think this one took me about 20 minutes.

Programming Praxis – Benford’s Law

October 26, 2010

In today’s Programming Praxis exercise, our task is to see if Benford’s law (lower digits are more likely to be the first digit of numbers found i large collections of data) holds for the littoral areas of lakes in Minnesota. Let’s get started, shall we?

Some imports:

import Data.List
import Text.Printf

The algorithm for calculating the distribution of leading digits given a group of numbers is virtually identical to the Scheme version, despite the fact that I didn’t look at the solution beforehand. It’s not too surprising though, since it’s simply the most obvious method. Note that the function argument is a list of floats. Initially I assumed all areas were integers, which resulted in incorrect results until I found that there were 10 floats hidden in the input (thank god for regular expressions).

firstDigits :: [Float] -> [(Char, Double)]
firstDigits xs = map (\ds -> (head ds, 100 * toEnum (length ds) /
    toEnum (length xs))) . group . sort $ map (head . show) xs

With that function out of the way, the problem becomes trivial: just call firstDigits on all the appropriate numbers.

shriram :: [[String]] -> [(Char, Double)]
shriram xs = firstDigits [n | [(n,_)] <- map (reads . (!! 3)) xs]

Of course we need to run the algorithm over the given data, using the parser from two exercises ago:

main :: IO ()
main = either print (mapM_ (uncurry $ printf "%c %f\n") . shriram)
       =<< readDB csv "csv.txt"

This produces the same list of percentages as the Scheme version. Looks like Benford’s Law holds in this case as well.

Programming Praxis – Text File Databases: Part 2

October 22, 2010

In today’s Programming Praxis exercise, our task is to define functions to map, filter, fold and foreach over records in text file databases, for which we wrote parser’s in the previous exercise.

However, due to the way we wrote the functions last time, there really isn’t much point in doing so. Since the parsers already return a list of records (albeit wrapped in an Either and an IO), you can simply use the map, filter, foldl and mapM_ functions from the Prelude to process them. I suppose that in the Scheme solution it makes a little more sense, since there the parsers only return one record at a time, but even then I’d personally just write a function that returns all the records in a file and then process them like any other list, since it saves you from having to duplicate a lot of existing functions. Additionally, it makes function composition much easier, as the database-specific functions cannot be composed.

Of the four functions mentioned, the only one that warrants a function in Haskell is foreach (or in Haskell terminology, mapM_), since it requires doing something with the potential parse error:

dbMapM_ :: Monad m => (a -> m b) -> Either l [a] -> m ()
dbMapM_ = either (const $ return ()) . mapM_

The other three can just be fmapped over the result of readDB. I won’t bore you with the implementations for map, filter and foldl, since they would be largely identical to the ones found in the Prelude.

main :: IO ()
main = do db <- readDB (fixedLength [5,3,4]) "db_fl.txt"
          print $ map head <$> db
          print $ foldl (const . succ) 0 <$> db
          print $ filter (odd . length) <$> db
          dbMapM_ print db

Programming Praxis – Text File Databases: Part 1

October 19, 2010

In today’s Programming Praxis exercise our goal is to read data from four different types of text file databases. Let’s get started, shall we?

Some imports (the last one is only there to make the type signatures easier to read):

import Control.Applicative ((<*), (<*>), (*>), (<$>))
import Text.Parsec
import Text.Parsec.String

Whenever I need to read any kind of text-based data format, the Parsec library is my go-to tool. First, let’s define what constitutes the end of a line, since we need it in all four types.

eol :: Parser ()
eol = (char '\n' *> optional (char '\r')) <|>
      (char '\r' *> optional (char '\n')) <|> eof

The first type to handle are fixed-length records. All we do is create a parser for each field and concatenate their results. There is currently no special consideration for the header, as I can’t tell from the exercise text what we need to do with it and unfortunately there are no test cases for me to see the expected behaviour.

fixedLength :: [Int] -> Parser [String]
fixedLength fields = foldr (\n p -> (:) <$> count n anyChar <*> p)
                           (return []) fields <* eol

The parser for character-delimited records is fairly self-evident: records consist of fields and stop at the end of a line, fields consist of characters and stop at delimiters or the end of a line. The separator is itself a parser, so there’s plenty of flexibility.

charDelim :: Parser a -> Parser [String]
charDelim sep = manyTill field eol where
    field = manyTill anyChar ((sep *> return ()) <|> lookAhead eol)

Comma separated files aren’t much more difficult. Fields are separated by commas and are either plain text or quoted values.

csv :: Parser [String]
csv = sepBy field (char ',') <* eol where
    field = quoted <|> many (noneOf ",\n\r")
    quoted = between (char '"') (char '"') $
             many (try (char '"' <* char '"') <|> noneOf "\"")

For name-value records, just create a tuple of the name and the value, keep doing so until you find an empty line.

nameValue :: Parser a -> Parser [(String, String)]
nameValue sep = manyTill field eol where
    field = (,) <$> manyTill anyChar sep <*> manyTill anyChar eol

The four parsers above only parse a single record.  To read a file, we just keep reading records until we hit the end of the file.

readDB :: Parser a -> FilePath -> IO (Either ParseError [a])
readDB record = fmap (parse (manyTill record eof) "") . readFile

The lines below show some example usages:

main :: IO ()
main = do print =<< readDB (fixedLength [5,3,4]) "db_fl.txt"
          print =<< readDB (charDelim $ char '|') "db_cd.txt"
          print =<< readDB csv "db_csv.txt"
          print =<< readDB (nameValue $ char ':') "db_nv.txt"

Judging from my own limited test cases, everything seems to be working, and the code is significantly more compact than the provided solution. Yet another example of why I’m a fan of Parsec.

Programming Praxis – Find The Longest Palindrome In A String

October 15, 2010

In today’s Programming Praxis exercise, our goal is to write an alogrithm to find the longest palindrome in a string. Let’s get started, shall we?

Some imports:

import qualified Data.ByteString.Char8 as B
import qualified Data.List.Key as K

Since the exercise is originally part of a group of 3 that is supposed to be completed in 20 minutes to 2 hours, I’m going to assume I don’t have time to figure out a fancy but complicated suffix trie-based approach. Below is the version I wrote in a minute or two, with two modifications:

1. The list comprehensions was originally a filter and a concatMap. Same thing, but different syntax. I like this version better.
2. The original worked on plain strings and ran in 8 seconds or so. Switching to ByteStrings speeds things up quite a bit and is trivial to do, requiring only a few additions of “B.”.

The algorithm is pretty trivial: get every possible substring, check if it’s a palindrome and return the longest one.

longestPalindrome :: B.ByteString -> B.ByteString
longestPalindrome s = K.maximum B.length
    [p | p <- B.inits =<< B.tails s, p == B.reverse p]

We test the algorithm on the Gettysburg Address.

gettysburg :: B.ByteString
gettysburg = B.pack

main :: IO ()
main = B.putStrLn $ longestPalindrome gettysburg

As expected, we get ranynar as the answer. Sure, it’s an O(n3) algorithm, but since this is a fairly short text it doesn’t matter all that much, as evidenced by the half-second running time. If you’re working with longer inputs, use a different algorithm.

Programming Praxis – Rotate An Array

October 12, 2010

In today’s Programming Praxis exercise, our goal is to write a function to rotate an array a selected number of places. Let’s get started, shall we?

While the assignment mentions arrays specifically, we’re going to be using plain lists, for the following reasons:

1. In Haskell lists are a far more common data structure than arrays.
2. The algorithm used in the Scheme solution is O(n) anyway, so it doesn’t matter all that much in terms of performance.
3. This way we can show off an alternative method, which is more fun than just implementing the provided algorithm in another language.

Instead of the triple reversing approach, we use a more straightforward method: what we effectively do when rotating a list is splitting it into a left and a right part and exchanging their places. So let’s just do that, making sure to keep the splitting point within the bounds of the list:

rotate :: Int -> [a] -> [a]
rotate _ [] = []
rotate n xs = b ++ a where (a,b) = splitAt (mod n $ length xs) xs

To see if it works, we test all the possible corner cases mentioned in the assignment, including the empty list, which the provided solution fails on (which is another reminder of why you should always consider all possible edge cases). To be honest, so did mine until I wrote ‘all possible edge cases’ and figured I’d better check if that was actually the case 🙂

main :: IO ()
main = do print $ rotate 1 [] == ([] :: [Int])
          print $ rotate 3 [1..3] == [1..3]
          print $ rotate 3 [1..3] == [1..3]
          print $ rotate 2 [1..6] == [3,4,5,6,1,2]
          print $ rotate 8 [1..6] == rotate 2 [1..6]
          print $ rotate (-2) (rotate 2 [1..6]) == [1..6]

Programming Praxis – Zeller’s Congruence

October 8, 2010

In today’s Programming Praxis exercise, our goal is to implement a function created by Christian Zeller to determine the day of the week for a given date. Let’s get started.

First, we need to import an existing date library to test Zeller’s algorithm against.

import Data.Time.Calendar
import Data.Time.Calendar.WeekDate

Next, a quick data type for the days of week.

data Weekday = Su | Mo | Tu | We | Th | Fr | Sa deriving (Enum, Eq, Show)

The algorithm itself is virtually identical to the Scheme implementation.

zeller :: Int -> Int -> Int -> Weekday
zeller year month day = toEnum $ mod (day + div (13 * m - 1) 5 +
                        d + div d 4 + div c 4 - 2 * c) 7 where
    y = if month < 3 then year - 1 else year
    m = if month < 3 then month + 10 else month - 2
    (c, d) = divMod y 100

To test if the algorithm works correctly, we write a convenience function to test if a given date is correct.

test :: Day -> Bool
test date = zeller (fromEnum y) m d == toEnum (mod w 7) where
    (y, m, d) = toGregorian date
    (_, _, w) = toWeekDate date

Like the given solution, we test whether today is indeed Friday and whether it produces the correct results for a thousand years starting from January 1st, 1753.

main :: IO ()
main = do print $ zeller 2010 10 8 == Fr
          print $ all test [fromGregorian 1753 1 1..fromGregorian 2753 1 1]

Indeed it does. Looks like Zeller’s algorithm works correctly.

Programming Praxis – Oban Numbers

October 1, 2010

In today’s Programming Praxis exercise, our task is to print a list of all Oban numbers (numbers that don’t have an o if you write them in words). Let’s get started, shall we?

Since any number higher than 999 will have either the word “thousand” or one of the -illions in it, we only have to implement spelling numbers less than a thousand. Spelling the numbers is a simple recursive algorithm. After spelling, we just take the ones that don’t have an o.

obans :: [Int]
obans = filter (notElem 'o' . spell) [1..999] where
    spell n | n <  20 = ones !! n
            | n < 100 = tens !! div n 10 ++ spell (mod n 10)
            | True    = spell (div n 100) ++ "hundred" ++ spell (mod n 100)
    ones = "" : words "one two three four five six seven eight \
                      \nine ten eleven twelve thirteen fourteen \
                      \fifteen sixteen seventeen eighteen nineteen"
    tens = "" : "" : words "twenty thirty forty fifty sixty \
                           \seventy eighty ninety"

Printing the numbers is trivial.

main :: IO ()
main = mapM_ print obans

If we instead say print $ length obans we see that there are indeed 454 Oban numbers, as there should be.