Posts Tagged ‘subsets’

Programming Praxis – String Subsets

November 23, 2010

In today’s Programming Praxis exercise,we have to write functions to determine if one string is a subset of another as if we were in an interview. Let’s get started, shall we?

First, some imports.

import Data.List
import qualified Data.Map as M
import qualified Data.IntMap as I

My first attempt doesn’t actually work, since the intersect function only checks whether an element is a member of the second list. It doesn’t keep track of duplicates.

subsetOf1 :: Eq a => [a] -> [a] -> Bool
subsetOf1 xs ys = intersect xs ys == xs

Since a call to a library function won’t suffice, we’ll have to whip up something ourselves. The obvious way is to get a count of all the characters in both strings and check if the second string has an equal or higher count for all the characters in the first string. Of course this method is O(n * m), so it’s not very efficient.

subsetOf2 :: Ord a => [a] -> [a] -> Bool
subsetOf2 xs ys = all (\(c, n) -> maybe False (n <=) .
                       lookup c $ count ys) $ count xs
    where count = map (\x -> (head x, length x)) . group . sort

To improve the big O complexity, we’re going to switch to a data type that has a faster lookup. Like the previous version, counting the frequency of each letter is O(n log n), but using Maps the comparison can now be done in O(m + n), meaning the overall complexity remains at O(n log n).

subsetOf3 :: Ord a => [a] -> [a] -> Bool
subsetOf3 xs ys = M.null $ M.differenceWith
    (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys)
    where f = M.fromListWith (+) . map (flip (,) 1)

Since the keys of the map are characters, there’s a further optimization we can make. By converting the characters to integers we can use an IntMap instead of a plain Map. An IntMap has a lookup of O(min(n,W)), with W being the amount of bits in an Int. For any non-trivial n, this results in O(1) lookup. Counting all the letters can now be done in O(n). Since the comparison still takes O(m + n), the resulting complexity is O(m + n). This is the minimum we can achieve, as we need to fully evaluate both strings to produce an answer, which is an O(m + n) operation.

subsetOf4 :: Enum a => [a] -> [a] -> Bool
subsetOf4 xs ys = I.null $ I.differenceWith
    (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys)
    where f = I.fromListWith (+) . map (flip (,) 1 . fromEnum)

A quick test shows that the first function indeed fails, but the other ones succeed.

main :: IO ()
main = do let test f = print (f "da" "abcd", not $ f "dad" "abcd")
          test subsetOf1
          test subsetOf2
          test subsetOf3
          test subsetOf4

I must say I hope that I have internet access during the interview, though. If not, I would have had to come up with an alternative for differenceWith and I might not have remembered the existence of IntMap. In that case I’d probably have gone with something along the lines of the array-based solution from number 4 of the Scheme solutions.