Programming Praxis – String Subsets

In today’s Programming Praxis exercise,we have to write functions to determine if one string is a subset of another as if we were in an interview. Let’s get started, shall we?

First, some imports.

import Data.List
import qualified Data.Map as M
import qualified Data.IntMap as I

My first attempt doesn’t actually work, since the intersect function only checks whether an element is a member of the second list. It doesn’t keep track of duplicates.

subsetOf1 :: Eq a => [a] -> [a] -> Bool
subsetOf1 xs ys = intersect xs ys == xs

Since a call to a library function won’t suffice, we’ll have to whip up something ourselves. The obvious way is to get a count of all the characters in both strings and check if the second string has an equal or higher count for all the characters in the first string. Of course this method is O(n * m), so it’s not very efficient.

subsetOf2 :: Ord a => [a] -> [a] -> Bool
subsetOf2 xs ys = all (\(c, n) -> maybe False (n <=) .
                       lookup c $ count ys) $ count xs
    where count = map (\x -> (head x, length x)) . group . sort

To improve the big O complexity, we’re going to switch to a data type that has a faster lookup. Like the previous version, counting the frequency of each letter is O(n log n), but using Maps the comparison can now be done in O(m + n), meaning the overall complexity remains at O(n log n).

subsetOf3 :: Ord a => [a] -> [a] -> Bool
subsetOf3 xs ys = M.null $ M.differenceWith
    (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys)
    where f = M.fromListWith (+) . map (flip (,) 1)

Since the keys of the map are characters, there’s a further optimization we can make. By converting the characters to integers we can use an IntMap instead of a plain Map. An IntMap has a lookup of O(min(n,W)), with W being the amount of bits in an Int. For any non-trivial n, this results in O(1) lookup. Counting all the letters can now be done in O(n). Since the comparison still takes O(m + n), the resulting complexity is O(m + n). This is the minimum we can achieve, as we need to fully evaluate both strings to produce an answer, which is an O(m + n) operation.

subsetOf4 :: Enum a => [a] -> [a] -> Bool
subsetOf4 xs ys = I.null $ I.differenceWith
    (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys)
    where f = I.fromListWith (+) . map (flip (,) 1 . fromEnum)

A quick test shows that the first function indeed fails, but the other ones succeed.

main :: IO ()
main = do let test f = print (f "da" "abcd", not $ f "dad" "abcd")
          test subsetOf1
          test subsetOf2
          test subsetOf3
          test subsetOf4

I must say I hope that I have internet access during the interview, though. If not, I would have had to come up with an alternative for differenceWith and I might not have remembered the existence of IntMap. In that case I’d probably have gone with something along the lines of the array-based solution from number 4 of the Scheme solutions.

Tags: , , , , , , , ,

5 Responses to “Programming Praxis – String Subsets”

  1. Grégory LEOCADIE Says:

    Thanks for your functional approach.
    I did it with an imperative style (it seemed so simple) and I did not know where to begin in a functional style. I’ll look deeper in your code and try to understand the pattern (first look, it does not look that hard)

  2. programmingpraxis Says:

    I know Haskell has a multi-set library. Does the library have an isSubset predicate? If so, just convert both strings to multi-sets of characters and check if the second is a subset of the first.

  3. Remco Niemeijer Says:

    Hm. So it does. With that, the algorithm can be reduced to

    import qualified Data.IntMultiSet as I
    subsetOf5 :: Enum a => [a] -> [a] -> Bool
    subsetOf5 xs ys = I.isSubsetOf (toSet xs) (toSet ys) where
        toSet = I.fromList . map fromEnum

    I’ll have to remember that library, it might come in handy some day. Thanks.

  4. Marii Yonov Says:

    This problem reminds me an article in reddit I’ve read 3-4 days ago:

  5. Remco Niemeijer Says:

    Not surprising, since that same link is mentioned in the original exercise 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: