In today’s Programming Praxis exercise,we have to write functions to determine if one string is a subset of another as if we were in an interview. Let’s get started, shall we?

First, some imports.

import Data.List import qualified Data.Map as M import qualified Data.IntMap as I

My first attempt doesn’t actually work, since the intersect function only checks whether an element is a member of the second list. It doesn’t keep track of duplicates.

subsetOf1 :: Eq a => [a] -> [a] -> Bool subsetOf1 xs ys = intersect xs ys == xs

Since a call to a library function won’t suffice, we’ll have to whip up something ourselves. The obvious way is to get a count of all the characters in both strings and check if the second string has an equal or higher count for all the characters in the first string. Of course this method is O(n * m), so it’s not very efficient.

subsetOf2 :: Ord a => [a] -> [a] -> Bool subsetOf2 xs ys = all (\(c, n) -> maybe False (n <=) . lookup c $ count ys) $ count xs where count = map (\x -> (head x, length x)) . group . sort

To improve the big O complexity, we’re going to switch to a data type that has a faster lookup. Like the previous version, counting the frequency of each letter is O(n log n), but using Maps the comparison can now be done in O(m + n), meaning the overall complexity remains at O(n log n).

subsetOf3 :: Ord a => [a] -> [a] -> Bool subsetOf3 xs ys = M.null $ M.differenceWith (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys) where f = M.fromListWith (+) . map (flip (,) 1)

Since the keys of the map are characters, there’s a further optimization we can make. By converting the characters to integers we can use an IntMap instead of a plain Map. An IntMap has a lookup of O(min(n,W)), with W being the amount of bits in an Int. For any non-trivial n, this results in O(1) lookup. Counting all the letters can now be done in O(n). Since the comparison still takes O(m + n), the resulting complexity is O(m + n). This is the minimum we can achieve, as we need to fully evaluate both strings to produce an answer, which is an O(m + n) operation.

subsetOf4 :: Enum a => [a] -> [a] -> Bool subsetOf4 xs ys = I.null $ I.differenceWith (\x y -> if x <= y then Nothing else Just x) (f xs) (f ys) where f = I.fromListWith (+) . map (flip (,) 1 . fromEnum)

A quick test shows that the first function indeed fails, but the other ones succeed.

main :: IO () main = do let test f = print (f "da" "abcd", not $ f "dad" "abcd") test subsetOf1 test subsetOf2 test subsetOf3 test subsetOf4

I must say I hope that I have internet access during the interview, though. If not, I would have had to come up with an alternative for differenceWith and I might not have remembered the existence of IntMap. In that case I’d probably have gone with something along the lines of the array-based solution from number 4 of the Scheme solutions.

Tags: bonsai, code, Haskell, interview, kata, praxis, programming, string, subsets

November 23, 2010 at 12:25 pm |

Thanks for your functional approach.

I did it with an imperative style (it seemed so simple) and I did not know where to begin in a functional style. I’ll look deeper in your code and try to understand the pattern (first look, it does not look that hard)

November 23, 2010 at 3:33 pm |

I know Haskell has a multi-set library. Does the library have an isSubset predicate? If so, just convert both strings to multi-sets of characters and check if the second is a subset of the first.

November 23, 2010 at 4:27 pm |

Hm. So it does. With that, the algorithm can be reduced to

I’ll have to remember that library, it might come in handy some day. Thanks.

November 23, 2010 at 8:16 pm |

This problem reminds me an article in reddit I’ve read 3-4 days ago: http://paultyma.blogspot.com/2010/11/google-interviewing-story.html

November 23, 2010 at 8:19 pm |

Not surprising, since that same link is mentioned in the original exercise :)