In today’s Programming Praxis problem our task is to write a parser for simple regular expressions. Since Haskell has a very good parser library called Parsec, we’re going to be using that. Let’s get started.
First, some imports:
import Control.Applicative ((<$>), (*>), (<*), (<*>)) import Data.Char import Text.Parsec import Text.Parsec.String
Next we define our data structure. There are seven constructs we have to implement, split into two groups based on whether or not they can be followed by a star or not.
data Elem = Lit Char | Esc Char | Any | Set Bool [Elem] deriving Show data Chunk = Elem Elem | BoL | EoL | Star Elem deriving Show
The parser itself is not too difficult if you know how the operators from Control.Applicative work. <$> means apply the function on the left to the result of the parser on the right. <*, *> and <*> take only the result on the left, right and both sides respectively.
regex :: Parser [Chunk] regex = (++) <$> bol <*> many chunk where bol = option  (const [BoL] <$> char '^') chunk = choice [Star <$> try (element <* char '*'), const EoL <$> try (char '$' <* eof), Elem <$> element] element = choice [esc <$> try (char '\\' *> anyChar), const Any <$> char '.', Set False . expandSet <$> set "[^", Set True . expandSet <$> set "[", Lit <$> noneOf "]"] esc c = if elem c "nt" then Esc c else Lit c set s = try (string s *> many1 element <* char ']') expandSet (Lit a:Lit '-':Lit b:xs) | validRange a b = map Lit [a..b] ++ expandSet xs expandSet (x:xs) = x : expandSet xs expandSet _ =  validRange a b = b > a && ((isLower a && isLower b) || (isUpper a && isUpper b) || (isDigit a && isDigit b))
With the parser written, the function to parse a string is trivial:
parseRegex :: String -> Either ParseError [Chunk] parseRegex = parse regex ""
Some tests to see if everything is working properly:
main :: IO () main = mapM_ print [parseRegex "[0-9][0-9]*", parseRegex "^..*$", parseRegex "hello", parseRegex "^ *hello *$", parseRegex "^[^x].*[0-9] *x$"]
Piece of cake. Next time we do the implementation.