## Posts Tagged ‘parsing’

### Programming Praxis – Regular Expressions, Part 1

September 15, 2009

In today’s Programming Praxis problem our task is to write a parser for simple regular expressions. Since Haskell has a very good parser library called Parsec, we’re going to be using that. Let’s get started.

First, some imports:

```import Control.Applicative ((<\$>), (*>), (<*), (<*>))
import Data.Char
import Text.Parsec
import Text.Parsec.String```

Next we define our data structure. There are seven constructs we have to implement, split into two groups based on whether or not they can be followed by a star or not.

```data Elem = Lit Char | Esc Char | Any | Set Bool [Elem] deriving Show
data Chunk = Elem Elem | BoL | EoL | Star Elem deriving Show```

The parser itself is not too difficult if you know how the operators from Control.Applicative work. <\$> means apply the function on the left to the result of the parser on the right. <*, *> and <*> take only the result on the left, right and both sides respectively.

```regex :: Parser [Chunk]
regex = (++) <\$> bol <*> many chunk where
bol = option [] (const [BoL] <\$> char '^')
chunk = choice [Star <\$> try (element <* char '*'),
const EoL <\$> try (char '\$' <* eof),
Elem <\$> element]
element = choice [esc <\$> try (char '\\' *> anyChar),
const Any <\$> char '.',
Set False . expandSet <\$> set "[^",
Set True . expandSet <\$> set "[",
Lit <\$> noneOf "]"]
esc c = if elem c "nt" then Esc c else Lit c
set s = try (string s *> many1 element <* char ']')
expandSet (Lit a:Lit '-':Lit b:xs)
| validRange a b = map Lit [a..b] ++ expandSet xs
expandSet (x:xs) = x : expandSet xs
expandSet _ = []
validRange a b = b > a && ((isLower a && isLower b) ||
(isUpper a && isUpper b) ||
(isDigit a && isDigit b))```

With the parser written, the function to parse a string is trivial:

```parseRegex :: String -> Either ParseError [Chunk]
parseRegex = parse regex ""```

Some tests to see if everything is working properly:

```main :: IO ()
main = mapM_ print [parseRegex "[0-9][0-9]*",
parseRegex "^..*\$",
parseRegex "hello",
parseRegex "^ *hello *\$",
parseRegex "^[^x].*[0-9] *x\$"]```

Piece of cake. Next time we do the implementation.