Parsing...
I’ve been working lately on a parser for Markdown to let me do some more manipulations of web content in our blog generation. It’s not going overly well. I went the other day to the Markdown source code to see how they parse, and much to my chagrin, they don’t. The canonical Markdown parser is implemented using a sequence of regular expression substitutions and hashing to protect things from subsequent substitutions.
This is not helpful. Not helpful at all. And it’s proving somewhat difficult to write a parser (granted, my brain hasn’t been working on it as hard as it could).
Part of me is thinking that I would be better off switching to reStructuredText, which is designed to be parsed and specified as a tree. But reST, while being more flexible than Markdown, isn’t quite as pretty to read (IMO, of course). It would also require old posts to be either converted or run through compatibility mechanisms. Or require all new posts to be marked as reST posts. Neither of these options is overly appealing.
There’s also Textile, and a few others — don’t like those so much either.
I should probably just dig in and finish parsing Markdown (or a stricter variant thereof).
No comments posted.