elehack.net

Parsing...

I’ve been working lately on a parser for Markdown to let me do some more manipulations of web content in our blog generation. It’s not going overly well. I went the other day to the Markdown source code to see how they parse, and much to my chagrin, they don’t. The canonical Markdown parser is implemented using a sequence of regular expression substitutions and hashing to protect things from subsequent substitutions.

This is not helpful. Not helpful at all. And it’s proving somewhat difficult to write a parser (granted, my brain hasn’t been working on it as hard as it could).

Part of me is thinking that I would be better off switching to reStructuredText, which is designed to be parsed and specified as a tree. But reST, while being more flexible than Markdown, isn’t quite as pretty to read (IMO, of course). It would also require old posts to be either converted or run through compatibility mechanisms. Or require all new posts to be marked as reST posts. Neither of these options is overly appealing.

There’s also Textile, and a few others — don’t like those so much either.

I should probably just dig in and finish parsing Markdown (or a stricter variant thereof).

Comments

No comments posted.

Post a Comment

You may post a comment using the form below. All fields are optional. By submitting a comment, you release it to Michael and Jennifer Ekstrand under the Creative Commons Attribution 3.0 license. See our copyright notice for details. You might also want to read our privacy statement.