My ongoing work in making the site less gross under the hood is getting to a point where it needs to be able to handle bbcode. While I could adapt an existing library to do so, it would also be nice to be able to use the same code both client-side (for instance to preview posts in your browser) and server-side (to render them for viewing). In addition, traditional bbcode parsers are extremely regex-heavy- for instance the one currently powering the site (based on phpbb 2) basically does one pass per tag that it understands, in addition to modifying the text you give it before storing it, which makes doing any other handling of it more difficult.

So what have I done? I implemented a Rust library that can be used just about anywhere:

Not all of the tags that I want to eventually support are implemented right now, but the remaining ones should be quite easy.

To support a variety of uses (for instance, counting words in a post in addition to converting to HTML), the core parser implements an API similar to SAX-mode XML parsers: it emits start and end events for each tag, and text anywhere there is text. It does a single pass through the text and doesn't backtrack, so it should be very fast. A disadvantage of this approach as implemented is that it may be difficult to adapt to other dialects of BBcode than the one we use here, but for Cemetech-adjacent uses that's not a concern. BBcode isn't the most self-consistent language, so some tags have special behavior coded in- especially lists, but also images and some kinds of links.

The one major departure from the status quo is in how unclosed tags are handled- in order to avoid a need for unbounded backtracking, the parser assumes that any valid open tag also has a valid close tag somewhere- if it reaches a point where a tag should have been closed but was not, it synthesizes a close tag. For instance, the markup:
[u][b]Hello, you[u]
would be displayed as [b]Hello, you by a traditional parser (illustrated with fullwidth brackets U+FF3B, U+FF3D to avoid ambiguity) while this one translates it as Hello, you (bolding the text as well as underlining) because the valid open bold tag must be closed when its parent underline is closed.
(Further supporting that choice, illustrating the behavior here reveals very strange behavior where the first unmatched tag is closed by a much later one, meaning the span of bold is much longer than expected.)

Being implemented in Rust, it's easy to embed this as a library. It currently has Python bindings (which I expect to use on the server) and a version that can be built to WebAssembly/Javascript, which can easily be used in a browser.
This is pretty nice. Is there a compelling reason to keep using bbcode though, as opposed to something like reStructuredText? I guess I'm too far removed from site development to understand why rolling a custom parser makes sense.

Also your rust code is amazing lol, I hope one day to be that good Razz
At least for existing posts, we can't really translate them to some other markup system- we need to keep them as bbcode. I do want to support other kinds of markup on the site in the future (Markdown primarily, but I see no reason reST couldn't also be supported), but that doesn't free us from the need to be able to handle bbcode.

MateoConLechuga wrote:
Also your rust code is amazing 0x5, I hope one day to be that good Razz
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum