parser and me sitting on a tree ...

24 April, 2025

I always thought parsers were pretty complicated, but I think I was underestimating them even then.

Rust recently turned 10 and the Rust tool I've been using (and loving) is the Zed code editor. I figured the lang deserves a gift.

When starting work on the better-comments extension for Zed, my task seemed way too simple. Given a comment (String), we must flag them if they match a known pattern such as // * Information!. Simple regex, right? Well, the VS Code extension actually does just that. Well at it's core anyway, the rest is just JavaScript bread.

Well, it's not fair to reduce the JS logic to "bread", but I'll dive more into it's details in the next post, when I explain how I actually did the syntax highlighting.

Coming back to the matching logic, the VS Code extension works as a wrapper to the native editor. Which means it's logic for detecting and flagging a comment is independent of any language server. That's not the case in Zed. Primarily,

The absolute pain of syntax highlighting in Zed.

Zed’s syntax-highlighting engine is not exposed to extensions. Everything that ends up with colour in the editor still flows through the built-in pipeline:

Tree-sitter grammar --> language-specific *.scm queries --> theme --> paint

The zed_extension_api that you can call from lib.rs has no function that feeds new highlight spans back into that pipeline—only utilities for LSP, processes, downloads, etc. Docs.rs Zed

This albeit robust design has hurt the extension ecosystem time and time again. Leading to for example lack of LSP Semantic Tokens no

If and when this issue is closed, perhaps I could just feed cross-language comment markers back through LSP but until then, I'll have to sit on the tree.

This would mean allocating/writing a dedicated lisp file for each language server that I intend to support (yes, that's right).

Injection?

My second idea was to use injections. Consider this Python snippet which executes a SQL query:

cursor = conn.cursor()
cursor.execute("SELECT * FROM table")
print(f"The first object is {cursor.fetchone()}")

Now the interesting thing here is that the syntax-highlighter identifies that "SELCT * FROM table" is a SQL command; it's not Python code and it's not a string like the actual text in the print() method. Therefore the highlighter is able to distinguish between python and non-python code. Carrying forward this analogy, my plan was to create a new language and then identify it within the code.

Benefits of the “Better-Comments-Lang” approach

Benefit Why it helps Isolated logic All the pattern matching (“TODO”, “FIXME”, “*”, etc.) lives in one grammar instead of being copy-pasted into every host’s highlights.scm. Theme flexibility Because the injected grammar can emit any of Tree-sitter’s standard capture names—or custom ones that your theme recognises—you’re not limited to the small set of comment highlight groups most themes provide. One-time maintenance When you add a new keyword or colour rule you only touch the Better-Comments grammar, not every host language. Practical constraints to keep in mind

Per-language injection files are still required. Tree-sitter queries can’t be registered globally, so each host extension that owns the comments must bundle an injections.scm rule (or you publish a fork/PR of that extension) . No “user-configured” injections yet. Zed’s extension API doesn’t expose dynamic, runtime injections—only the static files shipped in an extension are read. Performance overhead is negligible but real. A second parser runs over every comment, so massive codebases with very large block comments might see a small extra cost; in practice real-world editors that do this (Pulsar, Neovim, etc.) report no visible slowdown pulsar-edit.dev . Theme support required. If you invent new capture names (e.g. @comment.todo, @comment.warning) they have to be mapped in every theme you want to colour correctly; otherwise Zed will fall back to the generic comment scope.

#match? made in heaven

When plan B fails, go back to plan A. Regex, but better.

Gratitude

Naturally I owe a big thanks to the wonderful dev team at Zed. But also, I want to thank the neovim community - they've done so much work (and fruitful discussion) on tree sitters which really helped me circumvent a lot of potential pitfalls.