Markdown implementation

Zulip uses a special flavor of Markdown/CommonMark for its message formatting. Our Markdown flavor is unique primarily to add important extensions, such as quote blocks and math blocks, and also to do previews and correct issues specific to the chat context. Beyond that, it has a number of minor historical variations resulting from its history predacting CommonMark (and thus Zulip choosing different solutions to some problems) and based in part on Python-Markdown, which is proudly a classic Markdown implementation. We reduce these variations with every major Zulip release.

Zulip has two implementations of Markdown. The backend implementation at zerver/lib/markdown/ is based on Python-Markdown and is used to authoritatively render messages to HTML (and implements slow/expensive/complex features like querying the Twitter API to render tweets nicely). The frontend implementation is in JavaScript, based on marked.js (static/js/echo.js), and is used to preview and locally echo messages the moment the sender hits Enter, without waiting for round trip from the server. Those frontend renderings are only shown to the sender of a message, and they are (ideally) identical to the backend rendering.

The JavaScript Markdown implementation has a function, markdown.contains_backend_only_syntax, that is used to check whether a message contains any syntax that needs to be rendered to HTML on the backend. If markdown.contains_backend_only_syntax returns true, the frontend simply won't echo the message for the sender until it receives the rendered HTML from the backend. If there is a bug where markdown.contains_backend_only_syntax returns false incorrectly, the frontend will discover this when the backend returns the newly sent message, and will update the HTML based on the authoritative backend rendering (which would cause a change in the rendering that is visible only to the sender shortly after a message is sent). As a result, we try to make sure that markdown.contains_backend_only_syntax is always correct.

Testing

The Python-Markdown implementation is tested by zerver/tests/test_markdown.py, and the marked.js implementation and markdown.contains_backend_only_syntax are tested by frontend_tests/node_tests/markdown.js.

A shared set of fixed test data ("test fixtures") is present in zerver/tests/fixtures/markdown_test_cases.json, and is automatically used by both test suites; as a result, it is the preferred place to add new tests for Zulip's Markdown system. Some important notes on reading this file:

If you're going to manually test some changes in the frontend Markdown implementation, the easiest way to do this is as follows:

  1. Log in to your development server.
  2. Stop your Zulip server with Ctrl-C, leaving the browser open.
  3. Compose and send the messages you'd like to test. They will be locally echoed using the frontend rendering.

This procedure prevents any server-side rendering. If you don't do this, backend will likely render the Markdown you're testing and swap it in before you can see the frontend's rendering.

If you are working on a feature that breaks multiple testcases, and want to debug the testcases one by one, you can add "ignore": true to any testcases in markdown_test_cases.json that you want to ignore. This is a workaround due to lack of comments support in JSON. Revert your "ignore" changes before committing. After this, you can run the frontend tests with tools/test-js-with-node markdown and backend tests with tools/test-backend zerver.tests.test_markdown.MarkdownTest.test_markdown_fixtures.

Changing Zulip's Markdown processor

First, you will likely find these third-party resources helpful:

When changing Zulip's Markdown syntax, you need to update several places:

Important considerations for any changes are:

Per-realm features

Zulip's Markdown processor's rendering supports a number of features that depend on realm-specific or user-specific data. For example, the realm could have linkifiers or custom emoji configured, and Zulip supports mentions for streams, users, and user groups (which depend on data like users' names, IDs, etc.).

At a backend code level, these are controlled by the message_realm object and other arguments passed into do_convert (sent_by_bot, translate_emoticons, mention_data, etc.). Because Python-Markdown doesn't support directly passing arguments into the Markdown processor, our logic attaches these data to the Markdown processor object via e.g. _md_engine.zulip_db_data, and then individual Markdown rules can access the data from there.

For non-message contexts (e.g. an organization's profile (aka the thing on the right-hand side of the login page), stream descriptions, or rendering custom profile fields), one needs to just pass in a message_realm (see, for example, zulip_default_context for the organization profile code for this). But for messages, we need to pass in attributes like sent_by_bot and translate_emoticons that indicate details about how the user sending the message is configured.

Zulip's Markdown philosophy

Note that this discussion is based on a comparison with the original Markdown, not newer Markdown variants like CommonMark.

Markdown is great for group chat for the same reason it's been successful in products ranging from blogs to wikis to bug trackers: it's close enough to how people try to express themselves when writing plain text (e.g. emails) that it helps more than getting in the way.

The main issue for using Markdown in instant messaging is that the Markdown standard syntax used in a lot of wikis/blogs has nontrivial error rates, where the author needs to go back and edit the post to fix the formatting after typing it the first time. While that's basically fine when writing a blog, it gets annoying very fast in a chat product; even though you can edit messages to fix formatting mistakes, you don't want to be doing that often. There are basically 2 types of error rates that are important for a product like Zulip:

Both of these are minor issues for most products using Markdown, but they are major problems in the instant messaging context, because one can't edit a message that has already been sent before others read it and users are generally writing quickly. Zulip's Markdown strategy is based on the principles of giving users the power they need to express complicated ideas in a chat context while minimizing those two error rates.

Zulip's changes to Markdown

Below, we document the changes that Zulip has against stock Python-Markdown; some of the features we modify / disable may already be non-standard.

Note This section has not been updated in a few years and is not accurate.

Basic syntax

Lists

Code

Other