Sending messages

While sending a message in a chat product might seem simple, there's a lot of underlying complexity required to make a professional-quality experience.

This document aims to explain conceptually what happens when a message is sent in Zulip, and why that is correct behavior. It assumes the reader is familiar with our real-time sync system for server-to-client communication and new application feature tutorial, and we generally don't repeat the content discussed there.

Message lists

This is just a bit of terminology: A "message list" is what Zulip calls the frontend concept of a (potentially narrowed) message feed. There are 3 related structures: * A message_list_data just has the sequencing data of which message IDs go in what order. * A message_list is built on top of message_list_data and additionally contains the data for a visible-to-the-user message list (E.g. where trailing bookends should appear, a selected message, etc.). * A message_list_view is built on top of message_list and additionally contains rendering details like a window of up to 400 messages that is present in the DOM at the time, scroll position controls, etc.

(This should later be expanded into a full article on message lists and narrowing).

Compose area

The compose box does a lot of fancy things that are out of scope for this article. But it also does a decent amount of client-side validation before sending a message off to the server, especially around mentions (E.g. checking the stream name is a valid stream, displaying a warning about the number of recipients before a user can use @**all** or mention a user who is not subscribed to the current stream, etc.).

Backend implementation

The backend flow for sending messages is similar in many ways to the process described in our new application feature tutorial. This section details the ways in which it is different:

Local echo

An essential feature for a good chat is experience is local echo (i.e. having the message appear in the feed the moment the user hits send, before the network round trip to the server). This is essential both for freeing up the compose box (for the user to send more messages) as well as for the experience to feel snappy.

A sloppy local echo experience (like Google Chat had for over a decade for emoji) would just render the raw text the user entered in the browser, and then replace it with data from the server when it changes.

Zulip aims for a near-perfect local echo experience, which requires is why our Markdown system requires both an authoritative (backend) Markdown implementation and a secondary (frontend) Markdown implementation, the latter used only for the local echo feature. Read our Markdown documentation for all the tricky details on how that works and is tested.

The rest of this section details how Zulip manages locally echoed messages.

Local echo in message editing

Zulip also supports local echo in the message editing code path for edits to just the content of a message. The approach is analogous (using markdown.contains_backend_only_syntax, etc.)), except we don't need any of the local_id tracking logic, because the message already has a permanent message id; as a result, the whole implementation was under 150 lines of code.

Putting it all together

This section just has a brief review of the sequence of steps all in one place: * User hits send in the compose box. * Compose box validation runs; if it passes, the browser locally echoes the message and then sends a request to the POST /messages API endpoint. * The Django URL routes and middleware run, and eventually call the send_message_backend view function in zerver/views/messages.py. (Alternatively, for an API request to send a message via Zulip's REST API, things start here). * send_message_backend does some validation before triggering the check_message + do_send_messages backend flow. * That backend flow saves the data to the database and triggers a message event in the notify_tornado queue (part of the events system). * The events system processes, and dispatches that event to all clients subscribed to receive notifications for users who should receive the message (including the sender). As a side effect, it adds queue items to the email and push notification queues (which, in turn, may trigger those notifications). * Other clients receive the event and display the new message. * For the client that sent the message, it instead replaces its locally echoed message with the final message it received back from the server (it indicates this to the sender by adding a display timestamp to the message). * The send_message_backend view function returns a 200 HTTP response; the client receives that response and mostly does nothing with it other than update some logging details. (This may happen before or after the client receives the event notifying it about the new message via its event queue.)

Message editing

Message editing uses a very similar principle to how sending messages works. A few details are worth mentioning:

Inline URL previews

Zulip's inline URL previews feature (zerver/lib/url_preview/) uses variant of the message editing/local echo behavior. The reason is that for inline URL previews, the backend needs to fetch the content from the target URL, and for slow websites, this could result in a significant delay in rendering the message and delivering it to other users.

Soft deactivation

This section details a somewhat subtle issue: How Zulip uses a user-invisible technique called "soft deactivation" to handle scalability to communities with many thousands of inactive users.

For background, Zulip’s threading model requires tracking which individual messages each user has received and read (in other chat products, the system either doesn’t track what the user has read at all, or just needs to store a pointer for “how far the user has read” in each room, channel, or stream).

We track these data in the backend in the UserMessage table, storing rows (message_id, user_id, flags), where flags is 32 bits of space for boolean data like whether the user has read or starred the message. All the key queries needed for accessing message history, full-text search, and other key features can be done efficiently with the database indexes on this table (with joins to the Message table containing the actual message content where required).

The downside of this design is that when a new message is sent to a stream with N recipients, we need to write N rows to the UserMessage table to record those users receiving those messages. Each row is just 3 integers in size, but even with modern databases and SSDs, writing thousands of rows to a database starts to take a few seconds.

This isn’t a problem for most Zulip servers, but is a major problem for communities like chat.zulip.org, where might be 10,000s of inactive users who only stopped by briefly to check out the product or ask a single question, but are subscribed to whatever the default streams in the organization are.

The total amount of work being done here was acceptable (a few seconds of total CPU work per message to large public streams), but the latency was unacceptable: The server backend was introducing a latency of about 1 second per 2000 users subscribed to receive the message. While these delays may not be immediately obvious to users (Zulip, like many other chat applications, local echoes messages that a user sends as soon as the user hits “Send”), latency beyond a second or two significantly impacts the feeling of interactivity in a chat experience (i.e. it feels like everyone takes a long time to reply to even simple questions).

A key insight for addressing this problem is that there isn’t much of a use case for long chat discussions among 1000s of users who are all continuously online and actively participating. Streams with a very large number of active users are likely to only be used for occasional announcements, where some latency before everyone sees the message is fine. Even in giant organizations, almost all messages are sent to smaller streams with dozens or hundreds of active users, representing some organizational unit within the community or company.

However, large, active streams are common in open source projects, standards bodies, professional development groups, and other large communities with the rough structure of the Zulip development community. These communities usually have thousands of user accounts subscribed to all the default streams, even if they only have dozens or hundreds of those users active in any given month. Many of the other accounts may be from people who signed up just to check the community out, or who signed up to ask a few questions and may never be seen again.

The key technical insight is that if we can make the latency scale with the number of users who actually participate in the community, not the total size of the community, then our database write limited send latency of 1 second per 2000 users is totally fine. But we need to do this in a way that doesn’t create problems if any of the thousands of “inactive” users come back (or one of the active users sends a private message to one of the inactive users), since it’s impossible for the software to know which users are eventually coming back or will eventually be interacted with by an existing user.

We solved this problem with a solution we call “soft deactivation”; users that are soft-deactivated consume less resources from Zulip in a way that is designed to be invisible both to other users and to the user themself. If a user hasn’t logged into a given Zulip organization for a few weeks, they are tagged as soft-deactivated.

The way this works internally is:

The end result is the best of both worlds:

Empirically, we've found this technique completely resolved the "send latency" scaling problem. The latency of sending a message to a stream now scales only with the number of active subscribers, so one can send a message to a stream with 5K subscribers of which 500 are active, and it’ll arrive in the couple hundred milliseconds one would expect if the extra 4500 inactive subscribers didn’t exist.

There are a few details that require special care with this system: * Email and mobile push notifications. We need to make sure these are still correctly delivered to soft-deactivated users; making this work required careful work for those code paths that assumed a UserMessage row would always exist for a message that triggers a notification to a given user. * Digest emails, which use the UserMessage table extensively to determine what has happened in streams the user can see. We can use the user's subscriptions to construct what messages they should have access to for this feature.