Message Processing vs. Stream Processing
Picture this: you’re trying to move your monolith application into the cloud—or maybe you're all in on cloud development—and you’re rewriting your app from scratch. Everyone is telling you to design it so that it’s “loosely coupled.” What does that mean, and how would you go about doing it?
A loosely coupled architecture means, among other things, that your services don't directly communicate with each other. Using a message broker between two services allows them to be run independently, or decoupled from one another. Messages coming through the broker are processed in one of two ways:
as individual messages
as a stream
This article will cover the differences between the two processing methods. You’ll also see how the content of messages and events differs. Finally, you’ll see concrete examples of message processing and stream processing to see how Apache Pulsar can do both.
Use Cases for Message and Stream Processing
So what exactly does it mean to process messages as a stream? How is it different from processing messages from a queue? To illustrate the difference between the two, I’ll reference a stock price service.
Message Processing
An application that displays the most recent price for various stocks would be a good use case for message processing. Your service would receive the most recent quote as a message; each message can be thought of as a command to update the stock price. You can process the command by saving the sale price into a database, then delete the message.
Each message contains all the information required to process it. This is called stateless processing since it doesn’t require the context of any previous messages. Any message can be sent to any compute resource for processing.
If your service is only tracking ten stocks, you could get by with one server. But if you wanted to watch every stock on the exchange you’d need to add a lot more. That’s the power of message processing—you can scale horizontally to handle an increased load.
Stream Processing
Continuing the example, if you wanted to calculate a moving average of those same stock prices, you would use stateful processing. You’d need to use stateful processing because a moving average is based on the content and order of previous values.
In situations where you need the context of previous events, you should definitely be looking into stream processing.
Data Content
One way to remember the best uses for message processing vs. stream processing is to think about the intent of the content you need to send.
Events record something that happened at a specific point in time: a stock sale, a rain gauge reading, or how fast a car was moving. They should make it obvious what happened with no mention of what actions to take. They will be processed in a stateful manner by other services, which will decide for themselves what to do with the information in the context of previous messages.
A message is a command to do something immediately: update the stock sale price, set the current weather to rain, or report that a driver is speeding. When reading the message, it should be obvious what should happen when the message is received. The message should also contain all of the data required to process it since it will be processed in a stateless manner.
Data Transmission
A message queue is a common feature of messaging systems. Messages published to a queue stay there until they have been read by all of the intended recipients. This means that messages will stay in the queue until you have the compute resources available to process them.
If you want to speed up processing, you need to increase the number of consumers of the queue. Using horizontal scaling like this also means that you should only be using queues for message processing.
Pub-sub services provide additional capabilities and features beyond what a simple queue can offer. The most obvious advantage is that messages are sent out to subscribers instead of requiring resources to constantly poll a queue.
Another advantage of pub-sub is that you can have multiple subscribers to a single topic. You can send one message from your service to a pub-sub, then any service that needs to act on that event can subscribe to the topic. Services can be added on later, to act upon the events that you’re sending to the pub-sub topic.
How Pulsar Supports Both
There’s a lot of software available to implement your message or stream-based architecture. What happens if you need to mix and match? Apache Pulsar is a pub-sub service that allows you to run both messaging and stream processing, depending on how you subscribe to the service.
When you create subscriptions in Pulsar, you specify which subscription to use and what resource should receive messages from the subscription. The type determines whether you'll be using messaging, streaming, or a hybrid processing model.
Message Processing
You can run message processing on Pulsar by using a “shared” subscription type. The shared subscription type works by cycling through all of the connected consumers, passing one message to the consumer, and moving to the next one.
Stream Processing
There are two subscription types that you can use to implement stream processing in Pulsar.
An “exclusive” subscription. With this subscription type, you’re only allowed to attach a single consumer to the subscription. All messages that are sent to the topic will be sent to this single consumer, in order.
A “failover”. If you need a little more resiliency, you can use the other type of stream subscription. Failover will only pass messages to a single consumer at a time.
The difference between the two is that failover allows you to add backup consumers. When the primary consumer stops acknowledging messages, the subscription will pick one of the backup consumers to take over and will start sending all messages to that one.
Distributed Stream Processing
Pulsar has implemented a third, hybrid type of processing. It enables distributed processing, as with messages, but allows you to make sure certain messages always go to the same consumer so you can run stream processing on them. This hybrid subscription type is called Key_Shared
.
The Key_Shared
subscription allows you to attach multiple consumers. When a message comes in, the subscription checks to see if a key is defined in the message. If the key is new, then the subscription selects a consumer and sends the message. When new messages come in with the same key, they will all be sent to the same consumer.
Conclusion
In this article, you’ve seen that messages send commands, and events send the current state. Messages can be received in any order, but events should be handled in a stream so they can be processed in the context of previous messages.
You’ve seen the difference between message queues and pub-sub message brokers. You’ve also gotten a brief introduction to Apache Pulsar, which allows you to run message and stream processing off of a topic.