Maybe you’re working on an ETL job that needs to make use of massively parallel processing. Or maybe there are some aspects of your application that need to be decoupled into asynchronous micro-services. RabbitMQ is a feature-packed, open-source AMQP broker that can help with the above problems and many more. Written in Erlang (though you don’t need to know Erlang to use it) and built over TCP, it’s sure to hit the performance needs for your application. If you need proof, read about how Pivotal and Google collaborated to achieve one million messages per second using RabbitMQ.
While it’s pretty easy to get started with RabbitMQ, there are many facts about how the broker is put together that I’ve learned over the years that would have saved me significant time and headaches to know from the beginning. I’d like to convey those in this article, which assumes a basic knowledge of RabbitMQ such as creating connections, channels, exchanges, queues and publishing/subscribing.
- Separate Projects and Environments Using Vhosts
In the same way that you might create different databases within a single MySQL host for different purposes, RabbitMQ has a construct called ‘virtual hosts’ or ‘vhosts’ that allow you to bundle your channels, exchanges, and queues based on their purpose (or by project) without setting up multiple hosts. You could even duplicate your messaging fabric for different environments (production, development, staging, etc.).
You can create vhosts from the management interface, the API, or the rabbitmqctl command line tool.
- Try a Framework
When I first started working with RabbitMQ, I was writing everything with the basic Ruby RabbitMQ client library called bunny. That worked fine but I soon realized I was repeating work that had already been done in some of the fantastic, open-source messaging frameworks out there such as Sneakers for Ruby (to which yours truly is a contributor) and Celery for python. Check out this video about how Instagram uses Celery and RabbitMQ for high availability of their feed.
- Acknowledge Your Acknowledgment
Pay close attention to where in your consumer logic you’re acknowledging messages. For example, you may have some exception handling that catches errors in your worker and sends the error message to some queue or error log. You’d want to make sure the consumer returns to the main logic after the error handling and returns an acknowledgment to the broker; otherwise, the messages will be stuck in the queue without any sign that something is wrong. This seems obvious, but it’s a really easy thing to miss. Also, this is a good reason to use dead letter exchanges.
- How Not to Lose Messages
If you publish a message to a queue that hasn’t been declared (or to an exchange with a routing key that doesn’t match any existing queues) your message will go straight to /dev/null. The broker won’t return an error as you might expect and the message will disappear. You can check if a queue exists before publishing using
passive=true. You may also want to architect accordingly by having your producers declare the queues before publishing to them or by writing a “fabric builder” task that sets up all the exchanges, queues and bindings up front.
- How to Really Not Lose Messages!
If the cost of losing messages is really too high to bear, you can ensure delivery to a consumer by setting the channel to
confirm mode using
confirm.select. In this mode the broker will return an
ack for each message it handles and
nack! in the event of a failure. A confirmation handler can be built into the publisher to act accordingly (e.g; republishing failed messages). In this mode the broker creates auto-incremented ids for each message (per channel), so the publishing client can use a counter to reconcile each message.
Another option is to use AMQP transactions but this will decrease the throughput by a factor of 250 (according to the RabbitMQ docs).
- Surviving Broker Crash or Reboot (How to Really, Really Not Lose Messages!)
RabbitMQ allows for clustering and load balancing for availability but I won’t go deeply into sysadmin issues here. However, if you have mission critical messages that can’t be lost in the event of a broker reboot make sure you do the following:
- Only publish to exchanges or queues that are declared as ‘durable’ and make sure the messages are marked ‘persistent’ by setting
delivery-mode = 2.These messages will be written to disk, which will result in a performance loss (possibly as much as 60%).
- If you’re working with a cluster, the queues you declare are not mirrored across all nodes, instead each node has pointers (containing basic metadata such as queue name, durability, etc) to the queues in neighboring nodes. If a node containing a durable queue goes down, you can’t simply re-declare that queue on the next available node that replaces it. The other nodes will still have a pointer to the durable queue on the original node and reject the declaration as redundant. To withstand failures you must load balance an active node with a standby node (HAproxy has ‘backup mode’ for this purpose), which could be on a separate host containing a durable queue of the same name. One you restore the original node the messages can be shoveled (see plugins) over from the backup node.
- RabbitMQ as a Micro-Services Protocol
Instead of using an HTTP API you can use RabbitMQ to connect services within your infrastructure. Essentially your client makes a request by publishing a message to the broker with a reply queue set in the ‘reply-to’ message header. The client will be simultaneously listening to that queue for a response (it can even do so on the same channel to which it’s publishing). The service consumes from the appropriate queue, processes the request and then publishes the message to the ‘reply-to’ queue specified in the message. The client can also set a
correlationId that it can use to reconcile the response messages with the requests. See the RPC tutorial for more details.
- Shutdown a Consumer at the End of a Process
Do you want a consumer to stop listening at the end of some process? You can send a message containing a “quit” signal in the message body or headers. When the consumer finds this signal it can send
basic_cancel to the broker and unsubscribe.
- Flow Control
If your publishers are sending messages at a pace that your consumers can’t keep up with you can implement flow control (i.e; publisher throttling) by setting
- Use the API
RabbitMQ has a very useful API that can be used to declare, delete and monitor exchanges, queues, vhosts and users. The root endpoint is http://localhost/api where you’ll also find rich documentation. You can even POST a JSON file of your entire fabric (which you can export from the management console and check-in to Git) to
/api/definitions and it will spin up your arrangement for you!
- Monitoring/Health Checks
You can use a framework like Nagios in combination with the RabbitMQ API and or your own short applications to do health checks. For example, you could write snippet of code to connect to broker and create a channel. Wrap that snippet in an exception and return with code 2 in the event of an error. This will make sure that the broker is responding to TCP connections and that AMQP commands are responding. You could also use the
api/aliveness-test endpoint of the API but this will run internally and might not actually test the client’s ability to connect.
You can check for warning-level issues by checking the state and configuration of queues or exchanges. You can simply query ‘api/queues’ and compare the returned configurations to the expected settings in a JSON. So if a queue was accidently re-declared as non-durable when it should be durable (which can happen and can result in message loss) or if a memory threshold has been surpassed, you can trigger a warning. Another thing to look for is how many unacknowledged messages are in the queue. If the number is much more than your average, it could indicate a failure of your consumers.
- Performance Tips
Non-critical consumers (i.e.; a log parser) can subscribe with
no-ack = true. This will speed up the broker by allowing it to “fire and forget” the messages. Also, if you set the
immediate flags to
false in your messages the client will speed up because the RabbitMQ routing will happen asynchronously.
If you’re really optimizing, you should know that Topic exchanges are slower than Fanout and Direct exchanges because of the pattern matching that has to happen to route messages. They are still super fast so this probably won’t matter to you.
The only plug-in I’ve used is Shovel (which lets you move messages from one host to another) but there are plugins that can support different protocols (like STOMP), alternate authentication systems (LDAP, etc), and new types of routing mechanisms.
You can install a plugin using
./rabbitmq-plugins enable <plugin-name>
If you’re ambitious you can even build your own plug-in!