Anand Sambamoorthy

I have exposure to traditional enterprise wide deployed messaging systems (typically JMS based and some AMQP based at later stages). Kafka was only experimentational to see if it was a fit-for-purpose tool for the use-cases at hand. This blog is part of the refresher and my reading through newer messaging technology - Kafka. The idea is to implement a sample use-case and to compare and see the features that Kafka brings to the table comparing against typical messaging technologies (JMS). The idea is not to list the difference from the Kafka platform and architecture perspective - e.g. distributed, scalable nature; rather it is to list the features that are from Message-Producer/Consumer perspective.

Kafka

Kafka is a high throughput distributed messaging platform typically used as a transport mechanism. Note: It is not typically used as application processing tool with business rules except when combined with kafka-streams where you can do some application processing logic. (it is claimed that it can infact be an alternative to processing frameworks like Apache-Spark)

Typical use-cases for using Kafka

Messaging and decoupling systems
Activity gathering
Metrics gathering from different locations, systems
Application logs gathering
Stream processing
Typically used in micro-services architecture, big-data integration along side with spark(some change this newly introduced streams)

Application implementation

The sample application would source tweets from twitter publish to a kafka cluster and sinked to an elasticsearch and visualization setup in Kibana. Application setup steps:

Install and configure Kafka with a tweets_topic
Source data from twitter - java client connecting to the twitter app with required auth keys
Sourced tweets published to Kafka
Consume tweets from Kafka topic with consumer/consumer-groups
Sink them to elastic-search cluster
Setup visualization with Kibana Note: Elastic cluster and Kibana are setup on cloud hosted bonsai.io

Application Design:-

Features that caught me

Below listing are the features from the Message Producer, Consumer point of view. The listing does not talk about Kafka from the overall platform and architecture point of view like cluster configuration, brokers (bootstrap brokers) partitions, replication etc!

Idempotent Producer
- An out of the box feature offered by Kafka - where the messaging system (the broker) is intelligent enough to know that it has already seen the message and discards it as duplicate. This occurs when for some reason the message-producer re-sends the message (say the n/w failed when the message-broker was sending a JMS-Ack and the producer re-sends the message). This is not a out-of-box feature that JMS message-brokers have by default atleast; the onus lies with the subscription system rather to discard them.

Safe Producer
- A safe-producer (retry, acks etc) in Kafka is a configurable feature setup at a message-producer level rather than the configuration on a broker/destination level.
- This is a good one especially if we think that it is best known for a producer to know the criticality of the message rather than at a broker/destination level. e.g on a notification topic - say the message producer with information notification does not need to have the Acks from all the broker partitions.

Producer{
...
        //safe producer
        properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
        properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
        properties.setProperty(ProducerConfig.RETRIES_CONFIG, "100");
        properties.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");//use 1 for ordering and that is within the partition
...        
}

Optimize for ThroughPut - producer level
- Kafka lets the Message producer decide how best to optimize the throughput of the message by setting properties e.g compression choice - say use google-snappy rather than simple zip and batching of the messages.
- This is a nice feature especially if we take the specialized approach like earlier that the message-producer knows the message and decided how best to batch, compress to increase throughput.

Producer{
...
        //High thruput settings
        properties.setProperty(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
        properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "20");
        properties.setProperty(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024));
...
}

Idempotent Consumer
- Idempotent consumer is a feature that removes the burden from applications implementing logic to detect duplication. Kafka-consumers can be configured with ids (business key or a combination of say topic_partition_somekey) that the kafka message broker will use to detect and purge the duplicate messages. Note: Typical JMS/AMQP implementations like ActiveMQ has these features as well, so not a unique feature but a nice out of box configurable feature.

Consumer in control
- The polling behaviour of Kafka lets consumers configure how best to retrieve the messages from the kafka-cluster. Consumer by configuration are empowered to poll data for batches at defined intervals or with batch-size exceeding certain limit rather than getting the messages as it is published.
- The idea is the consumer is best to judge how to consumer rather than the broker just pushing the messages as they get published

Commits and the bulk requests
- Message commits have choice - auto-commit or by consumers deciding when to commit.
- While this is not a new feature of Kakfa - this give more choices with configurable features like commit after bulk-request, batch.
- This feature also enables consumers to use Kafka as a persistent mechanism - to retrieve data at will from the point of choice. Kafka can re-play the messages on a commit-reset and the consumer can simply process them again. This is not something available with the typical messaging systems - the messages once delivered-acknowledged are gone from the broker. The onus lies with the publishing system to re-create/publish them again or the consumer get them by other choices with IT support intervention.

In summary, Kafka seems a great choice for messaging that empowers Message Producers and Message Consumers to decide how and when they want to pub/sub messages which is a bit different from typical messaging systems which does not have a great deal of flexibilty- the QoS of typical systems are often dictated/configured at a broker/destination level.

This added to the architecture, platform, monitoring, security and management features that Kafka as a whole offers makes Kafka a powerful platform for messaging and at ease. [why ease? I could setup the application that I described earlier in my laptop in a matter for few hours and it works!]

Anand Sambamoorthy

Messaging At Ease - Powered by Kafka!

Share

Kafka

Typical use-cases for using Kafka

Application implementation

Features that caught me