“Hello world” with Apache Kafka

So it’s time to send some data bits through Apache Kafka. But first, as usual, we need to install it first.

Installing Kafka is so trivial, so I’ll change my rule and will actually explain the process. Here goes the manual:

  1. Install Java Development Kit (you probably have it already)
  2. Download Kafka tarball
  3. Uncompress it ( tar -xzf kafka_2.11-0.10.1.0.tgz in *nix systems)
  4. Done. You installed Kafka.

Seriously, it’s that simple. Before we go any further, let’s look around a bit:

Kafka archive comes with minimum amount of folders and we’ll deal with only two of them: bin , where all shell scripts reside, and config with services configuration.

How to launch Kafka

As I mentioned last time, even single instance of Apache Kafka is a cluster, so starting it differs from starting e.g. single RabbitMQ node. Unlike the rabbit, Kafka needs a helper service to coordinate its nodes within the cluster, and that service is ZooKeeper. Whenever new topic is created, or new node added, or old one destroyed – ZooKeeper is the guy to sort things out. It decides where to put new topic, or what to do with new node, or how to rebalance replicas, when some of them were lost. It’s a monitor and an authority. No wonder it has to be started first.

Starting Apache ZooKeeper

Kafka installer already has all we need – ZooKeeper starter script in bin  and its configuration in config , so without further ado:

By the way, I’m running this from openjdk container in Docker and the steps I make will look the same in Mac and Linux. Happy Windows users, however, have to replace bin  with bin\windows\ , and .sh  extension with .bat .

Starting Kafka server

Just one more shell and config file:

It produces a lot of output, so it’s easy to miss, that one of the first things Kafka does is connecting to ZooKeeper at localhost:2181 – address defined server.properties  file. Now, with the server is up and running, it’s time to do some data exchange.

Producing and consuming data

Producing and consuming data in Kafka needs three more things: a topic to hold the data, a producer to create it and a consumer to get it back. And there’re shell scripts for all of that.

Create a topic

Creating a topic looks more complicated than expected, but it should, because there’s quite a bit of nontrivial magic happening there. The meanings of --create  and --topic mytopic  are pretty obvious, but the rest deserves some explanations.

  1. Firstly, we’re sending command to ZooKeeper, not Kafka broker:  --zookeeper localhost:2181 . That might be surprising first, but makes total sense when you think about it. In common scenario you won’t know how many nodes are in the cluster and who can hold one more topic, which, in fact, is more than just a name – it’s a whole storage associated with it. ZooKeeper, on the other hand, knows all about the cluster and it makes sense to talk to him first.
  2. Secondly, we specified replication factor:   --replication-factor 1 . “1” means there will be only one copy of topic messages, so if underlying Kafka node goes down, whole topic goes with him. If we had two nodes and set replication factor to two, each node would get its own copy of the topic.
  3. Finally, partitions number is set to 1, meaning topic files and data will be located in one place, in one logical storage. However, if topic is going to grow big and one broker is unlikely to handle estimated amount of requests to it, we could choose two or more partitions, and ZooKeeper would put them into different nodes, if possible.

Produce messages

Just one more shell file:

Producer needs to know location of at least one broker (you can specify more than one). It doesn’t have to be The broker that will end up storing the messages – just some broker that will help finding the right one. Console producer keeps the prompt open and whatever you type there will become messages.

Consuming messages

You guessed it – one more shell script:

Like a producer, consumer also needs some sort of entry point – --bootstrap-server . It also doesn’t exit immediately and keeps echoing messages until somebody stops it.

Voilà! We successfully started single node Kafka cluster and sent some messages through it.

Conclusion

Starting Kafka server and sending data through it looks noticeably more difficult than doing the same in RabbitMQ or ZeroMQ (everything is more difficult than ZeroMQ). That’s a price to pay for a system, which is cluster and high-availability ready. On the bright side, going from single- to multi-node cluster with partitioned topics costs nothing extra. That’s what we’re going to make sure of the next time.

Leave a Reply

Your email address will not be published. Required fields are marked *