Apache Kafka 101: How to set up Kafka on your local
Welcome to a do-it-along-with-me session on how to set up Kafka on your local system. This is done on the Mac and you would find a similar set of steps for Windows or Linux elsewhere. In this tutorial, we will do everything from scratch with screenshots.
This article is written for smart noobs who are interested in learning how to set up Kafka on your local to learn and implement the fundamentals.
In case you are interested in learning the theory, you can have a look at my article Apache Kafka 102.
When I was researching Kafka, one of the issues I had is that everyone primarily uses Docker images. There are docker images by Bitnami, wurstmeister, but what if you want to build your docker image from scratch? Like, right from the .tar file. It took a little time and digging, but, I was able to figure it out.
Download the Kafka Tar File
Go to Google, and type out “Kafka Tar File” or you can go directly to the Apache Kafka website here.
Download the latest Kafka tar file to a folder of your choice.
Pre-requisite Software
Now, in this section, in case you do not understand any word, please click on the underlined hyperlinks where I have linked guides to help you understand more or install it.
Java & Brew
First, go ahead and install Homebrew on the terminal. Now, we need to install Java to run Kafka via this guide. Type in the following command to see the versions.
First, we will search for the versions of available Java Development Kits (JDKs).
brew search openjdk
For this installation, we will select Java 11, which we can install with Brew for version openjdk@11 on Mac.
brew install openjdk@11
Next, we will put the path of the JDK we installed into the system. We will symlink it to the system.
sudo ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk
Now, we have successfully installed Java 11 on our system.
Modify the properties files
Simply double-click on the file to unzip and open the tar file.
Don’t be daunted by the folder structure, we will slowly start to understand more. Following is the folder structure of Kafka:
- bin: This is where the shell scripts are. These are the scripts that we will run to get Kafka to run.
- config: This is where properties files are that we can modify to change the parameters of various functions in Kafka.
- libs: This is where we find the libraries and jar files that are used to get Kafka to run smoothly.
- licenses: This folder just contains the licenses that Kafka is registered under.
Understand the purpose of the elements of Kafka
- Run Zookeeper: This is how Kafka keeps track of the metadata that is had. It will run on port 2181.
- Run the Kafka Server: This will run the kafka server which will host any interactions between the producer and consumer. This will run on port 9092.
- Run a standalone or distributed server: This will run the standalone or distributed server. This will run on port 8080.
How Zookeeper, Kafka Server and Standalone or Distributed Server Work
Apache Kafka is a distributed event streaming platform that uses Zookeeper, a centralized service for managing configuration and synchronization in distributed systems. This part of the blog explains how they work together.
Zookeeper’s Role in Kafka
Zookeeper helps Kafka in several ways:
- Controller election: The controller is a broker that manages the leader-follower relationship of the partitions. If a broker shuts down, the controller assigns new leaders for its partitions. Zookeeper elects a new controller if the current one fails, using a znode (a data node) called /controller.
- Cluster membership: Zookeeper tracks the status and availability of the brokers in the cluster. It also notifies Kafka of any changes in the cluster membership. This information is stored in another znode called /brokers/ids.
- Topic configuration: Zookeeper stores the metadata of the topics, such as the number of partitions, the replication factor, the leader and the ISR (in-sync replica) for each partition, and any user-defined configuration overrides. This information is stored in another znode called /brokers/topics.
- Quotas: Zookeeper stores the quotas (limits) for each client-id or user-principal on producing and consuming data. This information is stored in another znode called /config/clients.
Kafka Server’s Role in Kafka
The Kafka server is the main component of Kafka that handles all the requests from producers and consumers. The Kafka server consists of several modules, such as:
- Socket Server: This module handles the network communication between the clients and the server.
- Request Handler: This module parses the requests from the socket server and calls the appropriate API to handle them. It also performs authentication, authorization, and quota checks for each request.
- Replica Manager: This module manages the log replicas for the partitions assigned to the broker.
- Log Manager: This module manages the log segments for each partition. It creates new segments, deletes old segments, indexes and flushes data to disk, and handles log compaction and deletion policies.
- Group Coordinator: This module coordinates the consumer groups that consume data from Kafka. It assigns partitions to consumers, handles rebalances, commits offsets, and responds to consumer heartbeats.
- Transaction Coordinator: This module coordinates the transactions that span multiple partitions. It assigns producer ids, tracks transaction states, handles aborts and commits, and ensures exactly-once semantics.
Standalone or Distributed Server’s Role in Kafka
Kafka can be run in either standalone or distributed mode. In standalone mode, there is only one broker that handles all the requests and stores all the data. This mode is suitable for testing and development purposes, but not for production. In distributed mode, multiple brokers form a cluster and share the load of processing and storing data. This mode is suitable for production as it provides scalability, fault tolerance, and high availability.
The commands
Move the folder where you have downloaded Kafka. To get the path, you can hold down the “options” button and right-click on the folder. Or you can go to the options in Finder and enable the “Show path bar” to get the file path.
Next, we will get the path by clicking on the path.
Now, we will use the “change directory” command to move to that path. Open spotlight search and search for the Terminal. In case you are unsure how to do this, I have hyperlinked links that can show you to do it in the words, please feel free to click on them.
Now, to move to the terminal, use the following command.
cd <path to kafka>
For me, this looks like this:
cd /Users/anishmahapatra/Work/Kafka/kafka_2.12–3.4.0
Run Zookeeper
Now remember, the three main components we will need for the setup have to be run in the order here, which is Zookeeper first, followed by the Kafka Server, followed by the standalone server or the distributed server.
Kafka will need to store its metadata and this is done via a metadata handler, which is Zookeeper. To run it, open a new tab, go to the right folder (as shown above) and run this command.
bin/zookeeper-server-start.sh config/zookeeper.properties
Now, we have the Zookeeper server running. Next, we would like to run the Kafka server in a new terminal. Pro-tip, make sure that you are in the correct working directory when you open a new terminal.
Run Kafka Server
Now, we would like to run the Kafka server. For this, we would like to run the command on a new server. Make sure you are in the right Kafka folder.
bin/kafka-server-start.sh config/server.properties
Now, we have the Kafka server running. Open a new terminal and run the next command. Just a quick reminder, in case you would like to “elegantly shut down” the servers, hit Ctrl + C to wind down the server.
Run standalone server
Now, we have two options here, we can run the standalone server or the distributed server. For this blog, since I want to keep it simple, I will just do the standalone server. Please feel free to follow up and connect with me in case you enjoy learning.
https://www.linkedin.com/in/anishmahapatra/
bin/connect-standalone.sh config/connect-standalone.properties
Create a topic anishmahapatra
Now, for the transfer of real-time data to be exchanged between the producer and consumer, it is done via a topic. We can create this topic using the servers that we have already run. A new topic can be made via the following command.
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic anishmahapatra
Now, we have created a new topic. Let’s see the list of topics then.
Check for the list of Topics
Let’s list down the topics that are present in Kafka. We should see the topic that we made.
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
Now that we have created and validated the topic, we will run the producer and consumer.
Run producer and consumer script
In an ideal world, the producer and consumer would be in different datasets, but for the scope of this article, we will limit it to the terminal, where the aim is once we run the producer and consumer, we would like to see the message appear in the consumer once we type it and hit enter on the producer terminal.
This is the code to run the producer in the terminal.
bin/kafka-console-producer.sh --topic anishmahapatra --bootstrap-server localhost:9092
This is the code to run the consumer in the terminal.
bin/kafka-console-consumer.sh --topic anishmahapatra --bootstrap-server localhost:9092
Now, when we type a message in the producer, we should see it appear in the consumer. Below is a screenshot that you can zoom into to understand more.
That fulfils the aim of the blog. We have now set up native, vanilla Kafka on your local and implemented a producer and consumer.
Conclusion
In this article, we have learned how to set up Kafka on our local system using the tar file and the terminal. We have installed the necessary software, such as Java and Zookeeper, and configured the Kafka properties. We have also tested our Kafka installation by creating topics, producing messages, and consuming messages. We have seen how Kafka works as a distributed event streaming platform that can handle high volumes of data in real-time.
Connect with me and follow me for more!
I hope you enjoyed this do-it-along-with-me session and learned something new. If you did, please give this article a clap and share it with your friends. Also, feel free to follow me on Medium and connect with me on LinkedIn at https://www.linkedin.com/in/anishmahapatra. I would love to hear your feedback and suggestions for future articles. Thank you for reading!