None

Kafka Insight: Bridging Data and Applications

Kafka is an open source, distributed event streaming platform. This is design for fault tolerant, high throughput data processing. This is used to build a data pipeline, streaming applications and allow the processing stream of records in fault tolerant manner.

We will be keeping this blogs more hands on with a little theoretical information, which is easily digested by anyone. we are not going to compare any traditional messaging system over Kafka as we are focusing on distributed environment.

Kafka can serve the variety of purpose, specially with the large scale message and its processing. Following are few usage of it.

  • Real time Streaming
  • Messaging
  • Event Sourcing
  • Real time analytics
  • Data Pipelines

Not only the above, Kafka is included to design the loosely coupled system, that provide the al the benefits that comes with distributed systems as below listed and many more. .

  • Low Latency
  • Fault Tolerant
  • High Throughput
  • Scalable
  • Durable

Lets take a deep dive into the Kafka world.

Kafka is used for variety of use cases for messages, and we can consolidated them in 3 .

  • Batch Processing: (ETL Jobs)
  • RealTime Processing: Transaction Management (Saga)
  • Publish & Subscriber Processing: Messaging System

In General following are the terminology of Kafka System.

Kafka TerminologyDetails
BrokerRunning instance of Kafka is called broker. There may be multiple brokers of kafka running in cluster.
ControllerIn a Kafka Cluster, one of the available broker became controller (ideally first broker join the cluster)
This controller receives request and take action on them
TopicsTopic is an entity where the messages (data) resides.
It is a named entity where producers produce the data and consumers consumes from it.
PartitionTo achieve parallelism, single topic can be divided into multiple instances called partition.
Partitions can be increased and decreased as per the consumer requirement.
Consumer consumes from a single partition as they are single threaded.
Ordering is guaranteed in a single partition.
OffsetOffset is a locator which point out the location of data.
Data can be located through partition & offsets.
Message & OrderingData that reside in the topics called message.
Ordering of data is guaranteed in the partition.
Followers
Leader
Client – API: These are the API, through which client can interact with the Kafka clusters.
Source Connector
Sink Connector
Producer
Consumer
Stream

Kafka Cluster contains multiple kafka broker (nodes). Specifically they are called message broker and responsible to established the interaction between systems. These system are called producers and consumers. The Health of the cluster is maintained by the Zookeeper

Producer produce the data, and consumer is consuming it. This data is shipped to a specific topic. , which is further passed to partitions. When topics are created, then the partitions associated with the topic are distributed across the broker. lets see what happens when topics are created.

When topic creation is initiated, that request is received by zookeeper. Zookeeper pass this request to controller, which is an additional responsibility attained by one of the broker. The responsibility of the controller is to distribute the ownership of partition to the available brokers. This partition distribution among the broker is called leader assignment. So with this a topic is created and partitions are distributed.

Kafka The topic is divided into the partition and data is arrange in a sequential manner in a partition. Kafka stores the data in a file system