🎯Apache Kafka

Introduction

Kafka is a distributed streaming platform used to build real-time data pipelines and streaming applications. It is designed to be fault-tolerant and horizontally scalable, and provides high-throughput and low-latency data delivery.

Core Concepts

Topic

A topic is a category or feed name to which records are published. A topic is identified by its name, and a topic can have multiple partitions. A partition is a unit of parallelism in Kafka.

Partition

A partition is a unit of parallelism in Kafka. A topic can have one or more partitions, which allows for messages to be distributed across multiple brokers.

Offset

An offset is a unique identifier assigned to each message in a partition. It represents the position of a message in a partition.

Producer

A producer is a process that writes data to a Kafka topic. It sends records to a Kafka broker, which then writes the records to the appropriate topic and partition.

Consumer

A consumer is a process that reads data from a Kafka topic. It subscribes to one or more topics and partitions, and reads messages from them.

Broker

A broker is a Kafka server that receives messages from producers and sends messages to consumers. Each broker can handle multiple partitions and topics.

ZooKeeper

ZooKeeper is a distributed coordination service used by Kafka to maintain configuration information and provide distributed synchronization. Kafka brokers use ZooKeeper to discover each other and to elect a leader for each partition.

Consumer Group

A consumer group is a group of consumer processes that collectively consume messages from one or more topics. Each consumer in a group is assigned a subset of the partitions for the topics it is consuming.

Replication

Replication is the process of copying data from one broker to another for fault tolerance. Each partition has a leader replica, which is responsible for handling read and write requests. The leader replica replicates data to one or more follower replicas for fault tolerance.

Leader Replica

The leader replica is the replica in a partition that is responsible for handling read and write requests. It is the only replica that can receive data from producers and send data to consumers.

Follower Replica

A follower replica is a replica in a partition that receives replicated data from the leader replica. It is used for fault tolerance, and can take over as the leader if the current leader fails.

ISR (In-Sync Replica)

The ISR is the set of replicas that are currently up-to-date with the leader replica. Messages are only committed when they are written to all replicas in the ISR. If a replica falls out of sync with the leader replica, it is removed from the ISR.

Last updated