Let's discuss about common terms been used in kafka and about their roles in distributed architecture.
Producer
- An application that sends message to Kafka
Message
- Small to medium sized piece of data
Consumer
- A group of computers sharing workload for a common purpose
Topic
- Kafka topics are divided into several partitions. While the topic is a logical comcept in Kafka, a partitin is the smallest storage unit that holds a subset of records owned by a topic. Each partition is a single log file where records are written to it in an append-only fashion.
Offset
- A sequence id given to messages as they arrive in a partition
Global Unique identifier of the a message?
- Topic Name -> Partition Number -> Offfset
Consumer Group
Can multiple kafka consumers read same message from the partition?
- It depends on group ID. Suppose you have a topic with 12 partitions. If you have 2 Kafka consumers with the same Group Id, they will both read 6 partitions, meaning they will read different set of partitions = different set of messages. If you have 4 Kafka cosnumers with the same Group Id, each of them will all read three different partitions etc.
- But when you set different Group Id, the situation changes. If you have two Kafka consumers with different Group Id they will read all 12 partitions without any interference between each other. Meaning both consumers will read the exact same set of messages independently. If you have four Kafka consumers with different Group Id they will all read all partitions etc.
Within same group: NO
- Two consumers (Consumer 1, 2) within the same group (Group 1) CAN NOT consume the same message from partition (Partition 0).
Across different groups: YES
- Two consumers in two groups (Consumer 1 from Group 1, Consumer 1 from Group 2) CAN consume the same message from partition (Partition 0).
No comments:
Post a Comment