Published
- 2 min read
Apache Kafka
Apache kafka
What is apache kafka and its usecases Kafka is an event streaming platform for building real time data pipeline application Usecases : Real time analytics(website activity tracking) : User click activities are sent as logs to kafka server which can be used in analytic, here producer can be spark/apache beam/java application log aggregration and monitoring - processing millions of messages per second Streaming processing application - provides message streaming to kafka topic
Key components in Kafka
- topic
- partitions
- Brokers
- Producers
- Consumers
- Consumer group
- zookeeper
How does kafka ensure fault tolerance
- Replication
- Leader follower
- ISR - In sync replica
- Acknowlegement - configurable acknowledgement levels acks=0, acks=1, acks=all
How does kafka partitioning work
- Topic divided into partitions
- producer determine partitions key based partition round robin Partition
- Each partition is processed by a single consumer in a consumer group
- formula to generate kafka partitions: (Throughput/per second * no of consumers) P = Max(T/M, C, R) T: Required throughput in messages per second or bytes per second. M: Maximum throughput per partition (based on producer/consumer performance). C: Number of consumer instances (or threads) for concurrent processing. R: Replication factor (typically greater than 1 for fault tolerance).
What are kafka delivery semantics
- exactly once - messages are delivered to consumers only once, no duplicates, no losses producers and consumers use kafka IDEMPOTENET data producer ensure no duplicate data is written consumers commit the transactions atomically
- At most once - messages are delivered to consumers at most once meaning no reties occur. producer sends messages to broker without ack, consumers processes the messages and commits the offset before processing it as successful.
- at-least once - messages are delivered to consumers at least once, if that fails retry happens which may lead to duplicates, producers wait for acknowledgement from the consumers and consumers commit the offset only after successful write example : banking transations, applications that can handle data loss