In this chapter we'll dive a little deeper into Kafka, a distributed message broker.
git clone https://github.com/haf/vagrant-kafka.git
cd vagrant-kafka
vagrant up
#!/bin/bash
# create myid file. see http://zookeeper.apache.org/doc/r3.1.1/zookeeperAdmin.html#sc_zkMulitServerSetup
if [ ! -d /tmp/zookeeper ]; then
echo creating zookeeper data dir...
mkdir /tmp/zookeeper
echo $1 > /tmp/zookeeper/myid
fi
$HOME/kafka_2.11-0.10.1.1/bin/zookeeper-server-start.sh /vagrant/config/zookeeper.properties > /tmp/zookeeper.log &
#!/bin/bash
$HOME/kafka_2.11-0.10.1.1/bin/kafka-server-start.sh /vagrant/config/server$1.properties &
Note how the Vagrantfile assigns specific IPs to each node.
Kafka comes with system tools that support operating the broker.
- kafka-acls – "Principal P is [Allowed/Denied] Operation O From Host H On Resource R" – KIP-11.
- kafka-replay-log-producer – Consume from one topic and replay those messages and produce to another topic.
- kafka-configs – Add/Remove entity config for a topic, client, user or broker,
e.g.
topics
's configurationdelete.retention.ms
. - kafka-replica-verification – Validate that all replicas for a set of topics have the same data. Runs continuously. Also see Replication Tools.
- kafka-console-consumer – The console consumer is a tool that reads data from Kafka and outputs it to standard output.
- kafka-run-class – Used to invoke "classes" from the
kafka.tools
namespace. Most of these tools boil down to a "class call" like this. - kafka-console-producer – A console producer. Call with
--broker-list
and--topic
. - kafka-server-start – Used by the systemd units.
- kafka-server-stop – Used by the systemd units.
- kafka-simple-consumer-shell – A low-level tool for fetching data directly from a particular replica.
- kafka-consumer-perf-test – A tool to check performance of your cluster. Use together with a setup.
- kafka-streams-application-reset – A tool that resets the position a stream processing node has.
- kafka-mirror-maker – Continuously copy data between two Kafka clusters.
- kafka-topics – Create, delete, describe, or change a topic. Can also set configuration for topics, like the above mentioned retention policy.
- kafka-preferred-replica-election – A tool that causes leadership for each partition to be transferred back to the 'preferred replica', it can be used to balance leadership among the servers.
- kafka-verifiable-consumer – consumes messages from a specific topic and emits consumer events (e.g. group rebalances, received messages, and offsets committed) as JSON objects to STDOUT.
- kafka-producer-perf-test – A tool to verify producer performance with.
- kafka-verifiable-producer – A tool that produces increasing integers to the specified topic and prints JSON metadata to stdout on each "send" request, making externally visible which messages have been acked and which have not.
- kafka-reassign-partitions – This command moves topic partitions between replicas.
Resize the number of partitions in the producer-consumer in F# example. Ensure the consumer can read from them all (and that the producer produces to them all).
- Can we make the number of partitions even larger?
- What downsides are there with resizing an existing topic?
Here are WebScale™ Technologies Inc, we want horisontal scale-out. Make it so that you have two consumers that share the incoming messages.
- How many consumers can we have competing?
- Can we have competing consumers with only a single partition?
- What kafka-tool did you use for this?
Make your producer produce 1000 unique messages per second. Ensure your consumer group successfully reads these messages.
Kill one Kafka node. Your message rate should not drop.
Start the node again. What happens?
Read Jepsen: Kafka and let's discuss.
- How many ISRs are needed to be 'safe'?
- What are the wake-me-up-at-night boundaries we should alert on?
- Under what circumstances would an unclean leader election be OK for your system?
- Make a Logary metric that gives the rate of sending
- Extend your metric to log an alert when it drops below 800 messages per second.
What would it take to have idempotent producers?
Alternatives:
- Save unique message Id, store all Ids in B-Tree
- Number producers, keep high-water-mark under which all messages received are dropped, number messages. Store producer-id and high-water-mark either per topic-partition, per topic or per producer.
- Number producers, assign topic-partition to a producer to produce into similar to consumer groups' do with topic-partitions.
Discuss pros- and cons- of each.
Let's make a mirror of your Kafka cluster to a single machine separate cluster.
- Is a mirror Kafka setup identical in all ways that matter?
dc1
is experiencing a netsplit from your country. You can still SSH via AWS in London.dc2
has a mirror Kafka. What steps do you take to failover? Can you do it without disrupting reads? Without disrupting writes?