The Forever Log: Why Tiered Storage Changes the Math (and the Risks)

Tiered Storage turns Kafka into a system of record with infinite retention. But with great power comes new nightmares. Anatoly Zelenin and Bryan De Smaele discuss why replication is not backup and why a single deleted topic can wipe out terabytes of data.

This interview is an edited adaptation of the conversation on Cymo’s EDA Podcast. Please note that this is not a verbatim transcript; it has been condensed and structured to convey the most important architectural insights.

Bryan De Smaele (Cymo): Welcome, Anatoly. We often talk about Kafka as a tool for agility, but today I want to dig into the operator’s view. You work heavily with the finance sector. In my experience, banks were actually some of the early adopters of Kafka outside of the tech world. What drove that early adoption?

Anatoly Zelenin (DataFlow Academy): Thanks, Bryan. You are right. Banks were essentially forced to adopt it early because of a simple math problem.

They have these massive legacy core systems. Mainframes or giant Oracle monoliths. The problem is, every time you query a mainframe, IBM charges you MIPS. And on the Oracle side, those databases are often already running at capacity. If you hook a modern mobile app directly to them, you will either bankrupt yourself with MIPS charges or crash the database completely.

So, the driver was offloading. We extract data once from the core system into Kafka, and from there, we can fan it out to applications almost for free. But once that data is in Kafka, the question immediately becomes: Why delete it?

The Financial Shift: Tiered Storage

Bryan: That brings us to Tiered Storage. Before this, keeping data forever in Kafka was technically possible but financially insane.

About Bryan De Smaele: Bryan De Smaele is a technology entrepreneur and architect specializing in Event-Driven Architecture and Apache Kafka. As Co-Founder of Cymo & Kannika, he designs scalable data streaming solutions for complex digital transformations. A frequent international speaker and community organizer, Bryan bridges the gap between sophisticated software architecture and real-world business value.

Anatoly: Exactly. Before Tiered Storage, Kafka was an expensive, fast disk. Because storage and compute were coupled, if you wanted to store 100 TB of history, you had to buy the servers to support that storage, even if you didn’t need the CPU power.

Tiered Storage fundamentally changes the financial math. It decouples compute from storage. You keep your "hot" data (the last few hours or days) on expensive SSDs on the broker. Everything else, months or years of history, gets offloaded to S3 or an object store.

client-quote-img

Before Tiered Storage, Kafka was a very expensive storage system…​ With Tiered Storage, you don’t pay the premium for high-performance SSDs, but just object storage prices.

Anatoly Zelenin
Founder, DataFlow Academy

Bryan: But I often hear people ask: "Why should I keep it all in Kafka? I can simply load it into my data platform and use it from there."

Anatoly: That is the common counter-argument. But Kafka can function as the system of record for both operational and analytical data. It gives you a single point of access for both old and new data. Additionally, this allows you to have a single point of access for both old and new data.

This enables the Forever Log. Suddenly, Kafka isn’t just a pipe. It becomes a system of record. But this is where more challenges start.

The Backup Paradox: Replication ≠ Backup

Bryan: If Kafka is becoming a system of record, the reliability standards change. But many teams still operate under the assumption that "Kafka doesn’t need backups because it has replication."

client-quote-img

It’s the constant story of High Availability against actual Disaster Recovery.

Bryan De Smaele
Co-Founder, Cymo

Anatoly: That is a dangerous fallacy. Saying "I have Replication Factor 3, so I don’t need backups" is like saying "I have RAID 5, so I don’t need backups." We learned in the 90s that RAID is not a backup.

Replication provides high availability. It protects you if a server crashes. It does not protect you from human error.

If someone runs a script that deletes a topic in Production instead of Dev, replication just ensures that the data is deleted from all three brokers instantly.

About Cymo: Cymo is a technology consultancy specializing in Event-Driven Architecture (EDA) and real-time data streaming. Based in Belgium, Cymo is also the creator of Kannika, the first production-grade backup solution for Apache Kafka. Cymo doesn’t just build streaming platforms – they ensure that enterprise data remains resilient and recoverable, no matter the scale.

With Tiered Storage, your backup strategy actually gets a bit more manageable, but you have to be deliberate. It is also important to note that these raw backups can be inflexible:

  • Cold Data: This is easy. It’s already in S3. You can use standard object-store replication to secure it.

  • Hot Data: This is your risk window. Data sits on the broker until the segment rolls over and uploads to S3. You are still vulnerable during that window.

  • Restoring certain records: Putting something back on the exact topic it came from is manageable, but pinpointing certain data to restore is hard.

The Hidden Trap: Schema IDs and Disaster Recovery

Bryan: Let’s talk about the metadata. You mentioned that even if you backup the logs, you might still be unable to read them due to the Schema Registry.

Anatoly: This is the specific nightmare that keeps me up at night. The Schema Registry assigns an ID to every schema. That ID is embedded in the Kafka message. The problem is that the ID has no semantic meaning. It’s just a counter.

Bryan: And what happens if that mapping is lost?

Anatoly: I can give you a real example. I had a customer where the _schemas topic was removed by accident.

It was a catastrophe. The data was still there. Terabytes of it sitting in the topics. But because the _schemas topic was gone, they lost the mapping. They had messages saying "I am Schema ID 5," but they had no idea what "ID 5" looked like.

client-quote-img

The Schema IDs are just numbers that are incremented. There is no semantic meaning…​ If you lose your schema topic, you have little chance to ever restore the information.

Anatoly Zelenin
Founder, DataFlow Academy

Bryan: That is terrifying.

Anatoly: It is. You have to treat your _schemas topic as a very critical piece of data. If you are doing Disaster Recovery, you cannot just spin up a new Registry and expect it to work. The IDs will not match (ID 5 in the old cluster might be ID 1 in the new one).

If you don’t backup and restore that specific topic byte-for-byte, your terabytes of data might become binary garbage.

Conclusion

Bryan: So, Tiered Storage opens the door to the forever-log, but operational maturity has to catch up?

Anatoly: Exactly. We are moving from "moving data" to "keeping data."

  1. Tiered Storage makes the economics work.

  2. Backups are mandatory if you care about your historic data

  3. Schema Disaster Recovery is the hidden trap you need to plan for before the fire starts.

Bryan: Thanks for the reality check, Anatoly.

Anatoly: Anytime. See you at the next Kafka Summit. Or on LinkedIn (Bryan and Anatoly).

About Anatoly Zelenin
Hi, I’m Anatoly! I love to spark that twinkle in people’s eyes. As an Apache Kafka expert and book author, I’ve been bringing IT to life for over a decade—with passion instead of boredom, with real experiences instead of endless slides.

Continue reading

article-image
Schema Management in Kafka

In this post, you'll learn how explicit schemas help you avoid potential chaos in Kafka and how schema registries support this.

Read more
article-image
Kafka in Banking: A Bridge Between Worlds – for Long-term Economical Projects

Core banking systems handle the most important processes in banking. The problem: These inflexible giants rarely harmonize with the wishes of today's customers. Systems are needed that connect the old with the new. Many banks rely on Apache Kafka for this. Why?

Read more