The Best Thing About Apache Kafka? It's Boring!

How a system that simply works helps the German media landscape count millions of readers daily. An interview with Felix Sponholz.

By Anatoly Zelenin

You work for a company that virtually everyone in Germany indirectly deals with, but that hardly anyone really knows. What does INFOnline do?

We are a company from Bonn and we provide the online equivalent of audience measurement for television. We do this for websites, apps, and other portals in the German-speaking region. Our goal is to establish comparability in the online market.

So you’re something like Google Analytics?

We are absolutely not like Google Analytics. While we also offer reach measurement to companies, we adhere to all data protection regulations in Germany and Europe.

How do you measure reach in a data privacy-compliant way?

Without going into too much detail: We offer sensors, such as counting pixels, for our customers' websites. They immediately report to us when there’s a request on the page, meaning a user interacts with it. This means we don’t know who is browsing, but we do know for certain that a person is moving around on the website.

I have fun with Kafka. It just works—and I love things that just work.

Felix Sponholz

Software Developer, INFOnline GmbH

What do you need Kafka for?

We process very large amounts of data. We need to capture, process, expand, and adapt every interaction a user makes on a website. With Kafka, we can manage these data streams in real time.

What kind of data volumes are we talking about?

About 15,000 data points per second. Everyone can calculate for themselves how many that is per hour and per day.

Felix Sponholz is an experienced software developer at INFOnline GmbH and is co-responsible for the operation and further development of ETL pipelines used for analysis and reporting for a significant part of the German-speaking online media landscape. His particular expertise lies in the development of scalable data processing systems that enable efficient collection and analysis of large amounts of data.

When you process so much data, which insights do you gain?

Even though linear television has lost some of its importance, we see that 8:15 PM is still a familiar time. People go online heavily at that time. They inform themselves after work, look for inspiration for a movie night, or check what the weather will be like tomorrow. From the data, we can see that at 8:15 PM, Germans are still getting cozy.

To go into a bit more detail: Why do you use Kafka?

This may sound relatively simple, but it’s important: Kafka doesn’t give us headaches. Kafka works. We can securely move large amounts of data at high speed from the request, through the ETL process, to our customers' dashboards. Kafka is the most robust system in our data pipeline.

That might be surprising: It’s often said online that Kafka is quite complex?

We also perceived it that way during our initial research. What we can say is: Kafka may be relatively complex in the setup, especially when integrating it into a large IT infrastructure. However, if this is done carefully, Kafka doesn’t cause problems down the line. You just have to be confident about the initial effort. We do have an advantage, though: We operate in a very static framework. If things were more volatile, if we needed to spontaneously add components more often, Kafka could definitely be too complex for some. But in our use case, Kafka is just right.

Why is real-time data so important for you?

There are two main reasons: First, it’s cool for our customers when they can see live in the dashboard what’s happening on their platforms. There’s movement in it, it feels lively.

Second, I mentioned the data volumes. In our processing workflow, the information is aggregated at multiple stages. 15,000 data points per second are already a challenge for other systems. A lot happens simultaneously. But real-time processing creates a consistent data flow. If we could only process data packages every five minutes, it would be a much bigger chunk. Instead of a steady 15,000 data points, nearly four million would flow into the systems at once, often overloading them. Kafka accelerates and simplifies our process.

What’s the worst case in terms of real-time processing?

With Kafka, we’re talking about a 5 to 6 second delay. That’s a completely different value than in some of our other systems, where you can expect 10 minutes or more of delay.

What role does Kafka have in the software landscape?

I can only repeat: We don’t have headaches with Kafka. Others—consultants and software vendors—have promised us the same, and they didn’t keep their word. So we were initially skeptical about Kafka (and also somewhat scarred), but unlike some other systems, it hasn’t disappointed us.

Couldn’t you then implement more or even all processes with Kafka?

We would love to use it in more places, but migration is complex and not every client is willing to pay for that. That’s why our approach is to build new solutions on Kafka wherever possible—and to modernize our existing systems step by step.

What’s actually more difficult: receiving or processing data?

In the overall context? Receiving! For us, that’s because of data privacy and the use of ad blockers. With Kafka itself, both are painless.

It almost sounds like: The biggest benefit of Kafka is that it’s boring.

I have fun with Kafka, I don’t find it boring. It just works—and I love things that just work.

INFOnline GmbH is the leading provider for Digital Audience Measurement in Germany. As the central partner for the online industry, the company offers standardized, data protection-compliant usage measurement of websites according to IVW guidelines. With 35 employees, INFOnline combines high expertise and customer orientation to provide reliable, GDPR & TTDSG-compliant performance metrics for the online market.

What feature would you wish for in Kafka?

For our use case, the solution is nearly perfect. But for working with the system, I would wish for a better user interface. There’s still room for improvement here.

We work together now and then. If I may ask for feedback: What do you gain from our collaboration?

In the end, you took away the fear we had read about online. The foundational training was important so we could use Kafka efficiently and not accidentally make the complex setup even more complex. Equally relevant: You were also there for us afterward, helping when there was a problem. The collaboration made it possible for us to handle Kafka’s setup.

Want Kafka that "just works"?

Let’s take the headaches out of your setup. We always welcome professional exchange.

Schedule a discovery call

Hello, I’m Anatoly! I love to spark that twinkle in people’s eyes. As an Apache Kafka expert and book author, I’ve been bringing IT to life for over a decade—with passion instead of boredom, with real experiences instead of endless slides.

The Best Thing About Apache Kafka? It's Boring!

Felix Sponholz

Software Developer, INFOnline GmbH

Want Kafka that "just works"?

Contact

Address

E-Mail Address

Links

The Best Thing About Apache Kafka? It's Boring!

Felix Sponholz

Software Developer, INFOnline GmbH

Want Kafka that "just works"?

Share this post

Contact

Address

E-Mail Address

Links