The Best Thing About Apache Kafka? It's Boring!

How a system that simply works helps the German media landscape count millions of readers daily. An interview with Felix Sponholz.

By Anatoly Zelenin

You work for a company that virtually everyone in Germany indirectly deals with, but that hardly anyone really knows. What does INFOnline do?

We are a company from Bonn and we provide the online equivalent of audience measurement for television. We do this for websites, apps, and other portals in the German-speaking region. Our goal is to establish comparability in the online market.

So you’re something like Google Analytics?

We are absolutely not like Google Analytics. While we also offer reach measurement to companies, we adhere to all data protection regulations in Germany and Europe.

How do you measure reach in a data privacy-compliant way?

Without going into too much detail: We offer sensors, such as counting pixels, for our customers' websites. They immediately report to us when there’s a request on the page, meaning a user interacts with it. This means we don’t know who is browsing, but we do know for certain that a person is moving around on the website.

client-quote-img

I have fun with Kafka. It just works - and I love things that just work.

Felix Sponholz
Software Developer, INFOnline GmbH

What do you need Kafka for?

We process very large amounts of data. We need to capture, process, expand, and adapt every interaction a user makes on a website. With Kafka, we can manage these data streams in real-time.

What kind of data volumes are we talking about?

About 15,000 data points per second. Everyone can calculate for themselves how many that is per hour and per day.

Felix Sponholz is an experienced software developer at INFOnline GmbH and is co-responsible for the operation and further development of ETL pipelines used for analysis and reporting of a significant part of the German-speaking online media landscape. His special expertise lies in the development of scalable data processing systems that enable efficient collection and analysis of large amounts of data.

When you process so much data, which insights do you gain?

Even though linear television has lost some of its importance, we see that 8:15 PM is still a learned time. People go online heavily at that time. They inform themselves after work, look for inspiration for a movie night, or check what the weather will be like tomorrow. From the data, we can see that at 8:15 PM, Germans are still getting cozy.

To be more specific: Why do you use Kafka?

This may sound relatively simple, but it’s important: Kafka doesn’t give us headaches. Kafka works. We can safely bring large amounts of data at high speed from the request through the ETL process to our customers' dashboards. Kafka is the most robust system in our data pipeline.

This might be surprising: It’s often said online that Kafka is quite complex?

We also perceived it that way during our initial research. What we can say is: Kafka may be relatively complex in the setup, especially when integrating it into a large IT infrastructure. However, if this is done carefully, Kafka doesn’t cause problems down the line. You just have to be confident about the initial effort. We do have an advantage, though: We operate in a very static framework. If things were more volatile, if more spontaneous additions were needed, Kafka might be a bit too complex for some. In our area of application, however, Kafka is exactly right.

Why is real-time data so important for you?

There are two main reasons: First, it’s cool for our customers when they can see live in the dashboard what’s happening on their platforms. There’s movement in it, it feels lively. Second, I mentioned the data volumes. In our processing workflow, the information is aggregated at multiple stages. 15,000 data points per second are already a challenge for other systems. A lot happens simultaneously. But real-time processing creates a consistent data flow. If we could only process data packages every five minutes, it would be a much bigger chunk. Instead of a constant 15,000 data points, nearly four million would suddenly flow into the systems, often overloading them. Kafka accelerates and simplifies our process.

What’s the worst case in terms of real-time processing?

With Kafka, we’re talking about a 5 to 6 second delay. That’s a completely different value than in some of our other systems, where you can expect 10 minutes or more of delay.

What’s Kafka’s standing in the software landscape?

I can only repeat: We don’t have headaches with Kafka. Other consultants and software vendors have promised us the same - and they didn’t keep their word. Therefore, we were initially skeptical about Kafka (and still scarred), but unlike some other systems, it hasn’t disappointed us.

Couldn’t you then implement more or even all processes with Kafka?

We would like to use it in more places, but the migration is complex and not every customer pays for that. So our approach is to build new solutions on Kafka whenever possible - and modernize our existing systems gradually.

What’s actually more difficult: receiving or processing data?

In the overall context? Receiving! For us, this is due to data protection and the ad blockers being used. With Kafka itself, both are pain-free.

It almost sounds like: The biggest advantage of Kafka is that it’s boring.

I have fun with Kafka, I don’t find it boring. It just works - and I love things that just work.

INFOnline GmbH is the leading provider for Digital Audience Measurement in Germany. As the central contact for the online industry, the company offers standardized, data protection-compliant usage measurement of websites according to IVW guidelines. With 35 employees, INFOnline combines high expertise and customer orientation to provide reliable, GDPR & TTDSG-compliant performance values for the online market.

What feature would you wish for in Kafka?

For our use case, the solution is nearly perfect. But for working with the system, I would wish for a better user interface. There’s still room for improvement here.

We are working together for a while now, so I’d like to ask for feedback: What do you gain from our collaboration?

In the end, you took away the fear we had read about online. The fundamentals training was important so we could use Kafka efficiently and not accidentally make the complex setup even more complex. Equally relevant: You were there for us afterward, helping us when there was a problem. The collaboration has made Kafka’s setup manageable for us.

Anatoly Zelenin teaches Apache Kafka to hundreds of participants in interactive training sessions. His clients from the DAX environment and German mid-sized companies have valued his expertise and inspiring approach for over a decade. In addition to being an IT consultant and trainer, he also explores our planet as an adventurer.