April 18, 2024

Reading Time:


OpsVerse Engineers Discuss Their Favorite Talks from KubeCon 2024


As a company built with open-source, community, and cloud-native technologies at its core, OpsVerse loves attending (and speaking at) CNCF events. And with KubeCon + CloudNativeCon Europe ending a couple of weeks ago, we figured now’s the perfect time to ask some of our engineers which talks they enjoyed the most across KubeCon and in the parallel co-located community days. Here’s what they said:

Shivtej Narake – Founding Engineer

Architecting Growth: Scaling Tactics for Prometheus Metrics Collection – Arthur Silva Sens & Nicolas Takashi, Coralogix (YouTube)

Both of my favorite talks were part of Observability Day. The reason I’ve picked them is because I’ve been working on very similar problems as we’ve scaled ObserveNow over the last few years.

Finding Needles in the Observability Haystack – Ivan Nečas & Katya Gordeeva, Red Hat (YouTube)

The term “Observability Intelligence” really resonated with me here, and as I have been playing around with data science libraries, Obsinthe is a tool that I’ll definitely be exploring more going forward.

Aravind N – Founding Engineer

Demystifying Argo Workflows: An Architectural Deep Dive – Darko Janjić, Pipekit & Becky Pauley (YouTube)

This is a perfect talk for beginners that intuitively explains what Argo Workflows is: The architecture it is structured on, the building blocks, and functionalities it provides. If anyone is looking to explore Argo Workflows, this talk paired with the project documentation is a really good starting point.

Mastering Argo Workflows at Scale: A Practical Guide to Scalability Excellence – Tim Collins, Pipekit & Alec Stansell, Fetch Analytics (YouTube)

We use Argo Workflows frequently within OpsVerse’s DeployNow to handle the orchestration of complex workflows in Kubernetes (and we’ve also been experimenting with using it as a general-purpose workflow engine) as it allows users to define a sequence of tasks and dependencies between them, enabling the automation of multi-step processes. I liked this talk because it addresses scaling issues and provides a practical checklist to navigate the challenges of scaling Argo Workflows, ensuring the pipeline remains resilient.

The most interesting data point from the talk for me was when they spoke about scaling Argo Workflows from single-pod deployment to handling over 18,000+ pods per workflow that handle more than 500 billion data records!

Amogh Prakash – Founding Engineer

Dynamically Tuning Pods: Leveraging Time Series ML Models with KubeFlow – Christopher Nuland (YouTube)

Cloud cost optimization is something that’s crucial to most organizations and has been a focus area for us at OpsVerse as well. I really enjoyed this talk because it presented an interesting approach to combining traditional statistical methods with machine learning models to dynamically predict and adjust Kubernetes resource usage. Like the speaker, I also believe in the idea that machine learning methods should be used to complement traditional methods rather than replace them outright. It’s a recipe for success when building new systems today.

Navin Pai – Founding Engineer

Real-World Sampling – Lessons Learned After Reducing ~80% of Our O11y Costs – Juraci Paixão Kröhling & Alexandre Magno Prado Machado (YouTube)

Sampling is a good example of a process that sounds quite straightforward in theory, but opens up a can of worms in practice – especially as you scale (which is, ironically enough, also when sampling makes the most sense). We’ve had our fair share of (mis)adventures when building out sampling as a feature for ObserveNow, so it was fun to hear folks talk about a lot of the same roadblocks we encountered and their approaches to solving them. In some cases, we ended up with similar solutions; in others, we chose other tradeoffs. But overall, it was a really good talk for folks interested in exploring sampling and the challenges it presents.

AI-Assisted Runbooks – Instigating Precision and Efficiency in Kubernetes Operations – Vinothini Raju (YouTube)

Since we started building out Aiden, our North Star has always been to see how we can leverage GenAI to build out human-in-the-loop workflows to augment information that SREs and DevOps teams have access to, and then build out a DevOps copilot that can learn from the environment it operates in and simultaneously develop a symbiotic connection with the engineering team. The reason I chose this talk is because it discussed similar workflows for GenAI in the tangential domain of ITOps using a low-code platform (Paddle), and it was nice to hear from others in the industry on how they are approaching GenAI use cases where solution-verification is of utmost importance.

If you’d like to learn more about the open-source based, fully managed, cloud agnostic, DevOps tool stacks that OpsVerse offers, read more about ObserveNow, DeployNow, and OpsVerse One. If you want to get a sneak peek at the exciting future of DevOps, check out our latest GenAI-based DevOps copilot, Aiden.


Written by Navin Pai

Subscribe to the OpsVerse blog

New posts straight to your inbox