Talks & Writing

Selected talks, presentations, and written work on technology, product management, and cloud native innovation.

Featured Writing

Azure Monitor dashboards with Grafana in Azure Portal

AKS Engineering Blog

Announcing native Grafana dashboards in Azure Portal for AKS clusters at no additional cost. This integration eliminates the complexity of maintaining separate visualization tools while delivering comprehensive cluster observability with Container Insights, Prometheus, and Azure Monitor metrics out-of-the-box.

Topics: AKS, Grafana, Observability, Azure Monitor

Announcing the CLI Agent for AKS: Agentic AI-powered operations and diagnostics at your fingertips

AKS Engineering Blog

Introducing the CLI Agent for AKS - an AI-powered command-line experience for troubleshooting, optimizing, and operating AKS clusters. Built on open-source HolmesGPT (CNCF Sandbox project) and the AKS Model Context Protocol server, this human-in-the-loop tool brings intelligent agentic workflows directly to your terminal with a focus on security and transparency.

Topics: AI, AKS, Troubleshooting, Open Source, HolmesGPT

HolmesGPT: Agentic troubleshooting built for the cloud native era

CNCF Blog

Introduction to HolmesGPT, an open-source agentic AI framework for root cause analysis in cloud native environments. Co-authored with the Robusta.dev team, this post explores how AI agents can revolutionize Kubernetes troubleshooting through extensible toolsets, natural language prompts, and intelligent diagnostics.

Topics: Cloud Native, AI, Troubleshooting, Open Source

Talks & Videos

KubeCon EU 2024 - Azure Day: AI-assisted Observability & Troubleshooting

KubeCon EU 2024 - Azure Day

Deep dive into the latest advancements from AKS for improved troubleshooting including new AI-based features. This session covers the AKS monitoring and troubleshooting stack, challenges with monitoring, best practices, and includes live demos of Retina for network observability. Co-presented with Pavneet Ahluwalia and Neha Aggrawal.

Topics: AKS, AI, Observability, Troubleshooting, Retina, Network Observability

Enhancing AKS Cluster Troubleshooting

YouTube

Deep dive into AKS cluster troubleshooting techniques covering node saturation metrics for performance optimization, leveraging Kubernetes events as real-time cluster signals, and fine-tuning resource allocation with cluster autoscaler metrics.

Topics: AKS, Troubleshooting, Metrics, Performance, Autoscaling