kubernetes

Kubernetes: Unlocking Cost Efficiency and Maximizing Cluster Potential

Nicolas Narbais

22 Apr 2025 — 6 min read

Kubernetes is a powerful tool that delivers incredible benefits, including improved reliability, scalability, and significant cost savings. However, despite its widespread adoption—often through managed services—many users fail to fully grasp the responsibility it places on them. While reliability is always top of mind, achieving it in a cost-effective manner is less intuitive, especially when using shared clusters managed by other teams.

The question becomes: How can organizations identify key opportunities to optimize their Kubernetes configurations and save thousands of dollars with minimal impact on users?

Here is an example from Delivery Hero, a global food delivery giant:

Increasing teams’ cost visibility has contributed to a 10 percent decrease in Delivery Hero’s cloud costs over 48 days, which is an initial step in a larger FinOps goal of reducing overall cloud costs by 30 percent.

Kubernetes: A Quick Overview of Resource Management

Before diving into optimization strategies, let’s briefly review how Kubernetes handles resources. One key area to focus on is CPU requests and memory requests.

When you create a Pod, the Kubernetes scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. The scheduler ensures that, for each resource type, the sum of the resource requests of the scheduled containers is less than the capacity of the node. Note that although actual memory or CPU resource usage on nodes is very low, the scheduler still refuses to place a Pod on a node if the capacity check fails. This protects against a resource shortage on a node when resource usage later increases, for example, during a daily peak in request rate.

In essence, Kubernetes relies on these requests to determine where to place Pods, regardless of actual usage. If you set CPU and memory requests too high compared to actual usage, you may end up underutilizing your nodes, leading to wasted resources and higher costs.

Think of it this way: If most users are packing items into oversized boxes, the carrier can’t optimize how many boxes it can transport. Similarly, if Kubernetes resource requests are too large, the cluster can’t function as efficiently as possible.

To avoid this, it's crucial to regularly review your resource settings. The goal is to ensure that CPU and memory requests closely align with actual usage—essentially, that your "box" fits your "object" perfectly.

Truck analogy

Imagine we have several people sending items using a delivery truck. Each person is asked to pack their item in a box that fits it just right—this helps us make the most of the available space in the truck.

But what happens if people start using oversized boxes for small items? The equivalent of setting a CPU request way higher than its actual usage.

From the outside, the truck looks full. But if we had x-ray vision, we’d see that the boxes are mostly empty—wasting valuable space.

The person loading the truck (in this case, Kubernetes) assumes it’s full because all the boxes are taking up room, even though there’s actually plenty of unused capacity inside.

This is exactly what happens when a pod’s resource requests (CPU and memory) are set much higher than what the pod actually uses. Kubernetes thinks the node is full, even though there’s still space left unused.

With little effort and resizing each individual box, it is then feasible to improve our resource utilization and maximize the capacity utilization.

The Datadog Insight: Real-Time Usage vs. Requests

Datadog identified this challenge early on. The data below was gathered from over 2.4 billion containers, offering a comprehensive view of real-world container use.

65% of containers use less than half of requested CPU

Datadog’s platform provides a real-time comparison of live usage versus requested resources, enabling teams to track inefficiencies.

In addition, such information is highly granular. This means that it can be grouped by any tag associated to the containers. It could then be relevant to group by team, service or application. This is a great place to get started, identify the top outliers and start advocating for better management of the resources requests within Kubernetes. As a reminder, Delivery Hero managed to save 10% of their cloud bill in just 48 days.

This visibility is valuable, but it’s not always urgent. Just like alert fatigue, cost optimization can become a long-term challenge without immediate pressure. Additionally, Datadog does not offer native monitors for these metrics, and the available data is often limited to short intervals, overlooking application bahviors with longer cycles like monthly or weekly usage.

Going Beyond the Native Datadog Approach

Even though Datadog offers useful out of the box insights, Kubernetes cost optimization requires more proactive management. That’s where a good dashboard in Datadog can help.

In the new community hub, we offer you a custom Datadog dashboard that track Kubernetes resource usage over extended periods, such as weeks or months. These dashboards that leverage metrics not only highlight usage trends which help to take better decision based on past behaviors but also allow you to set monitors that alert cluster owners when new deployments or daemonsets underconsume resources. This approach enables teams to tackle inefficiencies proactively and in real time, preventing long-term waste.

How to get started

Getting started is easy:

⭐ Star our repository to support the project! And, if you like the dashboard, don't forget to upvote it on share.dataiker.com!

Then, simply use the Terraform module below to deploy the dashboards. By default, we added a price of $10 for each CPU core per month to give you a rough estimate of savings.

module "kubernetes_capacity_planning_dashboard" {
  source = "git::https://github.com/nxnarbais/datadog-share.git//assets/kubernetes_capacity_planning_dashboard?ref=0.11"

  teams = ["team:dataiker"]
  title_suffix = " - Managed by Terraform"
  managed_by_terraform = "managed_by:terraform"

  cost_per_cpu_core_per_month = 10
}

Conclusion: Empowering Teams with the Right Tools

Optimizing Kubernetes configurations for cost savings isn’t just about identifying waste—it's about empowering teams to make informed decisions and take action. With the right insights, tools, and ongoing support, organizations can save significantly while still maintaining the performance and reliability that Kubernetes offers.

At Dataiker, we make this process simple, offering visibility into Kubernetes resource usage and cost optimization potential without disrupting workflows. Whether you’re just starting with Kubernetes or looking to fine-tune your resource management, we’re here to help you unlock the full potential of your cloud infrastructure.