metrics

Taming Datadog Metrics: A Simple Way to Optimize Your Metrics Costs

Nicolas Narbais

13 Jan 2025 — 6 min read

In today’s cloud-native environments, observability is crucial. Services like Datadog give us the power to monitor and analyze a wide array of metrics, ensuring that we’re always on top of our systems' performance. However, as with any powerful tool, there's a cost. And for Datadog users, that cost is often tied to the cardinality of the metrics you're tracking. This is why in this article we review a feature that put you back in control of your cost.

Teaser: Use MwL to optimize your custom metrics

Metrics in Datadog Can Be Wild and Need to Be Tamed

When you start using Datadog, you quickly realize that metrics can get out of control. Each new dimension you add through tags—whether it's the region, service, or container ID—often adds another layer of cardinality. While these extra details are invaluable for monitoring, they can exponentially increase the number of timeseries Datadog has to store and manage, leading to an ever-increasing bill.

However, there’s good news. By focusing on optimizing how you store and process metrics, you can significantly reduce these costs. And one of the most powerful mechanisms for doing this is MwL, or Metrics without Limits.

A Quick Recap on Cardinality

Before we dive into the optimization techniques, let’s first take a moment to review the concept of cardinality in the context of metrics.

Cardinality refers to the number of unique combinations of metric names, tags, and tag values that Datadog stores. The more unique combinations you have, the higher your cardinality, which can lead to higher costs. This is because Datadog charges based on the number of distinct timeseries it has to manage. As the number of metrics grows, so do your bills.

If this still feels a bit vague, refer to our previous article reviewing the core concepts of custom metrics with examples.

Now, let’s talk about taming that wild cardinality.

The Magic of MwL: Optimizing Cardinality

MwL (Metrics without Limits) is a strategy designed to help you optimize the overall usage of your metrics by playing with cardinality. The core idea behind MwL is simple: by reducing the cardinality of the metrics you store, you can cut your costs without sacrificing essential data.

In most cases, it's not the ingestion of data that breaks the bank. Datadog ingestion rates are often negligible compared to the massive costs of indexing and storing those metrics. That’s why we’ll focus primarily on the indexing cost in this post. For simplicity’s sake, we’ll assume that ingestion is not a major factor in most examples. However, it's important to keep in mind that ingestion can be a factor in specific use cases, but it's often not the primary concern.

So, how does MwL actually work?

At its core, MwL transforms your original source data into a distribution metric. Instead of tracking each individual metric value separately, you can aggregate the data into a distribution. This allows you to have more control over the types of tags you track and how they are stored. With distributions, you can represent a higher level picture with fewer individual timeseries.

How MwL Works in Practice

The real power of MwL comes when you apply it to the tags that are used within your metrics. Datadog provides a unique tool to check which tags are actively used in your queries—whether it’s for monitors, dashboards, or notebooks.

This active tag usage insight gives you an extremely valuable guideline for optimizing your costs. By understanding which tags are actively queried and which are not, you can make informed decisions about which tags to keep and which to remove.

One key aspect to note: if you start querying tags yourself, you can influence this data, as your queries will count as “active” usage. So trust the data! The real value here is that you now have a clear understanding of where you can cut unnecessary tags, which directly impacts your cardinality and, subsequently, your costs.

This gives you the power to make data-driven decisions on tag management and reduces unnecessary overhead. Instead of blindly storing every tag, you can focus on the tags that matter and actively query them, optimizing your cost-to-value ratio.

The Power of Tag Analysis

When you analyze which tags are actively queried, you're essentially auditing your tags for relevance. Tags that aren’t actively used can often be removed or consolidated into broader categories, helping reduce the overall cardinality. In turn, this reduces the number of timeseries Datadog needs to store, ultimately bringing down your indexing costs.

Remember that Datadog’s pricing model is based on the number of timeseries indexed. So, every unnecessary tag may add to your costs. By continuously monitoring the tags that are actively used, you can make more informed decisions about which ones to keep.

The Trade-Offs: Pros and Cons of MwL

While MwL is a powerful tool, it does come with its trade-offs.

The Pros:

Cost Reduction: By reducing the cardinality of your metrics, you can significantly lower your Datadog costs.
Greater Control: You have more control over your metrics and can choose exactly which data to track and store.
Clearer Insights: By focusing only on the tags that matter, you gain clearer insights into your infrastructure without the noise of unnecessary data.

The Cons:

Increased Complexity: MwL adds an extra layer of complexity to your metrics management. For some users, this added complexity might be a deterrent, especially if you're not familiar with the ins and outs of metric management.
Reactive Approach: MwL is more reactive than proactive. Instead of addressing cardinality issues at the source (such as by refining your service architecture), you're making adjustments after the fact. While this can work well, it doesn’t solve the root cause of high cardinality in your system.
Risk of Over-Optimization: There's always a risk that, in your attempt to reduce costs, you may over-optimize and remove tags that are actually useful for troubleshooting or monitoring. This is a balancing act that requires careful consideration.

One of the biggest challenges with MwL is making sure that your total cost savings do not go overboard. After all, if the savings from optimizing cardinality aren’t greater than the original indexing cost, you might not be getting as much value as you think.

Best Practices for Implementing MwL

To get the most out of MwL, there are a few best practices to keep in mind:

Focus on the top spenders: Prioritize the custom metrics that impact the most your bill. The additional complexity you may introduce is outweigh by the economical impact on your top spenders. Note that not all the top custom metrics have a need to use MwL but if needed the impact will be high.
Prioritize Actively Used Tags: Focus on the tags that are actively used in your queries. By removing inactive tags, you can quickly reduce cardinality without losing critical insights.
Review Regularly: Tag usage can change over time. Regularly reviewing which tags are active will ensure that you're not storing obsolete data.
Monitor Impact: Always monitor the impact of any changes you make. Check your Datadog costs and performance before and after adjustments to ensure you're hitting the right balance.

Wrapping Up

Managing cardinality in Datadog doesn’t have to be overwhelming. By applying techniques like MwL, you can take control of your metrics costs and optimize your Datadog usage. While this approach requires careful thought and ongoing attention, the potential cost savings and operational benefits make it well worth the effort.

If you're ready to start optimizing your Datadog setup and reducing your costs, our SaaS solution, Dataiker, can help you implement these strategies quickly and efficiently. Stay tuned for more insights on how we’re simplifying Datadog management for teams like yours!