'5% utilization is a math fail': Millions of GPUs worth billions are mostly sitting idle, report finds
Date:
Tue, 21 Apr 2026 18:15:00 +0000
Description:
Most companies massively overprovision AI infrastructure, leaving GPUs and CPUs underutilized, while rising costs expose inefficiencies driven by fear and poor automation.
FULL STORY ======================================================================Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Threads Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Tech Radar Pro Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Become a Member in Seconds Unlock instant access to exclusive member features. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over. You are
now subscribed Your newsletter sign-up was successful Join the club Get full access to premium articles, exclusive features and a growing list of member rewards. Explore An account already exists for this email address, please log in. Subscribe to our newsletter Most AI GPUs run at shockingly low
utilization across production systems Companies are paying for twenty times more GPU capacity than needed Overprovisioning is rising sharply instead of improving year after year Companies across the tech industry are racing to
buy massive amounts of AI infrastructure, but most of it does barely any useful work at all.
A report from Cast AI, based on tens of thousands of Kubernetes clusters across AWS, Azure, and GCP, found that average GPU utilization sits at just 5%. Many teams deploy sophisticated AI tools to manage their applications,
yet those same tools are not used to optimize the underlying infrastructure. Article continues below You may like Europes sovereignty ambitions require smarter infrastructure AI workloads are increasing wasted cloud spend for the first time in 5 years but AI governance teams might be a solution GPU price hikes are getting out of hand going by a new report The numbers are getting worse, not better Organizations pay for roughly 20x more GPU capacity than their workloads actually use at any given moment.
The numbers come from direct measurements of production clusters and millions of compute resources before any optimization was applied.
"This is the third year we've published this report. The numbers are worse," said Laurent Gil, co-founder and President of Cast AI. "CPU utilization fell to 8%, down from 10%. Memory dropped from 23% to 20%."
The report also measured something called overprovisioning, which is the gap between what workloads actually need and what teams allocate to them. Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting
your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
CPU overprovisioning rose from 40% to 69% year over year, while memory overprovisioning now stands at 79%.
This means organizations reserve nearly twice as many CPU resources and four times as much memory as their workloads actually consume.
In short, organizations pay for infrastructure that their workloads do not even request, and the trend is accelerating instead of improving. What to
read next What if we treated the Nvidia GB10 as an employee: AI could remove reporting roles entirely from businesses with thousands of job losses, here's how this reviewer did it Is AI at work actually helping? Major survey claims many firms see no obvious benefit despite billions in investment Intel and SambaNova introduce a hardware system combining GPUs, RDUs, and CPUs
The situation gets even more expensive when comparing CPU and GPU costs directly. A CPU core sitting idle costs only cents per hour, but a GPU
sitting idle costs dollars per hour.
For the first time since EC2 launched in 2006, GPU prices are rising instead of falling.
In January 2026, AWS raised H200 Capacity Block prices by 15%, citing supply and demand, which broke a two-decade precedent.
"At 5% utilization, the math doesn't work," the report states. The hoarding instinct makes sense because lead times are long, yet that same hoarding
feeds the scarcity loop that drives prices even higher.
Not every cluster performs this badly, and one organization hit 49% utilization on H200s and 30% on H100s, well above the 5% average.
The difference comes down to automation rather than luck or better hardware. The tools to fix this already exist, including automated rightsizing, GPU sharing or time slicing, and Spot management.
However, most teams never get there because overprovisioning feels safer than running out of capacity, but that safety comes at a steep price.
The teams that closed the gap stopped treating resource efficiency as a manual, one-time task and started treating it as an automated, continuous process.
But Cast AI data reveals that most companies seem willing to keep paying
large fees rather than change their habits. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.
======================================================================
Link to news story:
https://www.techradar.com/pro/5-utilization-is-a-math-fail-millions-of-gpus-wo rth-billions-are-mostly-sitting-idle-report-finds
--- Mystic BBS v1.12 A49 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)