How a One-Line Bug Cost Me $159 in a Single Night

I pushed a small change to my Spring Boot service running on Amazon ECS, went to bed, and woke up to 10 budget alert emails on my phone.

One bug. One night. $159.79 for 238 GB of logs.

Here's the full story and what I have in place now to prevent it.

How It Happened

My service was deployed on Amazon ECS using a Fargate Spot instance with 0.5 vCPUs. When I first set it up, the service wasn't starting correctly, so I enabled CloudWatch Logs in the task definition to help debug the startup problems.

The root cause of those startup problems turned out to be a health check grace period that was too short. The Application Load Balancer started running health checks almost immediately after deployment. The service was still initialising and couldn't respond with a healthy status, so the ALB marked it unhealthy. ECS then undeployed it and started a fresh deployment. Which also failed the health check. Over and over.

I fixed the grace period and the service stabilised. I left CloudWatch Logs enabled. The free tier gives you 5 GB of log ingestion per month, and I figured it might be useful for future debugging.

5 GB feels like a lot. Until it isn't.

The Bug That Broke the Budget

The change I deployed before going to bed had a one-line bug. It created an endless loop that caught an exception and logged it, over and over, as fast as the JVM could manage. No crash. No process exit. Just an infinite stream of log lines, quietly and efficiently filling up CloudWatch.

By morning: 238 GB of ingested logs. At $0.50 per GB above the free tier, that's a straightforward and painful calculation.

I logged in, stopped the service immediately, identified the bad line, deleted the logs, and deployed the fix. Total damage: $159.79 and a stressful morning.

With a beefier instance the loop would have been faster and the bill even higher.

What I Should Have Done

Don't ship and sleep. If I'd spent five minutes looking at the logs after deploying, I'd have seen the exception flood immediately and killed it at the source. This sounds obvious in hindsight. At the time, it was a small change and I was confident it would be fine.

You have automated tests? Great. Static analysis? Also great. Neither of those catch a runaway log loop at runtime. There's no substitute for a quick manual check after you deploy, especially when the change touches anything that runs in a loop or handles errors.

What I Have in Place Now

None of the measures below would have stopped the service from running. But they would have woken me up within minutes instead of hours.

AWS Budgets

AWS Budgets lets you set a threshold on your monthly spend and get an email or SNS notification when you're approaching or over it. I had budgets configured, and they did fire. I just got the alerts several hours in, by which point the damage was already done.

The lesson: set your budget threshold low. A $10 alert is more useful than a $100 one.

AWS Cost Anomaly Detection

AWS Cost Anomaly Detection uses machine learning to spot unusual spending patterns and alert you. Unlike a static budget threshold, it adapts to your normal usage patterns and flags deviations. My cost monitor wasn't configured correctly at the time, and it would have caught this much faster if it had been.

Three steps to set it up: create a cost monitor, set an alert subscription with a dollar threshold, and you're done. AWS starts monitoring within 24 hours.

CloudWatch Billing Alarms

You can set up a billing alarm in CloudWatch that fires when your estimated charges cross a threshold. This one is particularly useful because it's fast. Billing alarms can catch a cost spike within an hour or two.

Combined with Cost Anomaly Detection, you get both a fast per-service alert and a broader account-level safety net.

Critical limitation: None of these tools stop your services. They notify you, and then you have to act manually. There is no hard cost cap in AWS that automatically kills a service when you hit a limit. If you want to stop spending, you have to do it yourself immediately.

What Happened Next

I reached out to AWS Support and explained what happened. On 29 June 2022, Amazon refunded the CloudWatch charges. The support experience was genuinely excellent. They understood the situation quickly and handled it without friction.

That said, I wouldn't count on a refund. AWS doesn't owe you one, and the right approach is to have the guardrails in place before you need them.