How One Simple Switch Slashed My GPT-4o API Costs by 93%

How One Simple Switch Slashed My GPT-4o API Costs by 93%

Last updated at:

·

Author: dlom

Building apps with AI is amazing.

You get to create things that feel like magic. But then the first bill from OpenAI comes in, and the magic disappears. Suddenly, it feels a lot less like innovation and a lot more like a leaky faucet draining your bank account.

It can get really expensive, really fast.

You're not alone if you've felt that shock. Every developer working with large language models worries about the cost of API calls. You write some code, it works, and you move on. But in the background, every time a user clicks a button, you could be spending money. And that money adds up.

So, I decided to do a little experiment. I wanted to see the real, nitty gritty cost of a single API call to OpenAI's latest and greatest model, GPT-4o. More importantly, I wanted to see if there was a simple way to make it cheaper.

What I found was a way to cut the cost by 93%. Yes, you read that right. Not 10%. Not 30%. Ninety three percent.

Here’s how I did it, and how you can too.

If you prefer a video explanation, here you go

Your AI API Bill Can Get Expensive, Fast

Before we dive into the experiment, let's talk about why this is such a huge problem. An individual API call might cost a fraction of a cent. It sounds like nothing.

But AI applications are not built on single calls. They are built on thousands, or even millions, of them.

Imagine you build a simple customer service chatbot. It gets 5,000 queries a day. Or maybe you have an internal tool that summarizes reports for your team of 100 people, and each person uses it 10 times a day. That's 1,000 calls right there.

That tiny fraction of a cent suddenly becomes a very real number on your monthly invoice. It’s the difference between a profitable project and one that bleeds cash.

The biggest issue is a lack of visibility. How can you control costs you can't even see in real time? You can't. You're flying blind, hoping for the best when the bill arrives. This is why I started working on a dashboard over at agentstower.com, to give developers like you and me the power to see exactly what we're spending, as we spend it.

To show you what I mean, I used that very dashboard to track my experiment.

A Real World Look at a Single GPT-4o API Call

Okay, let's get to the fun part. I fired up my development environment to make a real API call and measure the cost down to the hundredth of a cent.

The task was simple: I was working on a tool to generate creative marketing ideas. I wanted the AI to act like a world class marketing professional and give me some clever slogans.

I sent my request to the GPT-4o model. The AI did its thing and sent back a few decent ideas.

But the interesting part wasn't the ideas themselves. It was the numbers on my cost tracking dashboard after the call was complete.

Here’s the breakdown:

Input Tokens: 453

Output Tokens: 152

Total Tokens: 605

Total Cost: $0.00265 US dollars

Less than one cent. So who cares, right?

Wrong. Let's put that number in perspective. If your app made just 1,000 of these calls per day, that would be $2.65 per day. That's nearly $80 a month. For 10,000 calls a day, you're looking at $800 a month. For 100,000 calls, it's $8,000 a month.

It adds up. And this was just one simple request. More complex tasks require way more tokens and cost even more.

But this brings up a huge question. I only asked a short question, so where did all those "input tokens" come from? Why did I send three times more information than I got back?

What are "Tokens" and Why Do They Cost So Much?

This is the secret behind all AI costs. You need to understand what you're actually paying for.

Think of tokens as the currency of AI models. OpenAI doesn't charge you per word or per character. It charges you per token. A token is roughly three quarters of a word. The word "apple" is one token, but a more complex word like "extraordinary" might be broken into two or three tokens.

You pay for the tokens you send to the model (input) and the tokens the model sends back to you (output). And often, they have different prices.

So where did my 453 input tokens come from? It wasn't just my question. My input was made of three distinct parts:

1. The Instructions (System Prompt): This is where I told the AI how to behave. I gave it a role: "You are a marketing professional." This instruction is sent with every single API call to ensure the AI gives consistent, high quality responses. It sets the stage for the entire conversation.

2. The Context (Examples): This is where most of the cost came from. I didn't just ask for ideas blindly. To get great results, you have to show the AI what "great" looks like. I fed it several examples of iconic, clever marketing campaigns from The Economist. This trains the model, on the spot, to generate ideas in that specific style. It's a powerful technique for improving quality, but it means you are sending a lot of text, and paying for a lot of tokens, just to provide context.

3. The User's Message: This was the actual question I wanted answered. Ironically, this was the shortest part of the entire package I sent to OpenAI.

All of that, combined, made up my 453 input tokens. The 152 output tokens were simply the AI's answer. As you can see, the setup required to get a good response is often much larger than the response itself.

This is the hidden cost of AI development. But what if there was a way to process the same tokens for less money?

The Simple Trick That Cut My Costs by 93%

OpenAI has more than just one model. They have a whole family of them, each with different strengths, speeds, and, most importantly, prices. GPT-4o is the powerful, expensive, top of the line model. It's the sports car of AIs.

But sometimes, you don't need a sports car. Sometimes a reliable sedan will get the job done just fine. For OpenAI, that sedan is called GPT-4o-mini.

It's designed to be faster and much, much cheaper. So I decided to run my exact same experiment again. The only thing I changed was the model.

It was a one line code change. I swapped "gpt-4o" for "gpt-4o-mini" and ran the exact same request with the same instructions, context, and user message.

Here are the results:

Input Tokens: 453 (the same, because we sent the same text)

Output Tokens: 155 (slightly different response)

Total Tokens: 608

Total Cost: $0.00018 US dollars

Let's compare that. The first call with GPT-4o cost $0.00265. The second call with GPT-4o-mini cost $0.00018.

That is a 93% reduction in cost.

Let's go back to our earlier scenario. For an app making 10,000 calls a day, the monthly bill drops from $800 to just $54. That is a game changing amount of savings for any developer or business.

Now, what's the catch? The catch is performance. A cheaper model is generally not as "smart" as its more expensive sibling. For highly complex tasks that require deep reasoning, creativity, or nuanced understanding, you'll likely still need the power of GPT-4o. But for many common tasks like simple classification, summarization, or basic chatbot responses, GPT-4o-mini is more than capable.

The lesson here isn't to always use the cheapest model. The lesson is to use the *right* model for the job. And to know what's right, you need to be able to experiment and measure the impact.

Take Control of Your AI Spending

Flying blind is not a strategy. You can't optimize your AI costs if you don't have clear, real time data on what you're spending and where.

You shouldn't have to wait until the end of the month to find out if a new feature is financially viable. You should be able to see the cost of a single API call, test different models, and make informed decisions instantly.

That’s exactly why I'm building the cost tracking dashboard at agentstower.com. It's the tool I used for this experiment, and it gives you the power to see your token usage and costs as they happen. It helps you find those opportunities to switch to a cheaper model and save thousands of dollars without sacrificing quality where it matters.

If you are tired of guessing what your next OpenAI bill will be, you need to start tracking your costs. If you want to use the same dashboard I used to find these savings, I can help you get it set up.

Just go to agentstower.com/register and register for a free account. Take control of your AI API costs today.