Optimizing API Output for Use as Tools in Model Context Protocol (MCP)

Craig Walls

--

A picture of an old wooden toolbox filled with various tools.

When I first started my journey into Generative AI, I was drawn to the technique of applying tools (more commonly called “function” back then) to enable AI interactions to do more than simply produce responses from their training. It was tools that opened my eyes to how Generative AI was more useful than generating jokes or telling my the sky is blue. It could enable LLMs to work with application data and to take action and actually do things.

But then Retrieval Augmented Generation (RAG) became the big topic in GenAI circles. Just like tools, RAG could enable an LLM to answer questions outside of the boundaries of its training. And for a long while, tools took a backseat to RAG as the thing that the cool GenAI kids do.

Now, as the hot topic of Agentic AI is front and center, it’s becoming increasingly clear that tools are going to be critical to enabling agents to take actions and seek information on our behalf. Moreover, Model Context Protocol (MCP) provides a standardized way to create modules of tools and to share those modules across AI-enabled applications. And there’s definitely a place for MCP Servers that wrap existing APIs to bridge the chasm between GenAI and existing enterprise APIs.

MCP has been taking up a rather large space in my mind for the past couple of months. And now that Spring AI has support for developing both MCP Servers and MCP Clients to consume those servers, I’ve spent a lot of time building both. In doing so, I made a few mistakes, learned a few lessons, and now want to share one of those lessons with you so that you can make best use of MCP. I’ve made the mistakes so you don’t have to.

The Backstory

For a few years, I’ve been using a fun API called ThemeParks.wiki API. This API provides insight into various theme parks all over the world, giving information such as operating hours, attraction wait times, and show times. I’ve used it to develop an Alexa skill that I’ve used myself with the Alexa mobile application when I visit Disney World and Disneyland.

The ThemeParks.wiki API exposes a handful of endpoints:

  • /destinations — Returns a JSON list of all resort areas covered by the API. Each resort area entry has children that are the theme parks in that resort area. (E.g., Walt Disney World is a resort area; The Magic Kingdom, Epcot, Disney’s Hollywood Studios, and Disney’s Animal Kingdom are theme parks in that resort area.)
  • /entity/{entityId} — Returns high level details about an entity, which could be a resort area, theme park, show, attraction, restaurant, etc.
  • /entity/{entityId}/children — Returns a list of children belonging to an entity. This is most useful when the entity is a resort area or theme park. In the case of a theme park, the children are the attractions, shows, and restaurants that are in that park.
  • /entity/{entityId}/live — Returns live (e.g., near-realtime) information about an entity. In the case of a show, this might be a list of showtimes for the current day. For attractions, this will include the current wait time. For restaurants, this could include an estimated wait time for walkups, based on party size.
  • /entity/{entityId}/schedule — Returns operating hours for the entity. In the case of a theme park, this includes one full month of operating hours (starting with the current date), plus some additional information (such as purchased Lightning Lane entry availability times for attractions that support that).
  • /entity/{entityId}/schedule/{year}/{month} — That same as the previous schedule endpoint but focused on a specific month (starting with the first of that month).

What you may notice is that except for the destinations endpoint, all other endpoints require an entity ID. But where does that come from? In the case of theme parks, it can be obtained from the destinations endpoint, although you’d need to dig around in all of that JSON to find it for the specific endpoint you are asking about. In the case of attractions, you’d likely need to start with the theme park’s entity ID (obtained from destinations) and then make a request to the children endpoint to get the entity ID for the specific attraction, and then use that to make a request to one of the other endpoints to get the information you want about the attraction.

As a developer, it’s not that hard to examine the API and then write some code to extract the information you need. In fact, I had to code up a little of that effort myself when creating the Alexa skill. It wasn’t too difficult, but I did need to explicitly code up a chain of operations to get what I needed. And caching the data that doesn’t change (such as entity IDs) kept me from making repeated trips to the API.

As I said, it’s a fun and useful API. I imagined it would be fun to build an MCP Server that exposes the API’s functionality as tools to be used by an LLM, enabling a user to ask questions such as “What time does Epcot open tomorrow?” or “How long is the wait for Space Mountain in Disneyland?”

So that’s what I did. And that’s when some interesting lessons were learned.

The Easy/Naive Attempt

My first attempt at creating an MCP Server from ThemeParks.wiki API’s endpoints involved simply exposing each endpoint as a tool. This 1-to-1 mapping from endpoint to tools was very straightforward: Use Spring’s RestClient to make HTTP GET requests to each endpoint in the context of a tool definition. Nothing fancy at all.

Using the new @Tool annotation from Spring AI (currently in snapshot builds, but soon to be available in the Milestone 6 release), I first created a service class that defined one @Tool method for each endpoint. It looked a little something like this:

@Service
public class ThemeParkService {
public static final String THEME_PARKS_API_URL = "https://api.themeparks.wiki/v1";

private final RestClient restClient;

public SimpleThemeParkService(
RestClient.Builder restClientBuilder) {
this.restClient = restClientBuilder
.baseUrl(THEME_PARKS_API_URL)
.build();
}

@Tool(name = "getDestinations",
description = "Get list of resort destinations, including their entity ID, name, and a child list of theme parks")
public String getDestinations() {
return sendRequestTo("/destinations");
}

@Tool(name = "getEntity",
description = "Get data for a park, attraction, or show given the entity ID")
public String getEntity(String entityId) {
return sendRequestTo("/entity/{entityId}", entityId);
}

@Tool(name = "getEntityChildren",
description = "Get a list of attractions and shows in a park given the park's entity ID")
public String getEntityChildren(String entityId) {
return sendRequestTo("/entity/{entityId}/children", entityId);
}

@Tool(name = "getEntitySchedule",
description = "Get a park's operating hours given the park's entity ID.")
public String getEntitySchedule(String entityId) {
return sendRequestTo("/entity/{entityId}/schedule", entityId);
}

@Tool(name = "getEntityScheduleForDate",
description = "Get a park's operating hours given the park's entity ID and a specific year and month.")
public String getEntitySchedule(
String entityId, String year, String month) {
return sendRequestTo("/entity/{entityId}/schedule/{year}/{month}",
entityId, year, month);
}

@Tool(name = "getEntityLive",
description = "Get an attraction's wait times or a show's show times given the attraction or show entity ID")
public String getEntityLive(String entityId) {
return sendRequestTo("/entity/{entityId}/live", entityId);
}

private String sendRequestTo(String path, Object... pathVariables) {
return restClient
.get()
.uri(path, pathVariables)
.retrieve()
.body(String.class);
}

}

And then, I exposed these methods as tools in an MCP Server with the following configuration:

@Configuration
public class McpServerConfig {

public final SimpleThemeParkService themeParkService;

public McpServerConfig(SimpleThemeParkService themeParkService) {
this.themeParkService = themeParkService;
}

@Bean
public StdioServerTransport stdioTransport() {
return new StdioServerTransport();
}

@Bean
public McpSyncServer mcpServer(ServerMcpTransport transport) {
var capabilities = McpSchema.ServerCapabilities.builder()
.tools(true)
.build();

ToolCallback[] toolCallbacks = MethodToolCallbackProvider
.builder()
.toolObjects(themeParkService) // <-- Sets the tools from the service
.build()
.getToolCallbacks();

return McpServer.sync(transport)
.serverInfo("Theme Park MCP Server", "1.0.0")
.capabilities(capabilities)
.tools(ToolHelper.toSyncToolRegistration(toolCallbacks))
.build();
}

}

And, it worked. Well…it sorta worked. Sometimes it worked. But not always.

When it worked, it was slow. And when it didn’t work, it often failed because I was exceeding a tokens-per-minute (TPM) rate limit from OpenAI. After turning up the logging and doing a bit of detective work, I realized that while all of the tools were made available to the LLM, including reasonably good descriptions, the LLM was struggling to find the entity IDs for some items. So it was making frequent calls to those tools, in a sort of trial-and-error approach, to find what it was looking for.

What’s more, given that the JSON returned from both the destinations and schedule endpoints is so large, the prompt context was being filled with mostly irrelevant data. The destinations endpoint, for example, returns every resort area and theme park supported by the API. And the schedule endpoints each return a full month’s worth of schedule data, even if you only need 1 day. As a result, each time I asked why time a park opens or closes, I was burning through a huge number of tokens. Not only was this wasteful, it fully explained why I kept getting smacked with TPM rate limits.

I decided to take a deeper look into just how many tokens I was using by inspecting the usage metrics being returned in the response. I asked “What time does Epcot open tomorrow?” three times. The following table shows how many tokens were spent each time.

Yikes! No wonder I was frequently exceeding the rate limits. I was using GPT-4o as my model, which has a 30,000 TPM limit.

Rate limits notwithstanding, tokens are the way that the bill for GenAI APIs is calculated. With GPT-4o, that experiment cost me nearly 42 cents. That may not be a lot, but it was only 3 interactions.

Of course, one choice I could (and did) make is to switch to GPT-4o-mini. This model is significantly less expensive and has a much higher TPM limit (200,000 tokens per minute). But that felt like sweeping the problem under the rug. I wanted to see what the core problems are and if they can be resolved.

After a bit of thinking, it became very clear what the issues are:

  • The data from the destinations endpoint is resort area-centric, but most questions asked are more theme parks-focused. This made things more difficult for the LLM to find the entity ID for a given theme park.
  • The destinations endpoint and the schedule endpoints returns enormous amounts of JSON, most of which is irrelevant for the questions being asked. If being asked about Epcot, you don’t really need to know the entity ID for Six Flags Over Texas. If asking about park hours for tomorrow, you don’t really need to know the park hours for any date except for tomorrow’s date.

These two issues taken together explain why so many tokens were being used for each question. The 23K-93K tokens weren’t sent in a single prompt — the JSON is big, but not that big. But the big JSON being sent as context in several prompts while the LLM was fumbling about trying to find an entity ID added up quickly.

A diagram of the interactions between an application, an MCP Server, and an API, showing that before optimizations, more interactions and large amounts of JSON are in the prompt context.
A one-to-one mapping of API endpoints to MCP Server tools may result in more interactions with the LLM as it seeks the information it needs and may increase the token count of each interaction.

Put simply, the ThemeParks.wiki API isn’t designed to be optimally used directly as the underpinnings of an MCP server.

It was suggested to me that I shouldn’t code the MCP Server myself, but rather take advantage of the fact that ThemeParks.wiki API has an OpenAPI (note the “P”) specification and use a prebuilt MCP Server that magically exposes OpenAPI endpoints as tools in the MCP Server. I tried that, but since the OpenAPI MCP Server is exposing one tool per endpoint with no filtering of the data returned, the core issues remain the same.

Is all lost? Can ThemeParks.wiki API not be used to create an efficient MCP Server? It can. But only with some adjustments to the results.

Optimizing the API

Ideally, the ThemeParks.wiki API would be optimized, offering endpoints that are more focused on specific theme parks, attractions, etc. But we shouldn’t be quick to blame the designer for the problems I’ve encountered. The API was designed before MCP (or even the current buzz around GenAI) existed. There’s no question that network bandwidth could be conserved with some adjustments to the API, but typical use of the API outside of the realm of GenAI isn’t impacted by how the API is currently designed. It’s only in a GenAI/MCP situation where the pain is felt.

Unfortunately, I do not have much control of the API’s design. While I could submit some pull requests to the API’s GitHub project, it would be up to the project owner to decide if those optimizations are worthwhile. There’s little that I can do to optimize the API itself.

But if I can’t optimize the API for use with MCP, then I can certainly optimize how it’s used by the MCP Server. So that’s what I did.

One seemingly obvious optimization is to cache the results from the destinations endpoint. This would avoid unnecessarily hitting the API repeatedly to fetch the same data the doesn’t change often. But it does absolutely nothing to reduce token usage.

Next, I took the JSON coming from the destinations endpoint and inverted it such that instead of having resort areas with theme parks as children, the theme parks are the top level item with properties for the resort area’s entity ID and name. This makes it a bit easier for the LLM to sift through the data to find the entity ID for a given theme park.

I then wrote some code to filter the inverted data, seeking out a theme park entry by its name. This is less than perfect, of course. Although the filtering normalizes the name to lowercase and searches for substrings (so that “Animal kingdom” will match “Disney’s Animal Kingdom”), it doesn’t work as well if the requested park has a typo in its name (e.g., “Disbeyland” won’t match “Disneyland”). But for the most part, it works well. I’m already thinking of ways to overcome the shortcomings of this, but for now I’m pleased with how well it works.

All of that work ended up in a new getParksByName() method that replaced the getDestinations() method. The new method looks like this:

@Tool(name = "getParksByName",
description = "Get a list of parks (including their name and entity ID) given a park name or resort name")
public List<Park> getParksByName(String parkName) throws JsonProcessingException {
return getParkStream(
park -> park.name().toLowerCase().contains(parkName.toLowerCase())
|| park.resortName().toLowerCase().contains(parkName.toLowerCase()))
.collect(Collectors.toList());
}

private Stream<Park> getParkStream(Predicate<Park> filter) {
DestinationList destinationList = restClient.get()
.uri("/destinations")
.retrieve()
.body(DestinationList.class);

return Objects.requireNonNull(destinationList).destinations.stream()
.flatMap(destination -> destination.parks().stream()
.map(park ->
new Park(park.id(), park.name(), destination.id(), destination.name())
)
.filter(filter));
}

The next most obvious optimization is to single out the schedule for an entity to a given date. I started by removing the basic schedule endpoint from the tools and only using the endpoint that accepts a year and month as URL parameters. I changed my tool definition to accept a date in “yyyy-MM-dd” format, from which I use the year and month to make the request to the API. But I also use the date to filter the returned JSON down to a specific date, rather than a month’s worth of schedule information.

The new getEntitySchedule() method now looks like this:

@Tool(name = "getEntityScheduleForDate",
description = "Get a park's operating hours given the park's " +
"entity ID and a specific date (in yyyy-MM-dd format).")
public List<ScheduleEntry> getEntitySchedule(String entityId, String date) { // <5>
String[] dateSplit = date.split("-");
String year = dateSplit[0];
String month = dateSplit[1];

Schedule schedule = restClient.get()
.uri("/entity/{entityId}/schedule/{year}/{month}",
entityId, year, month)
.retrieve()
.body(Schedule.class);

return schedule.schedule().stream()
.filter(scheduleEntry -> scheduleEntry.date().equals(date))
.toList();
}

I also remove the getEntitySchedule() method that only accepts an entity ID as a parameter. Not only did it seem superfluous, I noticed that by leaving it in, the LLM was still occasionally requesting that it be invoked, which meant large amounts of unfiltered JSON being added to the prompt context.

Note that in both cases, the methods now return a List of Park or ScheduleEntryinstead of a JSON-filled String. While String responses were sufficient for the initial work, domain types made it easier to do the filtering and manipulation required. In either case, the results end up as JSON in the prompt context, so the choice of String versus domain-specific types had negligible effect on the overall performance.

A diagram showing an optimized MCP Server that filters the JSON coming from the API and not doing a one-to-one mapping of tools to endpoints, resulting in fewer interactions and smaller token usage.
By filtering the responses from an API and not necessarily exposing all endpoints as tools, the LLM is able to find the data it needs in fewer prompts and the JSON carried in the prompt contexts is much smaller.

With these changes made to my MCP Server implementation, I tried everything again. And it worked brilliantly! I was getting good answers to all of my questions, typically faster than before, and was never hitting a rate limit!

But what is the bottom line? How many tokens are being used per question after these optimizations? The following table shows a marked reduction in token usage.

Wow! After adapting the results from the API to be more focused on typical use cases and to reduce the amount of JSON passed around in the prompt context, the number of prompt tokens was reduced by roughly 93%-98%. Even at GPT-4o prices, that only cost just over a penny for all three attempts. And by using GPT-4o-mini, it was only fractions of a penny.

The Takeaway

There are definitely opportunities for further optimization to my MCP Server. I didn’t even touch the getEntity(), getEntityLive(), or getEntityChildren() methods. Optimizing those could certainly help reduce token spend for other use cases, such as asking for the current wait time for an attraction. But there’s no denying that the initial improvements that I’ve made had a significant and positive impact to token usage when asking for park hours.

Although using a prebuilt OpenAPI MCP Server would’ve been an easy way to integrate with the API, there’s no way to achieve these same results without going back to the API itself and applying the optimizations there.

The key takeaway is that as GenAI, Agentic AI, and MCP continue to be adopted and applied, there will be a desire to integrate with existing APIs to solve problems. It’s quite likely that those existing APIs will have been designed without consideration of how they may be used as tools made available to an LLM.

Providing an OpenAPI specification for your API is a good thing. And using an OpenAPI MCP Server to expose that API’s endpoint as tools is also tempting for its simplicity. But before your do that, it’s a good idea to consider what impact that the data returned from your API may have on token usage and seek ways to either optimize the API itself or optimize how the API is used in a custom MCP Server implementation. If you don’t, you may find that your results — and your GenAI API bill — may be less than ideal.

If you want to learn more about Spring AI, MCP, RAG, and other GenAI topics, be sure to check out my book, Spring AI in Action. It covers all of this and more with practical hands-on examples.

--

--

Craig Walls
Craig Walls

Written by Craig Walls

Author Spring in Action and Build Talking Apps for Alexa

Responses (3)