#4 A Calculator Can’t Lie To You About Math

Can MCP help keep your LLM from lying to you?

August 01, 2025

In my last post in this series, MCP Security Vulnerabilities Devs Should Know, I talked about security concerns for both MCP servers and their clients.

In today’s episode of my series on MCP for Curious Devs, I’m going to talk about an even bigger AI threat and how the Model Context Protocol can help.

That threat is… Lies.

Getting Comfortable with LLMs

If you’ve spent any time at all using LLM chat agents, you’ve probably already had one lie to you.

If you’ve ever used one for coding, it may have invented a package or method that doesn’t exist.

Maybe you heard about the recent incident where Chicago Sun Times published a summer reading list with completely made up books (coverage by NPR).

There’s a disclaimer at the bottom of the (paid) pro-level Gemini interface that says “Gemini can make mistakes, so double-check it”, and similar statements can be found on all of the others. However, how many users are actually taking the time to “double check” the results of their prompts?

As more and more users want to depend on chat agents, we need to keep them from lying to us.

Illustration of a white calculator with a digital display showing "AAAAPAABBAAN HF28HF88UU". The calculator has buttons labeled "MRC", "M-÷", "MY", "ARC", numbers 0-9, "+", "-", "=", and "•".

image by Gemini “an inaccurate calculator - vector art style”

Not like a calculator

One of the arguments that I have been seeing online in favor of students using LLM chat agents for learning, is that it’s just like giving them a calculator.

This is a terrible analogy, and please help me shut it down if you encounter it. Here’s why it’s problematic:

Calculators can’t lie to you about math.

The reason that calculators are so ubiquitous is because of their ACCURACY. The only way to get an incorrect answer from a calculator is to give it incorrect inputs. LLMs simply don’t work the same way. The “how many R’s in strawberry” example demonstrates this.

Generating vs Regurgitating

Many everyday users of LLMs are confusing content generation for content regurgitation. When they ask for an “ai summary” for example, they think that the LLM is ingesting some content, filtering it, and then regurgitating the same words back in the summary as were in the original content. But that’s not how any of this works.

LLMs don’t regurgitate words and sentences from the context fed to them, they use the context fed to them to guess at generating brand new sentences.

In other words, they are literally designed and programmed to make shit up.

Illustration contrasting "Regurgitating" versus "Generating" ideas. On the left, a colorful parrot is surrounded by speech bubbles containing nonsensical text: "Copied Whme spave trealmat mey wiing", "Ne you the copied, now yous thammes", "Wopiew Rerper'em ned fou patrex", and "Cropied Iwe houman your allice". On the right, a person sits at an easel, painting a lightbulb. A speech bubble says, "New ungiique text". A jagged blue and black line separates the two sides, with "VS" in the middle.

image by Gemini “infographic specifically focused on visually depicting the generating vs regurgitating section”

As a quick aside… I believe that most of our use cases actually require regurgitation, and for that we should all be using RAG or Retrieval Augmented Generation; which I talked about in post #2 in this series Remote MCP for Docs vs URL Search in prompt.

How MCP can make LLMs more accurate

We can understand how and why LLMs lie to us and bemoan all AI usage as too fraught to have any benefit, or we can figure out how to make them more accurate and reliable. These are just a few of the ways that the Model Context Protocol can help.

Structured Data & Validation

Tool Output Schema

One of the challenges for LLM users is getting data output in an expected format. You can ask for JSON, but the “generator” might just generate it slightly differently for each request. If you’ve tried to do any kind of tool development, you may have already run into this issue.

One huge benefit of MCP is the ability to define tool output schemas. This can force an agent to structure its response in a particular way. For example, I use this to define required output fields when generating product pages.

Elicitation Response Validation

Structured schemas can be enforced on prompt inputs as well as tool outputs. By implementing Elicitation an MCP server can standardize the inputs required by its tools. This allows MCP servers to request structured data from users with JSON schemas to validate responses.

Diagram of the Elicitation flow from MCP docs:

Protocol & Tool Execution Errors

In addition to input and output validation for tools, MCP also supports standard JSON-RPC error messages. Tool errors as well as client/server errors can be surfaced to agents and their users. Errors can be structured to ensure standardized consumption of the tools.

Keeping things honest

While standardized errors can’t directly prevent an LLM from lying to you, they can be used to ensure that MCP servers and clients are only taking expected actions on data that is appropriately structured.

MCP as a protocol by itself can’t solve the generation vs regurgitation issue, or that LLMs are inherently designed to make shit up, but with strategies like RAG it can improve the overall accuracy and reliability of the responses we get from our chat agents.

Ready to get your hands dirty with MCP, join me again next time for #5 Tools for Finding & Publishing MCP Servers!

Have feedback about these MCP posts? DM me on Bluesky @immber.bsky.social!