#2 Remote MCP for Docs vs URL Search in prompt

Why using a Remote MCP server is a better & more efficient solution than prompting with a docs URL search

by CoastWeb.dev |

Published July 03, 2025

Use Case Example: Documentation

Recently I built a remote MCP server for ATprotocol’s documentation.

Why?

I wanted to make it easier for ATprotocol developers (including myself) to query the ATproto documentation.

But, why does anyone need a remote MCP server for that?

Since just about every LLM agent today has url search ability and can reference a documentation site, you might be wondering why I would choose to build (and serve for free) a remote MCP server for documentation that is already available online.

Let me explain.

An overview of the two approaches

1. In-Prompt Search Approach

If you have ever asked a chat agent to include results from a specific site, then you have used an “in-prompt search” approach.

When you ask for references and pass a URL directly in prompts, the model fetches and processes the URL’s content in real-time.

Here's what typically happens to the search results text:

1. The retrieved web content gets tokenized (broken into tokens i.e. ‘Hello world’ => [‘hello’, ‘world’])

2. Those tokens get converted to vector embeddings (a numerical vector that represents its meaning)

3. The LLM processes these embeddings along with the original conversation context

Only after this initial linear processing has occurred can the agent process your original queries.

The key take-away is that any text the LLM directly processes has to be both tokenized and embedded.

How it actually works depends on your LLM and agent

Sometimes the LLM processes your search results directly, just like any other input text - tokenizes it, embeds it, and processes it all together with your original question. This is straightforward, but can hit context length limits with long search results.

Some agents have the LLM first summarize or extract key information from the search results, then use that processed information in the final response.

2. Remote MCP Server with Vectorized RAG Index Approach

First let’s understand the components of my example documentation MCP server:

1. A cron worker that crawls and scrapes the documentation sites saving their html content. (This lets me include as many url resources as I want, such as the atproto wiki & Bluesky docs)

2. A vectorized RAG index (where the documentation content gets pre-processed i.e. tokenized & vector embeddings created)

3. The remote MCP server that publishes a “search_documentation” tool (mine is hosted on Cloudflare)

What is a Vectorized RAG?

Retrieval Augmented Generation (RAG) just means that some source of tokenized embeddings was used to find the most relevant chunks before feeding them to the LLM.

A “RAG Index” is a just list of those embeddings. Similar to how a database index makes it easier to query a table, a RAG Index makes it easier for an LLM to query a set of vector embeddings.

Pre-processed sources

By crawling, tokenizing, and embedding the documentation content in advance, my MCP server is essentially “pre-processing” it for querying and consumption by an LLM.

By doing this once and making it available remotely to any agent via the model context protocol, I’m essentially “caching” the embedding index for any LLM agent to consume.

Remote MCP is better

By creating a single RAG index with the data already tokenized and embedded, anyone who wants to query these documentation sources can do so without having to re-tokenize and re-embed that data.

Computational overhead efficiency

The primary benefits lie in reduced token cost, and processing time.

Since most of us have concerns around LLM energy consumption, this is one example of how using pre-computed vectors vs real-time fetching can reduce overall energy consumption.

Contextual understanding improvements

Combining data from multiple sources into one single vector index also results in multi-source synthesis capabilities. Instead of limiting context to a single url’s fetch results, a vector index can in theory be quite large and contain significant amounts of data from various sources.

While increasing the scope and size of your source material might seem like it would make queries take longer, in fact pre-processing the tokenization and embedding steps improves noise reduction in results.

This is accomplished by enabling focused retrieval through the RAG index vs. full page processing of possibly irrelevant results with a URL prompt search. Similar to how querying a SQL index or view is likely to be more efficient than querying a large unindexed SQL table.

Remote MCP is better for RAG Use Cases

Remember from “What devs should know about MCP”, a remote MCP server is simply one that is hosted in the cloud and accessible over the internet. Remote servers are not necessarily “official” depending on who is hosting and providing them.

While it would have been possible to make a local MCP server that users install and run locally the scrape, tokenize, and create vector embeddings for a list of documentation sites, that wouldn’t de-duplicate the processing. Each user would then be creating their own instance of the MCP server and would need to pre-process their own RAG index.

Universal compatibility

Model Context Protocol provides the standardization that makes my MCP server’s “search_documentation” tool work with any agent. By running the pre-processing task once a week on a cron schedule, and hosting my remote MCP server on Cloudflare without requiring any authentication, it makes querying those resources available to everyone in the world that wants to query those sources.

Options for Authentication

If my RAG index contained proprietary sources instead of public protocol documentation it would be possible to require MCP server authentication and limit access to specific users.

This makes RAG an excellent use case for a remote MCP server solution.

Remote MCP wins this use case

TL;DR instead of Claude or ChatGPT or whatever agent each of us is using having to parse content from a url source every time one of us wants to query it, a remote MCP server can do that processing once, and then make an index of it available to everyone.

I think this efficiency thing might be kinda important. Maybe even as important as what’s coming up next time: “MCP security vulnerabilities devs should know”.

Have feedback about these MCP posts? DM me on Bluesky @immber.bsky.social!

Thanks for reading!

Get updates from CoastWeb Developer Updates!