Speed is citation surface area - General Discussion

Speed is citation surface area
6 months ago

#1

As Trongate v2 approaches, and as I've spent the last few months buried deep in the world of optimizing content for AI, a thought on Trongate's inherent advantage in a world of RAG pipelines:

Now that AI answer engines are increasingly becoming the "first reader," the old-school virtue of fast, server-rendered HTML is less nostalgic and more strategically practical.

1) Most AI crawlers don't execute JavaScript at all

- Vercel analysed 569 million GPTBot requests and found zero evidence of JavaScript execution
- ClaudeBot, Meta-ExternalAgent, Bytespider, PerplexityBot: none render JavaScript

They fetch HTML, parse it, and leave. Client-side rendered pages appear empty or low-signal to them

2) Speed directly correlates with AI citations

The performance advantage is measurable:

- Pages with First Contentful Paint under 0.4 seconds average 6.7 citations
- Pages slower than 1.13 seconds? Just 2.1 citations
- Fast pages are 3x more likely to be cited across AI platforms

According to Sistrix when they looked into it, only 274,455 domains appeared in Google AI Overviews for the UK SERPS out of 18.4 million domains in organic results. The selection is ruthlessly tight.

Citation patterns show AI systems favour:
- Clear HTML structure over length
- Fast responses (sub-second FCP is table stakes)
- Content-complete HTML (no waiting for JavaScript to reveal meaning)

So Trongate's architectural advantages:

- Zero JavaScript rendering overhead
- Content is complete in the initial response
- Sub-100ms response times

These are competitive advantages in an AI-first discovery world where:

1. Most AI crawlers can't see JavaScript-rendered content
2. Citation probability correlates directly with page speed
3. Content that loads fast and parses easily wins

Trongate - ahead of the curve as usual.

6 months ago

#2

Indeed.
These are my observations:
1. Properly formatted and concise responses will be favoured.
- Claude seems to have some preference for JSON.
2. semantically correct HTML is both an advantage for screen readers and for bots.
3. Page titles are more important than ever.
4. AI bots understands polling in the form of meta refresh but not JS based.
5. Correct use of headers like status code and content-type are easy wins

6 months ago

#3

You folks seem to be more versed on the subject than a geezer like myself.
So when you say that an AI is partial to fast loading, highly structured HTML, with page titles, and status codes, can I assume you are referring to html pages from which the AI derives it information/knowledge base from?
Apparently if it can source information from well structured, fast loading HTML pages, and can determine the page title and successfully loaded page based on the status code, it chooses that over slow loading pages or those with Javascript latencies.
Has anyone tested AI preferences in the matter of Trongate-MX usage, versus non-MX? Will it be confused by the lack of a page load, and does attaching a succinct status code help? Or do I not know what I'm talking about?

6 months ago

#4

Much the same rules apply as with google page ranking or lighthouse.

As for Trongate MX, we generally load a full HTML page and enhance it with MX so the AI bot will see the initial contents will be read.
If you have an anchor link pointing to another page (href="$url") then it will be followed however mx-get="$url" might get ignored.
If you have a inline JS script or a JS file reference that finds an element and attached behaviour like `el.addEventListener('click', function () { window.location.href = "$url" });` that likely won't be followed which is much the same reason SPA's like a React client app may struggle to get ranked high on Google compared to like server rendered equalivent.

My point on status codes and content-types are mostly aimed at APIs.
You'd be surprised how many return 200 (i.e. everything is OK) with body like `{ "success": false, "error": "bla bla bla" }` and if that endpoint had simply returned status_code = 400 or 422 then it would automatically score higher.
for Content-Type i'm specifically thinking of endpoints that specify response header `Content-Type: "text/html"` but proceeds to return a JSON payload. The AI will read try to parse it as HTML and fail.

6 months ago

#5

This is, to the best of my knowledge, what's happening in the background.

When you ask ChatGPT, Perplexity, or Google's AI Overviews a question, two things are happening simultaneously:

Training knowledge: The model has vast amounts of information baked in from when it was trained, but that knowledge is frozen in time.

Live retrieval: The system searches the web in real-time, fetches actual pages, extracts relevant chunks, and uses those to "ground" its answer.

This second part is called Retrieval-Augmented Generation (RAG), and it's why your page structure matters so much, because time is of the essence.

The RAG Pipeline

Query expansion: The AI generates multiple search queries (both simple keywords and semantic/intent-weighted variations).

Metadata filtering: From maybe 50 candidate pages, the system filters down to 10-20 based on titles, meta tags, some schema JSON, what it can see in search results.

Page fetching with timeout: Selected pages are fetched, typically with ~2 second timeouts. If your page hasn't delivered meaningful HTML by then, it's skipped.

Chunking: The HTML is parsed (often converted to markdown) and split into small chunks, typically 128 tokens (roughly 50-80 words).

Embedding and scoring: Each chunk is converted into a mathematical vector and scored for semantic similarity to the query using cosine similarity.

Selection for synthesis: The top-scoring chunks from 3-5 pages are sent to the final AI model, which synthesizes the answer and chooses citations.

This is where server-side rendering becomes critical, because the crawlers aren't executing JavaScript, and when pages are split into 128-token chunks, well-structured pre-rendered content produces meaningful units.

RAG systems have tight latency budgets and can't wait around for JavaScript bundles to download, execute, fetch data from APIs, hydrate islands, and whatever new complexity the JS community have invented this week. They need content-complete HTML, immediately.

6 months ago

#6

Hi codemonkey,

You’re pretty much on the money. What really counts is that an AI can grab the page quickly and understand it straight away, and that still comes down to clean, server-rendered HTML with clear signals. If the meaning is conveyed in the first response, and the page is semantically structured with a proper title and sensible headers, it gets read; if it’s hidden behind JavaScript or waiting on client-side tricks, it is often overlooked. From that angle, Trongate’s approach aligns well with how AI systems actually interpret the web.

The only thing worth dialling back a touch is the idea that speed and structure guarantee success. They don’t win the race on their own, but they decide whether you’re even on the starting line. Once the page is in the mix, relevance and clarity do the heavy lifting. Seen that way, your point stands: Trongate isn’t just quick for people, it’s quick in the way machines now expect content to be delivered.

6 months ago

#7

Yes, existing best practices still apply. But in traditional search, having site speed and accessibility concerns that were "not great, not terrible" was mostly fine.

Now, with AI overviews, it's much more abrupt. Too slow and you may as well be invisible. Rely on JavaScript rendering, you may as well be invisible.

And it's a loop. If your pages appear in AI overviews, you can effortlessly start to pick up links from AI generate content, feeding back into the traditional search signals. It's surprising just how many websites don't remove utm_source=chatgpt.com from their content links before clicking 'publish'.