How we test and rank.
Every server on this site was installed, configured, and benchmarked by hand. No vendor paid for placement. No affiliate deal influenced the score. Our test scripts are public on GitHub so you can reproduce every result.
1. Discovery
We scan GitHub, npm, and the official MCP server registry daily. New repositories with mcp-server in the name or topic are flagged for review. We also monitor the official MCP servers repo for community additions.
2. Installation
Each server is installed via npx or pip in a clean environment. We follow the official README exactly. No custom patches, no privileged flags. If the install fails, the server is rejected. We test with the MCP SDK version that the server actually works with (currently 1.0.0), not the latest release.
3. Production Tasks
Every server must complete five tasks on a deterministic test page. Each server gets arguments matching its actual tool schema. Playwright uses accessibility refs, Puppeteer uses CSS selectors, Firecrawl uses single-shot actions. Tasks are user intents ("navigate", "click", "fill"), not tool calls.
- Browser Automation: navigate to test page, take screenshot, get DOM snapshot, click a button, fill a form field.
- Test page: A self-contained HTML page served locally on an ephemeral port. No external network dependencies. Same DOM every run.
- Validation: We check
output.isErrorfirst, then assert on semantic outcomes ("navigated", "screenshot", "snapshot"). Never match generic substrings that could match error text.
4. Scoring
We report raw metrics, not an opaque aggregate score. A trustworthy benchmark shows you the actual numbers so you can judge for yourself:
We do not weight these into a single "score." Different use cases care about different things. A CI pipeline cares about pass rate and consistency. A prototyping session cares about low connect time. We show all the numbers so you can weight them yourself.
5. Re-testing
We re-run benchmarks before publishing updated rankings. If a server breaks in a new release, its score drops and it moves down the list. If a previously failing server fixes its issues, we re-test and update the results. The date on every category page shows when the last run completed.
6. Auth & Reproducibility
Some servers require API keys. We classify them into three categories:
- Free to Test: No API key required. Install via npx and it works. Playwright MCP and Puppeteer MCP fall here.
- Requires Auth: Needs a free API key. We test with the free tier when possible. Firecrawl MCP and Browserbase MCP fall here.
- Requires Paid: No free tier. We verify the package installs but do not run full tasks without a paid key. These servers are labeled "Paid. Evaluation pending."
Key isolation: The test harness strips all environment variables that look like keys, tokens, or secrets before spawning each server. Only the declared API key for that specific server is injected. This prevents a buggy or malicious server from exfiltrating unrelated credentials.
Reproduce our results
All test scripts are open-source. Clone the repo and run the same benchmarks yourself.
View test scripts on GitHub