# Sybil Agentic Research

> Research dashboard ranking 27 prediction markets across 5 tiers (A–E) by what an autonomous AI agent can demonstrably do today. Four independent benchmarks — agent accessibility, CLI/MCP, skill, and framework — all run by autonomous coding agents. April 2026 snapshot.

Live at https://sybil.exchange/agentic-research. This file is a plain-markdown snapshot of the entire dashboard, regenerated on every build. For structured JSON, fetch `/agentic-research/data/summary.json`.

**27** PMs surveyed · **4** benchmarks · **20** graded for accessibility · **12** tested for agent tools · **5** tiers

---

## Tier rankings

### Tier A — Production-ready

_Round-trip trade works through a tested surface with no significant friction. Safe to point an agent at today._

- **Manifold** (play money · None · AA=B CLI=A) — Play-money MCP scores 17 of 18; the easiest place to start.
- **Baozi** (decentralized · Solana · AA=B CLI=A SKILL=D FW=N/A) [low liquidity] — 76-tool MCP server on Solana with a clean round-trip.
- **Myriad** (decentralized · Multi-EVM · AA=A CLI=A) — BNB-chain CLI nails the full trade with one rough edge.
- **Seer** (decentralized · Gnosis · AA=C FW=Production) [low liquidity] — Gnosis framework deploys real trading agents to Seer markets.
- **AIOmen / Presagio** (decentralized · Gnosis · AA=C FW=Production) [low liquidity] — 300+ daily active agents already trading via Olas.
- **Metaculus** (play money · None · AA=B FW=Usable) — Active forecasting framework with daily commits — submits probability estimates, not trades.

### Tier B — Workable with friction

_A real trade completes, but a major friction point keeps it from being production-ready: geofencing, hostile accessibility, or VPN-only access._

- **Polymarket** (decentralized · Polygon · AA=A CLI=C SKILL=B FW=Abandoned) — Most polished agent stack on the market; geoblocked in 33 countries including the US, UK, EU, and Australia.
- **Alpha Arcade** (decentralized · Algorand · AA=D CLI=B) [low liquidity] — MCP completes the trade, but the website is rated D for agent accessibility — agents must already know the tool exists.

### Tier C — Surface exists, doesn't function

_Has tested agent surfaces (CLI, MCP, or Skill) that fail to complete a real trade end-to-end._

- **Rain Protocol** (decentralized · Arbitrum · AA=B SKILL=C) — OpenClaw skill opens positions but cannot close them.
- **Limitless** (decentralized · Base · AA=B CLI=D) — Two tools published, both auth-walled, zero trading checks pass.
- **Context Markets** (decentralized · Base · AA=A CLI=D SKILL=D) [low liquidity] — CLI and Skill both exist; neither completes a trade end-to-end.
- **Sapience** (decentralized · Ethereal + Arbitrum · AA=A SKILL=D FW=Abandoned) [low liquidity] — ElizaOS plugin gets one download per week and hasn't shipped since August 2025.

### Tier D — Dev tools, no agentic layer

_These PMs ship developer APIs and SDKs, but no MCP, Skill, or Framework has been published or tested. They could be agent-ready with effort; we just don't have evidence yet._

- **Kalshi** (regulated/cefi · CeFi)
- **XO Market** (decentralized · Sovereign rollup · AA=D)
- **Opinion** (decentralized · BNB Chain · AA=C)
- **SX Bet** (decentralized · SX Network · AA=C)
- **predict.fun** (decentralized · BNB Chain · AA=C)
- **Probable** (decentralized · BNB Chain · AA=C)
- **Trueo** (decentralized · Base · AA=C)
- **Interactive Brokers (ForecastTrader)** (regulated/cefi · CeFi)
- **PredictIt** (regulated/cefi · CeFi)

### Tier E — Closed to agents

_No public dev surface beyond the consumer web UI. No documented API, no SDK, no agent layer._

- **Robinhood Prediction Markets** (regulated/cefi · CeFi)
- **OG (by Crypto.com)** (regulated/cefi · CeFi)
- **DraftKings Predictions** (regulated/cefi · CeFi)
- **FanDuel Predicts** (regulated/cefi · CeFi)
- **Overtime** (decentralized · Base · AA=C)
- **worm.wtf** (decentralized · Solana · AA=D)

## Findings

### 01. Geoblocking treats agents like they're human.

Polymarket, the most agent-tooled PM in the dataset, is geofenced in 33 countries including the US, UK, EU, and Australia. Agents run on cloud VPS in whichever region the operator chooses; that location reflects hosting decisions, not the operator's actual jurisdiction.

### 02. The biggest PMs are not the friendliest to agents.

Polymarket and Kalshi together account for the majority of total prediction-market volume. Neither is in Tier A. The top of the ranking is mid-volume DeFi: Manifold, Baozi, Myriad.

### 03. Some "agent surfaces" exist nominally but don't function.

Limitless ships a CLI and an MCP server — both pinned to versions, both auth-walled, both fail every trading check. Sapience's ElizaOS plugin gets ~1 download/week and targets a host framework version that no longer exists. Polymarket Agents has 2.7K stars but the execution path is commented out by default.

### 04. Most regulated PMs treat agents as a B2B integration channel, not as users.

Of seven regulated/CeFi PMs surveyed, only Kalshi has a developer surface that an agent could reasonably use today. Even Kalshi has not published an agent-specific layer — its API is built for institutional partners. Robinhood, OG, Interactive Brokers, PredictIt, DraftKings, and FanDuel have minimal or no public dev surface.

### 05. Dev surface and website live in two different worlds.

11 of the 20 PMs scored for agent accessibility sit at C or D, including PMs whose APIs we know are functional. PMs invest in developer documentation for partners who already know what they're looking for, while leaving their consumer website unparseable to a fetch-only agent. Discoverability is a separate problem from documentation.

### 06. There is no shared format for agent surfaces.

The five skills tested use five different formats: SKILL.md, OpenClaw script bundles, custom SDK guides, navigation-hub markdown. Frameworks vary just as widely. An agent built for one PM cannot transfer to another. There is no equivalent of OpenAPI for prediction markets.

## Landscape — all 27 prediction markets

| PM | Category | Chain | Volume | Dev tools | Agent tools |
|---|---|---|---|---|---|
| Polymarket | Decentralized | Polygon | $52.5B cumulative, ~$7B/month | API/SDK/WEBSOCKET | CLI/FRAMEWORK/SKILL |
| Kalshi | Regulated/CeFi | CeFi | $30B+ annual, ~$9B/month | API/SDK/WEBSOCKET | — |
| Robinhood Prediction Markets | Regulated/CeFi | CeFi | 12B+ contracts/yr | — | — |
| Opinion | Decentralized | BNB Chain | $8B/month peak | API/SDK/WEBSOCKET | — |
| Probable | Decentralized | BNB Chain | $3B cumulative | API/SDK | — |
| Limitless | Decentralized | Base | $1.5B cumulative | API/SDK/WEBSOCKET | CLI/MCP |
| SX Bet | Decentralized | SX Network | $1.1B cumulative | API/WEBSOCKET | — |
| Myriad | Decentralized | Multi-EVM | $385M cumulative | API/SDK | CLI |
| Rain Protocol | Decentralized | Arbitrum | ~$18M cumulative, ~$3.96M TVL, ~30K users | SDK/API | SKILL |
| Alpha Arcade | Decentralized | Algorand | $10M+ cumulative | API/SDK | MCP |
| predict.fun | Decentralized | BNB Chain | Early stage | API/SDK | — |
| Sapience | Decentralized | Ethereal + Arbitrum | ~18.6 USDe cumulative (very early) | API/SDK/WEBSOCKET | SKILL/FRAMEWORK |
| Baozi | Decentralized | Solana | Unknown | — | MCP/SKILL/FRAMEWORK |
| Context Markets | Decentralized | Base | Unknown | API/SDK | MCP/CLI/SKILL |
| Manifold | Play Money | None | Play money only | API/SDK | MCP |
| Metaculus | Play Money | None | N/A (reputation-based) | API | FRAMEWORK |
| worm.wtf | Decentralized | Solana | Unknown | — | — |
| Overtime | Decentralized | Base | Unknown | — | — |
| Seer | Decentralized | Gnosis | Low | — | FRAMEWORK |
| AIOmen / Presagio | Decentralized | Gnosis | Low | — | FRAMEWORK |
| Trueo | Decentralized | Base | Low/early | SDK | — |
| OG (by Crypto.com) | Regulated/CeFi | CeFi | Undisclosed | — | — |
| DraftKings Predictions | Regulated/CeFi | CeFi | Early stage | — | — |
| FanDuel Predicts | Regulated/CeFi | CeFi | Early stage | — | — |
| PredictIt | Regulated/CeFi | CeFi | Declining | API | — |
| XO Market | Decentralized | Sovereign rollup | Early stage | API/SDK/WEBSOCKET | — |
| Interactive Brokers (ForecastTrader) | Regulated/CeFi | CeFi | Unknown | API | — |

## Testing — benchmark grades

Grades for the 20 Decentralized + Play Money PMs with tested surfaces. `—` = not tested.

| PM | Category | Agent Access | CLI/MCP | Skill | Framework |
|---|---|---|---|---|---|
| Polymarket | Decentralized | A (20/23) | C | B | Abandoned |
| Limitless | Decentralized | B (15/23) | D | — | — |
| Opinion | Decentralized | C (13/23) | — | — | — |
| Myriad | Decentralized | A (19/23) | A | — | — |
| SX Bet | Decentralized | C (13/23) | — | — | — |
| Sapience | Decentralized | A (21/23) | — | D | Abandoned |
| Baozi | Decentralized | B (17/23) | A | D | N/A |
| Context Markets | Decentralized | A (22/23) | D | D | — |
| predict.fun | Decentralized | C (12/23) | — | — | — |
| Manifold | Play Money | B (18/23) | A | — | — |
| Metaculus | Play Money | B (14/23) | — | — | Usable |
| worm.wtf | Decentralized | D (6/23) | — | — | — |
| Overtime | Decentralized | C (11/23) | — | — | — |
| Seer | Decentralized | C (13/23) | — | — | Production |
| AIOmen / Presagio | Decentralized | C (12/23) | — | — | Production |
| Rain Protocol | Decentralized | B (15/23) | — | C | — |
| Probable | Decentralized | C (9/23) | — | — | — |
| Alpha Arcade | Decentralized | D (7/23) | B | — | — |
| Trueo | Decentralized | C (13/23) | — | — | — |
| XO Market | Decentralized | D (8/23) | — | — | — |

## Methodology

### Agent Accessibility

Can a fetch-only agent read this prediction market's site? 15 checks grouped into 4 dimensions, weighted by how much each one decides whether the site is usable to an agent at all.

**Weights:**

- **Critical** (3pt × 2)
  - Content without JS — A fetch-only agent needs real text in the HTML body. JS-rendered shells are opaque without a browser.
  - Programmatic surface documented — If the HTML is unparseable, a documented API is the agent's only way in.
- **Important** (2pt × 4)
  - /llms.txt — Standard file that helps LLMs understand what the site is and where to find docs. 700+ sites have adopted it.
  - robots.txt allows AI crawlers — robots.txt doesn't block ClaudeBot, GPTBot, PerplexityBot etc. Some sites explicitly block all AI crawlers.
  - No CAPTCHA — No Cloudflare challenge, hCaptcha, or reCAPTCHA blocking automated access to public pages.
  - Bot policy / agent docs — Site or docs have explicit bot policy, agent quickstart, or tool integration docs. Concrete — not just "AI" in marketing copy.
- **Standard** (1pt × 9)
  - Markdown negotiation — When agent sends Accept: text/markdown, server returns clean markdown instead of HTML. Best agent onboarding path.
  - Structured metadata — Meta description or JSON-LD that tells an agent what the PM does, without parsing the full page.
  - Semantic HTML — Uses <nav>, <main>, <article> etc. so agents can understand page structure, not just a wall of <div>s.
  - Readable URLs — Market URLs like /markets/us-election vs opaque /markets/0x87c2a. Readable slugs let agents understand content from the URL alone.
  - Sitemap — XML sitemap helps agents discover all pages without crawling. Lists markets, docs, key content.
  - Proper 404s — Unknown URLs return HTTP 404, not 200 with an empty app shell. Agents need real status codes to navigate.
  - Developer docs exist — Any developer documentation at all — API guides, SDK references, integration docs.
  - Docs findable — An agent can find docs from the main site (linked in nav/footer). Fails if docs are only discoverable via Google.
  - Standard navigation — Homepage uses standard <a href> links, not JS-only routing. Agent can follow links without executing JavaScript.

**Grade adjustments:**

- Content without JS AND programmatic surface both fail → **max grade D**. No path for a fetch-only agent: the HTML is opaque and there is no documented API to fall back on.

**Grade buckets** (max 23):

- **A** (19–23) — Agent-ready
- **B** (14–18) — Agent-friendly
- **C** (9–13) — Agent-tolerable
- **D** (4–8) — Agent-hostile
- **F** (0–3) — Agent-opaque

_Why this formula:_ An autonomous agent has two ways to extract useful information from a site: parse the rendered HTML, or call a documented API. If neither path works, the site is closed to agents regardless of how clean the sitemap or URLs are. Content without JS and a documented programmatic surface carry 3 points each, and the grade is capped at D when both fail. The 2-point tier covers signals that determine how welcome an agent is once it gets in: llms.txt, AI-crawler permissions in robots.txt, the absence of public-page CAPTCHA, and a documented bot policy. The 1-point tier is hygiene that helps an agent navigate but does not gate access.

### CLI / MCP Test

Can an autonomous agent install this CLI or MCP tool and use it to place a real buy-and-sell trade with ~$1–2 of test funds? 18 checks across 4 sections, with trading checks worth double.

**Weights:**

- **Trading** (2pt × 5) — 10 of 23 total points come from this section.
  - Preview / dry-run — Cost estimate before execution (shares, fees).
  - Limit order + verify — Order placed, confirmed open in order list.
  - Cancel + verify — Order cancelled, absent from open orders.
  - Market buy + position — ~$1 buy fills, position visible with shares.
  - Sell + balance delta — Position closed, balance delta ≤5%.
- **Setup** (1pt × 4)
  - Install + version — Tool installs and runs. Version number confirmed.
  - Help / commands — Available commands listed with descriptions. JSON output flag exists.
  - Auth — Authentication succeeds. Subsequent commands work.
  - Balance — Returns structured balance data showing available funds.
- **Discovery** (1pt × 5)
  - List markets — Structured list of active markets with IDs, titles, prices.
  - Market detail — Single market with outcomes, prices, volume.
  - Orderbook quality — 5+ levels/side, monotonic prices, bid < ask.
  - Search — Keyword search returns relevant results.
  - Schema consistency — 3 different markets return same required fields.
- **Errors** (1pt × 4)
  - Insufficient balance — 100x order returns structured error, not silent failure.
  - Invalid inputs — 3 boundary tests return parseable errors.
  - Resolved market — Clear error for closed/resolved market.
  - Error recovery — Tool recovers from bad request, no state corruption.

**Grade adjustments:**

- Neither buy nor sell completes → **max grade D**. Tool cannot perform any trade. This is the explicit floor for autonomous-trading evaluation.
- Only one of buy / sell completes → **max grade C**. Half-trading. Agent can enter or exit but not cycle through positions.
- Both buy AND sell complete → **no cap**. Eligible for B / A based on the weighted score.
- A critical trading action only completes via VPN → **final grade −1 tier**. VPN-only trading works, but introduces operational complexity. Single penalty regardless of how many critical checks need a VPN.

**Grade buckets** (max 23):

- **A** (20–23) — Production-ready trading
- **B** (16–19) — Trades with minor gaps
- **C** (11–15) — Half-trading or partial
- **D** (6–10) — Discovery only, no trading
- **F** (0–5) — Cannot install or auth

_Why this formula:_ This benchmark exists to determine whether an autonomous agent can use the tool to trade. A tool that lists markets and handles errors well but cannot place a real buy and sell order has not demonstrated that capability, regardless of how many other checks it passes. Trading checks are weighted 2× (10 of 23 total points) and the grade is capped at D for any tool that cannot complete both a buy and a sell. VPN-only completion of a critical check is treated as a one-tier penalty rather than a hard cap, since the trade is real but the operational friction is meaningful for an agent running unattended.

### Skill Test

Can an autonomous agent complete a full trade cycle given only a SKILL.md file and wallet credentials? 8 milestones, weighted by how directly each one proves the agent traded.

**Weights:**

- **Critical** (3pt × 2)
  - Buy order (~$1) — The entry trade. The first milestone that proves the skill's instructions can drive a real on-chain action.
  - Sell / close position — The exit trade. Without it, the agent opens positions it cannot close, which is not autonomous trading.
- **Important** (2pt × 1)
  - Authenticate — Auth is the gate for every later milestone. If the skill's auth instructions don't work, no downstream milestone can pass.
- **Standard** (1pt × 5)
  - Understand skill & auth — Agent reads skill, identifies capabilities, auth method, SDKs, and API endpoints.
  - List markets — Find 3+ active markets with IDs and titles.
  - Market detail + prices — Get outcomes, probabilities, orderbook for a liquid market.
  - Verify position — Confirm position exists with correct market and shares.
  - Balance delta — Compare final balance to pre-trade. Delta should be small (spread + fees).

**Grade adjustments:**

- Neither buy nor sell milestone passes → **max grade D**. Skill cannot drive a trade end-to-end. No autonomous trading possible.
- Only one of buy / sell milestones passes → **max grade C**. Half-cycle skill. Agent can enter or exit but not cycle.
- Both buy AND sell milestones pass → **no cap**. Eligible for B / A based on score.
- A critical milestone only passes via VPN → **final grade −1 tier**. Same VPN penalty as CLI/MCP. Single penalty regardless of how many critical milestones need a VPN.

**Grade buckets** (max 13):

- **A** (11–13) — Production-ready skill
- **B** (8–10) — Round-trip with gaps
- **C** (5–7) — Half-cycle or partial
- **D** (1–4) — Cannot trade
- **F** (0) — Skill unreadable / unusable

_Why this formula:_ A skill is only useful if it lets an agent execute a complete trade cycle. Earlier milestones (reading the skill, finding markets, authenticating) are preparation; the milestones that prove the skill works are placing a position and closing it. Both are weighted at 3 points, and a skill that fails either critical milestone is capped at C; failing both caps at D. Auth is weighted at 2 points because every subsequent milestone depends on it. The VPN penalty mirrors CLI/MCP: VPN-only completion of a critical milestone counts toward eligibility but drops the final grade by one tier.

### Framework Assessment

How mature is this agentic framework for building autonomous PM agents? Five 0–3 quality dimensions, plus a type qualifier that distinguishes trading frameworks from forecasting libraries, plugins, and platform features.

**Weights:**

- **Maintenance** (0–3) — 0 abandoned (>6 mo) · 1 stale (3–6 mo) · 2 recent (1–3 mo) · 3 actively maintained (<30 days).
- **Adoption** (0–3) — 0 nobody · 1 single-digit signals · 2 real downloads / agents · 3 production agents at scale.
- **Completeness** (0–3) — 0 single connector · 1 partial pipeline · 2 full pipeline, single PM · 3 full pipeline, multi-PM.
- **Documentation** (0–3) — 0 README only · 1 setup but no examples · 2 setup + examples + reference · 3 tutorials + working agents.
- **Safety** (0–3) — 0 no guardrails · 1 error handling only · 2 position limits OR loss caps · 3 comprehensive guardrails.

**Type qualifier (badge, not score):**

- `trading` — Direct trading framework. The primary case the rubric is designed for.
- `forecasting` — Produces probability estimates only (e.g. Metaculus tools). Not penalized for not trading.
- `plugin` — Bridges to a host framework (e.g. ElizaOS plugin). Inherits the host's capabilities.
- `platform-feature` — Not a framework at all (e.g. Baozi Agent Arena leaderboard). Receives N/A.

**Grade buckets** (max 15):

- **Production** (13–15) — Real agents trading on this in prod today
- **Usable** (9–12) — Works, but has gaps
- **Experimental** (5–8) — Proof of concept or single-agent quality
- **Abandoned** (0–4) — Dead repo, no users, no path forward
- **N/A** (—) — Type-qualifier override (not a framework)

_Why this formula:_ Frameworks are systems that make decisions and execute, not tools an agent calls directly. Their production-readiness depends on multiple independent dimensions, and a high score on one cannot compensate for failure on another: a project with clean architecture but no maintenance does not become useful by virtue of its architecture alone. The five dimensions (maintenance, adoption, completeness, documentation, safety) are each scored 0–3 for a total of 15. The type qualifier is a separate tag (trading, forecasting, plugin, platform-feature) that sets the context in which a framework should be evaluated, so a forecasting library is not graded against trading frameworks. The qualifier does not change the dimension totals.

---

## For agents

No auth, no CORS, no rate limits. Plain HTTP GET.

- `https://sybil.exchange/agentic-research/data/summary.json` — authoritative tier assignments (A–E, hand-curated) + 6 findings
- `https://sybil.exchange/agentic-research/data/pms.json` — 27 PM entries with category, chain, volume, tool links
- `https://sybil.exchange/agentic-research/data/accessibility.json` — per-PM agent-accessibility grades + per-check evidence
- `https://sybil.exchange/agentic-research/data/tests.json` — CLI/MCP, skill, framework benchmark results
- `https://sybil.exchange/agentic-research/methodology/agent-accessibility.md`
- `https://sybil.exchange/agentic-research/methodology/cli-mcp-test.md`
- `https://sybil.exchange/agentic-research/methodology/skill-test.md`
- `https://sybil.exchange/agentic-research/methodology/framework-assessment.md`
- `https://sybil.exchange/agentic-research/llms.txt` — LLM-friendly index
- `https://sybil.exchange/agentic-research/index.md` — this file

---

Source: https://x.com/sybil_pm · Parent project: https://sybil.exchange