io.github.cyanheads/evals-mcp-server

io.github.cyanheads/evals-mcp-server·v0.1.2·Other

Author verifiable eval records through a draft→review→revise→submit loop with enforced graders.

Trust verdict · v1 advisory · method

NOT YET SCREENEDno verdict on file

Verdict not yet evaluated for this tool. The semantic screen takes adversarial cases first; coverage rolls out as the corpus expands (15/150 labels to graduation). The deterministic conformance probe is built but has not yet run on the public corpus, so a recorded verdict here is REVIEW or UNVERIFIED, never a clearing ALLOW. Until a verdict is recorded, an agent should treat this tool as not-yet-cleared and fall back to its own checks. Method: the eval, four-state verdict, honest limits.

Own this server? Screen its description →

Environment variables

EVALS_DATA_DIR

Root folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.

EVALS_REQUIRE_CONFIRMATION

When 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.

EVALS_DEFAULT_LICENSE

Default metadata.license applied when a draft omits one.

EVALS_CAPTURE_DIR

Directory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.

MCP_LOG_LEVEL

Sets the minimum log level for output (e.g., 'debug', 'info', 'warn').

MCP_HTTP_HOST

The hostname for the HTTP server.

MCP_HTTP_PORT

The port to run the HTTP server on.

MCP_HTTP_ENDPOINT_PATH

The endpoint path for the MCP server.

MCP_AUTH_MODE

Authentication mode to use: 'none', 'jwt', or 'oauth'.

EVALS_DATA_DIR

Root folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.

EVALS_REQUIRE_CONFIRMATION

When 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.

EVALS_DEFAULT_LICENSE

Default metadata.license applied when a draft omits one.

EVALS_CAPTURE_DIR

Directory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.

MCP_LOG_LEVEL

Sets the minimum log level for output (e.g., 'debug', 'info', 'warn').

MCP quality score · maturity, not trust · methodology

freshness

completeness

installability

documentation

stability

Alternatives in Other

agency.lona/trading

AI-powered trading strategy development: backtesting, market data, and portfolio analysis

ABMeter

ai.abmeter/abmeter

Feature flagging and A/B testing platform with AI-first experimentation workflows.

AdAdvisor MCP Server

ai.adadvisor/mcp-server

Query Meta Ads performance data — accounts, campaigns, ad sets, ads, metrics & settings.

Install

Claude Desktop (claude_desktop_config.json)

{
  "mcpServers": {
    "evals-mcp-server": {
      "command": "npx",
      "args": [
        "-y",
        "@cyanheads/evals-mcp-server"
      ],
      "env": {
        "EVALS_DATA_DIR": "./evals-data",
        "EVALS_REQUIRE_CONFIRMATION": "false",
        "EVALS_DEFAULT_LICENSE": "<evals_default_license>",
        "EVALS_CAPTURE_DIR": "<evals_capture_dir>",
        "MCP_LOG_LEVEL": "info",
        "MCP_HTTP_HOST": "127.0.0.1",
        "MCP_HTTP_PORT": "3010",
        "MCP_HTTP_ENDPOINT_PATH": "/mcp",
        "MCP_AUTH_MODE": "none"
      }
    }
  }
}

Cursor (.cursor/mcp.json)

{
  "mcpServers": {
    "evals-mcp-server": {
      "command": "npx",
      "args": [
        "-y",
        "@cyanheads/evals-mcp-server"
      ],
      "env": {
        "EVALS_DATA_DIR": "./evals-data",
        "EVALS_REQUIRE_CONFIRMATION": "false",
        "EVALS_DEFAULT_LICENSE": "<evals_default_license>",
        "EVALS_CAPTURE_DIR": "<evals_capture_dir>",
        "MCP_LOG_LEVEL": "info",
        "MCP_HTTP_HOST": "127.0.0.1",
        "MCP_HTTP_PORT": "3010",
        "MCP_HTTP_ENDPOINT_PATH": "/mcp",
        "MCP_AUTH_MODE": "none"
      }
    }
  }
}

Cline (cline_mcp_settings.json)

npx -y @cyanheads/evals-mcp-server

Verdict API

curl -s mcpindex.ai/api/v1/trust/server/io-github-cyanheads-evals-mcp-server

Free-tier verdict as JSON: decision + dimensions + severity. Call it from your agent before it invokes a tool it just discovered.

Details

version: v0.1.2
category: Other
quality: 80 / 100
operator: cyanheads

Provenance

Verdict history is anchored to Bitcoin via OpenTimestamps. The free tier returns the current verdict only; the anchored record is served on the authenticated tier.

Links

Repository →