io.github.cyanheads/evals-mcp-server
Author verifiable eval records through a draft→review→revise→submit loop with enforced graders.
Verdict not yet evaluated for this tool. The semantic screen takes adversarial cases first; coverage rolls out as the corpus expands (15/150 labels to graduation). The deterministic conformance probe is built but has not yet run on the public corpus, so a recorded verdict here is REVIEW or UNVERIFIED, never a clearing ALLOW. Until a verdict is recorded, an agent should treat this tool as not-yet-cleared and fall back to its own checks. Method: the eval, four-state verdict, honest limits.
Own this server? Screen its description →
EVALS_DATA_DIRRoot folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.
EVALS_REQUIRE_CONFIRMATIONWhen 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.
EVALS_DEFAULT_LICENSEDefault metadata.license applied when a draft omits one.
EVALS_CAPTURE_DIRDirectory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.
MCP_LOG_LEVELSets the minimum log level for output (e.g., 'debug', 'info', 'warn').
MCP_HTTP_HOSTThe hostname for the HTTP server.
MCP_HTTP_PORTThe port to run the HTTP server on.
MCP_HTTP_ENDPOINT_PATHThe endpoint path for the MCP server.
MCP_AUTH_MODEAuthentication mode to use: 'none', 'jwt', or 'oauth'.
EVALS_DATA_DIRRoot folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.
EVALS_REQUIRE_CONFIRMATIONWhen 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.
EVALS_DEFAULT_LICENSEDefault metadata.license applied when a draft omits one.
EVALS_CAPTURE_DIRDirectory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.
MCP_LOG_LEVELSets the minimum log level for output (e.g., 'debug', 'info', 'warn').
AI-powered trading strategy development: backtesting, market data, and portfolio analysis
Feature flagging and A/B testing platform with AI-first experimentation workflows.
Query Meta Ads performance data — accounts, campaigns, ad sets, ads, metrics & settings.