← Index

io.github.cyanheads/evals-mcp-server

io.github.cyanheads/evals-mcp-server·v0.1.2·Other

Author verifiable eval records through a draft→review→revise→submit loop with enforced graders.

Trust verdict · v1 advisory · method
NOT YET SCREENEDno verdict on file

Verdict not yet evaluated for this tool. The semantic screen takes adversarial cases first; coverage rolls out as the corpus expands (15/150 labels to graduation). The deterministic conformance probe is built but has not yet run on the public corpus, so a recorded verdict here is REVIEW or UNVERIFIED, never a clearing ALLOW. Until a verdict is recorded, an agent should treat this tool as not-yet-cleared and fall back to its own checks. Method: the eval, four-state verdict, honest limits.

Own this server? Screen its description →

Environment variables
EVALS_DATA_DIR

Root folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.

EVALS_REQUIRE_CONFIRMATION

When 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.

EVALS_DEFAULT_LICENSE

Default metadata.license applied when a draft omits one.

EVALS_CAPTURE_DIR

Directory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.

MCP_LOG_LEVEL

Sets the minimum log level for output (e.g., 'debug', 'info', 'warn').

MCP_HTTP_HOST

The hostname for the HTTP server.

MCP_HTTP_PORT

The port to run the HTTP server on.

MCP_HTTP_ENDPOINT_PATH

The endpoint path for the MCP server.

MCP_AUTH_MODE

Authentication mode to use: 'none', 'jwt', or 'oauth'.

EVALS_DATA_DIR

Root folder for record JSON. The store manages drafts/, submitted/, and exports/ subdirs under it.

EVALS_REQUIRE_CONFIRMATION

When 'true', evals_submit_draft requests a human confirmation where the client supports elicitation.

EVALS_DEFAULT_LICENSE

Default metadata.license applied when a draft omits one.

EVALS_CAPTURE_DIR

Directory of framework-written tool-call captures; when set, captures EvalsIDs resolve to full dumps.

MCP_LOG_LEVEL

Sets the minimum log level for output (e.g., 'debug', 'info', 'warn').

MCP quality score · maturity, not trust · methodology
freshness
25
completeness
10
installability
25
documentation
15
stability
5
Alternatives in Other