io.github.hidai25/evalview-mcp
Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.
Verdict not yet evaluated for this tool. The semantic screen takes adversarial cases first; coverage rolls out as the corpus expands (15/150 labels to graduation). The deterministic conformance probe is built but has not yet run on the public corpus, so a recorded verdict here is REVIEW or UNVERIFIED, never a clearing ALLOW. Until a verdict is recorded, an agent should treat this tool as not-yet-cleared and fall back to its own checks. Method: the eval, four-state verdict, honest limits.
Own this server? Screen its description →
OPENAI_API_KEYOpenAI API key for LLM-as-judge output quality scoring. Optional — deterministic tool/sequence evaluation works without it.
Focused MCP server for OpenAI image/audio generation (v2.0.0). Wraps endpoints via HAPI CLI.
Public MCP server for the LLM Search Engine
Audit your brand's visibility across ChatGPT, Gemini, Claude, Perplexity + 6 more engines.