Show HN: Rhesis AI - Multimodal test cases for agentic evals

Hey HN, Nicolai here, co-founder of Rhesis AI.

Most eval frameworks were designed when LLM inputs were text strings. That assumption breaks fast once your AI agent handles boarding passes, invoices, audio recordings or support screenshots. Text-only test cases become workarounds. So we added multimodal support to Rhesis: attach a file to a test case, run it, evaluate the response. Simple on the surface. Two non-obvious problems underneath.

Normalizing file delivery across endpoints: Rhesis sends test cases to application endpoints, not directly to LLM providers. Applications implement file handling very differently: base64, URLs, multipart form data, varying MIME type support. We built an abstraction layer that normalizes this without breaking existing integrations.

Handling files across three platform contexts: A file attached to a test case needs to work in simulation, in the review UI, and in trace rendering. Each context fetches, stores, and renders files differently. Getting that seamless took more wiring than expected.

One thing worth flagging for voice agent builders Full voice support introduces an extra evaluation layer that image and document evals don't have. Once you add STT or TTS to the pipeline, you're evaluating two things: the transcription layer and the agent response. Most eval frameworks collapse those. We're still working out how to surface that separation cleanly. Curious if anyone here has dealt with it.

MIT licensed. You can try it at app.rhesis.ai or dig into the implementation on GitHub: https://github.com/rhesis-ai/rhesis | Short feature demo: https://youtu.be/odq3GW5qspY

3 points | by nicolaib 1 hour ago

2 comments