Most Lifelike Avatar
Best Humor Performance
1
Hume Octave TTS
39%
2
Eleven Turbo v2.5
23%
3
PlayAI Dialog 1.0
14%
4
Google Chirp3
10%
Most Cinematic VLM
Once a research question is prioritized, we design a framework in partnership with top subject matter experts and ML researchers.
From a pool of thousands, we select the best experts to rate model outputs.
Summary insights are published and regularly refreshed here. For more detailed insights, or to be included in one of our upcoming studies, reach out to [email protected].
We are a group of researchers, creatives, and technologists mostly in awe of AI, with some notes on its aesthetic taste. The best reasoning models can pass the bar and crack novel math problems, and yet we still don’t find them capable of truly good writing … or humor. We’re here to change that.
Humor is perhaps the most anthropic feat. It’s incredibly contextual and complex across timing, emphasis, and emotional nuance. It’s a strong signal of broader intelligence. AI voice is booming as teams adopt solutions for customer support, accessibility, media, marketing, education, and more. So we figured, let’s start there! Benchmarking speech humor feels both useful and fun. (Not to mention our team has deep roots in audio with our Spotify co-founder, so where better to start?)
If you are one of the model companies we included in our research, contact us from a company email and we’ll provide more detailed research from our evaluators. If you are a researcher or developer, reach out with a short note and we’ll share as much as we can.
We’re already scoping out new evaluations across text, audio, image, and video, focusing on real-world use cases such as creative writing, drone footage, and animation. If you feel passionately about a creative use case we should consider, or have any interest in collaborating, please reach out.
Absolutely. We’ve worked with model developers extensively on the data supply side, and would be happy to collaborate on your unique creative eval use cases. We can help streamline the process of working with human experts in your domain and provide our expertise on different framework and surveying options. Get in touch with us here.
Yes yes yes. Please follow us on LinkedIn, where we list evaluator roles.
Always! Reach out to us here.