Download Maven - AI Evals For Engineers & PMs For Free

By Hamel Husain and Shreya Shankar
Stop guessing if your AI works. Build the feedback loops that make it better.
🚨 New: September 2026 cohort’s material is completely refreshed to cover advancements in the field 🚨
All students get:
♾️ Unlimited access to future cohorts & office hours: never worry about timing or missing new material.
🗄️ Lifetime access to all materials!
🤖 6 months of unlimited access to our new AI Eval Assistant (more info below).
🧑🏫 10+ hours of office hours to maximize the value of live interaction.
🏫 A Discord community with continuous access to instructors to get unstuck (even after the course!).
---
Do you catch yourself asking any of the following questions while building AI applications?
1. How do I test applications when the outputs require subjective judgements?
2. If I change the prompt, how do I know I’m not breaking something else?
3. Where should I focus my engineering efforts? Do I need to test everything?
4. What if I have no data or customers, where do I start?
5. What metrics should I track? What tools should I use?
6. Can I automate testing and evaluation? If so, how do I trust it?
If so, this course is for you.
All sessions are live and recorded.
What you’ll learn
Learn proven approaches for quickly improving AI applications. Build AI that works better than the competition, regardless of the use-case.
How To Collect Data For Evals
Understand instrumentation and observability for tracking system behavior.
Learn approaches for generating synthetic data to maximize error discovery and bootstrap product development.
Understand how to choose the right tools and vendors for you, with deep dives into the most popular solutions in the evals space.
Get immediate clarity and direction with Error Analysis
Apply data analysis techniques to rapidly find systematic issues in your product regardless of the use case.
Master the processes and tools to annotate and analyze data quickly and efficiently.
Learn how to analyze agentic systems (tool calls, RAG, etc.) to quickly identify systematic patterns and errors.
Implement Effective Evaluations
Create evals that are customized to your product and provide immediate value, NOT generic off the shelf evals (which do not work).
Align evals with stakeholders & domain experts that allow you to scientifically trust the evals.
Create high-quality LLM-as-a-judge and code based evals with a systematic, iterative process.
Master Architecture-Specific Eval Strategies
Learn how to measure & debug RAG systems for retrieval relevance and factual accuracy.
Understand how to tame multi-step pipelines to identify error propagation and root-causes of errors quickly.
Master techniques that apply to multi-modal settings, including text, image, and audio interactions.
Run Evals In Production
Learn how to set up automated evaluation gates in CI/CD pipelines.
Understand methods for consistent comparison across experiments, including how to prepare and maintain datasets to prevent overfitting.
Implement safety and quality control guardrails.
Ensure That Evals Lead To High ROI
Develop a strong intuition of when to write an eval, and when NOT write an eval.
Learn how to design interfaces to remove friction from reviewing data and collect higher quality data with less effort.
Learn how to avoid common pitfalls surrounding team organization, collaboration, responsibilities, tools, automation, and metrics.