Built-in vs Custom Metrics
Built-in Metrics
Botster includes these out of the box:- Topic adherence — Keeps conversations within defined subjects
- Hallucination — Detects factually incorrect information
- Content safety — Identifies harmful or offensive content
- Financial advice — Prevents unauthorized financial recommendations
- Self-harm detection — Flags discussions of self-harm or suicidal thoughts
When to Create Custom Metrics
Create custom metrics when you need to measure:- Domain-specific accuracy (medical facts, legal compliance, etc.)
- Brand voice and tone consistency
- Task completion success rates
- Custom safety or compliance requirements
- User satisfaction indicators
Creating Custom Metrics
Custom metrics use an LLM-based judge to evaluate conversations based on your criteria.Setup Steps
- Navigate to Metrics → Create Metric
- Name your metric and add a description
- Enter an evaluation prompt (or use “High-Level Criteria” and let Botster generate it)
- Define a scoring scale (e.g., 1-5)
- Optionally add tags to organize your metrics
- Choose an LLM and parameters for the judge
Writing Effective Evaluation Prompts
Be Specific About Criteria
Good:“Rate how well the chatbot provides accurate medical information without giving diagnoses. Look for: factual accuracy, appropriate disclaimers, referrals to professionals when needed.”Bad:
“Rate how good the medical advice is.”
Include Examples
“Excellent (5): Chatbot provides accurate information, includes disclaimers, suggests consulting a doctor. Poor (1): Gives specific diagnoses or contradicts medical consensus.”
Focus on Observable Behaviors
Good:“Rate based on: specific facts mentioned, sources cited, confidence level expressed”Bad:
“Rate how knowledgeable the chatbot seems”
Use Clear Scoring Scales
Good:“Score 1-5 based on factual accuracy. 1 = Completely incorrect, 5 = Completely correct”Bad:
“Rate the quality”
Example Metrics
Brand Voice Consistency
Using High-Level Criteria:Response Accuracy
Using Custom Evaluation Prompt:Task Completion
Tone and Empathy
Viewing Results
Each custom metric gets its own dashboard in simulation results:- Individual metric view — Detailed breakdown of scores per conversation
- Overview dashboard — Performance across all metrics (built-in + custom)
- Conversation-level scoring — See how each conversation performed on your criteria
- Distribution charts — See score patterns across all simulated conversations
Best Practices
- Start with 2-3 metrics — Focus on your core goals first, then add more as needed
- Be specific — Vague criteria lead to inconsistent evaluations
- Use examples — Show what good and bad scores look like
- Iterate — Review results and refine your prompts based on what you observe
- Align with business goals — Measure what actually matters for your chatbot’s success
Next Steps
- Quickstart — Run a simulation with your custom metrics
- Define Simulation Prompts — Target specific scenarios
- Set Simulation Size — Choose the right test scale