Evaluate
The Evaluate node enables you to validate outcomes, perform safety checks, and add content moderation to your AI workflows. You can add multiple evaluations per node, with flexible configuration options and branching based on pass/fail status.
Key Features
AI-powered content analysis and validation
Multiple evaluation types: Prompt, Expression, and Function
Pre-built presets for common safety and quality checks
Configurable fail/flag triggers for workflow routing
Support for multiple evaluations per node
How to Add an Evaluate Node
Open your workflow in the workflow builder
Click the "Add Action" button
Navigate to the "AI" category
Select "Evaluate"
The node will be added to your canvas
Evaluation Types
When you add an evaluation, you first select the type. There are three types available:
Prompt Evaluation
AI-powered content analysis using LLM classification.
Click "Add Evaluation"
Select "Prompt" as the type
Choose a preset or select "Custom" to write your own instruction
Specify the variables to evaluate
Configure the fail trigger (Fail or Flag)
Set additional configuration:
Select the model
Set the confidence threshold
Add a title and description
Click "Add Evaluation"
Expression Evaluation
Logic-based validation using expressions that evaluate to boolean.
Click "Add Evaluation"
Select "Expression" as the type
Enter your JSON expression
Configure the fail trigger (Fail or Flag)
Add a title and description
Click "Add Evaluation"
Function Evaluation
Custom code-based validation for complex logic.
Click "Add Evaluation"
Select "Function" as the type
Write your evaluation code
Configure the fail trigger (Fail or Flag)
Add a title and description
Click "Add Evaluation"
Available Presets
When using Prompt evaluations, you can choose from the following presets:
Jailbreak Detection
Detects prompt injection and jailbreak attempts
NSFW Text
Detects sexual content, hate speech, violence, inappropriate material
Politeness
Checks if content maintains professional/polite tone
Contains PII
Detects personally identifiable information (names, emails, SSN, etc.)
Profanity Free
Checks for profanity or offensive language
Ground Truth
Validates response against provided ground truth/context
Factual Accuracy
Checks for potential hallucinations or unsupported claims
Brand Safety
Detects content that could harm brand reputation
Toxicity
Detects toxic, harmful, or abusive language
Off-Topic
Checks if response stays on topic relative to the query
Sentiment
Analyzes sentiment (can be configured for positive/negative/neutral)
Language Detection
Ensures content is in expected language(s)
Competitor Mention
Detects mentions of competitor brands/products
Legal Compliance
Flags potentially legally problematic content
Non-Empty Response
Validates that a response is not empty
Max/Min Length
Checks response length is within bounds
Valid JSON
Validates that output is valid JSON
Valid URL
Validates that output contains valid URLs
Valid Email
Validates that output contains valid email addresses
Number Range
Checks that numeric values fall within a specified range
Fail Triggers
Each evaluation has a fail trigger that determines what happens when the evaluation is triggered:
Fail: Routes the workflow to the fail edge (blocking)
Flag: Continues workflow execution but marks the evaluation as flagged (non-blocking)
How to Manage Evaluations
Once you've added evaluations to your node, you can:
Edit: Click on an evaluation to modify its configuration
Remove: Click the trash icon on an evaluation to delete it
Reorder: Drag evaluations to change their order
💡 You can add multiple evaluations to a single Evaluate node. All evaluations run in parallel.
Node Edges
The Evaluate node has two edges:
Pass edge (top): Continues workflow execution when all evaluations pass, or when evaluations with flag action trigger (non-blocking)
Fail edge (bottom): Routes execution when any evaluation with fail action triggers
Activity View
In the Activity view, you can see the evaluation status for each workflow run:
Pass
✓ (Green)
All evaluations passed
Flag
⚑ (Amber)
One or more evaluations flagged (non-blocking)
Fail
✕ (Red)
One or more evaluations failed (blocking)
Best Practices
Use descriptive titles and descriptions for your evaluations to improve traceability
Start with presets for common use cases, then customise as needed
Use "Flag" for quality monitoring and "Fail" for critical safety checks
Combine multiple evaluations to create comprehensive validation pipelines
Review flagged items in the Activity view to identify patterns and improve your workflows
Last updated
Was this helpful?