Evaluate

The Evaluate node enables you to validate outcomes, perform safety checks, and add content moderation to your AI workflows. You can add multiple evaluations per node, with flexible configuration options and branching based on pass/fail status.

Key Features

AI-powered content analysis and validation
Multiple evaluation types: Prompt, Expression, and Function
Pre-built presets for common safety and quality checks
Configurable fail/flag triggers for workflow routing
Support for multiple evaluations per node

How to Add an Evaluate Node

Open your workflow in the workflow builder
Click the "Add Action" button
Navigate to the "AI" category
Select "Evaluate"
The node will be added to your canvas

Evaluation Types

When you add an evaluation, you first select the type. There are three types available:

Prompt Evaluation

AI-powered content analysis using LLM classification.

Click "Add Evaluation"
Select "Prompt" as the type
Choose a preset or select "Custom" to write your own instruction
Specify the variables to evaluate
Configure the fail trigger (Fail or Flag)
Set additional configuration:
- Select the model
- Set the confidence threshold
- Add a title and description
Click "Add Evaluation"

Expression Evaluation

Logic-based validation using expressions that evaluate to boolean.

Click "Add Evaluation"
Select "Expression" as the type
Enter your JSON expression
Configure the fail trigger (Fail or Flag)
Add a title and description
Click "Add Evaluation"

Function Evaluation

Custom code-based validation for complex logic.

Click "Add Evaluation"
Select "Function" as the type
Write your evaluation code
Configure the fail trigger (Fail or Flag)
Add a title and description
Click "Add Evaluation"

Available Presets

When using Prompt evaluations, you can choose from the following presets:

Preset

Description

Jailbreak Detection

Detects prompt injection and jailbreak attempts

NSFW Text

Detects sexual content, hate speech, violence, inappropriate material

Politeness

Checks if content maintains professional/polite tone

Contains PII

Detects personally identifiable information (names, emails, SSN, etc.)

Profanity Free

Checks for profanity or offensive language

Ground Truth

Validates response against provided ground truth/context

Factual Accuracy

Checks for potential hallucinations or unsupported claims

Brand Safety

Detects content that could harm brand reputation

Toxicity

Detects toxic, harmful, or abusive language

Off-Topic

Checks if response stays on topic relative to the query

Sentiment

Analyzes sentiment (can be configured for positive/negative/neutral)

Language Detection

Ensures content is in expected language(s)

Competitor Mention

Detects mentions of competitor brands/products

Legal Compliance

Flags potentially legally problematic content

Non-Empty Response

Validates that a response is not empty

Max/Min Length

Checks response length is within bounds

Valid JSON

Validates that output is valid JSON

Valid URL

Validates that output contains valid URLs

Valid Email

Validates that output contains valid email addresses

Number Range

Checks that numeric values fall within a specified range

Fail Triggers

Each evaluation has a fail trigger that determines what happens when the evaluation is triggered:

Fail: Routes the workflow to the fail edge (blocking)
Flag: Continues workflow execution but marks the evaluation as flagged (non-blocking)

How to Manage Evaluations

Once you've added evaluations to your node, you can:

Edit: Click on an evaluation to modify its configuration
Remove: Click the trash icon on an evaluation to delete it
Reorder: Drag evaluations to change their order

💡 You can add multiple evaluations to a single Evaluate node. All evaluations run in parallel.

Node Edges

The Evaluate node has two edges:

Pass edge (top): Continues workflow execution when all evaluations pass, or when evaluations with flag action trigger (non-blocking)
Fail edge (bottom): Routes execution when any evaluation with fail action triggers

Activity View

In the Activity view, you can see the evaluation status for each workflow run:

Status

Icon

Description

Pass

✓ (Green)

All evaluations passed

Flag

⚑ (Amber)

One or more evaluations flagged (non-blocking)

Fail

✕ (Red)

One or more evaluations failed (blocking)

Best Practices

Use descriptive titles and descriptions for your evaluations to improve traceability
Start with presets for common use cases, then customise as needed
Use "Flag" for quality monitoring and "Fail" for critical safety checks
Combine multiple evaluations to create comprehensive validation pipelines
Review flagged items in the Activity view to identify patterns and improve your workflows

Last updated 2 months ago

Was this helpful?

hashtagKey Features

hashtagHow to Add an Evaluate Node

hashtagEvaluation Types

hashtagPrompt Evaluation

hashtagExpression Evaluation

hashtagFunction Evaluation

hashtagAvailable Presets

hashtagFail Triggers

hashtagHow to Manage Evaluations

hashtagNode Edges

hashtagActivity View

hashtagBest Practices