Evaluate

The Evaluate node enables you to validate outcomes, perform safety checks, and add content moderation to your AI workflows. You can add multiple evaluations per node, with flexible configuration options and branching based on pass/fail status.

Key Features

  • AI-powered content analysis and validation

  • Multiple evaluation types: Prompt, Expression, and Function

  • Pre-built presets for common safety and quality checks

  • Configurable fail/flag triggers for workflow routing

  • Support for multiple evaluations per node

How to Add an Evaluate Node

  1. Open your workflow in the workflow builder

  2. Click the "Add Action" button

  3. Navigate to the "AI" category

  4. Select "Evaluate"

  5. The node will be added to your canvas

Evaluation Types

When you add an evaluation, you first select the type. There are three types available:

Prompt Evaluation

AI-powered content analysis using LLM classification.

  1. Click "Add Evaluation"

  2. Select "Prompt" as the type

  3. Choose a preset or select "Custom" to write your own instruction

  4. Specify the variables to evaluate

  5. Configure the fail trigger (Fail or Flag)

  6. Set additional configuration:

    • Select the model

    • Set the confidence threshold

    • Add a title and description

  7. Click "Add Evaluation"

Expression Evaluation

Logic-based validation using expressions that evaluate to boolean.

  1. Click "Add Evaluation"

  2. Select "Expression" as the type

  3. Enter your JSON expression

  4. Configure the fail trigger (Fail or Flag)

  5. Add a title and description

  6. Click "Add Evaluation"

Function Evaluation

Custom code-based validation for complex logic.

  1. Click "Add Evaluation"

  2. Select "Function" as the type

  3. Write your evaluation code

  4. Configure the fail trigger (Fail or Flag)

  5. Add a title and description

  6. Click "Add Evaluation"

Available Presets

When using Prompt evaluations, you can choose from the following presets:

Preset
Description

Jailbreak Detection

Detects prompt injection and jailbreak attempts

NSFW Text

Detects sexual content, hate speech, violence, inappropriate material

Politeness

Checks if content maintains professional/polite tone

Contains PII

Detects personally identifiable information (names, emails, SSN, etc.)

Profanity Free

Checks for profanity or offensive language

Ground Truth

Validates response against provided ground truth/context

Factual Accuracy

Checks for potential hallucinations or unsupported claims

Brand Safety

Detects content that could harm brand reputation

Toxicity

Detects toxic, harmful, or abusive language

Off-Topic

Checks if response stays on topic relative to the query

Sentiment

Analyzes sentiment (can be configured for positive/negative/neutral)

Language Detection

Ensures content is in expected language(s)

Competitor Mention

Detects mentions of competitor brands/products

Legal Compliance

Flags potentially legally problematic content

Non-Empty Response

Validates that a response is not empty

Max/Min Length

Checks response length is within bounds

Valid JSON

Validates that output is valid JSON

Valid URL

Validates that output contains valid URLs

Valid Email

Validates that output contains valid email addresses

Number Range

Checks that numeric values fall within a specified range

Fail Triggers

Each evaluation has a fail trigger that determines what happens when the evaluation is triggered:

  • Fail: Routes the workflow to the fail edge (blocking)

  • Flag: Continues workflow execution but marks the evaluation as flagged (non-blocking)

How to Manage Evaluations

Once you've added evaluations to your node, you can:

  • Edit: Click on an evaluation to modify its configuration

  • Remove: Click the trash icon on an evaluation to delete it

  • Reorder: Drag evaluations to change their order

💡 You can add multiple evaluations to a single Evaluate node. All evaluations run in parallel.

Node Edges

The Evaluate node has two edges:

  • Pass edge (top): Continues workflow execution when all evaluations pass, or when evaluations with flag action trigger (non-blocking)

  • Fail edge (bottom): Routes execution when any evaluation with fail action triggers

Activity View

In the Activity view, you can see the evaluation status for each workflow run:

Status
Icon
Description

Pass

✓ (Green)

All evaluations passed

Flag

⚑ (Amber)

One or more evaluations flagged (non-blocking)

Fail

✕ (Red)

One or more evaluations failed (blocking)

Best Practices

  • Use descriptive titles and descriptions for your evaluations to improve traceability

  • Start with presets for common use cases, then customise as needed

  • Use "Flag" for quality monitoring and "Fail" for critical safety checks

  • Combine multiple evaluations to create comprehensive validation pipelines

  • Review flagged items in the Activity view to identify patterns and improve your workflows

Last updated

Was this helpful?