Evaluator

Overview

The Evaluator class provides a framework for evaluating prompts using a given testset, metric, and global metric. It handles the evaluation process, including error handling, progress tracking, and result presentation.

Methods

__call__

Asynchronous method to evaluate a prompt using the configured testset and metrics.

Parameters:

  • prompt (Prompt): Prompt object to be evaluated.
  • testset (Optional[List[DatasetItem]]): Optional testset to override the configured one.
  • display_progress (Optional[bool]): Whether to display a progress bar.
  • display_table (Optional[Union[bool, int]]): Whether and how to display results table.
  • max_errors (Optional[int]): Maximum number of errors allowed.
  • batch_size (Optional[int]): Number of concurrent tasks.
  • return_only_score (Optional[bool]): Whether to return only the final score.
  • **kwargs: Additional keyword arguments.

Returns: Union[float, Tuple[List[Union[Dict, str]], List[MetricResult], GlobalMetricResult]]: Evaluation results, which can be a single score or more detailed results depending on configuration.