Getting Started
Installation
pip install ape-core
Preparing the prompt template
Prepare your prompt template as a Prompt
object.
Brackets {var_1}
can be used as placeholders for variables.
from ape.common import Prompt
student_prompt = Prompt(
name="MathSolver",
messages=[
{"role": "system", "content": "Solve math problem."},
{"role": "user", "content": "{problem}"},
],
model="gpt-4o-mini",
temperature=0.0,
response_format=None,
)
Preparing the dataset
Prepare your dataset as a list of DatasetItem
s (which is a TypedDict).
from ape.common import DatasetItem
dataset = [
DatasetItem(inputs={"problem": "1+1"}, outputs="2"),
]
Preparing the generator and metric
By default, Ape uses Generator
under the hood, which uses LiteLLM to support multiple models.
By subclassing BaseGenerator
, you can implement your own generator to support custom use cases (e.g. Langchain, OpenAI Assistants API).
Ape provides multiple metrics out of the box (CosineSimilarityMetric
, JsonMatchMetric
, SemanticF1Metric
), but you can also subclass BaseMetric
to implement your own metric.
from ape.common import BaseGenerator, BaseMetric, DatasetItem, MetricResult, GlobalMetricResult
from openai import AsyncOpenAI
openai = AsyncOpenAI()
class MathSolver(BaseGenerator):
async def generate(
self,
prompt: Prompt,
inputs: Dict[str, Any],
) -> Union[Dict[str, Any], str]:
# Call your LLM here
output = {
"thought": "...",
"answer": "...",
}
return output
class ExactMatchMetric(BaseMetric):
async def compute(
self,
dataset_item: DatasetItem,
pred: Union[Dict[str, Any], str],
) -> MetricResult:
if dataset_item["outputs"]["answer"] == pred["answer"]:
return MetricResult(
score=1.0,
)
else:
return MetricResult(
score=0.0,
)
You can further configure the evaluation by subclassing BaseGlobalMetric
. By default, Ape uses AverageGlobalMetric
, which computes the average of all the metric results as the final score.
Below is an example of a custom metric that computes the recall score.
from ape.common import BaseMetric, BaseGlobalMetric, DatasetItem, MetricResult, GlobalMetricResult
class RecallMetric(BaseMetric):
async def compute(
self,
dataset_item: DatasetItem,
pred: Dict[str, Any], # pred = {"predictions": [a,b,c,...]}
) -> MetricResult:
num_correct = 0
for item in pred["predictions"]:
if item in dataset_item["outputs"]["predictions"]:
num_correct += 1
recall = num_correct / len(dataset_item["outputs"]["predictions"]) if dataset_item["outputs"]["predictions"] else 0
return MetricResult(
score=recall,
trace={"num_items": len(dataset_item["outputs"]["predictions"])},
)
class MICRORecallGlobalMetric(BaseGlobalMetric):
async def compute(
self,
results: List[MetricResult],
) -> GlobalMetricResult:
total_num_items = 0
total_num_correct = 0
for result in results:
total_num_items += result.trace["num_items"]
total_num_correct += result.score * result.trace["num_items"]
return GlobalMetricResult(
score=total_num_correct / total_num_items,
)
Optimization
Select a trainer and optimize your prompt.
from ape.trainer import DSPyMiproTrainer
trainer = DSPyMiproTrainer(
generator=MathSolver(),
metric=ExactMatchMetric(),
)
optimized_prompt, report = await trainer(prompt=student_prompt, trainset=dataset, valset=[])
optimized_prompt
is a Prompt
object that contains the optimized prompt.
report
is a Report
object that contains the training process and results.