Getting Started
Installation
pip install ape-corePreparing the prompt template
Prepare your prompt template as a Prompt object.
Brackets {var_1} can be used as placeholders for variables.
from ape.common import Prompt
student_prompt = Prompt(
name="MathSolver",
messages=[
{"role": "system", "content": "Solve math problem."},
{"role": "user", "content": "{problem}"},
],
model="gpt-4o-mini",
temperature=0.0,
response_format=None,
)Preparing the dataset
Prepare your dataset as a list of DatasetItems (which is a TypedDict).
from ape.common import DatasetItem
dataset = [
DatasetItem(inputs={"problem": "1+1"}, outputs="2"),
]Preparing the generator and metric
By default, Ape uses Generator under the hood, which uses LiteLLM to support multiple models.
By subclassing BaseGenerator, you can implement your own generator to support custom use cases (e.g. Langchain, OpenAI Assistants API).
Ape provides multiple metrics out of the box (CosineSimilarityMetric, JsonMatchMetric, SemanticF1Metric), but you can also subclass BaseMetric to implement your own metric.
from ape.common import BaseGenerator, BaseMetric, DatasetItem, MetricResult, GlobalMetricResult
from openai import AsyncOpenAI
openai = AsyncOpenAI()
class MathSolver(BaseGenerator):
async def generate(
self,
prompt: Prompt,
inputs: Dict[str, Any],
) -> Union[Dict[str, Any], str]:
# Call your LLM here
output = {
"thought": "...",
"answer": "...",
}
return output
class ExactMatchMetric(BaseMetric):
async def compute(
self,
dataset_item: DatasetItem,
pred: Union[Dict[str, Any], str],
) -> MetricResult:
if dataset_item["outputs"]["answer"] == pred["answer"]:
return MetricResult(
score=1.0,
)
else:
return MetricResult(
score=0.0,
)You can further configure the evaluation by subclassing BaseGlobalMetric. By default, Ape uses AverageGlobalMetric, which computes the average of all the metric results as the final score.
Below is an example of a custom metric that computes the recall score.
from ape.common import BaseMetric, BaseGlobalMetric, DatasetItem, MetricResult, GlobalMetricResult
class RecallMetric(BaseMetric):
async def compute(
self,
dataset_item: DatasetItem,
pred: Dict[str, Any], # pred = {"predictions": [a,b,c,...]}
) -> MetricResult:
num_correct = 0
for item in pred["predictions"]:
if item in dataset_item["outputs"]["predictions"]:
num_correct += 1
recall = num_correct / len(dataset_item["outputs"]["predictions"]) if dataset_item["outputs"]["predictions"] else 0
return MetricResult(
score=recall,
trace={"num_items": len(dataset_item["outputs"]["predictions"])},
)
class MICRORecallGlobalMetric(BaseGlobalMetric):
async def compute(
self,
results: List[MetricResult],
) -> GlobalMetricResult:
total_num_items = 0
total_num_correct = 0
for result in results:
total_num_items += result.trace["num_items"]
total_num_correct += result.score * result.trace["num_items"]
return GlobalMetricResult(
score=total_num_correct / total_num_items,
)Optimization
Select a trainer and optimize your prompt.
from ape.trainer import DSPyMiproTrainer
trainer = DSPyMiproTrainer(
generator=MathSolver(),
metric=ExactMatchMetric(),
)
optimized_prompt, report = await trainer(prompt=student_prompt, trainset=dataset, valset=[])optimized_prompt is a Prompt object that contains the optimized prompt.
report is a Report object that contains the training process and results.