Tracking Hook¶
Activated when MLFLOW_TRACKING_URI is set.
The tracking hook creates hierarchical MLflow runs mirroring the evaluation structure.
Uses MlflowClient API for full isolation from user MLflow state. Thread-safe
for concurrent sample processing.
Features¶
Parent run per eval invocation with nested child runs per task
Task configuration logged as parameters
Per-sample scores as step metrics
Model token usage (input/output/total per model)
Real-time event counting (model calls, tool calls)
Eval artifacts: per-sample results JSON + full eval log JSON
Additional rich table artifacts under
inspect/*.json(tasks, samples, messages, sample scores, events, model usage)Trace assessments: eval scores logged via
mlflow.log_feedback()Optional provider autolog integration for LLM SDKs
Async logging for reduced hook latency
Thread-safe counters for concurrent samples
Configuration¶
Env var |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
– |
MLflow server URL |
|
No |
|
Experiment name |
|
No |
|
Log eval artifacts |
|
No |
|
Same as above (new prefix, takes priority) |
|
No |
|
Enable MLflow provider autolog integrations |
|
No |
|
CSV or JSON array of providers to autolog |
Supported provider integrations: openai, anthropic, langchain, litellm,
mistral, groq, cohere, gemini, bedrock.
Providers are enabled only when both the MLflow flavor module and provider SDK are present.
Artifacts¶
With artifact logging enabled, the tracking hook writes the following artifacts:
inspect/tasks.jsoninspect/samples.jsoninspect/messages.jsoninspect/sample_scores.jsoninspect/events.jsoninspect/model_usage.jsonsample_results/*.jsoneval_logs/*.json
API Reference¶
MLflow Tracking hook for Inspect AI.
Logs evaluation runs, task configurations, sample scores, and model usage to an MLflow tracking server. Creates a parent run per eval run with nested child runs per task.
Uses MlflowClient API to avoid contaminating global mlflow state, so user code that calls mlflow.start_run() independently will not conflict.
Activated automatically when MLFLOW_TRACKING_URI is set.
- class inspect_mlflow.tracking.MlflowTrackingHooks¶
Tracks Inspect AI evaluations in MLflow with hierarchical runs.
Uses MlflowClient API for isolation from user mlflow state.
- property artifact_manager: ArtifactManager¶
- property client: MlflowClient¶
- enabled() bool¶
Check if the hook should be enabled.
Default implementation returns True.
Hooks may wish to override this to e.g. check the presence of an environment variable or a configuration setting.
Will be called frequently, so consider caching the result if the computation is expensive.
- async on_model_usage(data: ModelUsageData) None¶
Called when a call to a model’s generate() method completes successfully without hitting Inspect’s local cache.
Note that this is not called when Inspect’s local cache is used and is a cache hit (i.e. if no external API call was made). Provider-side caching will result in this being called.
- Parameters:
data – Model usage data.
- async on_run_start(data: RunStart) None¶
On run start.
A “run” is a single invocation of eval() or eval_retry() which may contain many Tasks, each with many Samples and many epochs. Note that eval_retry() can be invoked multiple times within an eval_set().
- Parameters:
data – Run start data.
- async on_sample_end(data: SampleEnd) None¶
On sample end.
Called when a sample has either completed successfully, or when a sample has errored and has no retries remaining.
If a sample is run for multiple epochs, this will be called once per epoch.
- Parameters:
data – Sample end data.
- async on_sample_event(data: SampleEvent) None¶
On sample event.
Called when a sample event is emmitted. Pending events are not logged here (i.e. ToolEvent and ModelEvent are not logged until they are complete).
- Parameters:
data – Sample event.
- property settings: MLflowSettings¶