Storage¶

metalab supports pluggable storage backends with filesystem as the default and PostgreSQL for query acceleration on large-scale experiments.

Quick Start¶

import metalab

# Simple path - creates FileStore at that location
handle = metalab.run(exp, store="./experiments")

# Results are stored under:
handle.store.get_working_directory()  # ./experiments/<safe_experiment_id>/

Store Configuration¶

Stores are configured via StoreConfig dataclasses, which are serializable and can be passed around before connecting:

from metalab.store import FileStoreConfig

# Create a config (doesn't connect yet)
config = FileStoreConfig(root="./experiments")

# Scope to an experiment (creates subdirectory)
scoped = config.scoped("my_exp:1.0")

# Connect to get a usable store
store = scoped.connect()

Experiment Scoping¶

When you call metalab.run(), the store is automatically scoped to the experiment:

# These are equivalent:
metalab.run(exp, store="./experiments")
metalab.run(exp, store=FileStoreConfig(root="./experiments"))

# Data ends up in: ./experiments/my_exp_1.0/

FileStore (default)¶

FileStore is the source of truth for all data. It stores everything on the local filesystem in a well-organized layout.

from metalab.store import FileStoreConfig

# Path string - this is the collection root
metalab.run(exp, store="./experiments")
# Data stored in: ./experiments/<safe_experiment_id>/

# file:// URI 
metalab.run(exp, store="file:///shared/experiments")

# Explicit config (equivalent)
config = FileStoreConfig(root="./experiments")
metalab.run(exp, store=config)

The path you provide is the collection root—a directory that can hold multiple experiments. The runner automatically scopes storage to your experiment's ID, creating a subdirectory like my_exp_1.0/.

Layout¶

{store_root}/
├── runs/{run_id}.json           # Run records (versioned JSON)
├── derived/{run_id}.json        # Derived metrics
├── artifacts/{run_id}/          # Artifact files + manifest
├── logs/{run_id}_{name}.log     # Log files
├── results/{run_id}/{name}.json # Structured results
├── experiments/{exp_id}_{ts}.json # Experiment manifests
└── _meta.json                   # Store metadata

PostgresStore (optional)¶

PostgresStore wraps a FileStore with a PostgreSQL query index for fast lookups and filtering. Files remain the source of truth—Postgres accelerates queries.

from metalab.store import PostgresStoreConfig

# PostgresStore requires file_root for logs/artifacts
config = PostgresStoreConfig(
    connection_string="postgresql://user@localhost/metalab",
    file_root="/shared/experiments",
)
metalab.run(exp, store=config)

# Or via URI with file_root parameter
metalab.run(
    exp,
    store="postgresql://user@localhost/metalab?file_root=/shared/experiments",
)

Install support:

uv add metalab[postgres]

Architecture¶

PostgresStore = FileStore (source of truth) + PostgresIndex (query acceleration)

Component	Role	Responsibilities
FileStore	Source of truth	Run records Artifacts Logs Structured data
PostgresIndex	Query acceleration	Fast record lookups Experiment filtering Field catalog (Atlas) Derived metrics index

Key principle: All data writes go to FileStore first (permanent), then indexed in Postgres (ephemeral). If Postgres is lost, call rebuild_index() to restore from files.

Index Rebuild¶

If your Postgres database is wiped or out of sync:

from metalab.store import PostgresStoreConfig

config = PostgresStoreConfig(
    connection_string="postgresql://localhost/db",
    file_root="/path/to/files",
)
store = config.connect()
store.rebuild_index()  # Re-indexes all records from FileStore

Working with Existing FileStores¶

from metalab.store import FileStoreConfig, PostgresStore

# Wrap an existing FileStore with Postgres indexing
filestore = FileStoreConfig(root="/path/to/existing/store").connect()
pg_store = PostgresStore.from_filestore(
    "postgresql://localhost/db",
    filestore,
    rebuild=True,  # Index existing records
)

# Export back to standalone FileStore
exported = pg_store.to_filestore("/path/to/export")

Browsing Collections¶

A collection is an unscoped config pointing to a root directory that may contain multiple experiments. You can discover and browse experiments programmatically:

from metalab import load_results
from metalab.store import FileStoreConfig

# Create an unscoped config (collection root)
collection = FileStoreConfig(root="./experiments")

# List all experiments in the collection
experiments = collection.list_experiments()
# ['my_exp:1.0', 'my_exp:2.0', 'other_exp:1.0']

# Get config for a specific experiment
config = collection.for_experiment("my_exp:1.0")
results = load_results(config)

The list_experiments() method scans subdirectories for _meta.json files that contain experiment IDs. Only works on unscoped configs.

Loading Results¶

Load results from a store for analysis:

from metalab import load_results
from metalab.store import FileStoreConfig

# From path string
results = load_results("./experiments", experiment_id="my_exp:1.0")

# From config (using collection API)
collection = FileStoreConfig(root="./experiments")
results = load_results(collection.for_experiment("my_exp:1.0"))

# Convert to DataFrame
df = results.to_dataframe()

Store Transfer¶

Copy data between stores:

from metalab.store import export_store

# Export specific experiment
export_store(
    source="postgresql://localhost/db?file_root=/data",
    destination="./backup",
    experiment_id="my_exp:1.0",
)

Or via CLI:

metalab store export --from ./runs/my_exp --to ./backup

Creating Stores Programmatically¶

Use create_store() for URI-based store creation, or configs for explicit control:

from metalab.store import create_store, FileStoreConfig, parse_to_config

# URI-based (convenience)
store = create_store("./runs/my_exp")
store = create_store("file:///absolute/path")
store = create_store("postgresql://localhost/db", file_root="/path/to/files")

# Config-based (recommended for programmatic use)
config = FileStoreConfig(root="./runs/my_exp")
store = config.connect()

# Parse URI to config (useful for inspection/modification)
config = parse_to_config("postgresql://localhost/db?file_root=/data")
config = config.scoped("my_exp:1.0")  # Add experiment scoping
store = config.connect()

Config Serialization¶

StoreConfig objects are serializable for use across process boundaries (e.g., distributed execution):

from metalab.store import FileStoreConfig, StoreConfig

config = FileStoreConfig(root="./experiments", experiment_id="my_exp:1.0")

# Serialize to dict
d = config.to_dict()
# {'root': '/abs/path/to/experiments', 'experiment_id': 'my_exp:1.0', '_type': 'file'}

# Deserialize back
restored = StoreConfig.from_dict(d)
assert restored == config

Custom Stores¶

Create custom store implementations by subclassing StoreConfig:

from dataclasses import dataclass
from typing import ClassVar, Any

from metalab.store import StoreConfig
from metalab.store.locator import LocatorInfo

@dataclass(frozen=True, kw_only=True)
class MyCustomStoreConfig(StoreConfig):
    # scheme is used for URI parsing and auto-registration
    scheme: ClassVar[str] = "myscheme"

    # Your config fields
    endpoint: str
    bucket: str
    experiment_id: str | None = None

    def connect(self) -> "MyCustomStore":
        return MyCustomStore(self)

    @classmethod
    def from_locator(cls, info: LocatorInfo, **kwargs: Any) -> "MyCustomStoreConfig":
        # Parse from URI like "myscheme://endpoint/bucket"
        return cls(
            endpoint=info.host,
            bucket=info.path.lstrip("/"),
            experiment_id=kwargs.get("experiment_id"),
        )

class MyCustomStore:
    def __init__(self, config: MyCustomStoreConfig):
        self.config = config
        # Connect to your backend...

    # Implement Store protocol methods...

# Auto-registered! Now usable via create_store
store = create_store("myscheme://my-endpoint/my-bucket")

The __init_subclass__ mechanism in StoreConfig automatically registers your config class with the ConfigRegistry when the module is imported.