ml-cellar: A Minimal MLOps Model Management CLI

Summary

Released ml-cellar, a “Model Management CLI for Minimal MLOps,” written in Rust as OSS.
Among the software supporting the Minimal MLOps Checklist, it is responsible for model management and model serving.

–

Background

In the operation of ML models, the “management of deliverables such as trained models” is more effective than the training code itself. Where are the weights for which experiment? How do you reproduce it? Who should use which model? Which model was used as the base for fine-tuning? For this management of ML models, the concept of “MLOps (Machine Learning Operations)” was born. It is a general term for the systems, culture, and technologies to not only “make” machine learning models but also to “operate them safely, stably, and continuously to keep providing value.” It is an extension of DevOps (integration of development and operations) tailored to the specific challenges of machine learning (data, reproducibility, degradation, monitoring, etc.).

On the other hand, building this “MLOps” requires many MLOps engineers, resulting in significant costs. MLOps moves in the “right” direction the more it leans toward being full-stack: experiment management, metadata, model registries, artifact stores, serving, monitoring, and so on. However, its overall picture demands high costs and operational loads. While huge MLOps infrastructures are maintained at BigTech (I assume), all companies, especially small startups and research teams, often cannot pay the cost required to have an MLOps infrastructure.

In reality, there are many “light needs” for MLOps such as:

Research-oriented, where sharing experiment results (weights and settings) is enough.
Contract/Product-based, where you want to distribute fine-tuned models for multiple projects.
Simply wanting a “place for models” and “minimum rules” for now.

I wrote the Minimal MLOps Checklist(Japanese article) for those “light needs,” and this time, I created software to support Minimal MLOps. In the context of MLOps, many aim for full automation, but to reduce costs, ml-cellar emphasizes getting minimum operations running immediately by using manual CLI operations.

Existing services include wandb (Weights & Biases). While artifact management is possible with wandb, ml-cellar prioritizes “storing models and keeping them in a state where they are ready for use at any time.” Our policy is not to “search for models buried among a massive amount of experiments,” but rather to “save only the high-performing models and make them accessible to anyone.” The intended workflow is to manage experiment tracking loosely on wandb under individual responsibility, while handling model storage and serving meticulously on ml-cellar as a team responsibility. Additionally, ml-cellar has the advantage of not requiring any modifications to your training code, making it easier to start model management immediately compared to wandb.

–

About `ml-cellar`

Instead of providing a service like MLflow, I decided to provide ml-cellar as a Rust CLI that makes it easy to handle a minimal model registry based on Git LFS. It is not an “all-in-one” like MLflow, but aims for a “compromised MLOps” focused only on truly necessary functions. By using Git LFS, the learning cost to understand the structure is low, and I aimed for a configuration that is unlikely to become technical debt. Also, since it basically wraps git and git-lfs commands, operation is possible even without ml-cellar (using ml-cellar just makes it easier). Therefore, I believe it can be a configuration that avoids vendor lock-in and makes operations robust.

In ml-cellar, model deliverables are managed like a “wine cellar” using rack (shelf) and bin (bottle). Below is an example of a rack:

model_registry_repository/ # repository (called a model registry)
  vit-l/ # rack (type of model deliverable, equivalent to a grape variety in wine)
    0.1/ # bin (a set of versioned model deliverables, equivalent to a vintage in wine)
      # config files, weight files, logs, etc.
      config.yaml
      checkpoint.pth
      result.json
      log.log
    0.2/
      ...
  vit-m/
    ...

In this case, a model can be uniquely identified by a name like “vit-l/0.2,” making distribution and reference easier. For distribution:

ml-cellar clone {your_ml_registry}
ml-cellar download {path to a directory or a file}

This is how you get the files.

Also, by using GitHub as the remote repository for Git LFS, you can develop using Pull Requests for model uploads. The flow of “development,” “review,” and “approval” can be performed with an interface familiar to many software engineers, reducing the work cost and psychological cost for engineers. Permission management is also tied simply to GitHub accounts.

–

Usage

For details, see the README.
Note that GitHub LFS has limits on storage/bandwidth; exceeding them will result in additional charges or restrictions.
- Please note that the user is responsible for costs incurred by exceeding Git LFS limits.

Creating a Repository

Install GitLFS

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get update
sudo apt-get install git-lfs
git lfs install

Install Rust

curl https://sh.rustup.rs -sSf | sh

Install ml-cellar.

cargo install ml-cellar

Create a Model registry

mkdir {your_ml_registry}
cd {your_ml_registry}
ml-cellar init

git remote add origin {remote_repository}

Determine LFS targets in .gitattributes

# --- Log ---
*.log   filter=lfs diff=lfs merge=lfs -text

# --- Your data ---
*.db    filter=lfs diff=lfs merge=lfs -text

Create a rack (a “shelf” for each model)

ml-cellar rack {my_algorithm}

Structure of a rack

{your_ml_registry}/
  {my_algorithm}/
    README.md
    config.toml
    template.md

Define “required deliverables” in config.toml
- required_files: Files that must be present
- optional_files: Files that may be uploaded ("*" for anything)

[artifact]
required_files = ["config.yaml", "checkpoint.pth"]
optional_files = ["*"]

Create README.md
- Including information such as model purpose, training data, precautions, recommended use cases, and compatible tasks will help people who search for it later.

Committing Deliverables

Commit the model and create a PR

# Create a branch
git sw -c feat/my_algorithm/0.1

# Check if deliverables are complete
ml-cellar check {my_algorithm}/0.1/

# Commit
git add {path}
git commit -m "{commit message}"

# Push to the remote repository
ml-cellar push origin HEAD

Distributing Deliverables

clone

ml-cellar clone {your_ml_registry}

Immediately after cloning, the Git LFS files are not yet materialized, so you materialize them.
- The operation of “materializing only what is necessary” allows you to reduce capacity by not materializing unnecessary items even as models increase.

ml-cellar download {path to a directory or a file}

Maintaining the Repository

When using GitHub Git LFS, you can use GitHub’s features as they are. The following features are particularly useful:

Pull Request

By using GitHub as the remote repository for Git LFS, you can manage model uploads with a Pull Request-based workflow. Since you can realize the flow of “development → review → approval” on an interface familiar to many software engineers, you can reduce not only the actual work cost but also the psychological burden on engineers.

CODEOWNER

It is also possible to set CODEOWNERS for each rack/model name to protect each rack. Below is an example of .github/CODEOWNERS:

/vit/            @cv-team
/llm-model/      @llm-team
/**/project_A/   @project_A

This allows for control such that models for a project are reviewed by the people operating that project, while base models are reviewed by the R&D team.

Also, by combining with branch protection / Rulesets, the operation can be made even more robust:

Make PRs mandatory
Set the required number of approvals (N people)
Make CODEOWNERS approval mandatory

These can prevent models from being added or replaced without being noticed. There are more features; for details, see the documentation.

Document Preparation

In practice, writing evaluation results entirely by hand is costly. Also, summarizing them in spreadsheets makes it difficult to link results to which model they belong to. By putting information you want to write in the README into a JSON file in the training code, ml-cellar supports semi-automation to automatically create evaluation results from a template file.

If you create a template like the following and run a command like ml-cellar docs test_registry/test_docs/base/0.1/:

### {{version}}

- Write the summary for the model
- Why you updated the model
- What dataset you added
- What you fixed in the algorithm
- What you updated in training parameters

<details>
<summary> Evaluation results </summary>

- Performance
  - Device: {performance.device}
  - Inference time(ms): {performance.time_ms}
  - Training time: {performance.gpu_num} GPU * {performance.batch_size} Batchsize * {performance.training_days} days
- Dataset
  - name: {dataset.name}
  - Version: {dataset.version}
  - Frame number: {dataset.frame_number}
- mAP for test dataset

| Metric         | Value              |
| -------------- | ------------------ |
| mAP (0.5:0.95) | {ap.map.50_95.all} |
| mAP (0.5)      | {ap.map.50.all}    |

- Per-Class AP for test dataset

| Class   | Samples                                  | Accuracy               |
| ------- | ---------------------------------------- | ---------------------- |
| person  | {test_dataset.per_class_samples.person}  | {ap.per_class.person}  |
| bicycle | {test_dataset.per_class_samples.bicycle} | {ap.per_class.bicycle} |
| car     | {test_dataset.per_class_samples.car}     | {ap.per_class.car}     |
| dog     | {test_dataset.per_class_samples.dog}     | {ap.per_class.dog}     |

</details>

It outputs the following document to the standard output. By pasting this into the README, you can create documentation at a low cost:

### test_docs/base/0.1

- Write the summary for the model
- Why you updated the model
- What dataset you added
- What you fixed in the algorithm
- What you updated in training parameters

<details>
<summary> Evaluation results </summary>

- Performance
  - Device: RTX4090
  - Inference time(ms): 7.87
  - Training time: 2 GPU * 32 Batchsize * 4.7 days
- Dataset
  - name: myCOCO
  - Version: 2.1.1
  - Frame number: 21321
- mAP for test dataset

| Metric         | Value      |
| -------------- | ---------- |
| mAP (0.5:0.95) | 0.40453625 |
| mAP (0.5)      | 0.598      |

- Per-Class AP for test dataset

| Class   | Samples | Accuracy |
| ------- | ------- | -------- |
| person  | 647     | 0.5517   |
| bicycle | 916     | 0.2899   |
| car     | 663     | 0.3941   |
| dog     | 988     | 0.586    |

For details, see Documentation using template.md.

Project Operation

In practice, you might want to separate:

Models fine-tuned for Project A
Models fine-tuned with different data for Project B

ml-cellar also provides the concept of project-based model registries to support management on a per-project basis. For details, see the documentation.

–

Future Plans

As a current constraint, GitHub LFS is not infinite.

Storage limit
Bandwidth (download) limit
Additional charges or restrictions when exceeded

In ml-cellar, we plan to design it to offload to S3 or elsewhere with a Custom Transfer Agent (not yet implemented).

As current operation measures:

Narrow down LFS targets to truly necessary deliverables.
Keep checkpoints to “best only” (do not save all epochs).

These are realistic countermeasures.