Partnership

Mithril, now on Skypilot

Mithril, now on Skypilot

by

Olivier Truong

Plug into the AI omnicloud with a SkyPilot YAML

It’s been a longstanding request from the community and we’re excited to finally announce: SkyPilot workloads can now run on Mithril. You can run jobs, serve models, or execute large-scale inference easily, while benefiting from the consumption flexibility offered by Mithril.

SkyPilot started at UC Berkeley's Sky Lab and is now a thriving open-source framework for running AI workloads across clouds, Kubernetes, and on-prem clusters through a single interface.

Mithril complements this on the supply side. It turns globally distributed GPU infrastructure into a fluid, market-driven resource pool. Instead of capacity being either stranded or over-subscribed, pricing and allocation on Mithril adjust in real time to match supply with demand. Mithril achieves this by aggregating capacity into a unified marketplace.

From the recently-released flexible  reservations to cost-efficient spot capacity, Mithril gives teams two ways to procure compute depending on whether they prioritize certainty or flexibility.

For guaranteed access, reservations let you secure capacity ahead of time. Unlike traditional reservations that require committing to fixed usage, Mithril reservations let you give capacity back when you don’t need it, and earn usage credit as a result.

Spot capacity is allocated through a real-time auction, allowing teams to access available GPUs at market-clearing prices, allowing you to trade off cost with consistency of access. With a very low limit price, your workloads only run at times where the market price is very low, ensuring excellent value (at the cost of having to wait longer for completion). A high limit price, on the other hand, ensures you’ll get capacity (while only paying the resulting market price, not your limit price).

The result is immediate access to a broader pool of high-performance GPUs — delivered seamlessly through the familiar SkyPilot interface.

What Mithril Brings to SkyPilot

Training: secure capacity and return unused capacity for credits

For reservations, you can set up a Kubernetes cluster on top of your reserved resources and interact with them through SkyPilot, giving your whole team a single interface to share capacity across projects. The downside of a traditional reservation is that you're paying for capacity whether you use it or not. On Mithril, you earn credits when returning capacity back which can result in significantly improved economics for your training runs.

Training: achieve dramatic cost reductions for preemptible runs

At off-peak times, H100s and B200s can be had for as low as $0.01/hr. With SkyPilot and Mithril's developer tooling, you can make your training run preemptible by gracefully handling preemption. Checkpoint your training run to cloud buckets, resume automatically without manual intervention. This lets you launch a job that absorbs available and cost-effective capacity over days/weeks.

Inference: buy right-of-first refusal to capacity for your upcoming launch

Demand during product launches can be unpredictable. Traditionally, companies are forced to over-provision, wasting capital and hoarding resources, or to under-provision and risk failing to capture the moment. On Mithril, you can buy reserved capacity, scale up or down and get credits back for unused GPUs. With SkyPilot, scaling that capacity across regions or clouds requires no adjustments to your workflow.

Inference: burst on the spot market when demand spikes

A unique feature of Mithril spot is the ability to outbid the market to acquire capacity when supply is tight everywhere else. With SkyPilot's cross-cloud capabilities, you can use your reserved capacity, whether on Mithril or another cloud, and with minimal changes to your workflow, turn to Mithril at a moment's notice to grab capacity on the spot market to meet a surge in demand.

Batch inference: passively grab cheap capacity

SkyPilot makes workloads portable across providers and through Mithril's unique auction-based spot market, popular GPUs can be available for as low as $0.01 at off-peak hours. On Mithril, you can name a limit price and let capacity come to you. When prices drop, during off-peak hours or periods of excess supply, your jobs automatically start, capturing low-cost GPU capacity without manual intervention.

Run your first workload on Mithril with SkyPilot

The steps below are enough to get up and running quickly.

  1. Install and setup the Mithril CLI

uv tool install -U --refresh mithril-client

ml setup

  1. Install SkyPilot

uv tool install --with pip "skypilot-nightly[mithril]"

  1. Run a test job

# task.yaml

resources:

  infra: mithril

  accelerators: B200:8 # An 8x B200 instance

  

# Maximum hourly price you're willing to pay for

# the instance.

# Due to auction-based pricing, you often pay less

# than this cap.

config:

  mithril:

    

    limit_price: 32.00 # Equivalent to $4.00/GPU-hour

# Command that executes your code — runs on the

# cluster every time you launch or exec.

run: |

  nvidia-smi

  1. Launch your job

❯ sky launch -c mithril-test task.yaml

YAML to run: task.yaml

Considered resources (1 node):

--------------------------------------------------------------------------------------------

 INFRA                     INSTANCE          vCPUs   Mem(GB)   GPUS     COST ($)   CHOSEN

--------------------------------------------------------------------------------------------

 Mithril (us-central5-a)   neb-b200.sxm.8x   160     1792      B200:8   0.08          ✔

--------------------------------------------------------------------------------------------

Launching a new cluster 'mithril-test'. Proceed? [Y/n]:

## Distributed training

# multi-node.yaml

# Sync this directory so your code and data are

# available on the cluster

workdir: .

resources:

  infra: mithril

  accelerators: B200:8

num_nodes: 2

  

# Maximum hourly price you're willing to pay for

# the instance.

# Due to auction-based pricing, you often pay less

# than this cap.

config:

  mithril:

    # Equivalent to $4.00/GPU/hour on an 8x instance.

    limit_price: 32.00

# Runs once when the cluster is first created

# (install deps, download data, etc.)

setup: |

  uv pip install -r requirements.txt

  source .venv/bin/activate

# Command that executes your code — runs on the

# cluster every time you launch or exec.

run: |

  MASTER_ADDR=$(echo "$SKYPILOT_NODE_IPS" | head -n1)

  echo "Starting distributed training, head node: $MASTER_ADDR"

  torchrun \

  --nnodes=$SKYPILOT_NUM_NODES \

  --nproc_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \

  --node_rank=${SKYPILOT_NODE_RANK} \

  --master_addr=$MASTER_ADDR \

  --master_port=8008 \

  train.py

Compute that shapes to you

Foundry Technologies ©2025

Foundry Technologies ©2025

Product

Resources

Company