Free during beta — Try on Kaggle

Train Larger Models On Less Hardware

Memory-efficient optimizer for full-parameter LLM training.
Works with any PyTorch or HuggingFace model.

Training memory per parameter (weights + optimizer + gradients)
MethodBytes/ParamFull Training
FP16 + AdamW12Yes
8-bit Adam6Yes
LoRA/QLoRA~2No (subset)
AXIOM~2Yes
0 GB
Optimizer state
33×
Gradient compression
100%
Parameters trained
$pip install quarterbit

The AI Energy Crisis Is Here

Datacenter power demand is doubling. GPU waitlists stretch for months. Training costs are exploding. The industry needs a solution.

+88%
Power connection requests in 8 months
Source: S&P Global
Datacenter power demand by 2030
Source: IEA, Goldman Sachs
11+
GPUs needed to train a 70B model
Standard FP16 + AdamW

AXIOM reduces training memory.

Train larger models on existing hardware. Full-parameter training, not LoRA.

Where Your Memory Actually Goes

Training a 70B model. See why you need 11 GPUs — and how AXIOM fits on one.

Standard (11 GPUs)

Model Weights (FP16)140 GB
Gradients140 GB
Optimizer (momentum + variance)560 GB
Total Required840 GB

Needs 11× H100 80GB GPUs

AXIOM (2 GPUs)

Model Weights (FP16)140 GB
Compressed Gradients4 GB
Optimizer State0 GB
Total Required144 GB

2× H100 instead of 11× (5.8× fewer GPUs)

12 bytes/param~2 bytes/param=5-6× fewer GPUs

Does It Actually Learn?

Yes. Full parameter training. Every weight updated. Verified convergence.

100%
Perplexity improved
Llama-2 70B
5/5
Generations changed
Model actually learned
100%
Parameters trained
Not LoRA — full training

Both Attention and LayerNorm weights update during training. This is real learning, not a frozen model.

Loss: 13.0 → 0.5 over 500 steps. Perplexity: 550,000 → converged.

Train Larger Models On Your Hardware

AXIOM reduces memory per parameter. Same hardware, larger models.

Models You Can Train

GPUStandardAXIOM
RTX 4090 (24GB)3B~10B
A100 (80GB)7B~35B
H100 (80GB)7B~35B
T4 (Free) (16GB)1B~6B
Estimated model sizes based on memory per parameter

Memory Efficiency

ComponentStandardAXIOM
Weights2B2B
Optimizer8B0B
Gradients2B0.06B
Total12B~2B
~2 bytes/param vs 12 bytes/param for AdamW

Train larger models without buying more GPUs.

Same hardware, bigger models. Full-parameter training for everyone.

Three lines. That's it.

No model changes. No custom layers. Just wrap and train.

$ pip install quarterbit
# Works with ANY HuggingFace or PyTorch model
from quarterbit import AXIOM_Trainer
trainer = AXIOM_Trainer(model, train_loader, val_loader)
results = trainer.fit(steps=1000)
Open Source
Llama, Mistral, Qwen
Research
GPT-J, Phi, Gemma
MoE
Mixtral, DeepSeek
Any LLM
HuggingFace models

What This Means For You

Whether you're a researcher, startup, or enterprise — AXIOM changes what's possible.

Researchers

6B
on free Kaggle T4

Full-parameter training on free cloud GPUs. No cost, no cluster required.

  • Free on Kaggle
  • 100% parameters trained
  • Not LoRA — full training
  • Works with HuggingFace

Startups

35B
on single A100

Train larger LLMs on existing hardware. Fewer GPUs, same results.

  • Eliminates optimizer memory
  • 33× gradient compression
  • Extend runway
  • Drop-in replacement

Enterprise

5-6×
Fewer GPUs

Train 70B on 2 GPUs instead of 11. Cut hardware and energy costs.

  • ~80% fewer GPUs
  • Lower power draw
  • Immediate capacity
  • LLM training focus

Reduce Hardware Costs

Fewer GPUs needed means lower energy and hardware costs.

GPU Requirements

AdamW
12B
per parameter
AXIOM
~2B
per parameter
5-6× less memory = fewer GPUs needed

Energy Efficiency

Multi-GPU
High
power draw
Single GPU
Lower
power draw
Fewer GPUs = less energy consumption

Works With LLMs

Any HuggingFace language model. Full-parameter training, not adapters.
Vision, diffusion, and audio support planned for future releases.

LLaMA 3
Llama 2
Mistral
Mixtral
Gemma
Phi-3
Qwen
DeepSeek
GPT-2
GPT-J
Falcon
Yi
Command-R
OLMo
MPT
Pythia

Current focus: LLM training. AXIOM eliminates optimizer memory and compresses gradients 33×.

Deploy Anywhere

pip install quarterbit. That's it. Works on any cloud, any GPU, any framework.

Any Cloud

  • RunPod
  • Lambda Labs
  • AWS / GCP / Azure
  • Kaggle / Colab (free!)

SSH in, pip install, train.

Any GPU

  • Blackwell / B200
  • H100 / A100
  • RTX 4090 / 4080 / 4070
  • T4 (free tier)

Consumer to datacenter.

Any Framework

  • PyTorch native
  • HuggingFace Transformers
  • Lightning / Fabric
  • DeepSpeed / FSDP

Zero code changes.

Free during beta

Full features. No signup. Just pip install quarterbit and start training.

Love it? Star us on GitHub or support on Ko-fi.

Most Popular

Beta

Free
  • Unlimited usage
  • 15-17x memory compression
  • All model architectures
  • Personal & commercial use
  • No signup required
View on GitHub

Pro (Coming Soon)

$49/mo
  • Everything in Beta
  • Priority support
  • Custom model optimization
  • Direct engineer access
Coming Soon

Enterprise

Custom
  • Everything in Pro
  • Unlimited team members
  • Custom SLA guarantee
  • Direct owner access
Contact Sales