Free during beta — Try on Kaggle

Train Larger Models On Less Hardware

Memory-efficient optimizer for full-parameter LLM training.
Works with any PyTorch or HuggingFace model.

Training memory per parameter (weights + optimizer + gradients)

Method	Bytes/Param	Full Training
FP16 + AdamW	12	Yes
8-bit Adam	6	Yes
LoRA/QLoRA	~2	No (subset)
AXIOM	~2	Yes

0 GB

Optimizer state

33×

Gradient compression

100%

Parameters trained

View on GitHub View Docs

📦 PyPI ⭐ GitHub

$pip install quarterbit

The AI Energy Crisis Is Here

Datacenter power demand is doubling. GPU waitlists stretch for months. Training costs are exploding. The industry needs a solution.

+88%

Power connection requests in 8 months

Source: S&P Global

2×

Datacenter power demand by 2030

Source: IEA, Goldman Sachs

11+

GPUs needed to train a 70B model

Standard FP16 + AdamW

AXIOM reduces training memory.

Train larger models on existing hardware. Full-parameter training, not LoRA.

Where Your Memory Actually Goes

Training a 70B model. See why you need 11 GPUs — and how AXIOM fits on one.

Standard (11 GPUs)

Model Weights (FP16)140 GB

Gradients140 GB

Optimizer (momentum + variance)560 GB

Total Required840 GB

Needs 11× H100 80GB GPUs

AXIOM (2 GPUs)

Model Weights (FP16)140 GB

Compressed Gradients4 GB

Optimizer State0 GB

Total Required144 GB

2× H100 instead of 11× (5.8× fewer GPUs)

12 bytes/param→~2 bytes/param=5-6× fewer GPUs

Does It Actually Learn?

Yes. Full parameter training. Every weight updated. Verified convergence.

100%

Perplexity improved

Llama-2 70B

5/5

Generations changed

Model actually learned

100%

Parameters trained

Not LoRA — full training

Both Attention and LayerNorm weights update during training. This is real learning, not a frozen model.

Loss: 13.0 → 0.5 over 500 steps. Perplexity: 550,000 → converged.

Train Larger Models On Your Hardware

AXIOM reduces memory per parameter. Same hardware, larger models.

Models You Can Train

GPU	Standard	AXIOM
RTX 4090 (24GB)	3B	~10B
A100 (80GB)	7B	~35B
H100 (80GB)	7B	~35B
T4 (Free) (16GB)	1B	~6B

Estimated model sizes based on memory per parameter

Memory Efficiency

Component	Standard	AXIOM
Weights	2B	2B
Optimizer	8B	0B
Gradients	2B	0.06B
Total	12B	~2B

~2 bytes/param vs 12 bytes/param for AdamW

Train larger models without buying more GPUs.

Same hardware, bigger models. Full-parameter training for everyone.

Three lines. That's it.

No model changes. No custom layers. Just wrap and train.

$ pip install quarterbit
# Works with ANY HuggingFace or PyTorch model
from quarterbit import AXIOM_Trainer
trainer = AXIOM_Trainer(model, train_loader, val_loader)
results = trainer.fit(steps=1000)

Open Source

Llama, Mistral, Qwen

Research

GPT-J, Phi, Gemma

MoE

Mixtral, DeepSeek

Any LLM

HuggingFace models

What This Means For You

Whether you're a researcher, startup, or enterprise — AXIOM changes what's possible.

Researchers

on free Kaggle T4

Full-parameter training on free cloud GPUs. No cost, no cluster required.

Free on Kaggle
100% parameters trained
Not LoRA — full training
Works with HuggingFace

Startups

35B

on single A100

Train larger LLMs on existing hardware. Fewer GPUs, same results.

Eliminates optimizer memory
33× gradient compression
Extend runway
Drop-in replacement

Enterprise

5-6×

Fewer GPUs

Train 70B on 2 GPUs instead of 11. Cut hardware and energy costs.

~80% fewer GPUs
Lower power draw
Immediate capacity
LLM training focus

Reduce Hardware Costs

Fewer GPUs needed means lower energy and hardware costs.

GPU Requirements

AdamW

12B

per parameter

AXIOM

~2B

per parameter

5-6× less memory = fewer GPUs needed

Energy Efficiency

Multi-GPU

High

power draw

Single GPU

Lower

power draw

Fewer GPUs = less energy consumption

Works With LLMs

Any HuggingFace language model. Full-parameter training, not adapters.
Vision, diffusion, and audio support planned for future releases.

LLaMA 3

Llama 2

Mistral

Mixtral

Gemma

Phi-3

Qwen

DeepSeek

GPT-2

GPT-J

Falcon

Command-R

OLMo

MPT

Pythia

Current focus: LLM training. AXIOM eliminates optimizer memory and compresses gradients 33×.

Deploy Anywhere

pip install quarterbit. That's it. Works on any cloud, any GPU, any framework.

Any Cloud

RunPod
Lambda Labs
AWS / GCP / Azure
Kaggle / Colab (free!)

SSH in, pip install, train.

Any GPU

Blackwell / B200
H100 / A100
RTX 4090 / 4080 / 4070
T4 (free tier)

Consumer to datacenter.

Any Framework

PyTorch native
HuggingFace Transformers
Lightning / Fabric
DeepSpeed / FSDP

Zero code changes.

Free during beta

Full features. No signup. Just pip install quarterbit and start training.

Love it? Star us on GitHub or support on Ko-fi.

Beta

Free

Unlimited usage
15-17x memory compression
All model architectures
Personal & commercial use
No signup required

View on GitHub

Pro (Coming Soon)

$49/mo

Everything in Beta
Priority support
Custom model optimization
Direct engineer access

Coming Soon

Enterprise

Custom

Everything in Pro
Unlimited team members
Custom SLA guarantee
Direct owner access

Contact Sales