Memory-efficient optimizer for full-parameter LLM training.
Works with any PyTorch or HuggingFace model.
| Method | Bytes/Param | Full Training |
|---|---|---|
| FP16 + AdamW | 12 | Yes |
| 8-bit Adam | 6 | Yes |
| LoRA/QLoRA | ~2 | No (subset) |
| AXIOM | ~2 | Yes |
pip install quarterbitDatacenter power demand is doubling. GPU waitlists stretch for months. Training costs are exploding. The industry needs a solution.
AXIOM reduces training memory.
Train larger models on existing hardware. Full-parameter training, not LoRA.
Training a 70B model. See why you need 11 GPUs — and how AXIOM fits on one.
Needs 11× H100 80GB GPUs
2× H100 instead of 11× (5.8× fewer GPUs)
Yes. Full parameter training. Every weight updated. Verified convergence.
Both Attention and LayerNorm weights update during training. This is real learning, not a frozen model.
Loss: 13.0 → 0.5 over 500 steps. Perplexity: 550,000 → converged.
AXIOM reduces memory per parameter. Same hardware, larger models.
| GPU | Standard | AXIOM |
|---|---|---|
| RTX 4090 (24GB) | 3B | ~10B |
| A100 (80GB) | 7B | ~35B |
| H100 (80GB) | 7B | ~35B |
| T4 (Free) (16GB) | 1B | ~6B |
| Component | Standard | AXIOM |
|---|---|---|
| Weights | 2B | 2B |
| Optimizer | 8B | 0B |
| Gradients | 2B | 0.06B |
| Total | 12B | ~2B |
Train larger models without buying more GPUs.
Same hardware, bigger models. Full-parameter training for everyone.
No model changes. No custom layers. Just wrap and train.
Whether you're a researcher, startup, or enterprise — AXIOM changes what's possible.
Full-parameter training on free cloud GPUs. No cost, no cluster required.
Train larger LLMs on existing hardware. Fewer GPUs, same results.
Train 70B on 2 GPUs instead of 11. Cut hardware and energy costs.
Fewer GPUs needed means lower energy and hardware costs.
Any HuggingFace language model. Full-parameter training, not adapters.
Vision, diffusion, and audio support planned for future releases.
Current focus: LLM training. AXIOM eliminates optimizer memory and compresses gradients 33×.
pip install quarterbit. That's it. Works on any cloud, any GPU, any framework.
SSH in, pip install, train.
Consumer to datacenter.
Zero code changes.
Full features. No signup. Just pip install quarterbit and start training.
Love it? Star us on GitHub or support on Ko-fi.