AXIOM Documentation
15-17x memory compression for any PyTorch model. Works with all HuggingFace Transformers (NLP, Vision, Audio, Multimodal).
15-17x memory compression. Train large models on single GPUs. Works with any PyTorch model using nn.Linear layers.
Installation
Install AXIOM from PyPI. Requires Python 3.11+ and PyTorch 2.0+.
pip install quarterbitQuarterBit works on all NVIDIA GPU architectures (sm_60 to sm_90). Windows and Linux supported.
Quick Start
One function call enables memory-efficient training. Works with all HuggingFace models (NLP, Vision, Audio, Multimodal).
from quarterbit import axiom, TrainingStats
from transformers import AutoModelForCausalLM
import torch
# Load any HuggingFace model
model = AutoModelForCausalLM.from_pretrained("model-name", torch_dtype=torch.float16)
# Enable memory-efficient training (15-17x compression)
model = axiom(model)
model = model.cuda()
# Training loop - no optimizer needed!
# AXIOM has a built-in optimizer that updates during backward()
stats = TrainingStats(log_interval=100)
for step, batch in enumerate(dataloader):
loss = model(**batch).loss
loss.backward() # Weights update automatically
stats.log(step, loss.item())
stats.summary()
# Save with standard PyTorch
torch.save(model.state_dict(), "checkpoint.pt")Memory Compression
AXIOM achieves 15-17x memory compression, enabling large model training on single GPUs.
Weights
Compressed weight representation reduces memory footprint significantly.
Optimizer
Built-in optimizer eliminates separate optimizer state storage.
Tip: Activation Checkpointing
For very long sequences or large batches, use PyTorch's built-in gradient checkpointing:
model.gradient_checkpointing_enable()Memory Comparison
| Model | Standard (FP16+AdamW) | AXIOM | Compression |
|---|---|---|---|
| 7B | 84 GB | 5.5 GB | 15x |
| 13B | 156 GB | 9 GB | 17x |
| 70B | 840 GB | 53 GB | 16x |
Supported Models
Works with any PyTorch model. All HuggingFace Transformers supported across all domains.
NLP
LLaMA, Mistral, Mixtral, Qwen, Yi, Phi, Gemma, GPT, BERT, T5
Vision
ViT, CLIP, DINOv2, Swin Transformer
Audio
Whisper, Wav2Vec2, HuBERT
Multimodal
LLaVA, BLIP, Flamingo, PaLI
Custom Models
Any PyTorch model using nn.Linear layers works automatically.
from quarterbit import axiom
import torch.nn as nn
# Your custom model
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(768, 3072),
nn.GELU(),
nn.Linear(3072, 768),
)
model = MyModel()
model = axiom(model) # Works!Training Stats
Track loss, perplexity, throughput, and memory during training.
from quarterbit import TrainingStats
stats = TrainingStats(log_interval=100)
for step, batch in enumerate(train_loader):
loss = model(**batch).loss
loss.backward() # AXIOM updates weights automatically
stats.log(step, loss.item(), tokens=batch_size * seq_len)
# Validation (optional)
if step % 500 == 0:
val_loss = evaluate(model, val_loader)
stats.log_val(step, val_loss)
stats.summary()Output
Step 100 | Loss: 3.2451 | PPL: 25.67 | 1250 tok/s | Peak: 5.2GB
Step 200 | Loss: 2.8934 | PPL: 18.05 | 1312 tok/s | Peak: 5.2GB
Step 500 | Loss: 2.5123 | PPL: 12.33 | 1285 tok/s | Peak: 5.2GB
>>> Val Loss: 2.6841 | Val PPL: 14.64 (+23.1%)
==================================================
Training Complete
Train Loss: 3.8521 -> 2.1234
Best Loss: 2.0891
Val PPL: 18.92 -> 12.45 (+34.2%)
Steps: 2000
Time: 45.2 min
==================================================Save & Resume
AXIOM models use standard PyTorch checkpointing. No special handling required.
Save Checkpoint
# Save model state
torch.save(model.state_dict(), "checkpoint.pt")
# Or save full training state
torch.save({
'model': model.state_dict(),
'step': step,
'best_loss': best_loss,
}, 'checkpoint.pt')Resume Training
from quarterbit import axiom
from transformers import AutoModelForCausalLM
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained("model-name", torch_dtype=torch.float16)
# Enable AXIOM
model = axiom(model)
model = model.cuda()
# Load checkpoint
model.load_state_dict(torch.load("checkpoint.pt"))
# Continue training - no optimizer needed
for batch in dataloader:
loss = model(**batch).loss
loss.backward() # Weights update automaticallyCLI Commands
# Login via browser (recommended)
quarterbit login
# Check license status
quarterbit status
# Activate with key
quarterbit activate <LICENSE_KEY>Supported GPUs
AXIOM works on all major NVIDIA GPU architectures:
Consumer
- sm_61: GTX 1050/1060/1070
- sm_75: RTX 2080
- sm_86: RTX 3090/3080/3070
- sm_89: RTX 4090/4080/4070
Data Center
- sm_60: P100, GTX 1080
- sm_70: V100
- sm_75: T4 (Kaggle/Colab)
- sm_80: A100
- sm_90: H100, H200
Ready to Get Started?
15-17x memory compression. Train 70B on single H100, 13B on free Kaggle T4.