Documentation

AXIOM Documentation

15-17x memory compression for any PyTorch model. Works with all HuggingFace Transformers (NLP, Vision, Audio, Multimodal).

AXIOM - Memory Compression Trainer

15-17x memory compression. Train large models on single GPUs. Works with any PyTorch model using nn.Linear layers.

Installation

Install AXIOM from PyPI. Requires Python 3.11+ and PyTorch 2.0+.

pip install quarterbit

QuarterBit works on all NVIDIA GPU architectures (sm_60 to sm_90). Windows and Linux supported.

Quick Start

One function call enables memory-efficient training. Works with all HuggingFace models (NLP, Vision, Audio, Multimodal).

from quarterbit import axiom, TrainingStats
from transformers import AutoModelForCausalLM
import torch

# Load any HuggingFace model
model = AutoModelForCausalLM.from_pretrained("model-name", torch_dtype=torch.float16)

# Enable memory-efficient training (15-17x compression)
model = axiom(model)
model = model.cuda()

# Training loop - no optimizer needed!
# AXIOM has a built-in optimizer that updates during backward()
stats = TrainingStats(log_interval=100)
for step, batch in enumerate(dataloader):
    loss = model(**batch).loss
    loss.backward()  # Weights update automatically
    stats.log(step, loss.item())

stats.summary()

# Save with standard PyTorch
torch.save(model.state_dict(), "checkpoint.pt")

Memory Compression

AXIOM achieves 15-17x memory compression, enabling large model training on single GPUs.

Weights

Compressed weight representation reduces memory footprint significantly.

Optimizer

Built-in optimizer eliminates separate optimizer state storage.

Tip: Activation Checkpointing

For very long sequences or large batches, use PyTorch's built-in gradient checkpointing:

model.gradient_checkpointing_enable()

Memory Comparison

ModelStandard (FP16+AdamW)AXIOMCompression
7B84 GB5.5 GB15x
13B156 GB9 GB17x
70B840 GB53 GB16x

Supported Models

Works with any PyTorch model. All HuggingFace Transformers supported across all domains.

NLP

LLaMA, Mistral, Mixtral, Qwen, Yi, Phi, Gemma, GPT, BERT, T5

Vision

ViT, CLIP, DINOv2, Swin Transformer

Audio

Whisper, Wav2Vec2, HuBERT

Multimodal

LLaVA, BLIP, Flamingo, PaLI

Custom Models

Any PyTorch model using nn.Linear layers works automatically.

from quarterbit import axiom
import torch.nn as nn

# Your custom model
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(768, 3072),
            nn.GELU(),
            nn.Linear(3072, 768),
        )

model = MyModel()
model = axiom(model)  # Works!

Training Stats

Track loss, perplexity, throughput, and memory during training.

from quarterbit import TrainingStats

stats = TrainingStats(log_interval=100)

for step, batch in enumerate(train_loader):
    loss = model(**batch).loss
    loss.backward()  # AXIOM updates weights automatically
    stats.log(step, loss.item(), tokens=batch_size * seq_len)

    # Validation (optional)
    if step % 500 == 0:
        val_loss = evaluate(model, val_loader)
        stats.log_val(step, val_loss)

stats.summary()

Output

Step   100 | Loss: 3.2451 | PPL: 25.67 | 1250 tok/s | Peak: 5.2GB
Step   200 | Loss: 2.8934 | PPL: 18.05 | 1312 tok/s | Peak: 5.2GB
Step   500 | Loss: 2.5123 | PPL: 12.33 | 1285 tok/s | Peak: 5.2GB
         >>> Val Loss: 2.6841 | Val PPL: 14.64 (+23.1%)

==================================================
Training Complete
  Train Loss: 3.8521 -> 2.1234
  Best Loss:  2.0891
  Val PPL:    18.92 -> 12.45 (+34.2%)
  Steps:      2000
  Time:       45.2 min
==================================================

Save & Resume

AXIOM models use standard PyTorch checkpointing. No special handling required.

Save Checkpoint

# Save model state
torch.save(model.state_dict(), "checkpoint.pt")

# Or save full training state
torch.save({
    'model': model.state_dict(),
    'step': step,
    'best_loss': best_loss,
}, 'checkpoint.pt')

Resume Training

from quarterbit import axiom
from transformers import AutoModelForCausalLM
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained("model-name", torch_dtype=torch.float16)

# Enable AXIOM
model = axiom(model)
model = model.cuda()

# Load checkpoint
model.load_state_dict(torch.load("checkpoint.pt"))

# Continue training - no optimizer needed
for batch in dataloader:
    loss = model(**batch).loss
    loss.backward()  # Weights update automatically

CLI Commands

# Login via browser (recommended)
quarterbit login

# Check license status
quarterbit status

# Activate with key
quarterbit activate <LICENSE_KEY>

Supported GPUs

AXIOM works on all major NVIDIA GPU architectures:

Consumer

  • sm_61: GTX 1050/1060/1070
  • sm_75: RTX 2080
  • sm_86: RTX 3090/3080/3070
  • sm_89: RTX 4090/4080/4070

Data Center

  • sm_60: P100, GTX 1080
  • sm_70: V100
  • sm_75: T4 (Kaggle/Colab)
  • sm_80: A100
  • sm_90: H100, H200

Ready to Get Started?

15-17x memory compression. Train 70B on single H100, 13B on free Kaggle T4.