# Cloud Training Package (Pure Nix Approach)

Since local CUDA in NixOS is complex, this approach prepares a **self-contained training package** that can be uploaded to any cloud GPU provider.

## What's Included

The package (`cloud-training-package/wan22-smoke-training.tar.gz`) contains:
- 160 training video clips (smoke test subset)
- Training metadata (DiffSynth format)
- Training script (`train.py`)
- Launch script (`launch.sh`)
- Python dependencies (`requirements.txt`)
- Usage instructions (`README.md`)

**Package size:** ~600 MB

## How It Works

```
┌─────────────────────────────────────┐
│  NixOS Machine (Your Bastion)       │
│  - Pure Nix data preparation        │
│  - No CUDA needed                   │
└──────────────┬──────────────────────┘
               │
               │ Upload tarball
               ▼
┌─────────────────────────────────────┐
│  Cloud GPU Provider                 │
│  - Ubuntu/Debian with CUDA drivers  │
│  - PyTorch + DiffSynth installed    │
│  - RTX 4090 / A100 GPU              │
└──────────────┬──────────────────────┘
               │
               │ Download trained LoRA
               ▼
┌─────────────────────────────────────┐
│  NixOS Machine                      │
│  - Use trained weights for inference│
└─────────────────────────────────────┘
```

## Recommended Providers

| Provider | GPU | Hourly Rate | Est. Cost |
|----------|-----|-------------|-----------|
| **RunPod** | RTX 4090 | $0.70 | $2-3 |
| **Vast.ai** | RTX 4090 | $0.40-0.60 | $1.50-2.50 |
| **Lambda Labs** | A100 40GB | $1.50 | $4-6 |

**Expected training time:** 2-3 hours for smoke test

## Step-by-Step Guide

### 1. Create Training Package

```bash
cd /home/workspaces/totally-spies-cultshot
nix-shell -p python3 --run "python3 tools/prepare-cloud-training-package.py"
```

Output: `cloud-training-package/wan22-smoke-training.tar.gz`

### 2. Rent GPU Instance

**Example: RunPod**

```bash
# Go to https://runpod.io/console/
# Select: RTX 4090 (24GB)
# Template: PyTorch 2.5 + CUDA 12.4
# Deploy
```

You'll get:
- SSH access
- Jupyter notebook
- ~50GB disk space

### 3. Upload Package

```bash
# From your bastion machine
scp cloud-training-package/wan22-smoke-training.tar.gz root@<gpu-ip>:~/

# SSH into GPU instance
ssh root@<gpu-ip>

# Extract
tar xzf wan22-smoke-training.tar.gz
cd wan22-smoke-training
```

### 4. Download Wan2.2 Model

```bash
# Install huggingface CLI
pip install huggingface-hub

# Login (accept terms first at https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
huggingface-cli login

# Download model (~15GB)
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir /models/Wan2.2-TI2V-5B
```

### 5. Run Training

```bash
bash launch.sh
```

Training will:
- Install Python dependencies (~5 min)
- Load Wan2.2 model (~2 min)
- Train for 1 epoch on 160 clips (~2-3 hours)
- Save LoRA checkpoints to `./output/`

### 6. Download Results

```bash
# From your local machine
scp root@<gpu-ip>:~/wan22-smoke-training/output/*.safetensors ./materials/trained-loras/

# Terminate GPU instance to stop billing
```

## Nix Integration

This approach is **pure Nix** because:
- ✅ Data preparation happens in Nix shell
- ✅ Package is reproducible (same inputs → same tarball)
- ✅ No CUDA/NVIDIA dependencies on NixOS
- ✅ Training environment is disposable cloud instance

## Cost Tracking

Create a simple cost tracker:

```bash
cat > track-costs.sh << 'SCRIPT'
#!/bin/bash
START_TIME=$(date +%s)
HOURLY_RATE=${1:-0.70}  # Default $0.70/hr for RTX 4090

echo "Training started at $(date)"
echo "Hourly rate: \$${HOURLY_RATE}"
echo ""

while true; do
    NOW=$(date +%s)
    ELAPSED=$((NOW - START_TIME))
    HOURS=$(echo "scale=2; $ELAPSED / 3600" | bc)
    COST=$(echo "scale=2; $HOURS * $HOURLY_RATE" | bc)
    echo -ne "\rElapsed: ${HOURS}h  Cost: \$${COST}   "
    sleep 60
done
SCRIPT
chmod +x track-costs.sh

# Run in background while training
./track-costs.sh 0.70 &
```

## Next Steps After Smoke Test

1. **Validate LoRA quality** - Generate test videos
2. **Full pilot training** - Upload full 601-clip dataset
3. **Production training** - Scale to full episode pool

## Troubleshooting

### CUDA Out of Memory
Reduce batch size in `train.py`:
```python
batch_size = 1  # Already minimal
gradient_accumulation_steps = 8  # Increase from 4
```

### Download Too Slow
Use `aria2c` for parallel downloads:
```bash
pip install aria2p
aria2c -x 16 -s 16 <huggingface-url>
```

### Training Crashes
Check GPU memory:
```bash
watch -n 5 nvidia-smi
```

If >95% used, reduce resolution:
```bash
--height 384 --width 640
```
