# Local CUDA Training - Build in Progress

## Current Status

**Building PyTorch with CUDA support via Nix**

Started: $(date)
Expected completion: 2-4 hours

## What's Being Built

The following derivations are building in sequence:

1. **CUDA Toolkit components** (~30 min)
   - cuda_nvcc (NVIDIA CUDA compiler)
   - cuda_cudart (CUDA runtime)
   - cuda_cupti (CUDA profiling tools)
   - cuda_nvdisasm (CUDA disassembler)
   - libcusparse (Sparse linear algebra)

2. **NCCL** (~20 min)
   - NVIDIA Collective Communications Library
   - Required for multi-GPU training

3. **Triton** (~45 min)
   - GPU kernel compiler for PyTorch

4. **PyTorch with CUDA** (~1.5-2 hours)
   - Full PyTorch build with CUDA backend
   - Links against nixpkgs CUDA packages

5. **Training environment** (~5 min)
   - Python wrapper with all dependencies
   - DiffSynth, transformers, accelerate, etc.

## Why This Takes Time

Unlike pip wheels (pre-built binaries), Nix builds everything from source:
- ✅ **Reproducible** - Same inputs always produce same output
- ✅ **Pure** - No external dependencies
- ✅ **Integrated** - Works seamlessly with NixOS
- ⏱️ **Slow** - Must compile ~50GB of CUDA code

## What Happens After Build

Once complete, you can:

```bash
# Enter GPU training shell
cd ~/bastions && nix develop .#gpu-training

# Verify CUDA works
python3 -c "import torch; print(torch.cuda.is_available())"  # Should print True

# Run smoke test
cd /home/workspaces/totally-spies-cultshot
python3 tools/run_wan22_train.py \
  --training-data-dir materials/training-data/iiw-english-smoke-video-only \
  --output-dir materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints \
  --model-variant ti2v-5b \
  --lora-rank 16 \
  --epochs 1 \
  --dataset-repeat 20 \
  --learning-rate 2e-5 \
  --num-frames 81 \
  --height 480 \
  --width 832 \
  --gradient-accumulation-steps 4
```

## Monitoring Build Progress

In another terminal:
```bash
watch -n 60 'nix log /nix/store/$(ls /nix/store/ | grep python.*torch | head -1).drv 2>&1 | tail -20'
```

Or check build status:
```bash
cd ~/bastions && nix build .#devShells.x86_64-linux.gpu-training --dry-run 2>&1 | grep "will be built"
```

## Expected Resource Usage During Build

- **CPU**: 100% on all cores
- **RAM**: 16-32 GB peak
- **Disk**: ~50 GB temporary
- **Time**: 2-4 hours total

## After Build Completes

The shell will be cached in your Nix store. Subsequent uses will be instant:

```bash
# First use: builds (2-4 hours)
nix develop .#gpu-training

# Second use: instant
nix develop .#gpu-training
```

## Troubleshooting

### Build fails with CUDA error
Ensure your NixOS config has:
```nix
hardware.opengl.enable = true;
nixpkgs.config.allowUnfree = true;
```

### Out of disk space
Clean old generations:
```bash
sudo nix-collect-garbage -d
```

### Build too slow
Consider cloud training alternative in `docs/CLOUD-TRAINING.md`

## Next Steps After Successful Build

1. ✅ Verify CUDA availability
2. ✅ Test with smoke dataset (160 clips, ~2-3 hours training)
3. ✅ Validate output quality
4. ✅ Run full pilot training (601 clips, ~8-12 hours)
5. ✅ Generate validation shots for Laurent

---

**Build command:**
```bash
cd ~/bastions && nix build .#devShells.x86_64-linux.gpu-training --no-link --print-out-paths
```

**Status:** In progress (41 derivations building)
