# Local GPU Training Setup

Your machine has an **NVIDIA RTX 6000 Ada 48GB** with driver 580.142 (CUDA 13.0).

## Quick Start

### 1. Create Python Environment (Using System Python)

```bash
# Find system Python (outside Nix)
SYSTEM_PYTHON=$(ls /run/current-system/sw/bin/python* | grep -E 'python3\.(11|12)$' | head -1)
if [ -z "$SYSTEM_PYTHON" ]; then
  echo "System Python not found. Using /usr/bin/python3 as fallback."
  SYSTEM_PYTHON="/usr/bin/python3"
fi

echo "Using: $SYSTEM_PYTHON"

# Create venv
$SYSTEM_PYTHON -m venv ~/.venv/spies-gpu
source ~/.venv/spies-gpu/bin/activate

# Upgrade pip
pip install --upgrade pip
```

### 2. Install PyTorch with CUDA

```bash
# PyTorch with CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Verify GPU access
python3 -c "import torch; print(f'CUDA: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
```

### 3. Install DiffSynth-Studio

```bash
# Clone DiffSynth-Studio
git clone https://github.com/modelscope/DiffSynth-Studio.git ~/src/DiffSynth-Studio
cd ~/src/DiffSynth-Studio

# Install
pip install -e .

# Install additional dependencies
pip install accelerate transformers diffusers pillow opencv-python imageio imageio-ffmpeg einops safetensors omegaconf
```

### 4. Download Wan2.2 Model

```bash
# Install huggingface CLI
pip install huggingface-hub

# Login (required - model requires acceptance)
huggingface-cli login

# Download model (requires ~15GB)
# First accept terms at: https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ~/models/Wan2.2-TI2V-5B
```

### 5. Run Smoke Test

```bash
# Navigate to repo
cd /home/workspaces/totally-spies-cultshot

# Activate venv
source ~/.venv/spies-gpu/bin/activate

# Dry-run validation (no GPU needed)
python3 tools/run_wan22_train.py \
  --training-data-dir materials/training-data/iiw-english-smoke-video-only \
  --output-dir materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints \
  --model-variant ti2v-5b \
  --lora-rank 16 \
  --epochs 1 \
  --dataset-repeat 20 \
  --learning-rate 2e-5 \
  --num-frames 81 \
  --height 480 \
  --width 832 \
  --gradient-accumulation-steps 4 \
  --dry-run

# Actual training (GPU required)
python3 tools/run_wan22_train.py \
  --training-data-dir materials/training-data/iiw-english-smoke-video-only \
  --output-dir materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints \
  --diffsynth-path ~/src/DiffSynth-Studio \
  --wan22-model ~/models/Wan2.2-TI2V-5B \
  --model-variant ti2v-5b \
  --lora-rank 16 \
  --epochs 1 \
  --dataset-repeat 20 \
  --learning-rate 2e-5 \
  --num-frames 81 \
  --height 480 \
  --width 832 \
  --gradient-accumulation-steps 4
```

## Expected Training Time

**RTX 6000 Ada 48GB specs:**
- 160 smoke clips × 20 repeats = 3,200 iterations
- ~2-3 seconds per iteration (batch size 1, gradient accumulation 4)
- **Estimated time: 2-3 hours**

## Monitoring

In another terminal:
```bash
watch -n 5 nvidia-smi
```

Check checkpoints:
```bash
ls -lh materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints/
```

## After Training

Test inference with your LoRA:
```bash
cd ~/src/DiffSynth-Studio
python3 examples/wan2.2/text_to_video.py \
  --model_path ~/models/Wan2.2-TI2V-5B \
  --lora_path /home/workspaces/totally-spies-cultshot/materials/training-data/iiw-english-smoke-video-only/wan22_checkpoints/checkpoint-xxx \
  --prompt "Sam in green catsuit in WOOHP HQ" \
  --output_path /tmp/test_output.mp4
```

## Troubleshooting

### CUDA Out of Memory
Reduce resolution or batch size:
```bash
--height 384 --width 640 --gradient-accumulation-steps 8
```

### Model Download Fails
The Wan2.2 model requires you to:
1. Have a HuggingFace account
2. Accept the license at https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B
3. Use `huggingface-cli login` with an account that has accepted terms

### Import Errors
Ensure you're in the venv:
```bash
source ~/.venv/spies-gpu/bin/activate
python3 -c "import torch, diffsynth"
```

## Alternative: Use System Python

If Nix Python causes issues:
```bash
# Use system Python directly
/usr/bin/python3 -m venv ~/.venv/spies-gpu
```

## Next Steps After Smoke Test

1. **Validate output quality** - Generate test videos with the trained LoRA
2. **Full pilot training** - Run on 601 clips (full iiw-english-pilot dataset)
3. **Character identity integration** - Add character plate references
4. **Validation shot generation** - Use trained model for TSV-001/002/005 shots
