> **⚠️ SUPERSEDED** — This document is a historical receipt. See `s7-current-understanding.md` for the authoritative current position.

# Totally Spies S7 animation style — Gemma 3 12B second opinion

## Purpose

This note records a second-opinion multimodal pass using local Ollama
`gemma3:12b` to validate the earlier `qwen2.5vl:7b` assessment.

Raw outputs are stored at:

- `materials/benchmark/official-vimeo-trailer/ts-vision-second-opinion-gemma3-12b.json`

## Runtime context

- Ollama version: `0.19.0`
- `gemma4:*` could not be tested because it requires a newer Ollama version
- `gemma3:12b` was chosen as the strongest practical multimodal fallback that
  is compatible with the current daemon
- Inference was CPU-slow on this machine (roughly 6–15 minutes per prompt)

## What Gemma 3 12B said

### Style assessment

Gemma identified the show as:

- clean **vector-based linework**
- **flat, solid colors** with minimal gradients
- **digital cutout / rigged** character animation
- **limited animation** rather than full animation

### Hold-pair assessment

On a dialogue pair ~120 ms apart, Gemma said:

- the **background is completely static**
- character posture is **almost identical**
- only **very slight** expression or arm-position shifts occur
- the shot is a **held drawing / limited animation** style, not fluid redraw

### Action-pair assessment

On an action pair ~120 ms apart, Gemma said:

- motion is **controlled and limited**
- changes are **sparse** between frames
- it looks more like **coordinated pose interpolation** than dense unique
  redraws
- the animation prioritizes impact over hyper-realistic fluidity

### Direct AI-video relevance assessment

Gemma explicitly chose:

> **B) Preserving temporal consistency across mostly-held 2D cutout-style
> animation with restrained movement**

over:

> learning complex realistic motion physics

It also said the main failure mode of a generic video model would be to
**hallucinate unwanted motion and detail**, imposing realistic physics and
extra dynamics on a style that is deliberately restrained.

## Conclusion

Gemma 3 12B **confirms the earlier assessment**.

The key conclusion remains the same:

- Totally Spies S7 is not difficult because it contains highly complex motion
- It is difficult because it requires **discipline**: stable characters,
  static or mostly static backgrounds, restrained motion, and strong
  inter-frame consistency
- Therefore, the central video-generation problem is **temporal consistency**,
  not realistic physics simulation

## Practical implication

For a local stronger follow-up:

- `gemma3:27b` is the next compatible step on the current Ollama version
- `gemma4:26b` is likely the best next test once Ollama is upgraded
