# CultGuard Autonomous Agents

Autonomous monitoring agents built on the **pi-agent framework** for continuous OSINT surveillance, content analysis, and evidence collection.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    pi-agent Extension Layer                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  cultguard-monitor.ts                                     │   │
│  │  • Custom monitoring tools                                │   │
│  │  • Event hooks for automation                             │   │
│  │  • Session state management                               │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Autonomous Agent Processes                       │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │
│  │  Facebook    │ │   Content    │ │    Alert     │            │
│  │  Monitor     │ │   Analyzer   │ │    System    │            │
│  │              │ │              │ │              │            │
│  │ • Continuous │ │ • Risk       │ │ • Email      │            │
│  │   polling    │ │   scoring    │ │   alerts     │            │
│  │ • Auto-scrape│ │ • LLM        │ │ • Dashboard  │            │
│  │ • Spike      │ │   analysis   │ │   updates    │            │
│  │   detection  │ │ • Coordination│ │• Threshold  │            │
│  │              │ │   detection  │ │  monitoring  │            │
│  └──────────────┘ └──────────────┘ └──────────────┘            │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              Evidence Collector                           │   │
│  │  • Auto-ingest media                                      │   │
│  │  • HTTP archiving                                         │   │
│  │  • Identifier extraction                                  │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    PostgreSQL Database                           │
│  • investigations • entities • content • media                  │
│  • relationships • identifiers • annotations                    │
└─────────────────────────────────────────────────────────────────┘
```

## Quick Start

### 1. Enter the Pure Nix Shell

```bash
cd cultguard-agents
devenv shell
```

### 2. Configure pi-agent

The extension auto-loads from `.pi/extensions/cultguard-monitor.ts`. Verify it's loaded:

```bash
devenv shell
pi
# Look for "CultGuard Monitoring Extension loaded" in startup message
```

### 3. Start Monitoring

**Option A: Interactive (via pi)**
```bash
pi
# Use the new tools:
# /start_monitoring --investigation <id> --page <url>
# /check_new_content --investigation <id>
# /analyze_content --content_ids ["id1", "id2"]
# /generate_report --investigation <id>
```

**Option B: Autonomous Agents**
```bash
# Start Facebook monitoring (runs continuously)
agent-monitor \
  --investigation lebanon-liberates-2026 \
  --page "https://facebook.com/page-url" \
  --interval 30

# Start content analysis (one-shot or continuous)
agent-analyzer \
  --investigation lebanon-liberates-2026 \
  --batch-size 50 \
  --llm

# Start alert system
agent-alert \
  --investigation lebanon-liberates-2026 \
  --channel email \
  --email alerts@example.com

# Start evidence collector
agent-collector \
  --investigation lebanon-liberates-2026 \
  --auto
```

## Agent Descriptions

### 1. Facebook Monitor (`facebook-monitor.ts`)

**Purpose:** Continuous surveillance of Facebook pages for new content.

**Features:**
- Configurable polling intervals (5-1440 minutes)
- Automatic scraping when new content detected
- Activity spike detection
- Exponential backoff on failures
- Session persistence across restarts

**Usage:**
```bash
agent-monitor \
  --investigation <id> \
  --page <facebook-url> \
  --interval <minutes> \
  --depth <posts> \
  --auto-ingest true \
  --alert-threshold <count>
```

**Key Parameters:**
- `--interval`: Check frequency (default: 30 min)
- `--depth`: Posts to scrape per cycle (default: 25)
- `--auto-ingest`: Trigger evidence collection (default: true)
- `--alert-threshold`: Alert if >N posts in interval (default: 10)

### 2. Content Analyzer (`content-analyzer.ts`)

**Purpose:** Automated risk assessment and pattern detection.

**Features:**
- Heuristic risk scoring (0-100)
- LLM-powered deep analysis
- Coordination detection (similar posts, timing patterns)
- Sentiment analysis
- Suspicious pattern detection:
  - Urgency/manipulation language
  - Call-to-action phrases
  - Conspiracy markers
  - Coordinated hashtags
  - Bot-like behavior

**Analysis Types:**
- `risk`: Overall risk assessment
- `coordination`: Astroturfing detection
- `sentiment`: Positive/negative/neutral
- `claims`: Fact-checking assertions

**Usage:**
```bash
agent-analyzer \
  --investigation <id> \
  --batch-size 50 \
  --type risk \
  --type coordination \
  --llm \
  --since <hours>
```

**Risk Scoring:**
- 0-14: Low risk
- 15-29: Medium risk
- 30+: High risk (triggers alerts)

### 3. Alert System (`alert-system.ts`)

**Purpose:** Real-time notifications for significant events.

**Alert Types:**
- `ACTIVITY_SPIKE`: Unusual posting volume
- `HIGH_RISK_CONTENT`: Content with high risk scores
- `COORDINATION_DETECTED`: Synchronized behavior
- `NEW_ENTITY`: Multiple new entities
- `MONITORING_FAILURE`: System errors

**Delivery Channels:**
- Email (via Gmail MCP integration)
- Dashboard (Metabase updates)
- Console logging

**Usage:**
```bash
agent-alert \
  --investigation <id> \
  --channel email \
  --email alerts@example.com \
  --min-risk 30 \
  --interval 15
```

### 4. Evidence Collector (`evidence-collector.ts`)

**Purpose:** Automated evidence ingestion and archiving.

**Features:**
- Media download and deduplication (SHA-256)
- HTTP response archiving
- Identifier extraction (phone, email, domain)
- Metadata preservation
- Batch processing

**Usage:**
```bash
# One-shot collection
agent-collector \
  --investigation <id>

# Continuous collection
agent-collector \
  --investigation <id> \
  --auto \
  --interval 10 \
  --media true \
  --archive true
```

## pi-agent Extension Tools

The `cultguard-monitor.ts` extension provides these tools to the LLM:

### `/start_monitoring`
Start autonomous monitoring of a Facebook page.

**Parameters:**
- `investigation_id`: Target investigation
- `page_url`: Facebook page to monitor
- `monitoring_interval_minutes`: Check frequency (5-1440)
- `auto_scrape`: Automatically collect new content
- `auto_ingest`: Automatically ingest media
- `alert_on_spike`: Send spike alerts

### `/check_new_content`
Check for new content since last monitoring cycle.

**Parameters:**
- `investigation_id`: Target investigation
- `since_minutes`: Lookback window (5-1440)
- `include_comments`: Analyze comments

### `/analyze_content`
Analyze content for suspicious patterns.

**Parameters:**
- `content_ids`: Content IDs to analyze
- `analysis_type`: risk | coordination | sentiment | claims
- `include_llm`: Use LLM for deep analysis

### `/generate_report`
Generate comprehensive monitoring reports.

**Parameters:**
- `investigation_id`: Target investigation
- `time_range_hours`: Report window (1-720)
- `include_network`: Include relationship analysis
- `format`: summary | detailed | executive

### `/configure_alerts`
Configure alert thresholds.

**Parameters:**
- `post_spike_threshold`: Alert if >N posts
- `reaction_spike_threshold`: Alert if >N reactions
- `suspicious_keywords`: Keywords to flag

### Commands

```bash
# In pi interactive mode:
/monitoring-status    # Show active investigations
/monitoring-reset     # Clear monitoring state
```

## Systemd Services (Optional)

For production deployments, run agents as systemd user services:

### 1. Create Service Files

```bash
# ~/.config/systemd/user/cultguard-monitor@.service
[Unit]
Description=CultGuard Facebook Monitor - %i
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/mnm/workspaces/cultguard-agents
Environment=DEVENV_SHELL=1
ExecStart=/run/current-system/sw/bin/devenv shell -- bash -lc 'agent-monitor --investigation=%i --page="$(cat /home/mnm/workspaces/cultguard-agents/.config/%i/page.url)" --interval=30'
Restart=on-failure
RestartSec=60

[Install]
WantedBy=default.target
```

### 2. Enable Services

```bash
# Create config for investigation
mkdir -p ~/.config/cultguard/lebanon-liberates-2026
echo "https://facebook.com/page-url" > ~/.config/cultguard/lebanon-liberates-2026/page.url

# Enable and start
systemctl --user enable cultguard-monitor@lebanon-liberates-2026
systemctl --user start cultguard-monitor@lebanon-liberates-2026
```

### 3. Monitor Services

```bash
systemctl --user status cultguard-monitor@*
journalctl --user -u cultguard-monitor@lebanon-liberates-2026 -f
```

## Configuration

### Alert Thresholds

Default thresholds (configurable via `/configure_alerts`):

```typescript
{
  postSpike: 10,              // Alert if >10 posts in interval
  reactionSpike: 1000,        // Alert if >1000 reactions
  suspiciousKeywords: [
    "election", "vote", "protest",
    "breaking", "urgent"
  ],
  newEntityThreshold: 5,      // Alert if >5 new entities
}
```

### Risk Scoring Weights

```typescript
// Heuristic weights
urgency_language:      +15
call_to_action:        +10
conspiracy_markers:    +20
coordinated_hashtags:  +15
high_engagement:       +10  (>1000)
viral_engagement:      +20  (>5000)
high_share_ratio:      +15  (shares/likes >0.8)
coordination_detected: +20  (per signal)
```

## Monitoring Workflows

### Workflow 1: New Investigation

```bash
# 1. Start monitoring target page
pi "/start_monitoring --investigation demo-2026 --page <url> --interval 15"

# 2. Wait for initial scrape to complete
# 3. Generate baseline report
pi "/generate_report --investigation demo-2026 --time_range_hours 24"

# 4. Start alert system
agent-alert -i demo-2026 --channel email --email team@example.com
```

### Workflow 2: Event Response

```bash
# 1. Check for new content after breaking event
pi "/check_new_content --investigation demo-2026 --since_minutes 60"

# 2. Analyze high-risk content
pi "/analyze_content --content_ids [\"id1\", \"id2\"] --type risk --llm"

# 3. Generate situational report
pi "/generate_report --investigation demo-2026 --format executive"
```

### Workflow 3: Coordination Analysis

```bash
# 1. Fetch recent content batch
# 2. Run coordination detection
agent-analyzer \
  -i demo-2026 \
  --batch-size 100 \
  --type coordination \
  --since 24

# 3. Review flagged content in Metabase
# Open http://localhost:3100 and query v_content_timeline
```

## Database Integration

All agents write to the PostgreSQL database:

- **content**: Posts, comments, ads with risk scores
- **media**: Downloaded images/videos with SHA-256 hashes
- **annotations**: LLM analysis results and alerts
- **identifiers**: Extracted phone/email/domain
- **http_log**: Archived HTTP captures
- **relationships**: Detected entity connections

### Useful Views

```sql
-- Monitoring coverage
SELECT * FROM v_embedding_coverage;

-- Entity summary with content counts
SELECT * FROM v_entity_summary;

-- Recent high-risk content
SELECT id, text, risk_score, collected_at
FROM content
WHERE investigation_id = 'demo-2026'
AND collected_at > NOW() - INTERVAL '24 hours'
ORDER BY risk_score DESC
LIMIT 20;

-- Alert history
SELECT ref_id, note, confidence, created_at
FROM annotations
WHERE ref_type = 'alert'
ORDER BY created_at DESC;
```

## Error Handling

### Agent Failures

- **Exponential backoff**: 2x delay per consecutive failure (max 4 hours)
- **Alert on 3+ failures**: Sends MONITORING_FAILURE alert
- **State persistence**: Monitoring state survives restarts

### Rate Limiting

- Facebook scraping respects platform rate limits
- Configurable intervals prevent aggressive polling
- Automatic backoff on HTTP errors

### Recovery

```bash
# Reset monitoring state
pi "/monitoring-reset"

# Restart failed agent
agent-monitor -i <id> -p <url>

# Check database for stuck state
search --sql "SELECT * FROM content WHERE raw_json->>'processed' = 'false'"
```

## Best Practices

1. **Start conservative**: Use 30-60 minute intervals initially
2. **Monitor token usage**: LLM analysis can be expensive
3. **Set appropriate thresholds**: Tune based on baseline activity
4. **Review alerts daily**: Avoid alert fatigue
5. **Archive everything**: Enable `--archive true` for legal preservation
6. **Test in staging**: Validate workflows before production

## Troubleshooting

### Agent won't start
```bash
# Check devenv is running
devenv up

# Verify database connection
db-stats

# Check extension loaded
pi  # Look for "CultGuard Monitoring Extension loaded"
```

### No content detected
```bash
# Verify page is being scraped
search --sql "SELECT * FROM entities WHERE investigation_id = '<id>'"

# Check last scrape time
pi "/monitoring-status"

# Manually trigger scrape
fb-scrape "<url>" --depth 10 -i <id> --save
```

### Alerts not sending
```bash
# Check alert configuration
pi "/configure_alerts"

# Verify email channel setup (TODO: integrate gmcli)

# Check database for alert records
search --sql "SELECT * FROM annotations WHERE ref_type = 'alert' ORDER BY created_at DESC LIMIT 10"
```

## Next Steps

### Planned Improvements

1. **Email integration**: Wire up gmcli skill for alert delivery
2. **Dashboard widgets**: Custom Metabase dashboards
3. **Embedding pipeline**: Semantic similarity for coordination detection
4. **Multi-platform**: Twitter/X, Telegram monitoring
5. **OCR integration**: Text extraction from images
6. **Network analysis**: Graph-based entity relationship mapping

### Contributing

- Add new analysis patterns to `content-analyzer.ts`
- Extend alert types in `alert-system.ts`
- Build custom pi-agent tools in `cultguard-monitor.ts`
- Create Metabase dashboards in `data/schema.sql`

## Security Notes

- **Local-only**: All agents run locally, no cloud dependencies
- **Database isolation**: Investigation data stays in PostgreSQL
- **Credential safety**: Facebook session managed by Chromium, not agents
- **Review before alerting**: Human confirmation recommended for high-stakes alerts

---

For more information, see:
- Main README: `/README.md`
- pi-agent docs: `/nix/store/.../pi-coding-agent/README.md`
- Database schema: `data/schema.sql`
