# Domain Scoring Notes

Date: 2026-05-01

Source file: `Namecheap_Market_Sales.csv`

Purpose: identify cheap `.com` domains suitable for an email-host or messaging/infrastructure product, while avoiding domains that look spammy, random, or historically tainted.

## Pipeline overview

A general-purpose domain value scanner. Originally built for finding email-host 
candidates, now domain-agnostic. It filters for brandable, aged, cheap domains 
and scores them on a 100-point value framework.

## Current search scope

| Setting | Default |
|---|---|
| TLDs | `com,net,org,io,co,me,us,info,biz,app,dev` |
| Auction price | `< $100` |
| Age cutoff | registered before `2023-05-01` — roughly 3+ years |
| Renewal fee | included in total first-year cost |

## Taint gates

### 1. Reputation / infrastructure taint

Reject if:

- Domain MX points to disposable/bulk email infrastructure:
  - `mail.eye-mail.net`
  - `mx1.emailsendhub.com`
  - `mx.plingest.com`
- Domain appears in disposable-email blocklists.
- Search hits for spam, phishing, malware, scam, fake-login, or warez.

Domains with **clean MX** or **no MX** are accepted.

### 2. Name taint

Same as before: adult, gambling, pharma, scam, warez keywords.

### 3. Random-sequence taint

Reject machine-generated-looking strings: no vowels, 4+ consecutive consonants, 
pinyin-like with no market relevance.

## Taint gates

A domain should be excluded before scoring if it trips any of these gates.

### 1. Reputation / infrastructure taint

Reject if:

- Domain appears in disposable-email blocklists.
- Domain has obvious search hits for spam, phishing, malware, scam, disposable email, burner email, or fake-login use.
- Historical pages show adult, gambling, pharma, warez, loan, fake crypto, SEO doorway, or bulk-email activity.
- Domain is directly tied to disposable email infrastructure, not merely parked.

Known suspicious/disposable MX infrastructure seen in the dataset:

- `mail.eye-mail.net`
- `mx1.emailsendhub.com`
- `mx.plingest.com`

### 2. Name taint

Reject or heavily penalise names containing:

- `sex`, `xxx`, `porn`, `adult`, `escort`
- `casino`, `poker`, `bet`
- `loan`, `debt`, `payday`
- `viagra`, `cialis`, `pill`, `rx`
- `hack`, `crack`, `warez`
- `scam`, `spam`, `adware`, `malware`
- obvious random machine-generated strings

### 3. Random-sequence taint

Reject if the domain looks generated rather than named:

- 4–5+ consecutive consonants
- no vowels
- no recognisable word parts
- random clusters like `xsznlyj`, `rcmygs`, `hdyutian`
- pinyin/foreign transliteration with no clear target-market relevance

Some short names score well mechanically but still look spammy to recipients. These should be manually reviewed.

## 100-point scoring framework

After taint filtering, score remaining domains out of 100.

| Category | Weight | Notes |
|---|---:|---|
| History / reputation | 25 | Positive clean history beats blank slate; abuse history rejects. |
| Name appeal / least-spammy look | 25 | Real words, clean compounds, pronounceability, no random strings. |
| Email-host relevance | 20 | Communication, networking, routing, identity, reliability, infra terms. |
| Commercial/domain value | 15 | Age, length, Estibot, search count, DR, bids. |
| Cost / expiry | 10 | Total first-year price, renewal, expiry runway. |
| Legal / brand safety | 5 | Generic names score better; prior identity/trademark risk penalised. |

### History / reputation scoring

| Signal | Score |
|---|---:|
| Clean positive historical website | 20–25 |
| Blank slate, no bad history | 14–18 |
| Parked only, no bad history | 10–14 |
| Mixed or unclear history | 0–8 |
| Spam/adult/pharma/gambling/malware history | Reject |

Examples:

- `askakorean.com`: strong clean history, but poor email-host fit and prior identity residue.
- `encaustics.com`: legitimate art-site history; clean but not email-relevant.
- `multipeers.com`: clean/blank plus old and relevant.

### Name appeal scoring

High score:

- real words
- obvious two-word compounds
- clean infrastructure/product feel
- short and pronounceable
- no sketchy substrings

Low score:

- random sequences
- awkward generated names
- pinyin-like strings with no target-market relevance
- suggestive/spammy/commercial spam terms

### Email-host relevance scoring

Strong terms:

- `peer`, `peers`, `multi`
- `talk`, `memo`, `message`, `send`, `post`, `relay`
- `webhook`, `web`, `hub`, `cloud`, `node`, `link`, `signal`, `sync`
- `reliable`, `secure`, `trust`, `safe`

## Latest re-ranked top domains

These are from the 3+ year `.com` clean pool, scored using the 100-point framework above.

| Rank | Domain | Score | Total | Expires | Age | Note |
|---:|---|---:|---:|---|---:|---|
| 1 | `multipeers.com` | 79 | $20.48 | 2026-05-16 | 20y | Best balance: network/email concept, old, cheap, clean. |
| 2 | `captaintalks.com` | 78 | $20.48 | 2026-05-10 | 6y | Strong communication wording. |
| 3 | `opensnip.com` | 72 | $20.48 | 2026-05-12 | 3y | Clean, modern, open/code/message feel. |
| 4 | `clydeauto.com` | 69 | $23.48 | 2026-05-02 | 12y | Positive clean site history, but auto-specific. |
| 5 | `asfans.com` | 66 | $20.48 | 2026-05-10 | 22y | Short, clean history, less email-specific. |
| 6 | `weddingmemo.com` | 66 | $20.48 | 2026-05-06 | 14y | Real words, clean, memo relevance. |
| 7 | `waldorfhub.com` | 66 | $23.48 | 2026-05-02 | 5y | Hub/community/inbox feel. |
| 8 | `basicsai.com` | 65 | $23.48 | 2026-05-02 | 3y | AI-relevant, clean, modern. |
| 9 | `iotcure.com` | 65 | $23.48 | 2026-05-02 | 6y | Tech feel, but health/IoT niche. |
| 10 | `reliablewebhook.com` | 65 | $23.48 | 2026-05-02 | 4y | Best literal infrastructure name, but long/newer. |
| 11 | `zebrarobot.com` | 64 | $20.48 | 2026-05-05 | 6y | Memorable tech brand feel. |
| 12 | `encaustics.com` | 61 | $20.48 | 2026-05-13 | 23y | Strong clean art-site history, weak email fit. |
| 13 | `idareweb.com` | 61 | $23.48 | 2026-05-02 | 3y | Web-related, slightly awkward. |
| 14 | `limewild.com` | 59 | $23.48 | 2026-05-02 | 4y | Brandable, clean, not email-specific. |
| 15 | `luxurysaver.com` | 56 | $20.48 | 2026-05-03 | 14y | Real words, deal-site feel. |
| 16 | `artselor.com` | 53 | $20.48 | 2026-05-13 | 3y | Art-ish, weak second half. |
| 17 | `chaintechs.com` | 53 | $23.48 | 2026-05-02 | 3y | Tech-relevant, decent. |
| 18 | `globalshopgt.com` | 53 | $23.48 | 2026-05-02 | 4y | Commerce feel, not email-specific. |
| 19 | `benedorm.com` | 52 | $23.48 | 2026-05-02 | 13y | Brandable/place-name feel. |
| 20 | `mallkun.com` | 51 | $20.48 | 2026-05-09 | 26y | Old and short, but less clear meaning. |
| 21 | `aiutonomous.com` | 50 | $23.48 | 2026-05-02 | 3y | AI/autonomous pun, slightly awkward. |
| 22 | `askakorean.com` | 50 | $23.48 | 2026-05-02 | 25y | Excellent history, poor email-host fit, prior identity. |
| 23 | `brainafy.com` | 50 | $23.48 | 2026-05-02 | 6y | Brandable, but `-afy` feels startup-generic. |
| 24 | `canutex.com` | 49 | $23.48 | 2026-05-02 | 15y | Short, neutral, not meaningful. |
| 25 | `diabinha.com` | 49 | $23.48 | 2026-05-02 | 20y | Real foreign word, not email-relevant. |
| 26 | `yizec.com` | 47 | $20.48 | 2026-05-09 | 18y | Short blank slate, but invented/random feel. |
| 27 | `flitevents.com` | 47 | $23.48 | 2026-05-02 | 7y | Event niche. |
| 28 | `jazzebel.com` | 47 | $23.48 | 2026-05-02 | 3y | Brandable, slightly odd. |
| 29 | `smartafy.com` | 47 | $23.48 | 2026-05-02 | 6y | Smartify-like, `-afy` penalty. |
| 30 | `ibluprint.com` | 45 | $20.48 | 2026-05-02 | 10y | Blueprint misspelling, acceptable but not ideal. |

## Recommended shortlist

Best overall:

1. `multipeers.com`
2. `captaintalks.com`
3. `opensnip.com`
4. `reliablewebhook.com`
5. `waldorfhub.com`
6. `zebrarobot.com`
7. `basicsai.com`
8. `weddingmemo.com`
9. `encaustics.com`

Best pure value: `multipeers.com`

Best technical/infrastructure name: `reliablewebhook.com`

Best cheap modern brand: `opensnip.com`

## LinkedIn company match check

After scoring, the top candidates are cross-checked against LinkedIn company pages using Brave Search. This detects whether a domain has (or is very close to) an existing brand presence.

| Match level | Signal | Score impact |
|---|---|---|
| **Exact** | LinkedIn company slug or title matches the domain root exactly, and the profile features the domain. | **+3 bonus** |
| **Strong** (typo/collision) | Close phonetic or spelling match to an existing LinkedIn company. | **−4 penalty** (collision risk) |
| **None** | No relevant LinkedIn company found. | **0** |

### Exact-match examples found in the pool

| Domain | LinkedIn company | Status |
|---|---|---|
| `xpenseco.com` | **XpenseCo** (exact) | ✅ Confirmed |

### Collision-risk examples

| Domain | Close match | Risk |
|---|---|---|
| `blilibili.com` | bilibili | typo/trademark |
| `untuckiy.com` | UNTUCKit | typo/trademark |
| `winzhi.com` | WinZip | typo/trademark |
| `limewild.com` | LimeWire | typo/trademark |
| `jazzebel.com` | Jezebel | typo/trademark |
| `ahgree.com` | Agree.com | typo/trademark |
| `smartafy.com` | Smartify | typo/trademark |
| `brainafy.com` | Brainify | typo/trademark |
| `chaintechs.com` | Chaintech Technology Corp | close match, different TLD |

### Cache

All LinkedIn Brave Search queries are cached in `linkedin_cache/<root>_linkedin.txt`.

This means:
- Re-runs do **not** hit the API again for already-queried domains.
- You can safely run the pipeline multiple times without exhausting Brave Search quota.
- On first run, queries are spaced with a polite `sleep 2` to avoid rate limits.

## Pipeline files

| File | Purpose |
|---|---|
| `devenv.sh` | Verify all system dependencies (bash, awk, perl, dig, curl, sort, grep). |
| `process_domains.sh` | Full pipeline: filter → MX check → score → LinkedIn match → output CSV. |
| `linkedin_check.sh` | Helper: queries Brave Search for LinkedIn company pages, uses cache. |
| `domain_scoring.md` | This document. |

### Running the pipeline

```bash
# 1. Verify environment
./devenv.sh

# 2. Run against local CSV
./process_domains.sh

# 3. Or fetch updated CSV from a URL
./process_domains.sh https://example.com/Namecheap_Market_Sales.csv
```

### Pipeline phases

1. **Download / use local CSV**
2. **Hard-filter** (age, price, TLD, digits/hyphens, bad words, random strings)
3. **Parallel MX check** — reject domains with disposable/bulk email MX
4. **Score** (100-point framework)
5. **Post-filter** — remove obvious random/pinyin/gibberish after scoring
6. **LinkedIn company match** (cached, top 100 scored candidates)
7. **Output** — `domain_results_YYYYMMDD_HHMMSS.csv`

### Environment variables

| Variable | Default | Purpose |
|---|---|---|
| `CUTOFF_YEARS` | `3` | Minimum domain age |
| `TMPDIR` | `/tmp` | Working temp space |
| `LINKEDIN_LIMIT` | `100` | How many top candidates to LinkedIn-check |
| `LINKEDIN_CACHE` | `./linkedin_cache` | Cache directory for LinkedIn queries |
| `BRAVE_SEARCH` | `~/.pi/agent/skills/local-pi-skills/brave-search` | Brave Search skill path |

## Provenance of the current pool

| Metric | Value |
|---|---:|
| Total `.com` under $10, 3+ years | 7,034 |
| After MX clean check | ~165–227 |
| After random/gibberish post-filter | ~50–70 |
| After LinkedIn matching (top 100) | 0–1 exact matches typically |

## Caveats

- This is a heuristic scoring model, not a paid historical reputation audit.
- DNSBL queries can be blocked or rate-limited from non-mailserver networks.
- Historical DNS and abuse data would be stronger with paid APIs such as SecurityTrails, DomainTools, VirusTotal, Cisco Talos, or Google Safe Browsing.
- Expiring domains may change availability or price quickly.
- LinkedIn search results depend on Brave Search indexing and may miss newly created or private company profiles.
- Exact LinkedIn matches are rare in heavily-filtered bargain-domain pools because most clean cheap names are either invented brands or previously parked.

