SearchLoop · Field Guide

Build a proprietary acquisition screen from raw data.

How to find the founder-owned companies Grata and Inven can't — without a $15–40k/yr seat. A worked playbook for partners, searchers, and independent sponsors, built with Claude Code.

By Russell Taylor — ex–Credit Suisse, Greenhill & Treis. Now SearchLoop.

Book a Dealflow Diagnostic Download the Starter Kit ↓

Walkthrough — coming soon ~8 min · Russell builds the screen live

The whole build, start to finish — sources, the test loop, and the funnel from 24,000 records to ~80 names.

The problem with renting a database

The expensive tools are opaque, they lock you in — and in a fragmented market they can't even give you a clean list.

Start with the part nobody at the database companies will say out loud: in a fragmented market like dental, the expensive tools can't even hand you a clean list.

The same practice shows up three times — under the dentist's own name, a DBA, and a stale duplicate. Ownership hides behind holdcos and management-services organizations. And half the single-location practices you actually want never made it into the firmographic feed at all. You're paying $15–40k a year, per seat, for a deduplication problem you then have to solve yourself.

There are three problems here, in order of how much they should bother you.

01 · Price & theatre

You book a call to see a price

Pricing is quote-based, demo-gated, and per-seat. Public estimates put entry around $15k, climbing past $40k with seats, data export, and API add-ons. The tell isn't the number — it's that you have to take a sales call to learn it. The price is whatever they think you'll pay.

02 · Lock-in

You own nothing at renewal

Annual contract, per seat. The list isn't yours, the enrichment isn't yours, and the day you stop paying your "pipeline" disappears. You were renting a view — not building an asset.

03 · The blind spot

Worst at the deals you want

These platforms aggregate digital exhaust — websites, funding, news, LinkedIn. The two-truck HVAC firm and the solo practice with a Wix site and no press are exactly the proprietary targets — and exactly what the feed under-indexes.

A database you share with every other fund is, by definition, not proprietary origination. You're all querying the same index and emailing the same top results.

The good news: Grata and Inven aren't magic. They're a clean UI and a scoring layer on top of public and semi-public data — most of it free, and some of it more authoritative than anything they resell. You can assemble the precise slice you need, own it outright, and tune it to your thesis instead of theirs.

	Rented seat (Grata / Inven)	Owned screen (this guide)
Cost	$15–40k+/yr, per seat, quote-based	~$50–300 in API calls per pull
Price transparency	Book a call to see it	Every line item visible
Who else has it	Every other fund	Only you
Founder-owned coverage	Weak — under-indexes off-grid SMBs	Built from the index they live in
At renewal	Access ends; you keep nothing	A database & scripts you own
Fit to your thesis	Their filters	Your gates, your weights
Explainability	Black-box relevance	Every score carries its reasons

Raw sources of truth

Every company is registered somewhere before it ever appears in a database product. Find that index and you're upstream of the aggregators.

There's a system of record — a place a business has to exist to operate. Work from it directly and you're using the same raw material Grata buys, normalizes, and rents back to you. Three sources cover most of the lower middle market.

Spine for trades

Google Maps

The census of Main Street. Every local, physical business has a listing because that's how customers find them — name, address, phone, website, category, rating, review count. More complete than any firmographic feed, because being listed is existential, not marketing.

Spine for healthcare

The NPI registry (NPPES)

The federal enumeration of every US healthcare provider. Free, bulk-downloadable as CSV, a public no-auth API, and taxonomy-coded — so you isolate "general-practice dentists in Texas" with a code, not a guess. 7M+ active records, kept current by law.

Spine for the UK

Companies House

The unfair advantage for UK targets. Free API with company profiles, officers, and the PSC register (owners >25%). Accounts are free to download, ~60% as structured XBRL you can parse for revenue and headcount. Ownership and financials, one source.

The pattern across all three: one source is your spine — the index — and the others become enrichment and corroboration. In the worked example below, NPPES is the spine; Google Maps, the practice website, and reviews are the enrichment.

NPPES pullfree · current · yours

# Bulk: the full FOIA-disclosable file, updated monthly (+ weekly deltas)
https://download.cms.gov/nppes/NPI_Files.html

# Or the public API — no auth, 200 records/page:
https://npiregistry.cms.hhs.gov/api/?version=2.1&taxonomy_description=Dentist&state=TX&limit=200&skip=0

# Dental taxonomy codes you'll filter on (NUCC):
122300000X  Dentist (grouping)      1223X0400X  Orthodontics
1223G0001X  General Practice        1223E0200X  Endodontics
1223P0221X  Pediatric Dentistry     1223P0300X  Periodontics

Tell Claude "go to the NPI registry" — and here's the one thing that trips everyone up: the API caps at 1,200 results per query and filters by taxonomy description, so for a whole state you pull the bulk file and match your codes across all 15 taxonomy columns, not just the first. The downloadable kit's NPI playbook covers the rest — skipping deactivated records, using the practice (not mailing) address, and the decision-maker that org records hand you for free.

The toolkit

Seven tools. You don't operate most of them — Claude Code does. Your job is to read what it writes and tell it when it's wrong.

Tool	What it does	Cost
Claude Code	The operator. Reads your thesis, writes the Python, runs the pulls, iterates in plain English. You direct and review — you don't code.	—
Apify	Google Maps scrape (compass/crawler-google-places) — turns a search into structured rows with rating and review count.	~$0.004/listing
Firecrawl	The web reader. Returns a company site as clean markdown so a model can read the About page without choking on markup.	free tier
Tavily	Research-grade search. Have a name but no website? It finds the site, the LinkedIn, the local-news mention.	pay-as-you-go
OpenRouter	One key, every model. Route cheap classification to Gemini Flash; escalate hard judgment to a frontier model.	per token
Supabase	Where the screen lives once it outgrows a JSON file — Postgres + API + auth. Also what a client-facing dashboard runs on.	free tier
A design plugin	huashu-design / frontend-design in Claude Code — so the screen looks like a product, not a generated table. Anti-AI-slop.	free

All-in cost to stand this up: the keys are pay-as-you-go, and a single vertical pull lands in the low hundreds — $50–300 depending on how deep you enrich. Two orders of magnitude under a seat.

Architecture — so it doesn't hallucinate

Accuracy is the entire game. A screen that's 95% right means opening a call by congratulating someone on a practice they sold three years ago.

One bad row doesn't cost you one deal — it costs you your credibility with that buyer for every deal after it. The pipeline is six stages, each one inspectable and idempotent:

01 pull

NPPEStaxonomy + state

02 normalize

Dedupeto practices

03 enrich

Web + reviewsread the exhaust

04 corroborate

2-sourceclaim vs fact

05 score

2-stagecheap → smart

06 store

Supabase+ dashboard

Pull 20 → eyeball against ground truth → refine → only then scale.

Pull the first 20 records. For each, establish the truth from outside your own system — open Google, load the website, look at the listed phone. If 1 of 20 is wrong, that's a 5% error rate, and at 24,000 records it's 1,200 wrong rows. Fix the prompt now, while it's 20 rows. You re-run this loop every time you change a step, and you never trust a pull you haven't ground-truthed.

What you actually typeplain English

"Pull every dentist NPI in Texas from the NPPES API — taxonomies
1223G0001X and 1223P0221X — and write them to a JSON file. Then
take the first 20, look each one up on Google, and show me every
row where our data disagrees with reality before we scale."

Four hard rules enforce the discipline. Put them in your project's CLAUDE.md so Claude Code follows them every session:

Ground-truth before you trust

Establish truth from outside the system on a 20-row sample before scaling any step.

Two sources, or it's a claim

Owner, ownership, location count, revenue — require two independent sources, or store it as a claim, not a fact. Zero sources: the field stays blank.

Thin scrape = no classification

Under ~200 characters back from Firecrawl means the site blocked you. Don't ask the model what they do — it'll invent. Mark needs_review.

Temperature 0, log the evidence

Every classification runs deterministic, and stores the raw text it was based on — so you can audit any row back to its source.

Enrichment — reading signal in the noise

The signal that matters to an acquirer is almost never in the structured fields. It's in the unstructured exhaust around the business.

The website

Services and specialty mix (implants + ortho + sedation = bigger). Provider count — your cheapest size proxy. The About page. And a "Locations" dropdown or "now part of —" banner is a consolidation tell.

The reviews

Count is a volume proxy; rating is quality; velocity tells you growing vs. winding down. And the text is gold: "Dr. Alvarez has been my dentist for 22 years" gives tenure, owner identity, and succession risk in one line.

Ownership

Shared branding, a corporate footer, one phone routing several "locations," "part of the [X] family of practices" — these mark DSO/PE assets. A disqualifier, or a comp. Either way you tag it, with evidence.

Stack the signals and a specific shape emerges — the thing you're actually hunting:

Single location · established ~20–30 years ago · owner's surname behind the practice name · no DSO branding · one phone · a founder who keeps appearing in decade-old reviews.

That is the founder-owned, no-obvious-successor practice a DSO or a search fund most wants to buy — and the exact profile Grata is least likely to surface, because none of those signals live in a firmographic feed. You're not finding companies; you're finding situations. Every enrichment is a model call over corroborated raw text — never a guess.

One enriched recordillustrative

{
  "practice": "Lakeside Family Dentistry",
  "location": "Round Rock, TX",
  "taxonomy": "1223G0001X · General Practice",
  "locations": 1,   "providers_listed": 2,   "established": 2002,
  "google_reviews": 418,  "rating": 4.8,
  "ownership": "Independent",
  "ownership_evidence": "single phone; no DSO footer; surname branding",
  "owner_signal": "Dr. Karen Vogel named in 31 reviews; 'my dentist 19 yrs'",
  "succession_flag": true,
  "sources": ["nppes", "google_maps", "practice_website"]
}
// Note what's NOT here: no invented revenue, no guessed email, no owner age.
// If two sources didn't support it, it isn't in the record.

Scoring & the two-stage funnel

You don't run a frontier model over 24,000 records. You filter cheap, then spend real money only on the survivors — and let the funnel do the work.

Stage 1 — the cheap filter. A fast model (Gemini Flash) runs over the structured fields and kills the obvious no's: wrong taxonomy, wrong geography, clearly multi-location, obvious DSO, below a volume floor. Nearly free at this volume. Stage 2 — enrich & qualify the survivors. You spend real money only on the few thousand that matter — the deeper enrichment, with a frontier model reserved for the hard judgment calls (is this PE-owned? who owns it?). Qualification itself stays deterministic: an operator-tunable gate model, not a black-box AI score — every result carries the gates it passed and the evidence behind them.

Gate	Passes when	Source
Specialty fit	Taxonomy matches thesis (e.g. GP + pediatric)	NPPES
Geography	Inside target metro / state	NPPES + Maps
Size	Provider count / review volume in band	Web + reviews
Ownership	Independent, not DSO/PE-held	Corroborated

The one rule that matters here: "unknown" is not "fail." An unknown gate is a research task, not a rejection — it routes back for another enrichment pass. Treating unknowns as fails is how you silently delete your best, hardest-to-read targets — precisely the off-grid ones you came for.

Figures below are illustrative — representative magnitudes for a state-wide pull, not a delivered count.

Raw NPPES pull24,000

Dental taxonomies, TX — Type 1 + Type 2 NPIs

Collapse to distinct practices9,200

Dedupe individuals → practices; drop inactive

Stage 1 — cheap filter2,600

Single-location, independent, minimum volume

Stage 2 — enrich + qualify survivors310

Acquirability above cutoff

Human review of the top tier~80

Worth a personal, partner-led approach

24,000 → 80

Illustrative magnitudes for a state-wide pull — the shape is the point. Every name's score is a transparent roll-up of the gates it passed, not a model's guess, so you can defend each one in an IC meeting instead of a relevance number you can't explain.

Acquirability is a transparent weighted roll-up of the gates — not a model score. Illustrative.

Practice	City	Loc.	Est.	Reviews	Acq.	Why
Lakeside Family Dentistry	Round Rock	1	2002	418	91	Independent, single-site, strong succession signal
Hill Country Dental Care	New Braunfels	1	1998	263	88	Long-tenured owner, no successor named
Brushy Creek Smiles	Cedar Park	2	2009	540	64	Two sites, younger owner, lower urgency
Capital Dental Group	Austin	6	2014	1,910	22	Multi-site, DSO branding — comp, not target

And because the thresholds are knobs, the screen is yours to re-tune. Your size band, your geographies, your weighting of succession versus scale — move the gates and the list re-ranks. A rented database gives you their filters. This gives you your thesis, expressed as a screen.

Where I stop

I just handed you the screen. I'll be just as direct about what I didn't hand you, and why.

The screen is the commoditizable part. Known sources, scripted pulls, a scoring funnel. It takes discipline — mostly the testing discipline above — but anyone serious can build it from this guide. So I give it away. It also makes the case for what I do better than any pitch deck could.

What I don't give away is turning that screen into booked calls — because that's where the actual difficulty, and the judgment, lives:

Copy

The reply you have to earn

Getting a skeptical 62-year-old founder who's been cold-pitched by a dozen DSOs to reply to you is not a template. It's voice, segment-specific framing, and a few-shot library built from messages that have actually worked.

Deliverability

Where a perfect list dies

Private IPs, domain warmup, inbox placement. Get this wrong and your 80-name list lands in spam — and you never find out it happened.

Orchestration

A system, not a script

Email, LinkedIn, and — where it fits — WhatsApp, sequenced, each prospect locked to one sender, replies routed and handled. Then the feedback loop sharpens both the copy and the screen.

The data is the cost of entry. The conversion is the moat.

I'm comfortable giving you the data layer because I've built the conversion layer enough times to know that's where the real work — and the real edge — is.

KIT

Take the starter kit

The keepable toolkit to build the screen yourself — plain text you own. Free, no email.

A ready-to-run folder: a setup guide for the whole toolbox (the CLIs — Supabase, Vercel, Playwright — a .env key template for OpenRouter/Apify/Firecrawl/Tavily, and a ready Apify MCP config), the CLAUDE.md with the four hard rules that keep it accurate, a thesis + scoring template (your tunable gates), a copy-paste Day-1 bootstrap prompt that scaffolds the pipeline in Claude Code, the DESIGN.md behind this page, and a fictional sample dataset.

Download the Starter Kit ↓ Or have it built for you

The screen is the free part

Want the whole origination engine?

The screen above, plus the outreach that turns it into a live, replying pipeline — done-for-you, tuned to your thesis, run as a system. I'm an ex-investor, so I build origination the way a deal team actually uses it, not the way a software vendor imagines you might.

Book a Dealflow Diagnostic See example screens

Russell Taylor — SearchLoop rt@searchloop.ai searchloop.ai