The OSINT Methodology Framework: 4 Phases of Every Investigation

Planning, collection, analysis, reporting. The OSINT methodology framework that turns ad-hoc searching into reproducible investigations.

The difference between a Twitter thread and a publishable investigation is methodology. Investigators who produce consistent, defensible findings follow the same four-phase framework — planning, collection, analysis, reporting — whether the subject is a sanctioned oligarch or a local zoning scandal.

This post walks through each phase, the deliverables it should produce, and the mistakes that most often break an investigation.

Phase 1: Planning

Planning translates a vague interest into a specific question. See /methodology/planning/ for the full framework.

A good investigative question is falsifiable. "Is this shell company connected to Person X?" is falsifiable. "Tell me about this shell company" is not.

The planning phase should produce:

  • A primary question and two or three subquestions
  • A list of sources you plan to hit, ranked by expected yield
  • A legal and ethical review: what jurisdictions apply, what terms of service constrain you, what personal data triggers GDPR or similar regimes
  • A time budget

Investigators skip planning because it feels like paperwork. They pay for it later when they have 400 screenshots and no thesis.

Phase 2: Collection

Collection is the phase most people associate with OSINT. See /methodology/collection/.

Good collection is:

  • Sourced — every artifact has a URL, timestamp, and retrieval method recorded.
  • Preserved — pages archived to the Wayback Machine, PDFs hashed, screenshots captured with visible URL bars.
  • Bounded — you collect against the plan, not against whatever is interesting.

A minimal collection log looks like this:

2026-03-14T15:22Z | https://opencorporates.com/companies/us_de/7412389 | archive.org/web/20260314152200/... | SHA-256: 9f4e... | notes: officer list page 1

If you cannot reproduce a finding from your log alone, your collection is not sufficient. See /tools/metadata/ for artifact-level forensics.

Common Sources by Domain

Domain Primary sources
Corporate OpenCorporates, state registries, SEC EDGAR, UK Companies House
Journalism Local court records, FOIA/FOI portals, beat reporting archives
Financial OFAC sanctions, FinCEN advisories, court dockets, property records
Digital WHOIS, DNS, Certificate Transparency logs, Shodan
Geospatial Sentinel Hub, Planet, Google Earth Pro, Mapillary

Phase 3: Analysis

Analysis is where entities become relationships and data becomes claims. See /methodology/analysis/.

This phase has three sub-tasks:

  1. Entity resolution — is "John A. Smith" in the Delaware filing the same "John Smith" in the Florida property record? Resolve using dates of birth, middle initials, addresses, and corroborating documents.
  2. Link analysis — map the relationships. Maltego is the standard for visualization; a spreadsheet with source/target/relationship columns works for smaller cases.
  3. Contradiction hunting — actively look for evidence that falsifies your working hypothesis. Investigators who only confirm are writing advocacy, not investigation.

The output of analysis is a set of claims, each with a confidence level (confirmed, probable, possible, speculative) and a list of supporting artifacts.

Phase 4: Reporting

Reporting is the phase amateurs neglect and professionals obsess over. See /methodology/reporting/.

A report should:

  • State the question and scope up front
  • Separate findings from analysis from speculation
  • Cite every claim to a specific artifact in the collection log
  • Disclose limitations — what you could not verify, what sources refused to respond
  • Name contributors and date the report

The reason for rigor is not just credibility. A well-structured report survives legal review, makes peer review possible, and lets a reader reach their own conclusions. For legal-grade document review inside the reporting phase, many investigators use the Subthesis legal document analysis tool to triage large filing sets consistently.

Iteration

The framework is not strictly linear. Collection surfaces new questions that feed back into planning. Analysis reveals gaps that require more collection. What matters is that you can always say, at any moment, which phase you are in and what its current deliverable is.

Investigations without phase discipline drift. You end up with a folder of screenshots, a half-drafted Google Doc, and no idea what you were trying to prove.

A Worked Example

Say you want to investigate whether a named lobbyist registered a recent domain aligned with a dark-money campaign.

  • Planning: Question — did Lobbyist L register domain D within 60 days of PAC P's launch? Sources — WHOIS, historical WHOIS (DomainTools, WhoisXML), SEC filings, state lobbying registries.
  • Collection: Pull current WHOIS. Pull historical WHOIS snapshots. Archive both. Pull PAC filings from FEC. Log each with hash and timestamp.
  • Analysis: Compare registrant name, email, and address across records. Check date of registration against PAC formation. Look for contradicting evidence — did someone else use the same registrant address?
  • Reporting: Write a finding: "Domain D was registered on [date] by [registrant] whose email matches [source]. This is [N] days before PAC P's formation filing dated [date]. Confidence: confirmed. Limitations: registrant used privacy protection after [date], so we cannot track post-registration changes."

The same four phases apply whether the investigation is a dark-money domain, a geolocation puzzle, or a beneficial-ownership trace.

Tooling Per Phase

The Framework Is the Point

Tools change. The framework does not. An investigator who internalizes planning → collection → analysis → reporting can pick up any new source or platform and slot it into the existing workflow. An investigator who learns tools without the framework will produce work that falls apart under scrutiny.

Read /methodology/ for the expanded treatment of each phase, and browse /case-studies/ to see the framework applied to real investigations.

More from the blog