Building an OSINT Workflow: From Question to Report
A repeatable end-to-end OSINT workflow — templates, logs, and habits that turn individual skill into reliable output across investigations.
Individual OSINT skill is not the constraint for most investigators who stall. The constraint is workflow — the infrastructure that lets skill produce output consistently. This post lays out the end-to-end workflow this site recommends, with the templates and tools that support each step.
The Workflow at a Glance
Question → Plan → Collect → Analyze → Report → Archive
Each step has an input, an activity, an output, and a log. Discipline is keeping those four items explicit for every step.
Step 1: The Question
Every investigation starts with a falsifiable question. Not a topic, not a subject — a specific, testable claim.
Bad:
- "Investigate Meridian Holdings"
- "What's up with Sarah Lin"
Good:
- "Is Sarah Lin the beneficial owner of Meridian Holdings LLC?"
- "Did Meridian Holdings receive grants from PAC P between 2022 and 2024?"
The question determines which sources are relevant, what confidence level is required, and when the investigation is done. Without it, everything else drifts.
Template:
Investigation: [slug]
Primary question: [one sentence, testable]
Subquestions: [two or three]
Definition of done: [what evidence would conclude this?]
Definition of not-done: [what would require more work?]
Step 2: The Plan
Planning translates the question into a collection strategy. See /methodology/planning/.
Output:
Sources planned (ranked by expected yield):
1. SEC EDGAR full-text: "Meridian Holdings"
2. UK Companies House: Meridian-named entities
3. PACER: Sarah Lin as party
4. OpenCorporates: Meridian and Lin searches
5. State UCC filings: Meridian as debtor
Legal/ethics memo: [three paragraphs — public interest, subjects, safeguards]
Time budget: [hours allocated, by source]
Kill criteria: [when do we stop?]
Investigators who skip the memo produce work that falls apart when anyone asks questions about scope or proportionality. See /blog/osint-and-privacy-ethical-frameworks-for-responsible-investigation/.
Step 3: The Collection Log
The collection log is the single most important artifact in an investigation. It is append-only, timestamped, and captures every artifact retrieved.
Schema:
timestamp_utc | source_type | query_or_url | artifact_path | archive_url | sha256 | notes
Example row:
2026-04-07T09:14:00Z | SEC EDGAR | efts.sec.gov/.../q=Meridian | artifacts/edgar-1.pdf | web.archive.org/... | 9f4e2c1a... | Form D filing 2020-03-11
The log enables:
- Reproducibility (a peer can rerun the search)
- Auditability (a reviewer can verify findings)
- Recovery (your own memory in three weeks)
- Legal defense (chain of custody)
Step 4: Collection Discipline
Collection is not shopping. Rules:
- Every query against every source is logged, even if it returns nothing. "Empty result" is a finding.
- Every artifact of value is preserved (Wayback, local copy, hash). See /blog/preserving-digital-evidence-screenshots-archives-hashing/.
- Pivots are logged as pivots, with the source of the pivot identified. See /blog/the-art-of-pivoting-how-one-data-point-leads-to-the-next/.
- Out-of-scope findings are recorded in a parking lot, not pursued mid-investigation.
Step 5: The Analysis Matrix
Move from collection to analysis by translating artifacts into claims with confidence levels.
claim | supporting_artifacts | confidence | contradicting_evidence | open_questions
-----|------|------|------|------
Sarah Lin controls Meridian | edgar-1.pdf, pacer-12.pdf | probable | none found | did she control it in 2018?
Confidence levels:
- Confirmed — direct documentary evidence, multiple sources, no contradiction.
- Probable — strong indirect evidence, multiple signals aligning, no direct contradiction.
- Possible — one signal, plausible, not yet corroborated.
- Speculative — hypothesis, not evidence.
Reports should not contain speculative claims presented as findings. If speculative ideas appear, they appear labeled, with the reasoning exposed.
Step 6: The Report
The report is the output. It should be:
- Questioned-based — open with the question you investigated.
- Structured — findings separated from analysis separated from limitations.
- Cited — every factual claim cites a specific artifact in the collection log.
- Dated — publication date, and for factual claims, the "as of" date the source was captured.
- Bounded — what you did not find and what you could not verify is stated explicitly.
Template:
# [Investigation title]
## Question
[one paragraph]
## Finding 1: [short claim]
[body — what, how you know, confidence level]
Sources:
- [artifact 1, archived at URL, hash]
- [artifact 2, archived at URL, hash]
## Finding 2: ...
## Limitations
- [what you could not verify]
- [what sources refused to respond]
- [what would change the findings]
## Methodology note
[one paragraph — how you approached the investigation]
## About
[credits, date, contact]
See /methodology/reporting/ for the full reporting discipline.
Step 7: Archive and Close
After reporting:
- Archive the full investigation folder (collection log, artifacts, analysis matrix, report) to durable storage.
- Hash the archive.
- Document a retention decision: how long will this be kept, under what access controls, and what triggers deletion.
- Close the investigation formally. An open investigation invites mission creep.
Tooling Stack
A working investigator's stack:
- Planning: plain Markdown for briefs; a git repo per investigation.
- Collection: browser + Wayback Machine +
wget+ WHOIS + Shodan + targeted dorks (see /tools/google-dorking/) + theHarvester + SpiderFoot. - Analysis: spreadsheets for entity resolution; Maltego for network visualization.
- Evidence:
sha256sum,exiftool, OpenTimestamps. - Reporting: Markdown + static site or PDF; DocumentCloud for embedded documents; for legal-grade document review at scale, the Subthesis legal document analysis tool integrates triage and annotation in one pass.
- Archiving: encrypted external drive; optionally cold storage in S3 Glacier or similar.
The stack is intentionally plain. Sophisticated tooling does not compensate for weak discipline.
Habits That Separate Practitioners
- Hash before you analyze. Always. The three seconds of overhead is insurance.
- Archive on capture, not later. Later is when the page is gone.
- Log negative results. They shape the next investigation's plan.
- Use UTC. Timezone ambiguity has killed more investigations than adversarial review.
- Write the report as you go. Draft findings during collection; refine during analysis. Investigators who report only at the end forget what they learned.
- Keep an ethics review alive. Revisit the ethics memo at each phase transition.
A Realistic Pace
A well-run investigation of a single corporate entity with no unusual complications runs roughly:
- Planning: 2 hours.
- Collection: 6–12 hours spread across 3–7 days (waiting on archives, registries, court systems).
- Analysis: 2–4 hours.
- Reporting: 3–6 hours for a substantive writeup.
- Archive and close: 1 hour.
Total: 14–25 hours. Investigators who claim routine investigations take an afternoon are doing something different — usually less preservation, less analysis, or less reporting discipline — and their output shows it.
When the Workflow Bends
Breaking news demands a compressed workflow. The shape stays the same, but the phases collapse:
- The plan is a paragraph, not a document.
- Collection is scattershot, then logged retroactively within hours.
- Analysis is live, with confidence levels flagged.
- Reporting acknowledges open questions prominently.
- Archive and follow-up happen after publication.
Compressed workflow is legitimate. Skipping phases entirely is not.
The Long Game
Investigators who build a workflow early produce an increasing number of finished investigations per year because each investigation's output accelerates the next. Investigators without one hit a ceiling — their individual skill grows but their throughput does not, because every investigation starts from scratch.
A workflow is an investment. A few weeks of discipline in the first three investigations pays back across every subsequent one.
Read the methodology framework for the conceptual background, work through a domain guide that matches your actual question, and then run one investigation end to end with the workflow in this post. The second investigation will be noticeably faster than the first.