Google Dorking: Advanced Search Operators for Investigators

Advanced Google search operators for OSINT — site, filetype, intitle, inurl, and combinations that surface what default searches miss.

Default Google search is tuned for commerce and news. The operator syntax that has existed since the early 2000s still works, and it is the fastest way to surface documents, exposed configuration, and targeted people-search results that default queries bury. See /tools/google-dorking/ for the full tutorial; this post is the working reference.

The Operators That Matter

Operator Effect
site: Restricts to a domain
filetype: Restricts to a file extension
intitle: Page title contains the term
inurl: URL contains the term
intext: Body text contains the term
"..." Exact phrase
-term Excludes term
OR Boolean OR (must be uppercase)
* Wildcard
before: / after: Date range
cache: Google's cached copy (largely deprecated)

Combine them. Single operators are blunt; stacked operators are precise.

Document Discovery

Publicly posted documents are a gold mine. Agencies, companies, and courts routinely leave PDFs on web servers that never appear in their own navigation.

site:example.com filetype:pdf
site:example.com filetype:pdf "confidential" OR "internal use"
site:example.com (filetype:xls OR filetype:xlsx) "budget"
site:sec.gov filetype:pdf "Meridian" "beneficial owner"

For government sources:

site:.gov filetype:pdf "privacy impact assessment" "facial recognition"
site:.gov filetype:pdf inurl:foia

These produce filings, audits, and released records that no press release announced.

Employee and Contact Mapping

Targeted site queries on LinkedIn surface profiles without the platform's rate limits:

site:linkedin.com/in "Example Corp" ("compliance" OR "AML")
site:linkedin.com/in "former Example Corp" "general counsel"

For email pattern discovery, combine with explicit format searches:

"@example.com" "Director" -site:example.com
"first.last@example.com" site:example.com

Pair these results with Hunter.io to verify deliverability. For platform-specific techniques, see /tools/social-media/.

Archived and Historical Content

Google sometimes indexes pages that the original site has removed. Combined with the Wayback Machine, this produces content that live navigation will not.

site:web.archive.org "example.com" "board of directors"
site:example.com "board of directors" before:2020-01-01

The before: and after: operators are underused and routinely productive for tracing when a claim or statement appeared.

Exposed Configuration and Index Pages

Poorly configured web servers expose directory listings. These are legitimate OSINT targets when hosted publicly, but note the legal line in /blog/legal-boundaries-of-osint/ — viewing a public listing is fine; accessing credentials is not.

intitle:"index of" "parent directory" site:example.com
intitle:"index of /backup" -site:example.com
inurl:".git/config" "remote" -github.com

The Google Hacking Database (exploit-db.com/ghdb) maintains thousands of such patterns. Use them for reconnaissance against systems you have authorization to test, or against your own organization.

Court Records and Filings

Many court systems publish searchable dockets that Google indexes.

site:courtlistener.com "Meridian Holdings"
site:pacer.gov filetype:pdf "complaint"
"civil action no." filetype:pdf site:uscourts.gov

State courts vary wildly; a one-line dork against a state's domain often produces what the court's own search box hides.

People Search

For public-figure research, stacked operators produce better-than-search-engine results:

"John A. Smitherton" "Delaware" ("LLC" OR "Inc") -site:linkedin.com
"John A. Smitherton" site:justice.gov
"John A. Smitherton" (cv OR resume OR biography) filetype:pdf

For private individuals, the ethics framework applies hard — just because an operator returns a result does not mean publishing it is appropriate.

Negative Operators

Exclusion is as important as inclusion. A target name may produce thousands of false positives; negative operators narrow fast.

"Meridian Holdings" -"Meridian Holdings Corporation" -site:bloomberg.com
"Sarah Lin" "Meridian" -linkedin.com -facebook.com -twitter.com

Pattern Language for Dorking

Think in patterns:

  1. Where would this kind of information live? Court site, SEC, agency domain, archived site.
  2. What file format is it in? PDF, XLS, DOC, MP4.
  3. What words would appear in it but not elsewhere? Jargon, section headers, form numbers.
  4. What words would a false positive contain? Exclude them.

The best dorkers are not the ones who memorize operators but the ones who model where documents live.

Rotation and Rate Limits

Google aggressively rate-limits. For sustained work:

  • Rotate among Google, DuckDuckGo, Bing, and Yandex — each indexes differently.
  • Yandex is notably better for Russian-language and image sources.
  • Bing's site: operator is more forgiving than Google's on some domains.
  • Use search operators through DuckDuckGo Lite for minimal JavaScript overhead.

A Working Investigation Template

Given a target company "Meridian Holdings":

site:sec.gov "Meridian Holdings"
site:linkedin.com/in "Meridian Holdings"
"Meridian Holdings" filetype:pdf -site:meridianholdings.example
"Meridian Holdings" (lawsuit OR complaint OR "v.") filetype:pdf
site:opencorporates.com "Meridian Holdings"
"Meridian Holdings" site:web.archive.org
"meridianholdings.example" -site:meridianholdings.example

Seven queries. Fifteen minutes. A skeleton of everything public about the entity before any paid tool enters the picture.

Limitations and Failure Modes

  • Google is not comprehensive; it is curated. Important sites — deep government archives, academic preprints, some court systems — are underindexed.
  • Operators interact unpredictably. intitle: and inurl: combined sometimes silently drop results.
  • Cached results are gone. Rely on the Wayback Machine for historical fetches.
  • Personalized results may differ from a colleague's. Use incognito or a clean profile for reproducibility.

Legal Frame

Dorking itself is legal — you are querying a public index. What you do with the results is where law applies. Accessing credentials in a misconfigured file, or downloading data a site clearly did not intend to publish, can cross into CFAA territory. See /blog/legal-boundaries-of-osint/ for the full treatment. Investigators documenting civil rights abuses through public records — the kind of work catalogued by the ICE Encounter rights guides — rely heavily on dorking to surface agency documents that were technically public but practically buried.

Beyond Google

The OSINT Framework catalogs specialized search engines (IntelligenceX, Kagi, Marginalia) that outperform Google on specific source types. Keep a short list; reach for them when Google starts returning noise.

Dorking is a craft, not a trick. The investigators who benefit most are the ones who add it to a disciplined methodology rather than treat it as a shortcut.

More from the blog