Google Dorking: Advanced Search Operators for Investigators
Advanced Google search operators for OSINT — site, filetype, intitle, inurl, and combinations that surface what default searches miss.
Default Google search is tuned for commerce and news. The operator syntax that has existed since the early 2000s still works, and it is the fastest way to surface documents, exposed configuration, and targeted people-search results that default queries bury. See /tools/google-dorking/ for the full tutorial; this post is the working reference.
The Operators That Matter
| Operator | Effect |
|---|---|
site: |
Restricts to a domain |
filetype: |
Restricts to a file extension |
intitle: |
Page title contains the term |
inurl: |
URL contains the term |
intext: |
Body text contains the term |
"..." |
Exact phrase |
-term |
Excludes term |
OR |
Boolean OR (must be uppercase) |
* |
Wildcard |
before: / after: |
Date range |
cache: |
Google's cached copy (largely deprecated) |
Combine them. Single operators are blunt; stacked operators are precise.
Document Discovery
Publicly posted documents are a gold mine. Agencies, companies, and courts routinely leave PDFs on web servers that never appear in their own navigation.
site:example.com filetype:pdf
site:example.com filetype:pdf "confidential" OR "internal use"
site:example.com (filetype:xls OR filetype:xlsx) "budget"
site:sec.gov filetype:pdf "Meridian" "beneficial owner"
For government sources:
site:.gov filetype:pdf "privacy impact assessment" "facial recognition"
site:.gov filetype:pdf inurl:foia
These produce filings, audits, and released records that no press release announced.
Employee and Contact Mapping
Targeted site queries on LinkedIn surface profiles without the platform's rate limits:
site:linkedin.com/in "Example Corp" ("compliance" OR "AML")
site:linkedin.com/in "former Example Corp" "general counsel"
For email pattern discovery, combine with explicit format searches:
"@example.com" "Director" -site:example.com
"first.last@example.com" site:example.com
Pair these results with Hunter.io to verify deliverability. For platform-specific techniques, see /tools/social-media/.
Archived and Historical Content
Google sometimes indexes pages that the original site has removed. Combined with the Wayback Machine, this produces content that live navigation will not.
site:web.archive.org "example.com" "board of directors"
site:example.com "board of directors" before:2020-01-01
The before: and after: operators are underused and routinely productive for tracing when a claim or statement appeared.
Exposed Configuration and Index Pages
Poorly configured web servers expose directory listings. These are legitimate OSINT targets when hosted publicly, but note the legal line in /blog/legal-boundaries-of-osint/ — viewing a public listing is fine; accessing credentials is not.
intitle:"index of" "parent directory" site:example.com
intitle:"index of /backup" -site:example.com
inurl:".git/config" "remote" -github.com
The Google Hacking Database (exploit-db.com/ghdb) maintains thousands of such patterns. Use them for reconnaissance against systems you have authorization to test, or against your own organization.
Court Records and Filings
Many court systems publish searchable dockets that Google indexes.
site:courtlistener.com "Meridian Holdings"
site:pacer.gov filetype:pdf "complaint"
"civil action no." filetype:pdf site:uscourts.gov
State courts vary wildly; a one-line dork against a state's domain often produces what the court's own search box hides.
People Search
For public-figure research, stacked operators produce better-than-search-engine results:
"John A. Smitherton" "Delaware" ("LLC" OR "Inc") -site:linkedin.com
"John A. Smitherton" site:justice.gov
"John A. Smitherton" (cv OR resume OR biography) filetype:pdf
For private individuals, the ethics framework applies hard — just because an operator returns a result does not mean publishing it is appropriate.
Negative Operators
Exclusion is as important as inclusion. A target name may produce thousands of false positives; negative operators narrow fast.
"Meridian Holdings" -"Meridian Holdings Corporation" -site:bloomberg.com
"Sarah Lin" "Meridian" -linkedin.com -facebook.com -twitter.com
Pattern Language for Dorking
Think in patterns:
- Where would this kind of information live? Court site, SEC, agency domain, archived site.
- What file format is it in? PDF, XLS, DOC, MP4.
- What words would appear in it but not elsewhere? Jargon, section headers, form numbers.
- What words would a false positive contain? Exclude them.
The best dorkers are not the ones who memorize operators but the ones who model where documents live.
Rotation and Rate Limits
Google aggressively rate-limits. For sustained work:
- Rotate among Google, DuckDuckGo, Bing, and Yandex — each indexes differently.
- Yandex is notably better for Russian-language and image sources.
- Bing's
site:operator is more forgiving than Google's on some domains. - Use search operators through DuckDuckGo Lite for minimal JavaScript overhead.
A Working Investigation Template
Given a target company "Meridian Holdings":
site:sec.gov "Meridian Holdings"
site:linkedin.com/in "Meridian Holdings"
"Meridian Holdings" filetype:pdf -site:meridianholdings.example
"Meridian Holdings" (lawsuit OR complaint OR "v.") filetype:pdf
site:opencorporates.com "Meridian Holdings"
"Meridian Holdings" site:web.archive.org
"meridianholdings.example" -site:meridianholdings.example
Seven queries. Fifteen minutes. A skeleton of everything public about the entity before any paid tool enters the picture.
Limitations and Failure Modes
- Google is not comprehensive; it is curated. Important sites — deep government archives, academic preprints, some court systems — are underindexed.
- Operators interact unpredictably.
intitle:andinurl:combined sometimes silently drop results. - Cached results are gone. Rely on the Wayback Machine for historical fetches.
- Personalized results may differ from a colleague's. Use incognito or a clean profile for reproducibility.
Legal Frame
Dorking itself is legal — you are querying a public index. What you do with the results is where law applies. Accessing credentials in a misconfigured file, or downloading data a site clearly did not intend to publish, can cross into CFAA territory. See /blog/legal-boundaries-of-osint/ for the full treatment. Investigators documenting civil rights abuses through public records — the kind of work catalogued by the ICE Encounter rights guides — rely heavily on dorking to surface agency documents that were technically public but practically buried.
Beyond Google
The OSINT Framework catalogs specialized search engines (IntelligenceX, Kagi, Marginalia) that outperform Google on specific source types. Keep a short list; reach for them when Google starts returning noise.
Dorking is a craft, not a trick. The investigators who benefit most are the ones who add it to a disciplined methodology rather than treat it as a shortcut.