41 min read

> "The supreme art of war is to subdue the enemy without fighting." — Sun Tzu

Learning Objectives

  • Distinguish between passive and active reconnaissance and explain why passive recon is performed first
  • Conduct thorough domain and DNS intelligence gathering using publicly available data
  • Leverage WHOIS, registrar data, and certificate transparency logs to map target infrastructure
  • Perform advanced search engine dorking across Google, Shodan, and Censys
  • Execute social media and people-focused OSINT to build target profiles
  • Mine code repositories for leaked credentials, API keys, and sensitive configuration data
  • Deploy automated OSINT frameworks including Maltego, theHarvester, and Recon-ng
  • Document and organize reconnaissance findings for use in later penetration testing phases

Chapter 7: Passive Reconnaissance and OSINT

"The supreme art of war is to subdue the enemy without fighting." — Sun Tzu

Every successful penetration test begins long before the first packet is sent. Before you touch a target system, before you run a single scanner, before you craft a single exploit, there is reconnaissance. And within reconnaissance, the most powerful — and most legally defensible — technique is passive reconnaissance: the art of gathering intelligence about a target using only publicly available information, without ever directly interacting with the target's systems.

In this chapter, we explore the discipline of Open-Source Intelligence (OSINT) as it applies to ethical hacking. You will learn to think like an intelligence analyst, piecing together fragments of publicly available data into a comprehensive picture of your target's attack surface. By the end of this chapter, you will be able to map an organization's digital footprint, identify its employees, discover its technology stack, and uncover potential vulnerabilities — all without sending a single packet to the target network.

7.1 The Art of Passive Reconnaissance

7.1.1 Why Passive Recon Comes First

Imagine you are a burglar casing a neighborhood. You could walk up to every house, jiggle the doorknobs, and peer through the windows — but that approach is noisy, risky, and likely to get you noticed. A smarter approach is to first drive through the neighborhood, note which houses have alarm system signs, observe when residents leave for work, and check public property records. That is the difference between active and passive reconnaissance.

In penetration testing, passive reconnaissance means gathering information about a target without directly interacting with the target's systems. You never send a packet to their network. You never visit their website from your testing infrastructure. You never call their employees. Instead, you collect and analyze information that is already publicly available.

Why does this matter?

  1. Stealth: Passive reconnaissance generates zero alerts on the target's intrusion detection systems. There are no log entries, no anomalous traffic patterns, no suspicious connection attempts. From the target's perspective, nothing has happened.

  2. Legality: Because you are accessing only publicly available information, passive reconnaissance generally does not require explicit authorization. However, we always recommend having authorization documented before beginning any reconnaissance, even passive.

  3. Foundation: The intelligence gathered during passive recon directly informs every subsequent phase. It tells you where to focus active scanning, what social engineering pretexts might work, which technologies to research for vulnerabilities, and which employees to target.

  4. Efficiency: By understanding the target before engaging with it, you avoid wasting time on dead ends during active testing phases.

⚖️ Legal Note: While passive reconnaissance uses publicly available information, always ensure you have written authorization before beginning any reconnaissance activities as part of a penetration test. The scope of your engagement should explicitly include reconnaissance activities. Even "public" information gathering can cross legal boundaries in some jurisdictions, particularly when it involves personal data protected by regulations like GDPR.

7.1.2 The OSINT Mindset

OSINT — Open-Source Intelligence — is a discipline that originated in military and government intelligence communities. The term refers to intelligence collected from publicly available sources. In the context of ethical hacking, OSINT is the systematic collection, processing, and analysis of publicly available information about a target organization, its infrastructure, and its people.

The OSINT mindset requires a particular way of thinking:

  • Everything leaves a trace. Every domain registration, every job posting, every conference presentation, every code commit creates a data point. Your job is to find and connect these data points.
  • Context is everything. A single piece of information may seem meaningless. Combined with other data, it becomes intelligence. An employee's name is just a name. Combined with their role, their email format, their social media presence, and their technology expertise, it becomes an attack vector.
  • Think like a puzzle solver. OSINT is rarely about finding a single "smoking gun." It is about assembling hundreds of small pieces into a coherent picture.
  • Document everything. Rigorous note-taking and evidence preservation are essential. You need to be able to show your client exactly what you found and where you found it.

7.1.3 The OSINT Cycle

Professional intelligence gathering follows a structured cycle, and ethical hackers should adopt the same discipline:

  1. Planning and Direction: Define what you need to know. What is the scope of the engagement? What questions do you need to answer?
  2. Collection: Gather raw data from publicly available sources.
  3. Processing: Organize and format the collected data into usable information.
  4. Analysis: Examine the processed information for patterns, relationships, and actionable intelligence.
  5. Dissemination: Present findings in a structured report format.
  6. Feedback: Identify gaps in intelligence and cycle back to collection.

Let us apply this to our running example. MedSecure Health Systems has engaged your firm for a penetration test. Your scope of work authorizes full reconnaissance. Here is how you might plan your passive recon phase:

Intelligence Requirement Sources to Check Priority
External IP ranges and domains WHOIS, DNS, BGP, certificate transparency High
Technology stack Job postings, website headers (via cached pages), code repos High
Employee names and roles LinkedIn, company website, conference talks Medium
Email format Hunter.io, public documents, email headers High
Physical locations Google Maps, SEC filings, press releases Medium
Third-party relationships Vendor announcements, partnership press releases Medium
Security posture indicators Job postings for security roles, breach history High

💡 Intuition: Think of passive recon like being an investigative journalist. You are building a dossier on your target using only publicly available sources. The more thorough your research, the more effective every subsequent phase of the penetration test becomes.

7.2 The Reconnaissance Landscape: Tools, Sources, and Data Types

Before diving into specific techniques, it helps to understand the breadth of the passive reconnaissance landscape. The information you gather falls into several categories, each feeding different phases of the penetration test.

7.2.1 Categories of Passive Intelligence

Infrastructure Intelligence encompasses everything about the target's digital footprint: domains, subdomains, IP address ranges, autonomous system numbers, DNS configurations, hosting providers, CDN usage, and cloud platform choices. This category directly feeds active scanning and vulnerability assessment phases. When you discover that MedSecure operates 47 subdomains across AWS and Azure, you have defined the perimeter you will test.

Organizational Intelligence covers the human and structural dimensions: organizational charts, department structures, key personnel, office locations, business relationships, vendor partnerships, and corporate governance. This intelligence supports social engineering planning (Chapter 9) and helps you understand how the organization makes technology decisions.

Technical Intelligence reveals the technology stack: programming languages, web frameworks, content management systems, database platforms, security tools, and operational software. This is the bridge between reconnaissance and vulnerability research — once you know MedSecure runs WordPress 6.4.2 with eight specific plugins, you can research every known vulnerability in that exact configuration.

Credential Intelligence includes exposed email addresses, leaked passwords from data breaches, API keys found in code repositories, and any other authentication material discovered in public sources. This is some of the most immediately actionable intelligence — a leaked database credential found on GitHub during passive recon may provide direct access during the testing phase.

Behavioral Intelligence encompasses communication patterns, technology preferences, security awareness indicators, and operational procedures. Does the organization respond quickly to security researchers? Do employees routinely share technical details on social media? Is there a bug bounty program? This intelligence shapes your overall approach to the engagement.

7.2.2 Building Your OSINT Toolkit

A professional passive reconnaissance toolkit includes tools from several categories:

Domain and DNS Tools: dig, nslookup, host, dnspython (Python library), DNSDumpster, SecurityTrails, VirusTotal passive DNS

WHOIS and Registration Tools: whois command, DomainTools, WhoisXMLAPI, ViewDNS.info

Certificate Transparency: crt.sh, Censys, CertStream, Facebook CT monitoring

Search Engines: Google (with advanced operators), Shodan, Censys, BinaryEdge, ZoomEye, FOFA

Social Media and People: LinkedIn (manual and API), Hunter.io, Phonebook.cz, Sherlock, Holehe

Code Repository Tools: GitHub search, GitLab search, TruffleHog, GitLeaks, Shhgit

Automation Frameworks: theHarvester, Recon-ng, SpiderFoot, Maltego, Amass (passive mode)

Document Analysis: ExifTool, Metagoofil, FOCA

Data Breach Checking: Have I Been Pwned (HIBP), DeHashed (for authorized assessments)

The selection of tools depends on your engagement scope, available API keys (many tools work better with premium API access), and the specific intelligence requirements. A thorough passive recon engagement might use 15-20 tools; a quick preliminary assessment might use 4-5.

🧪 Try It in Your Lab: Set up a Kali Linux virtual machine and verify that the following tools are installed and functional: theHarvester, Recon-ng, Amass, dig, whois, and ExifTool. Create accounts for Shodan (free tier), Censys (free tier), and Hunter.io (free tier). Run a basic reconnaissance against your own domain or an authorized bug bounty target to validate your toolkit.

7.3 Domain and DNS Intelligence

7.3.1 Starting with the Domain

The domain name is typically your entry point. From a single domain — say, medsecure.com — you can unravel an extraordinary amount of information about an organization's digital infrastructure.

Reverse DNS and Related Domains

Organizations rarely operate a single domain. MedSecure Health Systems might own medsecure.com, medsecurehealth.com, medsecure.net, medsecure-portal.com, and medsecure-dev.io. Each additional domain expands the attack surface.

To discover related domains:

  • Reverse WHOIS lookups: Search for other domains registered with the same registrant name, email, or organization. Tools like DomainTools, WhoisXMLAPI, and ViewDNS.info support reverse WHOIS.
  • Google searches: "medsecure" site:*.com or searching for the organization name in quotes along with "domain" or "website."
  • Certificate transparency logs: Certificates issued to the organization may cover multiple domains (Subject Alternative Names).
  • Acquisitions and subsidiaries: Press releases about acquisitions often reveal additional domains.

7.3.2 DNS Record Analysis

DNS records are a goldmine for reconnaissance. Each record type reveals different information:

A and AAAA Records: Map hostnames to IP addresses. These reveal the actual servers hosting the organization's services and whether they use IPv6.

MX Records: Mail exchange records reveal the email infrastructure. If MX records point to *.google.com, the organization uses Google Workspace. If they point to *.pphosted.com, they use Proofpoint for email security. This intelligence helps you: - Understand what email security controls are in place - Determine whether phishing emails will face specific filtering - Identify the email platform for social engineering

TXT Records: Often contain SPF, DKIM, and DMARC records that reveal email infrastructure details. They may also contain domain verification tokens for cloud services (Google, Microsoft 365, AWS), revealing which cloud platforms the organization uses.

v=spf1 include:_spf.google.com include:sendgrid.net include:spf.protection.outlook.com ~all

This single SPF record tells us: the organization uses Google Workspace for email, SendGrid for transactional email, and Microsoft 365 (possibly for a subset of users or a legacy system). Three cloud services identified from one DNS record.

NS Records: Name server records identify the DNS hosting provider. If the organization hosts its own DNS, this reveals infrastructure. If they use a third-party like Cloudflare or AWS Route 53, this tells you about their infrastructure approach.

CNAME Records: Canonical name records reveal service aliases and third-party integrations. A CNAME like support.medsecure.com -> medsecure.zendesk.com reveals they use Zendesk for customer support.

SOA Records: Start of authority records contain the primary name server and the email address of the domain administrator (often the technical contact).

📊 Real-World Application: During a penetration test of a healthcare organization, our team discovered through DNS TXT records that the target was using both Microsoft 365 and Google Workspace simultaneously. Further investigation revealed this was due to a recent acquisition — the parent company used Microsoft 365 while the acquired subsidiary still used Google Workspace. This "seam" between two identity systems became a significant finding, as user provisioning and deprovisioning processes had gaps between the two platforms.

7.3.3 Subdomain Discovery Through Passive Means

Subdomains are where organizations hide their most interesting — and often most vulnerable — assets. Development servers, staging environments, internal tools exposed to the internet, legacy applications — these all live on subdomains.

Passive subdomain discovery techniques include:

Certificate Transparency Logs: When an organization obtains an SSL/TLS certificate, the certificate authority logs it in publicly accessible Certificate Transparency (CT) logs. These logs are searchable and reveal every subdomain for which a certificate has been issued. Tools like crt.sh, Censys, and CertStream provide access to CT data.

A search on crt.sh for %.medsecure.com might reveal:

medsecure.com
www.medsecure.com
mail.medsecure.com
vpn.medsecure.com
portal.medsecure.com
dev.medsecure.com
staging.medsecure.com
api.medsecure.com
jenkins.medsecure.com
grafana.medsecure.com
kibana.medsecure.com

The presence of jenkins.medsecure.com, grafana.medsecure.com, and kibana.medsecure.com immediately tells us about the organization's CI/CD pipeline and monitoring stack.

Search Engine Cache: Google, Bing, and other search engines index subdomains. Using site:*.medsecure.com can reveal subdomains that have been crawled.

DNS Aggregation Services: Services like SecurityTrails, VirusTotal, and DNSDumpster aggregate DNS data over time and can reveal current and historical subdomains.

Web Archives: The Wayback Machine (web.archive.org) may have cached pages from subdomains that no longer exist, revealing historical infrastructure.

🔴 Red Team Perspective: Pay special attention to subdomains that suggest development, staging, or internal tooling. These are often less hardened than production systems and may contain default credentials, debug interfaces, or older software versions. Finding staging-api.medsecure.com or dev-portal.medsecure.com during passive recon can lead to significant findings during active testing.

7.3 WHOIS, Registrar Data, and Certificate Transparency

7.3.1 WHOIS Intelligence

WHOIS is one of the oldest reconnaissance tools available. It queries databases maintained by domain registrars and regional internet registries to provide registration information about domains and IP address blocks.

For domain WHOIS lookups, you can learn: - Registrant information: Name, organization, email, phone number, and address of the domain owner (though privacy services increasingly obscure this) - Registration and expiration dates: When the domain was first registered and when it expires. A domain nearing expiration might be vulnerable to expiration hijacking. - Name servers: Confirms DNS hosting provider - Registrar: The company through which the domain was registered - Status codes: Domain lock statuses that indicate security measures in place

For IP address WHOIS lookups (via Regional Internet Registries like ARIN, RIPE, APNIC), you can discover: - Net ranges: The full IP address blocks allocated to an organization - Organization details: Official name, address, and contacts - Abuse contacts: Who to contact for security issues - ASN (Autonomous System Number): The organization's BGP routing identifier, which can reveal all IP prefixes they announce

# Domain WHOIS
whois medsecure.com

# IP WHOIS
whois 203.0.113.50

# ASN lookup
whois -h whois.radb.net AS12345

⚠️ Common Pitfall: Many organizations now use WHOIS privacy services (like Domains By Proxy or WhoisGuard) that replace the registrant's real information with the privacy service's details. Do not assume that WHOIS privacy means you cannot identify the registrant — cross-reference with historical WHOIS data, certificate details, and other sources.

7.3.2 Historical WHOIS Data

While current WHOIS data may be redacted due to privacy regulations (particularly GDPR), historical WHOIS data from before these protections were in place can be incredibly valuable. Services like DomainTools, WhoisXMLAPI, and SecurityTrails maintain historical WHOIS databases.

Historical WHOIS data can reveal: - Original registrant details before privacy protection was enabled - Changes in DNS hosting that indicate infrastructure migrations - Patterns of domain registration that reveal related domains - Technical contact email addresses that follow the organization's email format

7.3.3 Certificate Transparency Deep Dive

Certificate Transparency (CT) is a framework for publicly logging SSL/TLS certificates. Originally designed to detect fraudulently issued certificates, CT logs have become one of the most powerful passive reconnaissance tools available.

Every publicly trusted certificate authority must submit certificates to CT logs. This means that virtually every SSL/TLS certificate issued for a domain is publicly searchable.

What CT reveals:

  1. Complete subdomain inventory: Every subdomain that has ever had a certificate issued for it
  2. Wildcard certificates: *.medsecure.com certificates reveal the organization uses wildcard certs (which has security implications)
  3. Internal naming conventions: Subdomains often follow naming patterns that reveal organizational structure (us-east-prod-api.medsecure.com, eu-staging-web.medsecure.com)
  4. Certificate authority choices: Which CAs the organization trusts and uses
  5. Certificate renewal patterns: How frequently certificates are renewed, which may indicate automated vs. manual certificate management
  6. Historical infrastructure: Certificates for decommissioned services that may still be resolvable

Searching CT logs:

  • crt.sh: Free web interface for CT log searching. Query %.medsecure.com to find all certificates.
  • Censys: Provides CT data along with host scanning data. More powerful search syntax.
  • CertStream: Real-time certificate transparency log monitoring. Useful for monitoring a target over time.
  • Facebook CT Monitoring: Facebook maintains a CT monitoring tool for security researchers.
# Query crt.sh via API
curl -s "https://crt.sh/?q=%25.medsecure.com&output=json" | \
  jq -r '.[].name_value' | sort -u

🔵 Blue Team Perspective: Organizations should monitor CT logs for their domains. Unexpected certificates could indicate unauthorized subdomain creation, shadow IT, or certificate mis-issuance. Tools like CertStream and Facebook's CT monitor can provide real-time alerts. Defensive teams should also consider using CAA (Certificate Authority Authorization) DNS records to restrict which CAs can issue certificates for their domains.

7.4 Search Engine Dorking

7.4.1 Google Dorking (Google Hacking)

Google dorking — also known as Google hacking — is the art of using advanced Google search operators to find information that organizations have inadvertently exposed to the internet. The technique was popularized by Johnny Long and the Google Hacking Database (GHDB), which catalogs thousands of search queries that reveal sensitive information.

Essential Google Dork Operators:

Operator Purpose Example
site: Restrict results to a domain site:medsecure.com
intitle: Search page titles intitle:"index of" site:medsecure.com
inurl: Search URLs inurl:admin site:medsecure.com
filetype: Search specific file types filetype:pdf site:medsecure.com
ext: Search file extensions ext:sql site:medsecure.com
intext: Search page body text intext:"confidential" site:medsecure.com
cache: View Google's cached version cache:medsecure.com
- Exclude terms site:medsecure.com -www
"..." Exact phrase match "medsecure" "password"
* Wildcard site:*.medsecure.com

High-Value Google Dork Patterns:

# Find login pages
site:medsecure.com inurl:login OR inurl:signin OR inurl:admin

# Find exposed documents
site:medsecure.com filetype:pdf OR filetype:doc OR filetype:xlsx

# Find directory listings
site:medsecure.com intitle:"index of"

# Find configuration files
site:medsecure.com filetype:xml OR filetype:conf OR filetype:cfg OR filetype:env

# Find error messages revealing technology stack
site:medsecure.com "fatal error" OR "stack trace" OR "syntax error"

# Find exposed database files
site:medsecure.com filetype:sql OR filetype:db OR filetype:mdb

# Find backup files
site:medsecure.com filetype:bak OR filetype:old OR filetype:backup

# Find email addresses
site:medsecure.com "@medsecure.com"

# Find subdomains through Google indexing
site:*.medsecure.com -www

# Find WordPress installations
site:medsecure.com inurl:wp-content OR inurl:wp-includes

🧪 Try It in Your Lab: Set up a deliberately vulnerable web application in your lab (like DVWA or WebGoat) and practice Google dorking against it. You can also practice with authorized bug bounty targets. Never use Google dorks against organizations you do not have permission to test.

7.4.2 Shodan: The Search Engine for Internet-Connected Devices

While Google indexes web pages, Shodan indexes internet-connected devices. Shodan continuously scans the entire IPv4 address space, collecting banner information from services running on open ports. This makes it an extraordinarily powerful passive reconnaissance tool.

What Shodan reveals:

  • Open ports and running services
  • Software versions and configurations
  • Default credentials pages
  • Industrial control systems (ICS/SCADA)
  • IoT devices
  • SSL certificate details
  • Organization-specific infrastructure

Shodan Search Syntax:

# Find all Shodan results for an organization
org:"MedSecure Health Systems"

# Search by IP range
net:203.0.113.0/24

# Search by hostname
hostname:medsecure.com

# Find specific services
org:"MedSecure" port:3389    # RDP
org:"MedSecure" port:22      # SSH
org:"MedSecure" port:8080    # Alternative HTTP

# Find specific products
org:"MedSecure" product:"Apache"
org:"MedSecure" product:"nginx"

# Find vulnerable services
org:"MedSecure" vuln:CVE-2021-44228  # Log4Shell

# Search by SSL certificate
ssl.cert.subject.cn:medsecure.com

# Find exposed databases
org:"MedSecure" port:27017   # MongoDB
org:"MedSecure" port:6379    # Redis
org:"MedSecure" port:9200    # Elasticsearch

Shodan for MedSecure: A search for org:"MedSecure Health Systems" might reveal:

  • Web servers running Apache 2.4.41 on ports 80 and 443
  • An SSH server on port 22 with OpenSSH 7.9
  • A VPN endpoint running OpenVPN
  • An exposed Elasticsearch instance on port 9200 (critical finding!)
  • Several IoT medical devices with web interfaces

📊 Real-World Application: In 2023, researchers used Shodan to discover that over 100,000 industrial control systems (ICS) were directly accessible from the internet, many with default credentials. Healthcare organizations were particularly affected, with DICOM (medical imaging) servers, infusion pump management interfaces, and building management systems exposed. This is exactly the type of finding that passive recon with Shodan can reveal during an authorized penetration test.

7.4.3 Censys: Certificate-Centric Internet Scanning

Censys, developed by researchers at the University of Michigan, takes a certificate-centric approach to internet scanning. While similar to Shodan, Censys offers superior certificate analysis and a more structured query language.

Censys Query Examples:

# Search by domain
services.tls.certificates.leaf.names: medsecure.com

# Search by organization in certificate
services.tls.certificates.leaf.subject.organization: "MedSecure"

# Search by IP range
ip: 203.0.113.0/24

# Find specific services
services.service_name: HTTP AND services.tls.certificates.leaf.names: medsecure.com

# Find SSH services
services.service_name: SSH AND services.tls.certificates.leaf.names: medsecure.com

Censys excels at revealing: - All certificates issued for a domain (similar to CT logs) - Services running on hosts associated with those certificates - Hosting providers and cloud infrastructure - Historical certificate data

7.4.4 Other Specialized Search Engines

Beyond Google, Shodan, and Censys, several other search engines provide valuable reconnaissance data:

  • ZoomEye: Chinese-developed internet scanning platform, strong for IoT and Asian-hosted infrastructure
  • BinaryEdge: Focuses on internet security data including exposed databases, open ports, and vulnerabilities
  • GreyNoise: Distinguishes between targeted attacks and internet background noise
  • FOFA: Another internet-asset search engine popular in the security research community
  • Wigle.net: Wireless network mapping database — useful for physical reconnaissance

7.5 Social Media and People OSINT

7.5.1 Why People Are the Richest Intelligence Source

Technical infrastructure is important, but people are where reconnaissance becomes truly powerful. Employees are the ones who create and manage the systems you will test. They make mistakes, share too much online, and respond to social engineering. Understanding the people behind the organization is essential.

7.5.2 LinkedIn Intelligence

LinkedIn is the single most valuable OSINT source for organizational intelligence. It provides:

Organizational Structure: By examining employee profiles, you can reconstruct an organization's hierarchy, department structure, and reporting relationships. Search for "MedSecure Health Systems" and filter by current employees.

Technology Stack: Job postings and employee skill endorsements reveal the technologies in use. If MedSecure is hiring a "Senior Java Developer with Spring Boot and AWS experience," you now know they use Java, Spring Boot, and AWS. If employees list "Splunk" or "CrowdStrike" in their skills, you know what security tools are deployed.

Key Personnel: Identify: - IT administrators and system engineers (who manage the infrastructure you'll test) - Security team members (who will be defending against your test) - Help desk staff (potential social engineering targets) - Executives (high-value targets for spear phishing) - Recently departed employees (who may have retained access)

Email Format Discovery: If you find an employee named "John Smith" whose email is listed as jsmith@medsecure.com, you now know the email format is first-initial-last-name. You can generate email addresses for any employee you identify.

⚠️ Common Pitfall: Be careful not to interact with target employees on LinkedIn during passive recon. Viewing profiles (especially with a premium account) can generate notifications. Use logged-out browsing, Google cached versions of profiles, or dedicated OSINT accounts that do not reveal your identity or affiliation.

7.5.3 Other Social Media Platforms

Twitter/X: Employees and official accounts may share technical details, outage information, or complaints that reveal infrastructure details. Search for mentions of the organization and its employees.

GitHub/GitLab (covered in detail in Section 7.6): Developer profiles linked from LinkedIn or company websites reveal coding practices and potentially leaked code.

Facebook/Instagram: While less technical, these platforms can reveal physical office locations, employee events (useful for physical recon), and personal details useful for password guessing or social engineering pretexts.

Reddit/Stack Overflow/Technical Forums: Employees asking technical questions may reveal their infrastructure, the problems they are facing, and even code snippets from internal systems. Search for the organization's domain in questions and posts.

Glassdoor/Indeed: Employee reviews may reveal internal tools, management practices, security culture, and organizational challenges.

7.5.4 Email Intelligence

Discovering valid email addresses is crucial for both technical testing (password spraying, phishing simulations) and social engineering. Passive techniques include:

  • Hunter.io: Finds email addresses associated with a domain and identifies the email format
  • Phonebook.cz: Free email and domain search
  • EmailHippo/email-checker.net: Verify whether discovered email addresses are valid
  • Have I Been Pwned (HIBP): Check whether email addresses have appeared in data breaches
  • Google dorking: "@medsecure.com" can find emails posted publicly
  • PGP key servers: People who use PGP often publish their email addresses on public key servers
  • Data breaches: Breach data compilations (legally obtained through HIBP or similar services) reveal email addresses and sometimes passwords

Best Practice: When documenting email addresses found during OSINT, always note the source where you found each address. This is important for your report and helps the client understand their exposure. Create a structured table: Name | Email | Role | Source | Date Found.

7.5.5 Metadata Analysis

Documents published by organizations often contain metadata that reveals internal information. PDF files, Word documents, PowerPoint presentations, and images may contain:

  • Author names: Internal usernames or full names
  • Software versions: The application and version used to create the document (e.g., "Microsoft Word 2019" or "Adobe Acrobat 11.0")
  • Internal file paths: C:\Users\jsmith\Documents\MedSecure\Internal\quarterly-report.docx reveals a username and internal folder structure
  • Printer names: Can reveal internal printer naming conventions and locations
  • GPS coordinates: Images may contain EXIF data with GPS coordinates revealing exact locations
  • Revision history: Document revision metadata may contain names of additional editors

Tools for metadata extraction:

# ExifTool — the gold standard for metadata extraction
exiftool document.pdf

# FOCA — automated metadata extraction and analysis
# (Windows GUI tool by ElevenPaths)

# Metagoofil — automated Google dorking for documents + metadata
metagoofil -d medsecure.com -t pdf,doc,xls -l 100 -o medsecure_docs

🔗 Connection: The metadata you extract here connects directly to the social engineering techniques we will cover in Chapter 9. Internal usernames from document metadata become targets for password spraying (Chapter 13). Software versions become targets for vulnerability research (Chapter 14). Everything in passive recon feeds forward.

7.6 Code Repository Mining

7.6.1 GitHub and GitLab: The Developer's OSINT Goldmine

Developers are notoriously poor at operational security when it comes to code repositories. Despite years of warnings, sensitive data continues to be committed to public repositories at an alarming rate. A 2023 GitGuardian report found over 10 million new secrets leaked on GitHub in a single year.

What you are looking for:

  1. API Keys and Tokens: AWS access keys, API tokens for services like Stripe, Twilio, SendGrid, Slack webhooks
  2. Database Credentials: Connection strings with embedded usernames and passwords
  3. Private Keys: SSH keys, SSL/TLS private keys, PGP private keys
  4. Configuration Files: .env files, config.yml, settings.py with database URLs, email credentials, and service endpoints
  5. Internal URLs: References to internal hostnames, staging servers, and development endpoints
  6. Intellectual Property: Proprietary source code, algorithms, and business logic
  7. Infrastructure as Code: Terraform, CloudFormation, or Ansible files revealing cloud architecture

7.6.2 GitHub Search Techniques

GitHub's built-in search is surprisingly powerful for finding leaked secrets:

# Search for organization-specific leaks
org:medsecure password
org:medsecure secret
org:medsecure api_key
org:medsecure "BEGIN RSA PRIVATE KEY"

# Search for specific file types
org:medsecure filename:.env
org:medsecure filename:wp-config.php
org:medsecure filename:id_rsa
org:medsecure filename:.htpasswd
org:medsecure filename:shadow
org:medsecure filename:credentials

# Search for AWS keys
org:medsecure "AKIA" # AWS access key prefix

# Search for connection strings
org:medsecure "mongodb+srv://"
org:medsecure "postgresql://"
org:medsecure "mysql://"

# Search for configuration patterns
org:medsecure "DB_PASSWORD"
org:medsecure "SECRET_KEY"
org:medsecure "PRIVATE_KEY"

Beyond organization repositories: Do not limit your search to the organization's official GitHub account. Employees often: - Fork company repositories to personal accounts - Create personal projects that reference company infrastructure - Upload scripts or tools they created at work - Push code from their work laptop that includes configuration files

Search for the organization's domain, internal hostnames, or employee names across all of GitHub.

7.6.3 Pastebin and Code Sharing Sites

Pastebin, Ghostbin, GitHub Gists, and similar code-sharing platforms are frequently used to share: - Configuration snippets - Error logs (which may contain sensitive data) - Credentials for sharing between team members - Data dumps from breaches - Code snippets with embedded secrets

Tools for monitoring paste sites:

  • PasteHunter: Automated monitoring of multiple paste sites for keywords
  • Dumpster Diver: Searches for secrets in files and URLs
  • Google Dorking: site:pastebin.com "medsecure" or site:gist.github.com "medsecure"

🔴 Red Team Perspective: During a red team engagement targeting a financial services company, we found a developer's personal GitHub repository containing a Python script for automating database backups. The script included hardcoded database credentials for the production database. The developer had committed the script with real credentials, then attempted to remove them in a subsequent commit — but git history preserved the original credentials. Always check commit history, not just the current state of files.

7.6.4 Automated Secret Scanning

Several tools automate the process of scanning code repositories for leaked secrets:

  • TruffleHog: Searches git repositories for secrets using regex and entropy analysis. Examines the entire commit history.
  • GitLeaks: Fast scanning of git repositories for hardcoded secrets using regex patterns
  • git-secrets: AWS tool that prevents committing secrets to git repositories (defensive) and can scan existing repos (offensive)
  • Shhgit: Real-time GitHub secret monitoring
  • GitGuardian: Commercial solution with a free tier for monitoring public repositories
# TruffleHog scanning a repository
trufflehog git https://github.com/medsecure/patient-portal --only-verified

# GitLeaks scanning
gitleaks detect --source /path/to/repo --report-path findings.json

# Search all repos of an organization
trufflehog github --org medsecure --only-verified

7.7 Automated OSINT Frameworks

Maltego is the premier OSINT visualization and link analysis tool. It provides a graphical interface for discovering relationships between entities — people, organizations, domains, IP addresses, email addresses, and more.

Key Concepts:

  • Entities: The data points you are investigating (domains, IPs, people, emails, etc.)
  • Transforms: Automated queries that take an entity as input and produce related entities as output. For example, a "DNS to IP" transform takes a domain and returns its IP addresses.
  • Graphs: Visual representations of the relationships between entities

Reconnaissance Workflow in Maltego:

  1. Start with a domain entity: medsecure.com
  2. Run DNS transforms to discover subdomains, MX records, NS records
  3. Resolve domains to IP addresses
  4. Run WHOIS transforms on IP addresses to discover net ranges
  5. Run reverse DNS on IP ranges to discover additional hosts
  6. Search for email addresses associated with the domain
  7. Look up people associated with the email addresses
  8. Search social media profiles for those people

Maltego excels at visualizing complex relationships. When you have gathered dozens of domains, hundreds of subdomains, IP ranges, employee names, and email addresses, Maltego's graph view helps you see patterns and connections that would be invisible in a spreadsheet.

Maltego Editions:

  • Maltego CE (Community Edition): Free, limited transforms and results
  • Maltego Classic/XL: Commercial licenses with full access to transform marketplace
  • Maltego CaseFile: Free offline analysis tool (no transforms, manual data entry)

7.7.2 theHarvester: Quick and Effective Email and Subdomain Discovery

theHarvester is a purpose-built tool for gathering email addresses, subdomains, IPs, and URLs from multiple public sources. It is included in Kali Linux and is often the first OSINT tool run during a penetration test.

# Comprehensive theHarvester scan
theHarvester -d medsecure.com -b all -l 500 -f medsecure_results

# Specific source searches
theHarvester -d medsecure.com -b google
theHarvester -d medsecure.com -b linkedin
theHarvester -d medsecure.com -b shodan
theHarvester -d medsecure.com -b crtsh
theHarvester -d medsecure.com -b dnsdumpster

# Sources available include:
# anubis, baidu, bevigil, binaryedge, bing, bingapi,
# bufferoverun, censys, certspotter, crtsh, dnsdumpster,
# duckduckgo, fullhunt, github-code, google, hackertarget,
# hunter, hunterhow, intelx, linkedin, netlas, onyphe,
# otx, pentesttools, projectdiscovery, rapiddns, rocketreach,
# securityTrails, shodan, sitedossier, subdomaincenter,
# subdomainfinderc99, threatminer, tomba, urlscan,
# virustotal, yahoo, zoomeye

theHarvester consolidates results from dozens of sources into a single output, making it an efficient first step in passive reconnaissance.

7.7.3 Recon-ng: The Reconnaissance Framework

Recon-ng is a full-featured reconnaissance framework written in Python. Modeled after Metasploit, it provides a modular architecture for automating reconnaissance tasks.

Key Features:

  • Modular architecture: Install only the modules you need
  • Database-backed: All results are stored in a SQLite database
  • Workspaces: Separate data for different engagements
  • Reporting: Generate reports in various formats
  • API key management: Centralized storage for API keys used by modules

Basic Recon-ng Workflow:

# Start Recon-ng
recon-ng

# Create a workspace for the engagement
workspaces create medsecure

# Add the seed domain
db insert domains
# Enter: medsecure.com

# Install and run modules
marketplace install recon/domains-hosts/certificate_transparency
modules load recon/domains-hosts/certificate_transparency
run

marketplace install recon/domains-hosts/hackertarget
modules load recon/domains-hosts/hackertarget
run

marketplace install recon/hosts-hosts/resolve
modules load recon/hosts-hosts/resolve
run

# View results
show hosts
show contacts

# Generate report
modules load reporting/html
options set FILENAME /root/medsecure_report.html
run

Useful Recon-ng Module Categories:

Category Purpose Example Module
recon/domains-hosts Find hosts from domains certificate_transparency, hackertarget
recon/domains-contacts Find contacts from domains whois_pocs, metacrawler
recon/hosts-hosts Resolve and enrich hosts resolve, ssltools
recon/contacts-contacts Enrich contact information hibp_breach, hibp_paste
recon/profiles-profiles Cross-reference social profiles profiler
discovery/info_disclosure Find information disclosures Various
reporting Generate reports html, csv, json

7.7.4 SpiderFoot: Automated OSINT Collection

SpiderFoot is an open-source OSINT automation tool that integrates over 200 data sources. It can be run from the command line or through a web interface.

# Run SpiderFoot with web interface
spiderfoot -l 127.0.0.1:5001

# Command-line scan
spiderfoot -s medsecure.com -t INTERNET_NAME,IP_ADDRESS,EMAILADDR -o csv

SpiderFoot's strength lies in its breadth of data sources and automatic correlation of findings. It automatically identifies: - Related domains and subdomains - IP addresses and network ranges - Email addresses and phone numbers - Dark web mentions - Data breach exposures - Malware associations - Social media profiles - Technology fingerprints

7.7.5 Additional OSINT Tools

The OSINT toolkit extends well beyond these frameworks:

  • Amass (OWASP): Advanced subdomain enumeration with DNS brute forcing and data source integration
  • Subfinder: Fast passive subdomain discovery tool
  • Photon: Web crawler designed for OSINT
  • Shodan CLI: Command-line interface for Shodan searches
  • Sherlock: Find social media accounts by username across platforms
  • Holehe: Check if an email is registered on various platforms
  • GHunt: OSINT tool for Google accounts
  • Twint: Twitter intelligence tool (no API required)
  • Maigret: Username search across 2500+ sites

7.8 Building Your Passive Recon Methodology

7.8.1 A Structured Approach

Passive reconnaissance should follow a structured methodology to ensure thoroughness and consistency. Here is a recommended workflow:

Phase 1: Seed Identification (30 minutes) 1. Identify all known domains associated with the target 2. Identify the target's IP ranges and ASN 3. Identify key personnel names 4. Document all seeds in your notes

Phase 2: Domain and Infrastructure Intelligence (2-3 hours) 1. WHOIS lookups for all known domains 2. Reverse WHOIS to find related domains 3. DNS record analysis for all domains 4. Certificate transparency log searches 5. Subdomain enumeration through passive sources 6. Shodan/Censys searches for the organization 7. IP range enumeration via BGP/ASN data

Phase 3: People and Organization Intelligence (1-2 hours) 1. LinkedIn employee enumeration 2. Email address discovery and format identification 3. Social media profiling of key personnel 4. Organizational structure mapping 5. Technology stack identification via job postings

Phase 4: Code and Document Intelligence (1-2 hours) 1. GitHub/GitLab organization and employee repository searches 2. Pastebin and code sharing site monitoring 3. Document discovery and metadata extraction 4. Google dorking for exposed files and directories

Phase 5: Automated Collection (1-2 hours) 1. theHarvester comprehensive scan 2. Recon-ng module execution 3. SpiderFoot automated scan 4. Maltego graph construction

Phase 6: Analysis and Documentation (1-2 hours) 1. Consolidate all findings 2. Identify patterns and relationships 3. Prioritize findings by potential impact 4. Prepare the reconnaissance report 5. Identify areas requiring active reconnaissance

7.8.2 Documentation and Evidence Preservation

Professional passive reconnaissance requires rigorous documentation. For every finding, record:

  1. What was found: The specific data point or intelligence
  2. Where it was found: The exact URL, search query, or tool output
  3. When it was found: Timestamp (important because public data can change)
  4. How it was found: The tool and technique used
  5. Why it matters: The security implication of the finding
  6. Screenshot/evidence: Preserve evidence in case the source changes

Use a structured format. Many penetration testers use tools like CherryTree, Obsidian, or custom Markdown templates to organize their findings. At minimum, maintain:

  • A master spreadsheet of domains, subdomains, and IP addresses
  • An employee/contact database with names, roles, emails, and social profiles
  • A technology inventory listing all identified software, services, and versions
  • A findings log with potential vulnerabilities identified through OSINT
  • A raw evidence archive with screenshots, downloaded files, and tool outputs

7.8.3 Applying Passive Recon to Our Running Examples

MedSecure Health Systems: For our healthcare target, passive recon is especially critical. Healthcare organizations must comply with HIPAA, and any exposed patient data discovered during reconnaissance constitutes a major finding. Key focus areas: - Medical device exposure on Shodan (DICOM, HL7 FHIR endpoints) - Employee credentials in healthcare data breaches - Clinical trial data or patient information in exposed documents - Third-party vendor relationships (billing, EHR, telehealth)

ShopStack E-commerce: For our e-commerce target, passive recon focuses on: - Payment processing infrastructure (PCI DSS implications) - Customer data exposure - Third-party integrations (shipping, analytics, marketing) - Development team repositories and deployed technology stack - CDN and cloud infrastructure mapping

Student Home Lab: Practice passive recon against deliberately vulnerable targets: - Set up OWASP Juice Shop or DVWA and practice Google dorking techniques - Use Shodan to understand your own home network's exposure - Create a test GitHub repository with (fake) secrets and practice detection - Run theHarvester and Recon-ng against authorized bug bounty targets

🧪 Try It in Your Lab: Create a complete passive reconnaissance dossier for an authorized target. Use the methodology outlined above and document your findings using the evidence preservation guidelines. Time yourself and aim to complete a thorough passive recon within 8 hours.

7.9 Threat Intelligence Feeds and Dark Web Monitoring

7.9.1 Leveraging Threat Intelligence in Reconnaissance

While traditional OSINT focuses on publicly accessible information, threat intelligence adds another dimension to passive reconnaissance. Threat intelligence feeds aggregate data from multiple sources — including information about active threats, compromised credentials, and malware campaigns — that can be queried without interacting with the target.

Data Breach Intelligence: Services like Have I Been Pwned (HIBP) allow you to check whether email addresses from the target organization have appeared in known data breaches. For MedSecure, querying their email domain against HIBP might reveal: - 47 employee email addresses found in the LinkedIn 2012 breach - 12 email addresses found in the Adobe 2013 breach - 3 email addresses found in a healthcare-specific breach in 2022

This information tells you: (1) employee credentials may have been exposed and could be reused, (2) the organization has been indirectly affected by multiple breaches, and (3) password spraying with common passwords from these breach datasets may be effective during active testing.

# Check a single email against HIBP API
curl -s -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/breachedaccount/john.smith@medsecure.com"

# Check for pastes containing organization data
curl -s -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/pasteaccount/john.smith@medsecure.com"

Reputation and Indicator Databases: Services like VirusTotal, AlienVault OTX, and Abuse.ch provide reputation data for domains and IP addresses. Checking the target's infrastructure against these databases can reveal: - Whether the target's IP addresses have been associated with malicious activity (possibly indicating prior compromise) - Whether the target's domains have been flagged for hosting malware or phishing - Historical threat intelligence that reveals past security incidents

Dark Web Monitoring: While deep dark web monitoring is typically a specialized service, some passive reconnaissance benefits from awareness of dark web data: - Stolen credentials for the target organization being sold on dark web marketplaces - Internal documents or intellectual property being traded - Discussions about the target organization in hacking forums - Ransomware group "leak sites" where stolen data is published

⚠️ Common Pitfall: Accessing dark web marketplaces, even for reconnaissance purposes, raises significant legal and ethical concerns. Unless your engagement explicitly authorizes dark web monitoring and you have appropriate legal guidance, limit your breach intelligence to legitimate services like HIBP that aggregate publicly known breach data.

7.9.2 OSINT for Supply Chain Intelligence

Modern organizations depend on extensive supply chains of vendors, cloud services, and technology partners. Passive reconnaissance should map these relationships because:

  1. Vendors as attack vectors: A compromised vendor can provide access to the target (as in the SolarWinds and Kaseya incidents)
  2. Shared infrastructure risks: If the target and a compromised vendor share cloud infrastructure or network peering, lateral movement may be possible
  3. Third-party data exposure: Vendor data breaches may expose the target's data

Supply chain OSINT techniques:

  • Job postings: "Experience with [vendor product] required" reveals vendor relationships
  • Press releases: Partnership announcements, implementation case studies
  • Conference presentations: Vendor-sponsored talks often feature customer success stories
  • DNS records: Domain verification tokens for cloud services
  • LinkedIn: Employee skills listing vendor products; vendor employees connected to target employees
  • SEC filings: Public companies must disclose material vendor relationships
  • GitHub: Integration code, API clients, and configuration files referencing vendor services

For MedSecure, supply chain OSINT might reveal: Epic Systems (EHR), Palo Alto Networks (firewalls), CrowdStrike (endpoint security), Okta (identity management), and AWS (cloud hosting). Each of these vendor relationships becomes an attack surface element and a potential social engineering pretext.

🔗 Connection: Supply chain intelligence connects to the broader themes of attack surface evolution. As organizations adopt more cloud services and third-party integrations, their attack surface extends far beyond their own infrastructure. A comprehensive passive recon must map these extended attack surfaces.

Passive reconnaissance occupies a generally permissive legal space — you are accessing publicly available information. However, several boundaries must be respected:

  1. Personal Data Regulations: GDPR, CCPA, and similar regulations restrict the collection and processing of personal data, even if that data is publicly available. Collecting employee personal information during a penetration test should be authorized in your scope of work.

  2. Terms of Service: Scraping LinkedIn, automated Google searches, and bulk queries to services like Shodan may violate their terms of service. While ToS violations are generally civil matters rather than criminal ones, be aware of the boundaries.

  3. Accessing Leaked Data: Using breached credential databases (beyond HIBP-style existence checks) raises legal and ethical questions. Downloading and using stolen data is legally questionable in many jurisdictions, even for penetration testing purposes.

  4. Data Retention: How long should you retain OSINT data? Your engagement contract should specify data retention and destruction policies.

  5. Scope Boundaries: Even though passive recon is low-risk, your scope of work should explicitly authorize it. Never conduct reconnaissance outside your authorized scope.

⚖️ Legal Note: Some penetration testing frameworks, such as the PTES (Penetration Testing Execution Standard) and OWASP Testing Guide, provide guidance on the legal boundaries of reconnaissance. Familiarize yourself with these frameworks and ensure your engagement contracts are reviewed by legal counsel. In the United States, the Computer Fraud and Abuse Act (CFAA) primarily addresses unauthorized access to computer systems — passive recon using public sources generally does not constitute access. However, laws vary by jurisdiction, and you should always err on the side of caution.

7.9.2 Responsible OSINT Practices

Beyond legal compliance, ethical hackers should follow responsible OSINT practices:

  • Minimize personal data collection: Collect only what is necessary for the engagement. You do not need an employee's family photos or personal social media posts unless they are directly relevant.
  • Secure your findings: OSINT data, especially employee information and potential vulnerabilities, must be stored securely and shared only with authorized parties.
  • Report responsibly: If you discover serious exposures during passive recon (such as exposed databases or leaked credentials), inform your client promptly rather than waiting for the final report.
  • Do not weaponize personal information: The goal is to identify security risks, not to harm individuals. Employee information discovered during OSINT should be reported to help the organization improve its security posture, not to embarrass individuals.

7.11 Advanced OSINT Techniques

7.11.1 Geospatial Intelligence (GEOINT)

Geospatial intelligence — gathering intelligence from geographic and location-based data — is an often-overlooked dimension of passive reconnaissance that provides valuable context for penetration tests, especially those that include physical security assessment.

Google Maps and Street View Analysis: Google Maps provides satellite imagery, street-level photography, and business listings that reveal: - Building exterior and entry points - Parking lot layouts and access control - Adjacent buildings and shared spaces - Loading dock locations and service entrances - CCTV camera positions (visible in Street View) - Security guard booths and barrier gates - Rooftop equipment (HVAC systems indicating server rooms) - Signage revealing department names and building numbering

Historical Imagery: Google Earth Pro provides historical satellite imagery spanning decades. This can reveal construction timelines for data centers, changes in perimeter security (new fences, cameras), and expansion or contraction of operations.

Wifi Geolocation: WiGLE.net aggregates wireless network data submitted by wardrivers worldwide. Searching for the target organization's SSID names can reveal office locations (including unlisted satellite offices), wireless network naming conventions, encryption types in use, and the density of access points indicating building layout.

Employee Location Intelligence: Social media geotagged posts, Strava running routes around office buildings, and Foursquare/Swarm check-ins can reveal employee routines, preferred lunch locations (useful for physical SE encounters), and commute patterns.

⚖️ Legal Note: Geospatial intelligence gathering should be conducted using only publicly available imagery and data. Do not use drone photography, enter private property, or conduct physical surveillance beyond what is visible from public spaces unless your engagement specifically authorizes physical reconnaissance.

7.11.2 Financial and Business Intelligence

Publicly available financial and business data provides context that enriches technical reconnaissance:

SEC Filings (for public companies): 10-K annual reports, 10-Q quarterly reports, and 8-K current reports disclose material IT system investments, risk factors related to cybersecurity, subsidiary companies and their domains, executive officers and board members, and material vendor relationships.

Corporate Registry Filings: State business registries reveal registered agents and addresses, officers and directors, subsidiary and DBA names, and filing dates.

Patent and Trademark Filings: The USPTO and international patent databases reveal proprietary technologies being developed, key researchers and inventors, and technical details of systems and algorithms.

Government Contract Databases: USASpending.gov, FPDS, and SAM.gov reveal contract values, services provided, subcontractor relationships, and security clearance requirements for government contractors.

Job Posting Analysis: Beyond technology stack identification, job postings reveal salary ranges indicating company financial health, growth patterns, security maturity, remote work policies, and specific products and tools in use.

7.11.3 Internet Archive and Historical Analysis

The Wayback Machine (web.archive.org) preserves historical versions of websites. This historical intelligence can reveal previous technology stacks that might still be running on forgotten subdomains, administrative interfaces that were once publicly linked but later hidden, staff directories with names and contact information that have been removed, and deprecated API endpoints that may still be functional.

Historical website snapshots also capture former employees who may still have active accounts, merger and acquisition announcements with subsidiary domain information, and previous versions of robots.txt and other configuration files.

# Query the Wayback Machine CDX API for historical URLs
curl -s "http://web.archive.org/cdx/search/cdx?url=medsecure.com/*&output=json&fl=original,timestamp,mimetype&collapse=urlkey" | head -50

💡 Intuition: Think of the Wayback Machine as a time machine for reconnaissance. Organizations change their public-facing information over time, but the internet remembers. A security team that removed sensitive information from their website five years ago may not realize that the information is permanently archived and accessible to anyone who knows where to look.

7.12 Putting It All Together: The MedSecure OSINT Assessment

Let us walk through a condensed example of a passive reconnaissance assessment against MedSecure Health Systems.

Step 1: Domain Enumeration

Starting with medsecure.com, we perform WHOIS lookups, reverse WHOIS searches, and CT log queries. We discover: - medsecure.com (primary domain) - medsecure.net (redirects to medsecure.com) - medsecure-portal.com (patient portal) - medsecuredev.io (development domain)

Step 2: DNS Analysis

DNS records reveal Google Workspace for email, Cloudflare for DNS/CDN, and AWS as their primary cloud provider. TXT records contain verification tokens for Salesforce, HubSpot, and Zoom.

Step 3: Subdomain Discovery

CT logs and passive sources reveal 47 subdomains, including: - api.medsecure.com — API endpoint - ehr.medsecure.com — Electronic health records - staging-ehr.medsecure.com — Staging EHR (interesting!) - jenkins.medsecuredev.io — CI/CD server - grafana.medsecuredev.io — Monitoring dashboard - vpn.medsecure.com — VPN endpoint

Step 4: Shodan/Censys

Shodan reveals 23 internet-facing hosts, including an Elasticsearch instance on port 9200 with no authentication (critical finding) and several servers running outdated Apache versions.

Step 5: People Intelligence

LinkedIn reveals 340 employees. We identify: - 12 IT staff members - 4 security team members - The CISO (recently hired — suggests security improvements underway) - Email format: first.last@medsecure.com - 28 valid email addresses from Hunter.io

Step 6: Code Repository Mining

A developer's personal GitHub account contains a repository with an old .env file containing AWS access keys. TruffleHog confirms the keys were committed 8 months ago. (These need immediate reporting to the client.)

Step 7: Documentation

All findings are compiled into a structured report with evidence, screenshots, and security implications. Critical findings (exposed Elasticsearch, leaked AWS keys) are reported immediately.

This single passive reconnaissance engagement — conducted entirely without touching the target's systems — has revealed the technology stack, identified dozens of potential targets for active testing, discovered two critical vulnerabilities, and mapped the organization's personnel and structure.

Best Practice: Always report critical findings discovered during passive recon immediately, even if you are still in the reconnaissance phase. An exposed database or leaked production credentials represent active risk that the client needs to address before your penetration test continues.

7.13 Applying OSINT to ShopStack and Your Home Lab

7.13.1 ShopStack E-Commerce Reconnaissance

Let us also apply our methodology to our second running example — ShopStack, the e-commerce platform.

Domain Intelligence: Starting with shopstack.io, WHOIS reveals registration through Namecheap with privacy protection. DNS analysis shows Cloudflare name servers, MX records pointing to Google Workspace, and TXT records containing verification tokens for Stripe, Shopify (their competitor — possibly for research), and Google Analytics.

Certificate Transparency: CT logs reveal 23 subdomains including api.shopstack.io, admin.shopstack.io, staging.shopstack.io, docs.shopstack.io, and status.shopstack.io. The staging subdomain is particularly interesting — it may contain test data or weaker security controls.

Technology Stack from Job Postings: LinkedIn job postings for ShopStack reveal: React (frontend), Node.js/Express (backend API), PostgreSQL (database), Redis (caching), Elasticsearch (search), Docker/Kubernetes (containerization), and AWS (cloud infrastructure). This gives us a nearly complete picture of the technology stack.

GitHub Presence: ShopStack maintains a public GitHub organization with open-source libraries. Searching employee personal GitHub accounts reveals a developer who forked a private repo to their personal account — the fork contains environment variable references to internal service URLs and a .env.example file listing all required configuration variables (including variable names like DATABASE_URL, STRIPE_SECRET_KEY, JWT_SECRET).

Employee Intelligence: LinkedIn shows 150 employees, primarily engineers. The CEO has an active Twitter presence posting about technology decisions. The CTO presented at a Node.js conference — the video is on YouTube and shows slides with architecture diagrams. These slides reveal the internal microservice structure and communication patterns.

7.13.2 Student Home Lab Practice Targets

For practicing passive reconnaissance skills in your home lab:

Bug Bounty Programs: Many organizations run public bug bounty programs through HackerOne, Bugcrowd, and their own programs. These explicitly authorize reconnaissance against their systems. Start with programs that have broad scopes and permissive rules.

Your Own Infrastructure: Conduct passive recon against your own domain, home network, and social media presence. You may be surprised by what you discover. Check your own email addresses against HIBP. Search for your own name on data broker sites. Examine your social media privacy settings.

Deliberately Vulnerable Targets: OWASP Juice Shop, DVWA, and similar deliberately vulnerable applications can be deployed locally and used to practice Google dorking techniques (by adding them to a search engine through web crawling) and document metadata extraction.

CTF Platforms: TryHackMe, HackTheBox, and PicoCTF include OSINT challenges that provide safe, legal targets for practicing reconnaissance skills. These challenges range from beginner (finding hidden information in images) to advanced (tracking individuals across multiple platforms).

Summary

Passive reconnaissance and OSINT form the foundation of every successful penetration test. By the time you begin active testing, you should have a comprehensive understanding of your target's digital footprint, technology stack, organizational structure, and potential vulnerabilities — all gathered without sending a single packet to the target network.

The key principles to remember:

  1. Passive recon is non-negotiable: It must always precede active testing
  2. Breadth before depth: Cast a wide net first, then investigate promising leads
  3. Document everything: Your findings are only valuable if they are properly documented
  4. Connect the dots: Individual data points become intelligence when connected
  5. Stay legal and ethical: Even "public" information gathering has boundaries
  6. Automate where possible: Tools like theHarvester, Recon-ng, and SpiderFoot save enormous time
  7. Think like an attacker: What would a real adversary find most valuable?

In the next chapter, we cross the line from passive to active reconnaissance — sending packets to the target's systems to discover what passive methods could not reveal. The intelligence you have gathered in this chapter will guide every aspect of that active enumeration.