Chapter 7 Exercises: Passive Reconnaissance and OSINT

Exercise 1: Certificate Transparency Subdomain Discovery

Difficulty: Beginner | Estimated Time: 30 minutes

Write a Python script that queries crt.sh for all certificates associated with a given domain. The script should: - Accept a domain as a command-line argument - Query the crt.sh JSON API - Extract all unique subdomain names from the results - Remove wildcard prefixes (*.example.com becomes example.com) - Sort the results alphabetically - Output the total count and the full list

Test your script against an authorized target or a deliberately vulnerable lab environment. Compare the results with what you find using manual crt.sh web searches.

Exercise 2: WHOIS Intelligence Report

Difficulty: Beginner | Estimated Time: 45 minutes

Using command-line WHOIS tools (or the python-whois library), conduct a WHOIS investigation of three related domains belonging to the same organization. For each domain, document: - Registrant information (or privacy service used) - Registration and expiration dates - Name servers - Domain status codes

Then answer: What relationships between the domains can you establish from WHOIS data alone? Did you find any domains nearing expiration? Are the name servers consistent across all domains?

Exercise 3: DNS Record Technology Identification

Difficulty: Intermediate | Estimated Time: 45 minutes

Write a Python script using the dnspython library that: - Queries all common DNS record types (A, AAAA, MX, NS, TXT, SOA, CNAME, CAA) for a target domain - Analyzes MX records to identify the email provider - Parses TXT records to identify SPF includes, DMARC policies, and domain verification tokens - Generates a "technology fingerprint" report listing all identified services

Run your script against three different authorized domains and compare the technology footprints. Which domain reveals the most information through DNS alone?

Exercise 4: Google Dorking Workshop

Difficulty: Intermediate | Estimated Time: 1 hour

Create a comprehensive Google dork cheat sheet for a hypothetical penetration test. For each of the following categories, write at least three Google dork queries: 1. Login and administration pages 2. Exposed documents (PDF, DOCX, XLSX) 3. Directory listings 4. Error messages and stack traces 5. Configuration files 6. Email addresses 7. Subdomains not linked from the main site 8. Technology stack identification

Then, using an authorized target (such as a bug bounty program), execute your dorks and document any findings. Note which dork categories yielded the most results.

Exercise 5: Email Format Detection and Generation

Difficulty: Intermediate | Estimated Time: 45 minutes

Using publicly available information (company websites, press releases, public documents), determine the email format used by three different organizations. For each organization: - Find at least two verified email addresses - Determine the email format pattern (first.last, flast, etc.) - Generate a list of 10 probable email addresses using employee names from LinkedIn - Document your methodology and sources

Write a Python function that accepts a domain, a list of (first_name, last_name) tuples, and an email format string, then returns a list of generated email addresses.

Exercise 6: Shodan Organization Reconnaissance

Difficulty: Intermediate | Estimated Time: 1 hour

Using Shodan (free account), investigate the internet-facing infrastructure of an authorized target organization. Document: - Total number of hosts found - Open ports and services - Software versions identified - Any potentially vulnerable services (outdated versions, known CVEs) - Cloud vs. on-premise infrastructure indicators - Any unexpected services (databases, development tools, IoT devices)

Create a risk-prioritized summary of your findings. Which discoveries would you investigate first during active reconnaissance?

Exercise 7: Document Metadata Extraction

Difficulty: Intermediate | Estimated Time: 45 minutes

Download 5-10 publicly available documents (PDFs, Word files, presentations) from an authorized target's website. Using ExifTool or a Python metadata extraction library: - Extract all metadata from each document - Identify any internal usernames or author names - Note any internal file paths or system information - List software and version information - Check for GPS coordinates in any images

Write a brief intelligence report summarizing what the metadata reveals about the organization's internal environment.

Exercise 8: GitHub Secret Scanning

Difficulty: Intermediate | Estimated Time: 1 hour

Using the GitHub search interface (or the GitHub API), conduct a secret scan for an authorized target. Search for: - The organization's domain in code files - Common credential patterns (API keys, passwords, tokens) - Configuration files (.env, config.yml, settings.py) - Internal URLs and hostnames

Document each finding with: - The repository and file path - The type of secret or sensitive data found - The risk level (high, medium, low) - Whether the secret appears to be active or historical

Important: Do not attempt to use any discovered credentials. Document and report them to the organization.

Exercise 9: Automated OSINT with theHarvester

Difficulty: Beginner | Estimated Time: 30 minutes

Run theHarvester against an authorized target domain using at least five different data sources. Compare the results from each source: - Which source found the most email addresses? - Which source found the most subdomains? - Which sources had overlapping results? - Were any findings unique to a single source?

Document your findings in a structured table and explain why using multiple sources is important for comprehensive OSINT.

Exercise 10: Complete OSINT Report

Difficulty: Advanced | Estimated Time: 3-4 hours

Conduct a complete passive reconnaissance assessment of an authorized target using the methodology described in Section 7.8. Your assessment should include:

  1. Domain and infrastructure intelligence: WHOIS, DNS records, subdomains, IP ranges
  2. People and organization intelligence: Key personnel, organizational structure, email addresses
  3. Technology identification: Identified services, software, and cloud platforms
  4. Code and document intelligence: GitHub findings, document metadata
  5. Automated collection: theHarvester, Recon-ng, or SpiderFoot results

Compile your findings into a professional OSINT report with: - Executive summary - Methodology description - Findings organized by category - Risk assessment for each finding - Recommendations for reducing the organization's OSINT footprint - Evidence (screenshots, tool outputs) for each finding

Exercise 11: OSINT Tool Comparison

Difficulty: Intermediate | Estimated Time: 1.5 hours

Using the same authorized target domain, run reconnaissance with three different tools: 1. theHarvester 2. Recon-ng (at least 5 modules) 3. SpiderFoot (or Amass)

Create a comparison matrix documenting: - Unique findings per tool - Overlapping findings - Time to complete each scan - Ease of use and learning curve - Output format and report quality

Which tool would you choose for a quick assessment? Which for a comprehensive engagement?

Exercise 12: OSINT Monitoring Setup

Difficulty: Advanced | Estimated Time: 2 hours

Design (and if possible, implement) a continuous OSINT monitoring system for an organization. Your system should: - Monitor certificate transparency logs for new certificates on the target's domains - Check for new paste entries mentioning the organization - Monitor for new GitHub repositories or commits referencing the organization - Alert on new breach data involving the organization's domain

Document your design including data sources, collection frequency, alerting thresholds, and storage considerations. If implementing, use APIs and a simple script or notebook.

Exercise 13: Reverse Image and Social Media OSINT

Difficulty: Intermediate | Estimated Time: 45 minutes

Using publicly available profile photos from an authorized target organization's website: 1. Perform reverse image searches to find where else these images appear 2. Determine if any images are stock photos (indicating they may not represent real employees) 3. Check if employee images appear on personal social media accounts 4. Assess what additional information is discoverable from social media profiles

Document your findings while respecting privacy boundaries. Focus on what an adversary could discover, not on building invasive personal dossiers.

Exercise 14: Building Custom Wordlists from OSINT

Difficulty: Advanced | Estimated Time: 1 hour

Using OSINT findings from a passive reconnaissance assessment, build custom wordlists for use in later active testing phases: 1. A subdomain wordlist based on naming patterns discovered in CT logs and DNS records 2. A directory wordlist based on technology stack identification 3. A username wordlist based on discovered employee names and email formats 4. A password candidate list based on the organization's name, location, industry, and common patterns

Explain the methodology behind each wordlist and how OSINT-informed wordlists outperform generic ones.

Difficulty: Intermediate | Estimated Time: 45 minutes

For each of the following scenarios, analyze the legal and ethical implications:

  1. During passive recon for an authorized pentest, you discover an exposed database containing patient health records. What are your obligations?

  2. You find an employee's personal social media account with posts complaining about their employer's security practices. Can you include this in your report?

  3. A reverse WHOIS search reveals that the target organization's CTO personally registered several domains that appear to compete with the employer. What do you do?

  4. Your client asks you to conduct passive reconnaissance against a competitor (not the client). Is this within scope? What are the legal implications?

  5. You discover credentials in a public GitHub repository that appear to be for a production system. You have not yet begun the active testing phase. What is the correct course of action?

Write a 200-300 word analysis for each scenario, citing relevant laws, regulations, or professional standards.