34 min read

> "In preparing for battle, I have always found that plans are useless, but planning is indispensable." — Dwight D. Eisenhower

Learning Objectives

  • Clearly distinguish between passive and active reconnaissance and understand the legal implications of crossing that line
  • Perform comprehensive DNS enumeration including zone transfer attempts and subdomain brute forcing
  • Identify and exploit subdomain takeover vulnerabilities
  • Fingerprint web applications and identify their underlying technology stacks
  • Leverage certificate transparency data for active infrastructure discovery
  • Build a repeatable, systematic active reconnaissance methodology
  • Integrate passive and active reconnaissance findings into a unified intelligence picture
  • Apply active recon techniques to the MedSecure and ShopStack running examples

Chapter 8: Active Reconnaissance

"In preparing for battle, I have always found that plans are useless, but planning is indispensable." — Dwight D. Eisenhower

In the previous chapter, we gathered intelligence about our targets without ever touching their systems. We studied them from a distance, collecting publicly available information to build a picture of their attack surface. Now we cross a critical threshold: we begin interacting directly with the target's infrastructure.

Active reconnaissance involves sending packets to target systems, querying their services, and analyzing their responses. Unlike passive reconnaissance, active recon is detectable. Every DNS query, every HTTP request, every connection attempt generates log entries on the target's systems. This is the moment when your engagement moves from invisible to potentially visible — and it is the moment when explicit authorization becomes absolutely non-negotiable.

In this chapter, we will systematically enumerate the target's DNS infrastructure, discover hidden subdomains, fingerprint web applications, identify technology stacks, and build a comprehensive picture of the attack surface that will guide vulnerability assessment and exploitation in later chapters.

8.1 Crossing the Line: From Passive to Active

8.1.1 Understanding the Distinction

The boundary between passive and active reconnaissance is defined by a single question: Are you sending data to the target's systems?

  • Passive: Querying Google for site:medsecure.com — Google's servers process your request, not MedSecure's.
  • Active: Sending an HTTP request to https://medsecure.com — MedSecure's web server receives and processes your request.
  • Passive: Searching crt.sh for certificates — you are querying certificate transparency logs, not MedSecure's infrastructure.
  • Active: Connecting to medsecure.com:443 and examining the certificate presented — you are interacting with MedSecure's server.
  • Passive: Looking up MedSecure's DNS records using a third-party API — the API provider makes the queries.
  • Active: Running dig @ns1.medsecure.com medsecure.com AXFR — you are directly querying MedSecure's name server.

This distinction matters for three reasons:

  1. Legal exposure: Active reconnaissance constitutes interaction with the target's systems. Without authorization, this could violate computer fraud and abuse laws.
  2. Detection risk: Active reconnaissance generates logs and network traffic that can alert the target's security team.
  3. Scope compliance: Your engagement scope may permit passive recon broadly but restrict active recon to specific IP ranges, domains, or time windows.

⚖️ Legal Note: Before beginning any active reconnaissance, verify that your authorization explicitly covers the systems you plan to interact with. Your scope of work should list authorized IP ranges, domains, and testing windows. If you discover additional systems during passive recon that are not in scope, request a scope amendment before actively probing them. In many jurisdictions, unauthorized active reconnaissance can be prosecuted under laws like the US Computer Fraud and Abuse Act (CFAA), the UK Computer Misuse Act, or the EU's Directive on Attacks Against Information Systems.

8.1.2 The Active Recon Mindset

Active reconnaissance requires a different mindset than passive recon. You are now operating in an environment where:

  • Time matters: The longer you probe, the more likely you are to be detected. Efficient, targeted reconnaissance is better than noisy, comprehensive scanning.
  • Noise management is critical: You need to balance thoroughness with stealth. In a red team engagement, detection could mean game over. In a standard penetration test, excessive noise may still generate false alarms that waste the client's security team's time.
  • Every response is intelligence: Error messages, timeout patterns, redirect behaviors, and even the absence of a response all provide information.
  • You are building on passive recon: The intelligence gathered in Chapter 7 guides your active recon. Instead of blindly scanning, you target specific areas identified during passive reconnaissance.

💡 Intuition: Think of passive recon as studying the blueprint of a building from public records, and active recon as walking around the building, testing doors, and peering through windows. Both are valuable, but the second is visible to anyone watching, and the second is informed by the first.

8.1.3 Planning Active Reconnaissance

Before launching any tools, plan your active reconnaissance based on passive recon findings:

  1. Prioritize targets: Focus on hosts and services most likely to yield actionable intelligence
  2. Define boundaries: Know exactly which systems are in scope
  3. Choose your tools: Select the right tools for each target, avoiding unnecessary noise
  4. Set timing: Plan scans for authorized testing windows; consider time zones
  5. Prepare documentation: Set up your note-taking environment and evidence capture tools

For MedSecure Health Systems, our passive recon revealed 47 subdomains, several cloud services, and an exposed Elasticsearch instance. Our active recon plan might prioritize: - Verifying which passive recon subdomains actually resolve and respond - Attempting DNS zone transfers against MedSecure's name servers - Fingerprinting the web applications on key subdomains (patient portal, API, EHR) - Enumerating additional subdomains through brute forcing - Checking for subdomain takeover opportunities

8.2 DNS Enumeration and Zone Transfers

8.2.1 Why DNS Enumeration Matters

DNS is the backbone of internet infrastructure, translating human-readable domain names into IP addresses. For an ethical hacker, DNS is a map of the target's digital territory. Comprehensive DNS enumeration reveals:

  • All publicly accessible hostnames and their IP addresses
  • Mail servers and email routing
  • Service locations (SRV records)
  • Load balancing configurations
  • Cloud provider usage
  • Internal naming conventions
  • Legacy systems that may still be accessible

8.2.2 DNS Record Enumeration

Active DNS enumeration goes beyond the passive DNS lookups covered in Chapter 7. We now directly query the target's DNS infrastructure:

# Enumerate all common record types
dig medsecure.com ANY
dig medsecure.com A
dig medsecure.com AAAA
dig medsecure.com MX
dig medsecure.com NS
dig medsecure.com TXT
dig medsecure.com SOA
dig medsecure.com SRV
dig medsecure.com CNAME
dig medsecure.com CAA

# Query specific name servers
dig @ns1.medsecure.com medsecure.com ANY

# Trace DNS resolution path
dig +trace medsecure.com

# Reverse DNS lookup
dig -x 203.0.113.50

# Reverse DNS for an entire range (using a loop)
for ip in $(seq 1 254); do
  dig -x 203.0.113.$ip +short
done

Interpreting DNS Responses:

Every DNS response contains valuable intelligence. Pay attention to:

  • TTL values: Low TTLs (e.g., 60-300 seconds) may indicate dynamic infrastructure, load balancing, or CDN usage. High TTLs (e.g., 86400 seconds) suggest static infrastructure.
  • Multiple A records: The same hostname resolving to multiple IPs indicates load balancing or CDN distribution.
  • CNAME chains: Following CNAME records reveals third-party services and CDN configurations.
  • SOA records: The serial number format in SOA records can reveal whether DNS is manually managed (date-based serials like 2024012301) or automatically managed (sequential numbers).

8.2.3 DNS Zone Transfers (AXFR)

A DNS zone transfer is a mechanism for replicating DNS databases between servers. In a properly configured environment, zone transfers are restricted to authorized secondary DNS servers. In a misconfigured environment, anyone can request a full copy of the DNS zone — effectively getting a complete list of every hostname in the domain.

# Attempt zone transfer
dig @ns1.medsecure.com medsecure.com AXFR

# Using host command
host -t axfr medsecure.com ns1.medsecure.com

# Using nslookup
nslookup
> server ns1.medsecure.com
> set type=axfr
> medsecure.com

A successful zone transfer returns every DNS record in the zone, potentially revealing: - Internal hostnames (internal.medsecure.com, dc01.medsecure.com) - Development and staging servers - Database servers - VPN endpoints - All mail servers - Service records revealing internal services

⚠️ Common Pitfall: Zone transfers are increasingly rare in well-configured environments. Most modern DNS providers disable zone transfers by default. However, do not skip this check — misconfigurations still exist, especially in organizations that manage their own DNS infrastructure. In our experience, approximately 5-10% of organizations still have at least one name server that permits zone transfers.

🔵 Blue Team Perspective: To prevent zone transfer leakage, configure your DNS servers to restrict AXFR requests to authorized secondary servers only. In BIND, use the allow-transfer directive. In Microsoft DNS, restrict zone transfers in the zone properties. Regularly audit your DNS configuration and test for zone transfer vulnerabilities from external IPs.

8.2.4 DNS Brute Forcing

When zone transfers fail (as they usually will), subdomain brute forcing is the next technique. This involves querying the DNS server for common subdomain names to see which ones resolve:

# Using fierce
fierce --domain medsecure.com --dns-servers ns1.medsecure.com

# Using dnsenum
dnsenum medsecure.com

# Using dnsrecon
dnsrecon -d medsecure.com -t brt -D /usr/share/wordlists/subdomains-top1million-5000.txt

# Using gobuster DNS mode
gobuster dns -d medsecure.com -w /usr/share/wordlists/subdomains-top1million-20000.txt -t 50

# Using Amass (active mode)
amass enum -active -d medsecure.com -brute -w /usr/share/wordlists/subdomains.txt

# Using massdns for high-speed resolution
massdns -r /usr/share/massdns/lists/resolvers.txt \
  -t A -o S -w results.txt subdomains.txt

Wordlist Selection Matters:

The quality of your subdomain brute forcing depends entirely on your wordlist. Common wordlist sources include:

  • SecLists: Jason Haddix's all.txt and Subdomains-top1million-* lists
  • Assetnote: Regularly updated wordlists based on real-world data
  • Custom wordlists: Build wordlists based on patterns discovered during passive recon (e.g., if you find us-east-prod-api, try us-west-prod-api, eu-east-prod-api, etc.)

🔴 Red Team Perspective: Custom wordlists built from passive recon findings are far more effective than generic wordlists. If passive recon reveals a naming convention like {region}-{env}-{service}.medsecure.com, create a targeted wordlist that enumerates all combinations. This approach finds subdomains that no generic wordlist would contain.

8.2.5 DNS Cache Snooping

DNS cache snooping exploits the caching behavior of DNS resolvers to determine which domains have been recently queried. If a DNS resolver has a cached response for a domain, someone on the network recently visited that domain.

# Non-recursive query (cache snoop)
dig @ns1.medsecure.com www.competitor.com +norecurse

# If the response has answers, the domain is in the cache
# meaning someone at the organization recently visited it

This technique can reveal: - Third-party services the organization uses - Websites employees visit - Potential vendor relationships - Shadow IT services

However, most modern DNS resolvers are configured to prevent cache snooping, so this technique has limited reliability.

8.3 Subdomain Discovery and Takeover

8.3.1 Comprehensive Subdomain Discovery

Subdomain discovery combines passive and active techniques into a comprehensive enumeration. The goal is to identify every accessible subdomain — because each one represents a potential entry point.

The Multi-Source Approach:

The most effective subdomain enumeration uses multiple tools and techniques in combination:

  1. Certificate Transparency (passive): crt.sh, Censys
  2. Search engines (passive): Google, Bing, DuckDuckGo
  3. DNS aggregators (passive): SecurityTrails, VirusTotal, DNSDumpster
  4. Brute forcing (active): Gobuster, Amass, MassDNS
  5. Permutation scanning (active): altdns, dnsgen
  6. Web archive (passive): Wayback Machine, CommonCrawl
  7. Content analysis (active): Scraping JavaScript files for referenced subdomains
# Comprehensive Amass enumeration (combines passive + active)
amass enum -d medsecure.com -active -brute -ip -src -o amass_results.txt

# altdns for permutation scanning
# First, create a base list from discovered subdomains
altdns -i discovered_subdomains.txt -o permuted.txt \
  -w /usr/share/altdns/words.txt
# Then resolve the permutations
massdns -r resolvers.txt -t A -o S permuted.txt > permuted_resolved.txt

# dnsgen for smart permutations
cat discovered_subdomains.txt | dnsgen - | massdns -r resolvers.txt -t A -o S

Permutation scanning is particularly powerful. Given a discovered subdomain like staging-api.medsecure.com, permutation tools generate variations: - staging-api-v2.medsecure.com - staging-api-old.medsecure.com - dev-api.medsecure.com - test-api.medsecure.com - staging-api2.medsecure.com

8.3.2 Subdomain Takeover Vulnerabilities

Subdomain takeover is one of the most impactful findings that can emerge from reconnaissance. It occurs when a subdomain points (via CNAME or A record) to a third-party service that the organization no longer controls.

How Subdomain Takeover Works:

  1. Organization creates blog.medsecure.com with a CNAME pointing to medsecure.ghost.io (Ghost blogging platform)
  2. Organization cancels their Ghost account but forgets to remove the DNS record
  3. blog.medsecure.com still has a CNAME to medsecure.ghost.io, but that Ghost account no longer exists
  4. An attacker registers medsecure.ghost.io on Ghost's platform
  5. Now blog.medsecure.com serves the attacker's content
  6. The attacker can serve phishing pages, steal cookies set on *.medsecure.com, and compromise the organization's reputation

Common Services Vulnerable to Subdomain Takeover:

Service CNAME Pattern Takeover Indicator
AWS S3 *.s3.amazonaws.com "NoSuchBucket" error
GitHub Pages *.github.io 404 page with GitHub branding
Heroku *.herokuapp.com "No such app" error
Azure *.azurewebsites.net "Web app not found"
Shopify *.myshopify.com "Sorry, this shop is currently unavailable"
Fastly *.fastly.net "Fastly error: unknown domain"
Ghost *.ghost.io "The thing you were looking for is no longer here"
Pantheon *.pantheonsite.io "404 error unknown site"
Tumblr *.tumblr.com "There's nothing here"
Zendesk *.zendesk.com "Help Center Closed"

Detecting Subdomain Takeover:

# subjack — automated subdomain takeover detection
subjack -w subdomains.txt -t 100 -timeout 30 -ssl \
  -c /usr/share/subjack/fingerprints.json -v

# nuclei with takeover templates
nuclei -l subdomains.txt -t takeovers/ -o takeover_results.txt

# can-i-take-over-xyz — manual reference
# https://github.com/EdOverflow/can-i-take-over-xyz

# Manual verification: check CNAME and then visit the URL
dig blog.medsecure.com CNAME
curl -v https://blog.medsecure.com

📊 Real-World Application: Subdomain takeovers are more common than most organizations realize. A 2022 study by Detectify found that approximately 15% of large organizations had at least one subdomain vulnerable to takeover. Bug bounty programs regularly pay $500-$5,000 for subdomain takeover findings, depending on the domain's significance.

8.3.3 Virtual Host Discovery

In addition to subdomains that have their own DNS records, web servers often host multiple websites on a single IP address using virtual hosting. The web server uses the Host header in the HTTP request to determine which website to serve.

Virtual host enumeration sends HTTP requests to a target IP with different Host headers to discover which hostnames the server responds to:

# Using gobuster vhost mode
gobuster vhost -u http://203.0.113.50 -w /usr/share/wordlists/subdomains.txt \
  -t 50 --append-domain -d medsecure.com

# Using ffuf
ffuf -w /usr/share/wordlists/subdomains.txt -u http://203.0.113.50 \
  -H "Host: FUZZ.medsecure.com" -fs 0

# Using wfuzz
wfuzz -w /usr/share/wordlists/subdomains.txt -H "Host: FUZZ.medsecure.com" \
  --hc 404 --hw 0 http://203.0.113.50

Virtual host enumeration can reveal: - Internal applications served on the same IP as public websites - Development or staging versions of applications - Administrative interfaces not linked from the public site - Legacy applications still running on shared hosting

🧪 Try It in Your Lab: Set up an Apache or Nginx web server with multiple virtual hosts. Create one public site and one "hidden" admin interface. Practice discovering the hidden virtual host using gobuster or ffuf. This is an excellent exercise for understanding how virtual hosting works and why it creates security exposure.

8.4 Web Application Fingerprinting

8.4.1 Why Fingerprinting Matters

Web application fingerprinting is the process of identifying the technologies, frameworks, and software versions running on a web server. This intelligence directly maps to vulnerability research — once you know a server runs Apache 2.4.49, you can search for CVE-2021-41773 (path traversal). Once you know an application uses WordPress 5.8, you can search for known vulnerabilities in that version and its plugins.

8.4.2 HTTP Header Analysis

HTTP response headers are the first source of fingerprinting intelligence:

# Retrieve HTTP headers
curl -I https://medsecure.com

# Example response:
# HTTP/2 200
# Server: nginx/1.18.0
# Content-Type: text/html; charset=UTF-8
# X-Powered-By: PHP/7.4.33
# Set-Cookie: PHPSESSID=...; path=/
# X-Frame-Options: SAMEORIGIN
# X-Content-Type-Options: nosniff
# Strict-Transport-Security: max-age=31536000

From this single response, we learn: - Web server: nginx 1.18.0 - Server-side language: PHP 7.4.33 - Session management: PHP sessions (PHPSESSID cookie) - Security headers present: X-Frame-Options, X-Content-Type-Options, HSTS - Security headers missing: Content-Security-Policy, Permissions-Policy

Headers to examine:

Header Intelligence Value
Server Web server software and version
X-Powered-By Server-side language/framework
Set-Cookie Session management, frameworks (e.g., JSESSIONID=Java, PHPSESSID=PHP, ASP.NET_SessionId=.NET)
X-AspNet-Version .NET version
X-Generator CMS or framework (e.g., WordPress, Drupal)
X-Drupal-Cache Drupal CMS
X-Varnish Varnish cache server
Via Proxy servers and CDNs
X-Cache CDN caching status
CF-Ray Cloudflare CDN
X-Amz-* AWS infrastructure

8.4.3 Response Body Analysis

The HTML source code of web pages provides rich fingerprinting data:

Meta tags and generators:

<meta name="generator" content="WordPress 6.4.2" />
<meta name="generator" content="Drupal 10" />

JavaScript and CSS file paths:

<script src="/wp-includes/js/jquery/jquery.min.js?ver=3.7.1"></script>
<link rel="stylesheet" href="/wp-content/themes/medsecure-theme/style.css" />

The path /wp-includes/ immediately identifies WordPress. The theme name (medsecure-theme) may be a custom theme with its own vulnerabilities.

HTML comments:

<!-- Built with Angular 16.2.0 -->
<!-- Generated by Joomla! 4.3 -->

Error pages: Custom error pages may reveal the underlying technology. A 404 page saying "Whitelabel Error Page" identifies Spring Boot. A detailed stack trace may reveal the exact Java version, framework, and even internal package names.

Favicon hashes: Web frameworks and applications often use default favicons. By hashing the favicon, you can identify the technology:

# Download and hash the favicon
curl -s https://medsecure.com/favicon.ico | md5sum

# Compare against known favicon hashes (e.g., OWASP favicon database)

8.4.4 Automated Web Fingerprinting Tools

Wappalyzer / BuiltWith: Browser extensions and web services that identify technologies used on websites. Wappalyzer detects CMS, frameworks, JavaScript libraries, analytics tools, CDNs, and more.

WhatWeb: Command-line web fingerprinting tool included in Kali Linux:

# Basic scan
whatweb medsecure.com

# Aggressive scan (more requests, more detail)
whatweb -a 3 medsecure.com

# Scan multiple targets
whatweb -i targets.txt -a 3 --log-json=whatweb_results.json

WhatWeb output example:

https://medsecure.com [200 OK] Bootstrap, Country[US], HTML5,
HTTPServer[nginx/1.18.0], IP[203.0.113.50], JQuery[3.6.0],
Meta-Author[MedSecure Health Systems], Modernizr, PHP[7.4.33],
Script, Title[MedSecure - Patient Portal], WordPress[6.4.2],
X-Powered-By[PHP/7.4.33]

Nikto: Web server scanner that identifies known vulnerabilities, misconfigurations, and version-specific issues:

# Basic Nikto scan
nikto -h https://medsecure.com -o nikto_results.html -Format html

# Scan specific port
nikto -h https://medsecure.com -p 8443

# Tuning options (select specific test categories)
nikto -h https://medsecure.com -Tuning 1234

Webanalyze: Go-based tool that uses Wappalyzer's technology fingerprint database:

webanalyze -host medsecure.com -crawl 2 -output json

Best Practice: Use multiple fingerprinting tools. Each tool has different detection methods and databases. WhatWeb might identify the CMS while Wappalyzer catches the JavaScript frameworks. Nikto might find specific misconfigurations that other tools miss. Cross-reference results for the most complete picture.

8.4.5 WAF Detection and Identification

Web Application Firewalls (WAFs) sit between the client and the web server, filtering malicious requests. Identifying the WAF is important because: - Different WAFs have different bypass techniques - WAF presence indicates security investment - WAF rules may provide false positives during scanning

# wafw00f — WAF detection tool
wafw00f https://medsecure.com

# Output example:
# [+] The site https://medsecure.com is behind Cloudflare (Cloudflare Inc.)

# nmap WAF detection script
nmap -p 443 --script http-waf-detect medsecure.com
nmap -p 443 --script http-waf-fingerprint medsecure.com

Common WAF indicators:

WAF Detection Indicators
Cloudflare CF-Ray header, __cfduid cookie, specific error pages
AWS WAF x-amzn-RequestId header, specific 403 page format
Akamai AkamaiGHost server header, specific cookie patterns
Imperva/Incapsula X-CDN: Imperva, incap_ses_* cookies
F5 BIG-IP ASM TS cookies, specific blocking pages
ModSecurity Specific 403 error pages, Mod_Security in headers

🔵 Blue Team Perspective: While WAFs provide valuable protection, security teams should not rely on WAF obscurity. Assume that attackers can identify your WAF — because they can. Layer your defenses: WAF for broad protection, application-level security controls for depth, and monitoring for detection. Regularly review WAF logs for reconnaissance patterns like the probes described in this section.

8.4.6 HTTP Method Enumeration

Testing which HTTP methods a web server accepts reveals both its functionality and potential misconfiguration. The HTTP specification defines many methods beyond GET and POST, and dangerous methods left enabled are a common finding:

# Test allowed HTTP methods using OPTIONS
curl -s -X OPTIONS https://medsecure.com -I

# Nmap HTTP methods script
nmap -p 443 --script http-methods --script-args http-methods.url-path='/' medsecure.com

# Manually test specific methods
curl -s -X PUT https://medsecure.com/test.html -d "test content" -o /dev/null -w "%{http_code}"
curl -s -X DELETE https://medsecure.com/test.html -o /dev/null -w "%{http_code}"
curl -s -X TRACE https://medsecure.com -o /dev/null -w "%{http_code}"

Dangerous methods to check for:

  • PUT/DELETE: If enabled without authentication, attackers may be able to upload arbitrary files or delete existing resources. PUT in particular can enable web shell uploads.
  • TRACE: Enables Cross-Site Tracing (XST) attacks, which can be used to steal cookies marked with the HttpOnly flag by exploiting the server's echoing of the entire request, including headers.
  • CONNECT: Can be abused to tunnel traffic through the web server as a proxy, potentially allowing access to internal networks.
  • PATCH: While less dangerous than PUT, unsecured PATCH endpoints can allow partial modification of resources.

💡 Intuition: HTTP method testing is particularly important on WebDAV-enabled servers. WebDAV extends HTTP with methods like PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, and LOCK. If WebDAV is enabled but not properly secured, you may have full read/write access to the web server's file system. This is a common finding on legacy IIS servers and is frequently exploited in penetration tests.

8.4.7 Service Version Vulnerability Correlation

Once you have identified specific technology versions through fingerprinting, the next step is correlating these versions against known vulnerability databases. This is the bridge between reconnaissance and vulnerability assessment:

# Search for known vulnerabilities in identified versions
searchsploit wordpress 6.4
searchsploit apache 2.4.49
searchsploit spring boot 3.1

# Query the NIST NVD API for specific CPE entries
curl -s "https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=cpe:2.3:a:wordpress:wordpress:6.4.2:*:*:*:*:*:*:*"

# Check CVE Details
# https://www.cvedetails.com/version-search.php

Building a version-to-vulnerability mapping:

For each identified technology, document: 1. Product and exact version: e.g., Apache HTTP Server 2.4.49 2. End of Life status: Is this version still receiving security patches? PHP 7.4 reached EOL in November 2022 — any application running it is missing years of security fixes. 3. Known CVEs: What vulnerabilities affect this version? Apache 2.4.49 is famously vulnerable to CVE-2021-41773 (path traversal and RCE). 4. Exploit availability: Are public exploits available? A vulnerability with a Metasploit module or public proof-of-concept is far more dangerous than one with only a theoretical description. 5. Applicability: Does the vulnerability apply to the target's specific configuration? A vulnerability in Apache's mod_cgi module is irrelevant if mod_cgi is not enabled.

🔴 Red Team Perspective: Maintain a personal database of version-to-vulnerability mappings for common technologies. When fingerprinting reveals "nginx 1.18.0" or "jQuery 3.5.1," you should immediately know the relevant CVEs without needing to search. Speed in correlating versions to vulnerabilities gives you an edge in time-boxed engagements. Many experienced testers keep a spreadsheet or notes file organized by technology, listing critical CVEs and their detection methods alongside the version ranges affected.

8.5 Technology Stack Identification

8.5.1 The Full-Stack Picture

A complete technology stack identification combines information from multiple sources to build a comprehensive picture of the target's infrastructure:

Frontend Technologies: - HTML/CSS framework (Bootstrap, Tailwind, Material UI) - JavaScript framework (React, Angular, Vue.js, jQuery) - Client-side libraries (Lodash, Moment.js, D3.js) - Analytics and tracking (Google Analytics, Hotjar, Mixpanel)

Backend Technologies: - Web server (Apache, Nginx, IIS, Tomcat) - Application framework (Django, Rails, Spring, Express, Laravel) - Server-side language (Python, Ruby, Java, Node.js, PHP, C#) - API style (REST, GraphQL, gRPC)

Infrastructure: - Cloud provider (AWS, Azure, GCP, on-premises) - CDN (Cloudflare, Akamai, Fastly, CloudFront) - Container orchestration (Kubernetes, Docker Swarm, ECS) - Load balancer (ELB, HAProxy, F5)

Data Layer: - Database (MySQL, PostgreSQL, MongoDB, Oracle, SQL Server) - Caching (Redis, Memcached, Varnish) - Search (Elasticsearch, Solr) - Message queue (RabbitMQ, Kafka, SQS)

Security Controls: - WAF (identified above) - SSL/TLS configuration and certificate authority - Security headers - Authentication mechanisms (SSO, OAuth, SAML)

8.5.2 Techniques for Stack Identification

Job Postings: Perhaps the most reliable way to identify a technology stack. If MedSecure is hiring for "Senior React Developer with GraphQL and AWS experience," you have confirmed three technologies without touching their systems.

Error Messages: Deliberately triggering errors (within scope) can reveal detailed stack information:

# Request a non-existent page
curl -v https://medsecure.com/nonexistent_page_12345

# Submit malformed input
curl -v "https://medsecure.com/search?q=%00%01%02"

# Request with unusual HTTP methods
curl -v -X DELETE https://medsecure.com/
curl -v -X TRACE https://medsecure.com/
curl -v -X OPTIONS https://medsecure.com/

robots.txt and sitemap.xml: These files often reveal technology-specific paths:

curl https://medsecure.com/robots.txt
curl https://medsecure.com/sitemap.xml

# robots.txt might contain:
# Disallow: /wp-admin/          (WordPress)
# Disallow: /administrator/      (Joomla)
# Disallow: /admin/login         (Django)
# Disallow: /rails/info/         (Ruby on Rails)

Default files and paths: Every technology stack has default files:

# WordPress
curl -s https://medsecure.com/wp-login.php
curl -s https://medsecure.com/xmlrpc.php
curl -s https://medsecure.com/wp-json/

# Drupal
curl -s https://medsecure.com/CHANGELOG.txt
curl -s https://medsecure.com/core/CHANGELOG.txt

# Joomla
curl -s https://medsecure.com/administrator/
curl -s https://medsecure.com/configuration.php-dist

# ASP.NET
curl -s https://medsecure.com/web.config
curl -s https://medsecure.com/elmah.axd

# Java/Spring
curl -s https://medsecure.com/actuator/
curl -s https://medsecure.com/actuator/env

# Node.js/Express
curl -s https://medsecure.com/package.json

# PHP
curl -s https://medsecure.com/phpinfo.php
curl -s https://medsecure.com/info.php

🔴 Red Team Perspective: Spring Boot Actuator endpoints (/actuator/env, /actuator/health, /actuator/beans, /actuator/configprops) are among the most commonly found and most impactful misconfigurations. These endpoints can reveal environment variables (including database passwords), internal application structure, and system configuration. Always check for exposed actuator endpoints on Java/Spring applications.

8.5.3 Content Discovery and Forced Browsing

Content discovery (also called forced browsing or directory brute forcing) is the process of discovering hidden files, directories, and endpoints that are not linked from the main website:

# Using gobuster
gobuster dir -u https://medsecure.com -w /usr/share/wordlists/dirb/common.txt \
  -t 50 -o gobuster_results.txt -x php,asp,aspx,jsp,html,js

# Using feroxbuster (Rust-based, faster)
feroxbuster -u https://medsecure.com -w /usr/share/wordlists/dirb/common.txt \
  -t 50 -x php,asp,aspx,jsp -o ferox_results.txt

# Using dirsearch
dirsearch -u https://medsecure.com -w /usr/share/wordlists/dirb/common.txt \
  -e php,asp,aspx,jsp -t 50

# Using ffuf
ffuf -u https://medsecure.com/FUZZ -w /usr/share/wordlists/dirb/common.txt \
  -mc 200,204,301,302,307,401,403,405 -t 50 -o ffuf_results.json

What to look for in content discovery:

  • /admin/, /administrator/, /manage/ — Administrative interfaces
  • /api/, /api/v1/, /api/v2/ — API endpoints
  • /backup/, /bak/, /old/ — Backup files and old versions
  • /config/, /conf/, /settings/ — Configuration files
  • /debug/, /test/, /dev/ — Debug and development resources
  • /docs/, /documentation/ — API documentation (Swagger/OpenAPI)
  • /uploads/, /files/, /media/ — User-uploaded content
  • /.git/, /.svn/, /.hg/ — Version control directories
  • /.env, /.htaccess, /.htpasswd — Configuration files
  • /phpmyadmin/, /adminer/, /pgadmin/ — Database management tools
  • /grafana/, /kibana/, /prometheus/ — Monitoring dashboards

📊 Real-World Application: Exposed .git directories are alarmingly common. If https://medsecure.com/.git/HEAD returns content, the entire source code repository can potentially be downloaded and reconstructed using tools like git-dumper. This gives an attacker complete source code access, including configuration files, hardcoded credentials, and internal logic.

8.5.4 API Discovery and Enumeration

Modern web applications increasingly rely on APIs. Discovering and mapping API endpoints is a critical part of active reconnaissance:

# Check for OpenAPI/Swagger documentation
curl -s https://api.medsecure.com/swagger.json
curl -s https://api.medsecure.com/swagger-ui/
curl -s https://api.medsecure.com/openapi.json
curl -s https://api.medsecure.com/api-docs/
curl -s https://api.medsecure.com/v1/api-docs/

# GraphQL introspection
curl -s -X POST https://api.medsecure.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{__schema{types{name,fields{name}}}}"}'

# Check for common API paths
ffuf -u https://api.medsecure.com/FUZZ -w /usr/share/wordlists/api-endpoints.txt \
  -mc 200,201,204,301,302,401,403

JavaScript analysis for API endpoints:

Modern single-page applications embed API endpoints in their JavaScript code. By analyzing JavaScript files, you can discover: - API base URLs and endpoints - Authentication token handling - Hidden functionality - API keys embedded in client-side code

# Download and search JavaScript files
curl -s https://medsecure.com | grep -oP 'src="[^"]*\.js"' | \
  while read js; do
    url=$(echo $js | grep -oP '"[^"]*"' | tr -d '"')
    curl -s "https://medsecure.com$url" | grep -oP '"/api/[^"]*"'
  done

# Using LinkFinder for automated JS analysis
linkfinder -i https://medsecure.com -o cli

8.6 Certificate Transparency Mining as Active Reconnaissance

8.6.1 Beyond Passive CT Log Searching

While we covered certificate transparency logs as a passive technique in Chapter 7, active CT mining involves deeper analysis and real-time monitoring that complements active reconnaissance.

Certificate analysis reveals:

  1. Internal hostname patterns: Certificates often cover internal hostnames that follow naming conventions. Analyzing these patterns enables targeted subdomain brute forcing.

  2. Wildcard certificate scope: A wildcard certificate for *.internal.medsecure.com reveals an internal subdomain space worth exploring.

  3. Certificate validity periods: Short-lived certificates (90 days, as with Let's Encrypt) may indicate automated certificate management. Long-lived certificates might indicate manual processes with potential for expiration-related outages.

  4. SAN (Subject Alternative Name) analysis: A single certificate covering medsecure.com, portal.medsecure.com, ehr.medsecure.com, and api.medsecure.com confirms these hostnames are related and likely hosted on the same infrastructure.

  5. Pre-certificates: Certificate Transparency includes pre-certificates that are logged before the final certificate is issued. Monitoring pre-certificates can reveal new infrastructure before it goes live.

8.6.2 Active Certificate Inspection

Beyond searching CT logs, you can actively connect to hosts and examine their certificates:

# Retrieve and examine a certificate
echo | openssl s_client -connect medsecure.com:443 -servername medsecure.com 2>/dev/null | \
  openssl x509 -text -noout

# Extract Subject Alternative Names
echo | openssl s_client -connect medsecure.com:443 -servername medsecure.com 2>/dev/null | \
  openssl x509 -text -noout | grep -A1 "Subject Alternative Name"

# Check certificate chain
echo | openssl s_client -connect medsecure.com:443 -servername medsecure.com -showcerts 2>/dev/null

# Test SSL/TLS configuration
nmap -p 443 --script ssl-enum-ciphers medsecure.com
sslscan medsecure.com
sslyze medsecure.com
testssl.sh medsecure.com

Intelligence from certificate inspection:

  • Certificate authority: Commercial CAs vs. Let's Encrypt vs. internal CA
  • Organization details: The certificate's Organization (O) and Organizational Unit (OU) fields
  • Key strength: RSA key size, ECDSA curve
  • Cipher suites: Supported encryption algorithms (weak ciphers indicate outdated configuration)
  • Protocol versions: TLS 1.0/1.1 support indicates legacy requirements
  • Certificate pinning: Whether the site uses HPKP or certificate pinning

🔗 Connection: The SSL/TLS configuration intelligence gathered here feeds directly into vulnerability assessment (Chapter 14). Weak cipher suites, outdated TLS versions, and expired certificates are all findings that will appear in your penetration test report.

8.6.3 Real-Time Certificate Monitoring

For longer engagements (red team operations, continuous security monitoring), real-time CT monitoring reveals new infrastructure as it is created:

# CertStream — real-time certificate transparency monitoring
# Python client
pip install certstream
python -c "
import certstream

def callback(message, context):
    if message['message_type'] == 'certificate_update':
        domains = message['data']['leaf_cert']['all_domains']
        for domain in domains:
            if 'medsecure' in domain:
                print(f'New cert: {domain}')

certstream.listen_for_events(callback)
"

8.7 Building Your Active Recon Methodology

8.7.1 The Systematic Approach

Active reconnaissance should follow a systematic methodology that builds on passive recon findings. Here is a recommended workflow:

Phase 1: DNS Deep Dive (1-2 hours) 1. Attempt zone transfers against all discovered name servers 2. Enumerate DNS records for all discovered domains 3. Brute force subdomains using targeted wordlists 4. Perform permutation scanning on discovered subdomains 5. Resolve all discovered hostnames to IP addresses 6. Perform reverse DNS on all discovered IP ranges

Phase 2: Host Discovery and Validation (1-2 hours) 1. Verify which hosts from passive recon are actually live 2. Check for subdomain takeover vulnerabilities 3. Identify virtual hosts on shared IP addresses 4. Map the relationship between hostnames and IP addresses

Phase 3: Web Application Fingerprinting (2-3 hours) 1. Analyze HTTP headers for all web applications 2. Run WhatWeb/Wappalyzer against all web targets 3. Identify WAFs and security controls 4. Check for exposed administrative interfaces 5. Examine SSL/TLS configurations 6. Run Nikto against key targets

Phase 4: Content and API Discovery (2-3 hours) 1. Content discovery (directory brute forcing) on key targets 2. API endpoint discovery and documentation analysis 3. JavaScript analysis for hidden endpoints 4. Check for exposed version control directories 5. Check for exposed configuration files and backups 6. Identify development and staging environments

Phase 5: Analysis and Integration (1-2 hours) 1. Consolidate active and passive recon findings 2. Map the complete attack surface 3. Prioritize targets for vulnerability assessment 4. Document all findings with evidence 5. Update the engagement tracking document

8.7.2 Tool Chains and Automation

Experienced penetration testers chain tools together for efficiency. Here is an example automated reconnaissance workflow:

#!/bin/bash
# Automated active recon workflow
TARGET="medsecure.com"
OUTPUT_DIR="/root/engagements/medsecure/active_recon"
mkdir -p $OUTPUT_DIR

echo "[*] Phase 1: DNS Enumeration"
# Zone transfer attempt
for ns in $(dig +short $TARGET NS); do
  dig @$ns $TARGET AXFR >> $OUTPUT_DIR/zone_transfers.txt 2>&1
done

# Subdomain brute force
amass enum -active -d $TARGET -brute -o $OUTPUT_DIR/amass_results.txt

echo "[*] Phase 2: Host Validation"
# Resolve all discovered subdomains
cat $OUTPUT_DIR/amass_results.txt | httpx -silent -o $OUTPUT_DIR/live_hosts.txt

# Check for subdomain takeover
subjack -w $OUTPUT_DIR/amass_results.txt -t 100 -ssl \
  -o $OUTPUT_DIR/takeover_results.txt

echo "[*] Phase 3: Web Fingerprinting"
# Fingerprint all live web hosts
cat $OUTPUT_DIR/live_hosts.txt | while read host; do
  whatweb -a 3 "$host" >> $OUTPUT_DIR/whatweb_results.txt
done

# WAF detection
cat $OUTPUT_DIR/live_hosts.txt | while read host; do
  wafw00f "$host" >> $OUTPUT_DIR/waf_results.txt
done

echo "[*] Phase 4: Content Discovery"
# Directory brute force on key targets
cat $OUTPUT_DIR/live_hosts.txt | while read host; do
  feroxbuster -u "$host" -w /usr/share/wordlists/dirb/common.txt \
    -t 30 -x php,asp,aspx,jsp --silent >> $OUTPUT_DIR/content_discovery.txt
done

echo "[*] Recon complete. Results in $OUTPUT_DIR"

⚠️ Common Pitfall: Automated recon scripts can generate enormous amounts of traffic. Always configure rate limiting in your tools (the -t flag in many tools controls thread count). For a standard penetration test, 30-50 threads is usually appropriate. For a stealthy red team engagement, you may need to reduce this to 1-5 threads and add delays between requests.

8.7.3 Active Recon Against Our Running Examples

MedSecure Health Systems:

Building on our Chapter 7 passive recon, active reconnaissance of MedSecure reveals: - DNS zone transfer fails on primary name servers (properly configured) - Subdomain brute forcing discovers 12 additional subdomains not found passively - staging-ehr.medsecure.com runs an older version of the EHR application - jenkins.medsecuredev.io exposes a login page (potential for credential attacks) - The patient portal uses WordPress 6.4.2 with 8 plugins identified - An exposed .git directory on dev.medsecure.com allows source code download - API documentation is accessible at api.medsecure.com/swagger-ui/ - The Elasticsearch instance (found via Shodan) responds to queries without authentication

ShopStack E-commerce:

Active recon of our e-commerce target discovers: - The main site uses React (frontend) with a Node.js/Express API backend - The API supports GraphQL with introspection enabled (full schema disclosure) - Content discovery reveals /admin panel (returns 401, requires authentication) - JavaScript analysis reveals API endpoints for payment processing, user management, and inventory - A staging environment at staging.shopstack.io uses the same SSL certificate as production - The CDN (Cloudflare) is properly configured, but direct IP access bypasses the WAF

🧪 Try It in Your Lab: Set up a vulnerable web application (OWASP Juice Shop, DVWA, or HackTheBox machines) and perform a complete active reconnaissance workflow. Practice DNS enumeration, web fingerprinting, content discovery, and API enumeration. Document your findings as you would for a real engagement.

8.8 Cloud-Specific Active Reconnaissance

8.8.1 AWS Reconnaissance

When passive recon reveals that a target uses Amazon Web Services, active reconnaissance should include cloud-specific techniques:

S3 Bucket Enumeration: AWS S3 buckets follow predictable naming patterns. Given a company name or domain, you can guess and probe bucket names:

# Common S3 bucket naming patterns
# {company}-backups, {company}-data, {company}-logs, {company}-dev
# {domain}-assets, {domain}-static, {domain}-uploads

# Check if a bucket exists and is publicly accessible
aws s3 ls s3://medsecure-backups --no-sign-request 2>&1
aws s3 ls s3://medsecure-data --no-sign-request 2>&1
aws s3 ls s3://medsecure-patient-data --no-sign-request 2>&1

# Using specialized tools
# cloud_enum - enumerate cloud resources
cloud_enum -k medsecure -k medsecure.com

# S3Scanner - check S3 bucket permissions
s3scanner scan --bucket medsecure-backups

Publicly accessible S3 buckets have been responsible for some of the largest data exposures in history. During active reconnaissance, finding an open S3 bucket containing sensitive data is a critical finding that should be reported immediately.

CloudFront and Origin IP Discovery: Many organizations place their applications behind AWS CloudFront CDN. The CDN masks the origin server's real IP address, potentially bypassing WAF protections. Active recon techniques to find the origin include: - Checking DNS history for A records that predate CloudFront adoption - Searching for the origin IP in Censys or Shodan by matching the SSL certificate - Checking whether the application responds to direct IP access - Looking for subdomains that point directly to EC2 instances

AWS API Gateway Discovery: If the target uses API Gateway, you may discover API endpoints through: - Content discovery on API subdomains - JavaScript analysis revealing API endpoint URLs - Checking for exposed Swagger/OpenAPI documentation - Searching for the API Gateway execution endpoint pattern: {api-id}.execute-api.{region}.amazonaws.com

8.8.2 Azure Reconnaissance

Azure Blob Storage Enumeration:

# Azure blob storage follows the pattern:
# https://{account}.blob.core.windows.net/{container}

# Check for common container names
for container in data backups files uploads logs; do
  curl -s -o /dev/null -w "%{http_code}" \
    "https://medsecure.blob.core.windows.net/$container?restype=container&comp=list"
done

Azure Subdomain Discovery: Azure services create predictable subdomains: - {name}.azurewebsites.net (Web Apps) - {name}.database.windows.net (SQL Database) - {name}.vault.azure.net (Key Vault) - {name}.blob.core.windows.net (Storage) - {name}.azurecr.io (Container Registry)

Brute forcing these Azure-specific subdomains using the target's company name, abbreviations, and project names can reveal cloud resources.

8.8.3 GCP Reconnaissance

Google Cloud Platform resources follow similar patterns: - {project}.appspot.com (App Engine) - {bucket}.storage.googleapis.com (Cloud Storage) - {name}.cloudfunctions.net (Cloud Functions)

Active enumeration of GCP resources uses the same approach: generate candidate names from OSINT findings and probe for their existence.

8.8.4 Multi-Cloud Considerations

Many organizations use multiple cloud providers. Active recon should enumerate all cloud platforms identified during passive recon. Finding a development environment on a secondary cloud provider is a common high-value discovery — organizations often enforce strict security on their primary cloud platform while neglecting secondary deployments.

📊 Real-World Application: During a penetration test of a healthcare company, we discovered through active cloud enumeration that while their production AWS environment was well-secured (WAF, VPN-only access, encrypted storage), they had a GCP project used by the data science team that contained anonymized (but re-identifiable) patient data in publicly accessible Cloud Storage buckets. The data science team had provisioned this environment outside of IT governance, creating a significant compliance and security gap.

🔵 Blue Team Perspective: Organizations should implement Cloud Security Posture Management (CSPM) tools that continuously scan their cloud environments for misconfigurations. Regular external reconnaissance of your own cloud resources — essentially doing what an attacker would do — helps identify exposure before adversaries find it.

8.9 Evading Detection During Active Recon

8.8.1 Understanding Detection Mechanisms

During active reconnaissance, the target's security infrastructure may detect your activities:

  • Intrusion Detection Systems (IDS): Monitor network traffic for known attack patterns. Nmap scans, directory brute forcing, and zone transfer attempts all have recognizable signatures.
  • Web Application Firewalls (WAFs): May block or rate-limit requests from IPs that exhibit scanning behavior.
  • Security Information and Event Management (SIEM): Correlates log data from multiple sources to identify reconnaissance patterns.
  • Rate limiting: Web servers and APIs may block or throttle IPs that send too many requests.

8.9.2 Monitoring Your Own Noise

Before discussing stealth techniques, it is valuable to understand how your reconnaissance activity appears to the target's defenders. Setting up a lab environment with IDS/IPS (Snort, Suricata) and monitoring tools allows you to see your own scans from the defender's perspective.

What defenders see during active recon:

Your Activity Defender's View
DNS brute forcing Thousands of NXDOMAIN responses from a single source IP
Content discovery Sequential 404 responses with predictable URL patterns
Web fingerprinting Requests to common technology-specific paths
Zone transfer attempt AXFR request from an unauthorized IP
Port scanning SYN packets to sequential or random ports
Banner grabbing Connection followed by immediate disconnect

Understanding these signatures helps you design scanning strategies that minimize detection. It also informs your defensive recommendations — if you can spot your own scans, you can explain to the client exactly what they should be monitoring for.

🧪 Try It in Your Lab: Set up a Suricata IDS in front of a vulnerable web application. Run content discovery with feroxbuster at different speeds (50 threads, 10 threads, 1 thread). Examine the Suricata alerts for each speed. At what rate does the IDS start generating alerts? How do different User-Agent strings affect detection?

8.9.3 Stealth Techniques

For engagements where stealth is required (particularly red team operations):

  1. Slow and steady: Reduce scan speed dramatically. Instead of 50 threads, use 1-3 threads with random delays between requests.

  2. Distribute traffic: Use multiple source IP addresses (VPN, cloud instances, residential proxies) to distribute your requests across different origins.

  3. User-Agent rotation: Change your User-Agent string to mimic legitimate browsers and crawlers.

  4. Time-based distribution: Spread your reconnaissance over days or weeks rather than hours.

  5. Mimic legitimate traffic: Interleave reconnaissance requests with legitimate-looking browsing behavior.

  6. DNS over HTTPS (DoH): Use encrypted DNS to prevent network-level DNS monitoring from identifying your enumeration.

  7. Indirect reconnaissance: Use third-party services (Shodan, Censys, SecurityTrails) to query targets indirectly.

# Slow, stealthy Nmap scan
nmap -sS -T2 --max-rate 10 --randomize-hosts -D RND:5 medsecure.com

# Slow content discovery with random delays
ffuf -u https://medsecure.com/FUZZ -w wordlist.txt -rate 5 -t 1 \
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"

# gobuster with reduced speed
gobuster dir -u https://medsecure.com -w wordlist.txt -t 1 --delay 2s

🔴 Red Team Perspective: In real adversary simulations, stealth is paramount. A red team that gets detected during reconnaissance has failed. Plan your active recon to be indistinguishable from normal traffic patterns. This might mean spending a week on active reconnaissance that a pentester would complete in hours. The trade-off is that your findings are far more realistic — they represent what a patient, sophisticated adversary would discover.

🔵 Blue Team Perspective: To detect active reconnaissance, monitor for: (1) Sequential requests to non-existent URLs (directory brute forcing), (2) DNS queries for common subdomain names from a single source, (3) Multiple requests with unusual User-Agent strings, (4) Zone transfer attempts from unauthorized IPs, (5) Requests to known sensitive paths (.git, .env, /actuator/). Implementing rate limiting, CAPTCHA challenges, and honeypot URLs can help detect and deter reconnaissance.

8.10 Advanced Active Reconnaissance Techniques

8.10.1 JavaScript Analysis for Hidden Endpoints

Modern single-page applications (SPAs) embed significant intelligence in their JavaScript bundles. Analyzing these files reveals API endpoints, authentication flows, hidden features, and even hardcoded credentials:

# Download and analyze JavaScript files
# Step 1: Find all JS files referenced in the page
curl -s https://medsecure.com | grep -oP 'src="[^"]*\.js[^"]*"' | \
  sed 's/src="//;s/"//'

# Step 2: Download each JS file and search for interesting patterns
# Using LinkFinder for automated endpoint extraction
linkfinder -i https://medsecure.com -o cli

# Step 3: Search for API endpoints, tokens, and configuration
# Common patterns to search for in JS files:
# /api/v[0-9]/     - API versioned endpoints
# apiKey            - Hardcoded API keys
# secret            - Secret values
# token             - Authentication tokens
# admin             - Administrative functionality
# internal          - Internal endpoints
# debug             - Debug features
# password          - Hardcoded credentials
# \.env             - Environment file references

Source Maps: Developers sometimes accidentally deploy JavaScript source maps (.js.map files) to production. Source maps contain the original, unminified source code — providing complete access to the application's frontend logic, including code comments and variable names that reveal business logic and potential vulnerabilities:

# Check for source maps
curl -s -o /dev/null -w "%{http_code}" https://medsecure.com/assets/app.js.map
curl -s -o /dev/null -w "%{http_code}" https://medsecure.com/static/js/main.chunk.js.map

8.10.2 GraphQL Introspection

GraphQL APIs that have introspection enabled (the default in many frameworks) expose their entire schema — every query, mutation, type, and field. This is equivalent to getting complete API documentation without the developer intending to share it:

# GraphQL introspection query
curl -s -X POST https://api.medsecure.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ __schema { queryType { name } mutationType { name } types { name kind description fields { name description type { name kind } } } } }"}'

# Using graphql-voyager or graphiql for visualization
# These tools present the schema as an interactive diagram

A full GraphQL introspection response reveals every data type in the application, every query and mutation available, and the relationships between objects — effectively providing a complete data model of the application. This is one of the highest-value findings during active web reconnaissance.

8.10.3 CORS and Cross-Origin Configuration Analysis

Cross-Origin Resource Sharing (CORS) configurations can reveal trust relationships between domains and potential security misconfigurations:

# Test CORS configuration
curl -s -I https://api.medsecure.com/ \
  -H "Origin: https://evil.com" | grep -i "access-control"

# Test with null origin
curl -s -I https://api.medsecure.com/ \
  -H "Origin: null" | grep -i "access-control"

# Test with subdomain
curl -s -I https://api.medsecure.com/ \
  -H "Origin: https://dev.medsecure.com" | grep -i "access-control"

If the API responds with Access-Control-Allow-Origin: https://evil.com, the CORS configuration is overly permissive and may allow cross-origin attacks. If it reflects any origin, the application is vulnerable to cross-origin data theft.

8.10.4 Websocket Reconnaissance

Many modern applications use WebSocket connections for real-time features. WebSocket endpoints are often overlooked during reconnaissance:

# Check for WebSocket endpoints
curl -s -i -N -H "Upgrade: websocket" \
  -H "Connection: Upgrade" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  -H "Sec-WebSocket-Version: 13" \
  https://medsecure.com/ws

# Common WebSocket paths
# /ws, /websocket, /socket.io/, /signalr/, /cable, /hub

WebSocket endpoints may bypass WAF protections (since many WAFs do not inspect WebSocket traffic), may lack authentication controls, and can provide access to real-time application data.

Best Practice: Active reconnaissance should not stop at traditional HTTP endpoints. Modern web applications use WebSockets, GraphQL, gRPC, Server-Sent Events, and other protocols that each present unique attack surfaces. A thorough reconnaissance covers all communication channels.

8.11 Email Infrastructure Reconnaissance

8.11.1 Email Security Analysis

Active reconnaissance of the target's email infrastructure reveals critical intelligence for both phishing simulations and technical testing:

SPF, DKIM, and DMARC Verification: While DNS records were examined passively in Chapter 7, active verification confirms the actual enforcement:

# Check SPF record
dig medsecure.com TXT | grep "v=spf1"

# Check DMARC policy
dig _dmarc.medsecure.com TXT

# Check DKIM (requires knowing the selector — common selectors include
# google, default, selector1, selector2, k1, mail)
dig google._domainkey.medsecure.com TXT
dig selector1._domainkey.medsecure.com TXT
dig default._domainkey.medsecure.com TXT

A DMARC policy of p=none means the organization monitors but does not block spoofed emails — making phishing simulations easier. A policy of p=quarantine or p=reject means stricter enforcement and requires more sophisticated phishing infrastructure.

MTA-STS (Mail Transfer Agent Strict Transport Security): Check whether the target enforces encrypted email delivery:

# Check for MTA-STS policy
curl -s https://mta-sts.medsecure.com/.well-known/mta-sts.txt

# Check MTA-STS DNS record
dig _mta-sts.medsecure.com TXT

Email Gateway Identification: Understanding the email security gateway helps assess what filtering a phishing email must bypass: - Proofpoint: MX records containing pphosted.com - Mimecast: MX records containing mimecast.com - Barracuda: MX records containing barracudanetworks.com - Microsoft Defender for Office 365: MX records pointing to mail.protection.outlook.com - Google Workspace: MX records pointing to google.com or googlemail.com

Each email security gateway has different strengths and weaknesses. Knowing which gateway protects the target helps you choose appropriate phishing payloads and delivery techniques.

8.11.2 Email Address Verification

Active email verification techniques confirm whether discovered email addresses are valid:

# SMTP verification approach (connect to mail server)
# Note: Many modern mail servers reject VRFY and return
# uniform responses to RCPT TO for both valid and invalid addresses

telnet mail.medsecure.com 25
EHLO test.com
MAIL FROM:<test@test.com>
RCPT TO:<john.smith@medsecure.com>
# 250 = likely valid
# 550 = likely invalid
# 452 = mailbox full (confirms existence)

⚠️ Common Pitfall: SMTP verification is an active technique that contacts the target's mail server and generates log entries. Many organizations monitor for SMTP enumeration. If stealth is important, use passive verification methods (HIBP, Hunter.io) instead. Also be aware that catch-all mailboxes return 250 for every address, making SMTP verification unreliable.

8.12 Network Topology Mapping

8.12.1 Traceroute Analysis

Traceroute reveals the network path to the target, including intermediate routers, ISPs, and CDN nodes:

# Standard traceroute
traceroute medsecure.com

# TCP traceroute (more likely to traverse firewalls)
tcptraceroute medsecure.com 443

# MTR (combines ping and traceroute for continuous monitoring)
mtr medsecure.com

Traceroute data reveals: the target's ISP and hosting provider, network segmentation boundaries (where packet TTL changes significantly), CDN presence (multiple different endpoints for the same hostname), and geographic distribution of infrastructure.

8.12.2 BGP and ASN Analysis

Autonomous System Numbers (ASNs) and Border Gateway Protocol (BGP) data reveal the target's network infrastructure at the routing level:

# Find the target's ASN
whois -h whois.radb.net -- '-i origin AS' medsecure.com
dig +short medsecure.com | xargs -I{} whois -h whois.cymru.com " -v {}"

# List all prefixes announced by the ASN
whois -h whois.radb.net -- '-i origin AS12345'

# Using bgp.he.net web interface for ASN investigation
# https://bgp.he.net/AS12345

BGP intelligence reveals all IP address ranges owned or announced by the organization — which may include ranges not discoverable through DNS enumeration alone. These ranges define the complete network perimeter for active scanning.

8.13 From Reconnaissance to Vulnerability Assessment

8.9.1 Synthesizing Your Findings

Active reconnaissance produces a wealth of data. The final step is synthesizing this data into an actionable attack surface map. For each target, you should have:

  1. Hostname/URL: The specific endpoint
  2. IP Address: The underlying IP
  3. Technology Stack: Web server, framework, language, CMS
  4. Security Controls: WAF, security headers, authentication
  5. Potential Vulnerabilities: Based on identified versions and configurations
  6. Priority: How likely is this target to yield findings?

Example attack surface summary for MedSecure:

Target Technology Potential Issues Priority
portal.medsecure.com WordPress 6.4.2, PHP 7.4, nginx Plugin vulnerabilities, PHP version EOL High
api.medsecure.com Node.js, Express, Swagger UI Exposed API docs, potential auth issues High
staging-ehr.medsecure.com Java, Spring Boot, Tomcat Staging env, likely less hardened Critical
jenkins.medsecuredev.io Jenkins (version TBD) CI/CD, potential for code execution Critical
dev.medsecure.com Exposed .git directory Full source code disclosure Critical
elasticsearch.medsecure.com Elasticsearch, no auth Data exposure, potential RCE Critical

8.9.2 Preparing for the Next Phase

With reconnaissance complete, you are ready to move into vulnerability assessment and scanning (covered in Chapters 10-12). Your reconnaissance data directly informs:

  • Which targets to scan: Focus on the highest-priority targets identified during recon
  • What to scan for: Use technology-specific vulnerability checks based on identified versions
  • How to scan: Choose appropriate tools and techniques based on the target's technology stack and security controls
  • What to expect: Anticipate the types of vulnerabilities likely present in the identified technologies

8.9.3 Documenting Active Reconnaissance

Thorough documentation during active reconnaissance is not just good practice — it is essential for professional engagements. Your recon notes serve three critical functions: they support reproducibility (allowing another tester to verify your findings), they provide evidence for the final report, and they create an audit trail demonstrating that you operated within the authorized scope.

Recommended documentation structure:

  1. Timestamp every activity: Record when each tool was run, what parameters were used, and what target was probed. If a client later asks whether a specific alert was caused by your testing, you need to be able to correlate your activities with their detection timeline.

  2. Save raw tool output: Do not rely on memory or summary notes. Save the complete output of every tool invocation — DNS enumeration results, web fingerprinting output, content discovery findings. Disk space is cheap; lost evidence is expensive.

  3. Maintain a running attack surface inventory: As you discover new hosts, subdomains, services, and technologies, update a centralized inventory. A spreadsheet or structured data file (JSON, CSV) that maps each discovered asset to its IP address, technology stack, identified vulnerabilities, and testing priority provides the backbone for the vulnerability assessment phase.

  4. Screenshot key findings: For web-based findings — exposed admin panels, directory listings, default installation pages, error pages revealing stack traces — take screenshots immediately. Web applications change, and a finding that exists during reconnaissance may be remediated before you write your report.

  5. Separate confirmed from inferred findings: Clearly distinguish between what you have confirmed through direct testing and what you have inferred from indirect evidence. "The server returns a WordPress-style 404 page" is an inference. "The /wp-login.php page loads and the HTML source contains <meta name="generator" content="WordPress 6.4.2" />" is a confirmed finding.

Best Practice: Many professional penetration testers use tools like CherryTree, Obsidian, or Notion to maintain structured notes during engagements. Whatever tool you use, ensure that notes are organized by target (not by tool or chronologically), so you can quickly retrieve all information about a specific system when you move to the vulnerability assessment phase.

Summary

Active reconnaissance is where the penetration test truly begins engaging with the target. By systematically enumerating DNS infrastructure, discovering hidden subdomains, fingerprinting web applications, and mapping the complete technology stack, you build the intelligence foundation for every subsequent testing phase.

The key principles of effective active reconnaissance:

  1. Build on passive recon: Never start active recon from scratch — let passive findings guide your approach
  2. Be systematic: Follow a methodology to ensure thorough coverage
  3. Use multiple tools: Cross-reference findings from different tools for accuracy
  4. Manage your noise: Balance thoroughness with stealth appropriate to the engagement type
  5. Document continuously: Capture evidence for every finding as you discover it
  6. Think about the full stack: Enumerate everything from DNS to application to API
  7. Prioritize for action: Not every finding is equally important — focus on what matters

In the next chapter, we explore the human element of reconnaissance. Social engineering reconnaissance combines OSINT with psychological techniques to map the human attack surface — the employees, processes, and behaviors that technical controls cannot fully protect.