26 min read

> "The web was designed to be open, interconnected, and accessible. Every one of those qualities is also a potential attack vector." --- Jeff Williams, Co-Founder of OWASP

Learning Objectives

  • Understand modern web application architecture and its attack surfaces
  • Master the OWASP Top 10 (2021) vulnerability categories
  • Analyze HTTP requests and responses at a protocol level
  • Configure and operate Burp Suite for web application testing
  • Perform web application reconnaissance and site mapping
  • Implement input validation and output encoding defenses

Chapter 18: Web Application Security Fundamentals

"The web was designed to be open, interconnected, and accessible. Every one of those qualities is also a potential attack vector." --- Jeff Williams, Co-Founder of OWASP

Web applications are the dominant interface between organizations and their users, handling everything from e-commerce transactions to healthcare records. They are also, overwhelmingly, the most targeted attack surface in modern cybersecurity. According to Verizon's Data Breach Investigations Report, web application attacks account for approximately 26% of all breaches---more than any other attack pattern. Understanding how web applications work, where they break, and how to test them ethically is not merely a specialization within penetration testing; it is the foundation upon which most modern security assessments are built.

This chapter establishes that foundation. We will dissect web application architecture to understand the components an attacker can reach. We will study the OWASP Top 10 as a structured taxonomy of the most critical web risks. We will go deep into HTTP---the protocol that carries every web interaction---because you cannot test what you do not understand at the wire level. We will set up Burp Suite, the industry-standard intercepting proxy, and learn to use it as an extension of our hands. Finally, we will explore web application reconnaissance: the methodical process of mapping an application's structure, technology stack, and entry points before a single exploit is attempted.

By the end of this chapter, you will see web applications not as polished user interfaces, but as layered systems of trust boundaries, input channels, and state management---each layer presenting opportunities for authorized security testing.


18.1 Modern Web Application Architecture

Before you can test a web application, you must understand what you are testing. Modern web applications are far more complex than the static HTML pages of the early web. They are distributed systems with multiple tiers, each with its own attack surface.

18.1.1 The Three-Tier Architecture

The canonical web application architecture consists of three logical tiers:

Presentation Tier (Client Side) This is what runs in the user's browser. In our running example, ShopStack uses a React frontend---a single-page application (SPA) that renders the user interface, manages client-side state, and communicates with the backend via API calls. The presentation tier includes HTML, CSS, JavaScript, and increasingly, WebAssembly. From a security perspective, everything in this tier is under the attacker's control. The browser is hostile territory.

Application Tier (Server Side) ShopStack's Node.js API server handles business logic: authenticating users, processing orders, calculating prices, enforcing access controls. This tier receives requests from the presentation tier, validates them (or fails to), and interacts with the data tier. It is where most application-level vulnerabilities manifest---injection flaws, broken authentication, broken access control.

Data Tier (Backend) ShopStack uses PostgreSQL for relational data (users, orders, products) and might use Redis for session caching, Elasticsearch for product search, and S3 for file storage. The data tier should never be directly accessible from the internet, but misconfigurations and injection attacks can bridge that gap.

18.1.2 The Expanded Modern Stack

Real-world applications extend well beyond three tiers. Consider ShopStack's full deployment on AWS:

[Browser/Mobile App]
        |
    [CloudFront CDN]
        |
    [AWS WAF]
        |
    [Application Load Balancer]
        |
    [ECS/Fargate Containers]
     /     |      \
[Node.js] [Node.js] [Node.js]  (Application instances)
     \     |      /
    [Internal ALB]
        |
   [PostgreSQL RDS]  [Redis ElastiCache]  [S3 Buckets]

Each component introduces potential vulnerabilities:

  • CDN (CloudFront): Cache poisoning, origin misconfiguration
  • WAF (AWS WAF): Bypass techniques, rule gaps
  • Load Balancer: HTTP request smuggling, header injection
  • Containers: Escape vulnerabilities, misconfigured orchestration
  • Application Servers: The full OWASP Top 10
  • Databases: Injection, misconfigurations, excessive privileges
  • Object Storage: Public bucket exposure, insecure direct object references

18.1.3 API-Driven Architecture

Modern applications increasingly separate their frontend and backend through APIs. ShopStack exposes a RESTful API:

GET    /api/v2/products          # List products
GET    /api/v2/products/:id      # Get specific product
POST   /api/v2/orders            # Create order
GET    /api/v2/users/me          # Get current user profile
PUT    /api/v2/users/me          # Update profile
DELETE /api/v2/admin/users/:id   # Admin: delete user

API-driven architectures shift the attack surface. Instead of testing HTML forms, you are testing JSON endpoints. Instead of session cookies alone, you may encounter JWT tokens, API keys, or OAuth flows. The MedSecure patient portal uses a similar API architecture, but with additional FHIR-compliant healthcare endpoints that add complexity.

Key Concept

Every API endpoint is a potential entry point. During testing, you must enumerate all endpoints---not just those the frontend uses, but also deprecated versions (v1), admin endpoints, and debug routes that may have been left exposed.

18.1.4 Session and State Management

HTTP is stateless, but web applications are not. They maintain state through several mechanisms:

  • Cookies: Server-set values sent with every request. Session cookies (e.g., connect.sid in Express) link requests to server-side session data.
  • JWT (JSON Web Tokens): Self-contained tokens that carry user identity and claims. ShopStack uses JWTs stored in localStorage for API authentication.
  • URL Parameters: Query strings or path parameters that carry state (dangerous if containing sensitive data).
  • Hidden Form Fields: Values embedded in HTML forms that the server expects to receive back.
  • Local/Session Storage: Browser-side storage accessible via JavaScript.

Each state management mechanism has distinct security implications. Cookies can be stolen via XSS if not marked HttpOnly. JWTs can be forged if the signing key is weak or the algorithm is manipulable. URL parameters leak through referrer headers and browser history.

18.1.5 Trust Boundaries

A trust boundary exists wherever data crosses from one trust level to another. In ShopStack:

  1. Browser to Server: The most critical boundary. All user input crosses here.
  2. Server to Database: SQL queries constructed from user input cross here.
  3. Server to External APIs: Payment processor calls, email service integrations.
  4. Between Microservices: Internal service-to-service calls that may lack authentication.
  5. Server to File System: File uploads, log writes, template rendering.

Blue Team Perspective: Defense-in-depth means validating data at every trust boundary, not just at the perimeter. ShopStack should validate input at the API gateway, again in the application logic, and again through database constraints. If any single layer fails, the others should catch the attack.


18.2 The OWASP Top 10 (2021)

The Open Web Application Security Project (OWASP) Top 10 is the most widely recognized document for web application security awareness. Updated periodically (most recently in 2021), it represents a broad consensus on the most critical web application security risks. Understanding the Top 10 is essential not just for testing, but for communicating findings to developers and management.

18.2.1 A01:2021 --- Broken Access Control

Moved from #5 to #1. This is now the most common critical web vulnerability. Broken access control occurs when users can act outside their intended permissions.

Examples in ShopStack: - A regular user accessing /api/v2/admin/users by simply changing the URL - Modifying the user_id parameter in a request to view another user's orders - Accessing the order management API without being an authenticated merchant

Testing Approach: - Test every endpoint with different privilege levels - Attempt Insecure Direct Object Reference (IDOR) by manipulating identifiers - Check for missing function-level access controls on admin endpoints - Verify that CORS policies are properly restrictive

Key Statistic: 94% of applications tested by OWASP had some form of broken access control.

18.2.2 A02:2021 --- Cryptographic Failures

Previously "Sensitive Data Exposure." This category focuses on failures related to cryptography that lead to exposure of sensitive data.

Examples in ShopStack: - Transmitting credit card data over HTTP instead of HTTPS - Using MD5 or SHA-1 for password hashing instead of bcrypt/Argon2 - Storing API keys in plaintext in the database - Using weak or default TLS configurations

Testing Approach: - Verify TLS configuration using tools like testssl.sh - Check for sensitive data in URLs, logs, or error messages - Examine password storage mechanisms - Test for sensitive data transmitted in cleartext

18.2.3 A03:2021 --- Injection

Dropped from #1 to #3 as frameworks have improved default protections, but still critically dangerous. Injection occurs when untrusted data is sent to an interpreter as part of a command or query.

Types Covered in Chapter 19: - SQL Injection - NoSQL Injection - Command Injection - LDAP Injection - Template Injection

ShopStack Example:

# Vulnerable product search
GET /api/v2/products?search=laptop' OR '1'='1

18.2.4 A04:2021 --- Insecure Design

New category in 2021. This represents a fundamental shift in thinking---some vulnerabilities exist because the application was designed insecurely, not because it was implemented incorrectly.

Examples: - A password recovery flow that reveals whether an email is registered - A shopping cart that trusts client-side price calculations - No rate limiting on authentication endpoints - Insufficient anti-automation on critical business flows

Why This Matters: You cannot fix insecure design with a patch. It requires rethinking the architecture.

18.2.5 A05:2021 --- Security Misconfiguration

Expanded to include XML External Entities (XXE). This covers improper configuration at any level of the application stack.

ShopStack Examples: - Default credentials on the PostgreSQL admin interface - Directory listing enabled on the static file server - Stack traces exposed in production error responses - Unnecessary HTTP methods enabled (PUT, DELETE on static resources) - S3 buckets with public read access

18.2.6 A06:2021 --- Vulnerable and Outdated Components

Using components with known vulnerabilities---libraries, frameworks, or other software modules. ShopStack's package.json might include dozens of npm dependencies, each a potential vulnerability.

Testing Approach: - Run npm audit or yarn audit for Node.js dependencies - Use OWASP Dependency-Check or Snyk - Check CVE databases for known vulnerabilities in identified versions - Test for outdated jQuery, Angular, or React versions with known XSS vectors

18.2.7 A07:2021 --- Identification and Authentication Failures

Previously "Broken Authentication." Covers weaknesses in authentication mechanisms.

ShopStack Testing Points: - Credential stuffing resistance (rate limiting, CAPTCHA) - Password policy enforcement - Session fixation after login - JWT implementation flaws (none algorithm, weak secrets) - Multi-factor authentication bypass

18.2.8 A08:2021 --- Software and Data Integrity Failures

New category. Relates to assumptions about software updates, critical data, and CI/CD pipelines without verification.

Examples: - Auto-update mechanisms without signature verification - Deserialization of untrusted data - CI/CD pipeline compromise (supply chain attacks) - Insecure use of CDN-hosted JavaScript without Subresource Integrity (SRI)

18.2.9 A09:2021 --- Security Logging and Monitoring Failures

Previously "Insufficient Logging & Monitoring." Without proper logging, attacks go undetected.

What to Test: - Are login failures logged? - Are access control failures logged? - Do logs include sufficient context (IP, timestamp, user, action)? - Are logs protected from tampering? - Is there alerting on suspicious patterns?

18.2.10 A10:2021 --- Server-Side Request Forgery (SSRF)

New to the Top 10 in 2021. SSRF occurs when a web application fetches a remote resource without validating the user-supplied URL.

ShopStack Example:

# Image preview feature
POST /api/v2/products/preview
{"image_url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}

This request could make the server fetch AWS metadata, exposing IAM credentials. SSRF has been involved in some of the most significant cloud breaches, including the 2019 Capital One breach.

Blue Team Perspective: OWASP Top 10 coverage should be your minimum standard, not your goal. Use it as a communication framework with development teams, but your actual testing should go deeper. Consider OWASP's Application Security Verification Standard (ASVS) for comprehensive coverage.


18.3 HTTP in Depth

HTTP (Hypertext Transfer Protocol) is the language of the web. Every web application test begins and ends with HTTP. To be an effective web application tester, you must be able to read, write, and manipulate HTTP at the raw level.

18.3.1 HTTP Request Structure

An HTTP request consists of four parts:

POST /api/v2/auth/login HTTP/1.1        <-- Request Line
Host: shopstack.example.com              <-- Headers begin
Content-Type: application/json
Content-Length: 52
Cookie: session=abc123; tracking=xyz789
Authorization: Bearer eyJhbGciOiJIUzI1NiJ9...
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: application/json
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
                                         <-- Blank line (CRLF)
{"username":"admin","password":"P@ssw0rd"} <-- Body

The Request Line contains the method, the path (including query string), and the HTTP version. Each component is testable: - Method: Can you change POST to PUT? Does the server behave differently? - Path: Are there path traversal possibilities? Hidden endpoints? - Version: Does HTTP/1.0 vs 1.1 change behavior (useful for request smuggling)?

18.3.2 HTTP Methods

Method Purpose Security Relevance
GET Retrieve resource Should never modify state; parameters in URL (logged, cached)
POST Submit data Body not cached; used for state changes
PUT Replace resource Often used in REST APIs; test for unauthorized updates
PATCH Partial update May bypass validation that checks full PUT requests
DELETE Remove resource Test for unauthorized deletion
HEAD GET without body Useful for recon; may reveal headers without triggering WAF body inspection
OPTIONS Show allowed methods Reveals CORS configuration and supported methods
TRACE Echo request back Can enable Cross-Site Tracing (XST) attacks
CONNECT Establish tunnel Proxy-related; can enable SSRF

Testing Tip: Always send an OPTIONS request to every endpoint during reconnaissance. The Allow header in the response reveals which methods the server accepts:

HTTP/1.1 200 OK
Allow: GET, POST, PUT, DELETE, OPTIONS

If DELETE is allowed on a resource endpoint, test whether access control is enforced for that method.

18.3.3 HTTP Response Structure

HTTP/1.1 200 OK                                    <-- Status Line
Date: Wed, 27 Feb 2026 10:30:00 GMT               <-- Headers
Content-Type: application/json; charset=utf-8
Set-Cookie: session=def456; HttpOnly; Secure; SameSite=Strict; Path=/
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content-Security-Policy: default-src 'self'
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Request-Id: 7a8b9c0d-1e2f-3a4b-5c6d-7e8f9a0b1c2d
Server: nginx/1.24.0
Content-Length: 157

{"success":true,"token":"eyJhbGciOiJIUzI1NiJ9...","user":{"id":42,"role":"customer"}}

18.3.4 Status Codes as Intelligence

Status codes reveal application behavior:

Code Meaning Security Insight
200 OK Request succeeded; baseline response
201 Created Resource created; useful for confirming injection
301/302 Redirect May lead to open redirect vulnerabilities
400 Bad Request Input validation triggered; modify payload
401 Unauthorized Authentication required; test bypass
403 Forbidden Access denied; test for bypass (different method, headers)
404 Not Found Resource doesn't exist; use for directory brute-forcing
405 Method Not Allowed This method blocked; try others
429 Too Many Requests Rate limiting active; note threshold
500 Internal Server Error Server-side error; may indicate injection success
502/503 Bad Gateway/Unavailable Backend failure; may indicate DoS potential

Critical Testing Pattern: The difference between a 403 and a 404 response for non-existent resources can reveal information. If /api/v2/admin/users returns 403 (forbidden) but /api/v2/admin/nonexistent returns 404 (not found), you have confirmed that the users endpoint exists and is protected. A secure application returns the same response for both.

18.3.5 Security-Critical Headers

Request Headers to Manipulate:

  • Host: --- Virtual host routing; test for host header injection
  • X-Forwarded-For: --- IP spoofing behind load balancers
  • Referer: --- CSRF protection bypass if checking referer
  • Content-Type: --- Change from application/json to application/xml to test XXE
  • Cookie: --- Session manipulation, parameter tampering

Response Headers to Verify (Blue Team Checklist):

Header Secure Value Purpose
Strict-Transport-Security max-age=31536000; includeSubDomains Force HTTPS
X-Content-Type-Options nosniff Prevent MIME sniffing
X-Frame-Options DENY or SAMEORIGIN Prevent clickjacking
Content-Security-Policy Restrictive policy Prevent XSS, data exfil
X-XSS-Protection 0 (modern approach) Deprecated; CSP replaces it
Referrer-Policy strict-origin-when-cross-origin Control referer leakage
Permissions-Policy Restrict features Control browser APIs
Cache-Control no-store (for sensitive pages) Prevent caching of sensitive data

18.3.6 Cookies Deep Dive

Cookies are the primary session management mechanism and deserve special attention:

Set-Cookie: session=eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjo0Mn0.abc123;
    Domain=shopstack.example.com;
    Path=/;
    Expires=Wed, 06 Mar 2026 10:30:00 GMT;
    HttpOnly;
    Secure;
    SameSite=Strict

Cookie Attributes and Their Security Impact:

Attribute Effect Risk If Missing
HttpOnly Not accessible via JavaScript XSS can steal cookie
Secure Only sent over HTTPS Cookie exposed on HTTP
SameSite=Strict Not sent on cross-origin requests CSRF attacks possible
SameSite=Lax Sent on top-level navigations only Some CSRF vectors remain
Domain Scope of cookie sharing Overly broad = sibling subdomain access
Path URL path restriction Rarely effective as security control
Expires/Max-Age Cookie lifetime Persistent sessions if too long

Lab Exercise Preview: In your home lab, use Browser Developer Tools (F12 > Application > Cookies) on DVWA to examine cookie attributes. Note which security flags are missing. Then use Burp Suite to intercept and modify cookie values.

18.3.7 HTTP/2 and HTTP/3 Considerations

Modern web applications increasingly use HTTP/2 (binary framing, multiplexing, header compression) and HTTP/3 (QUIC-based, UDP transport). While the security fundamentals remain similar, these protocols introduce new attack surfaces:

  • HTTP/2 request smuggling: Binary framing can create desynchronization between frontend and backend servers
  • HPACK compression attacks: Header compression can leak information (similar to CRIME/BREACH)
  • HTTP/3 UDP-based attacks: New amplification and reflection possibilities

Burp Suite handles HTTP/2 transparently, but you should be aware of protocol differences when analyzing raw traffic.


18.4 Burp Suite Setup and Configuration

Burp Suite, developed by PortSwigger, is the single most important tool for web application penetration testing. It functions as an intercepting proxy that sits between your browser and the target, allowing you to inspect, modify, and replay every HTTP request and response. If you learn only one web testing tool, make it Burp Suite.

18.4.1 Architecture and Components

Burp Suite is a platform with multiple integrated tools:

Component Purpose Usage Frequency
Proxy Intercept and modify HTTP traffic Every test
Scanner Automated vulnerability scanning (Pro only) Most tests
Intruder Automated request manipulation Frequent
Repeater Manual request modification and replay Every test
Sequencer Token randomness analysis Occasional
Decoder Encoding/decoding utility Frequent
Comparer Diff two responses Occasional
Logger Full HTTP history Background
Extender Plugin management Setup

18.4.2 Initial Setup

Step 1: Install Burp Suite Community Edition

Download from portswigger.net. Install with default settings. Launch and create a temporary project (Community) or named project (Pro).

Step 2: Configure Browser Proxy

Option A --- FoxyProxy Browser Extension (Recommended): 1. Install FoxyProxy in Firefox or Chrome 2. Add a proxy: Host 127.0.0.1, Port 8080, Type HTTP 3. Name it "Burp Suite" and enable when testing

Option B --- System Proxy: 1. Set system proxy to 127.0.0.1:8080 2. Remember to disable after testing

Step 3: Install Burp CA Certificate

With proxy active, navigate to http://burpsuite. Download the CA certificate. Import it into your browser's certificate store as a trusted root CA. This allows Burp to intercept HTTPS traffic. Only install this certificate in your testing browser.

Step 4: Configure Scope

In Burp, go to Target > Scope. Add your target:

Protocol: HTTPS
Host: shopstack.example.com
Port: 443
File: ^/api/.*

Enable "Use advanced scope control" for precise targeting. Under Proxy > Options, select "And URL is in target scope" for both interception and history. This prevents capturing traffic from unrelated sites.

18.4.3 Essential Burp Workflows

Workflow 1: Passive Reconnaissance 1. Turn intercept OFF (Proxy > Intercept > "Intercept is off") 2. Browse the application normally 3. Review HTTP history (Proxy > HTTP history) 4. Examine the Site Map (Target > Site map) 5. Note API endpoints, parameters, authentication mechanisms

Workflow 2: Request Manipulation 1. Turn intercept ON 2. Perform action in browser (e.g., submit login form) 3. Modify the intercepted request in Burp 4. Forward the modified request 5. Observe the response

Workflow 3: Repeater Testing 1. Right-click any request in HTTP history 2. "Send to Repeater" 3. Modify the request manually 4. Click "Send" 5. Analyze the response 6. Iterate with different payloads

Workflow 4: Intruder Attacks 1. Send request to Intruder 2. Mark payload positions (e.g., the password field) 3. Select attack type (Sniper, Battering Ram, Pitchfork, Cluster Bomb) 4. Configure payload list 5. Start attack 6. Sort results by status code, length, or response time to identify anomalies

18.4.4 Essential Extensions

Install these via Extender > BApp Store:

  • Autorize: Automatically tests access control by replaying requests with different session tokens
  • Logger++: Enhanced logging with filtering capabilities
  • JSON Beautifier: Pretty-print JSON in all Burp tabs
  • Param Miner: Discover hidden parameters and headers
  • Hackvertor: Advanced encoding and decoding
  • Active Scan++: Enhanced scanning capabilities

Blue Team Perspective: Understanding Burp Suite is equally important for defenders. Security teams should use Burp to validate their WAF rules, test their security headers, and verify that access controls work correctly. Many organizations run Burp scans as part of their CI/CD pipeline using Burp's REST API.


18.5 Web Application Reconnaissance and Mapping

Reconnaissance is the most critical phase of web application testing. A thorough recon phase finds more vulnerabilities than brute-force automated scanning. The goal is to build a complete map of the application: its pages, its API endpoints, its parameters, its technology stack, and its trust boundaries.

18.5.1 Technology Stack Identification

Before testing, determine what you are testing:

Passive Identification:

Source Information Example
HTTP Headers Server software, framework Server: nginx/1.24.0, X-Powered-By: Express
Cookies Framework/language JSESSIONID = Java, PHPSESSID = PHP, connect.sid = Express
HTML Source Framework markers React root div, Angular ng- attributes, Vue.js v- directives
JavaScript Files Framework/version react.production.min.js, angular.min.js
Error Pages Stack/version Detailed stack traces in development mode
Favicon Hash Technology Known favicon hashes for default installs
robots.txt Hidden paths Disallowed directories reveal application structure
sitemap.xml Full URL list Intended for search engines; useful for testers

Tool: Wappalyzer A browser extension that automatically identifies technologies. For ShopStack, it would reveal: React, Node.js, Express, nginx, AWS CloudFront, and potentially PostgreSQL if error messages leak.

Command-Line Identification:

# Check HTTP headers
curl -sI https://shopstack.example.com | grep -i "server\|x-powered\|set-cookie"

# Retrieve robots.txt
curl -s https://shopstack.example.com/robots.txt

# Check common files
for file in robots.txt sitemap.xml .well-known/security.txt crossdomain.xml; do
    echo "--- $file ---"
    curl -s -o /dev/null -w "%{http_code}" "https://shopstack.example.com/$file"
    echo
done

18.5.2 Directory and File Discovery

Automated discovery finds resources not linked from the visible application:

Gobuster:

# Directory brute-force
gobuster dir -u https://shopstack.example.com -w /usr/share/wordlists/dirb/common.txt \
    -t 50 -o gobuster-results.txt --no-error

# With file extensions
gobuster dir -u https://shopstack.example.com \
    -w /usr/share/seclists/Discovery/Web-Content/raft-medium-files.txt \
    -x php,asp,aspx,jsp,html,js,json,xml,txt,bak,old,conf \
    -t 50 -o gobuster-files.txt

# API endpoint discovery
gobuster dir -u https://shopstack.example.com/api/ \
    -w /usr/share/seclists/Discovery/Web-Content/api/api-endpoints.txt \
    -t 30

Valuable Files to Find:

File/Path Significance
.git/ Source code disclosure via git repository
.env Environment variables with secrets
backup.sql, dump.sql Database dumps
web.config, .htaccess Server configuration
/api/swagger.json API documentation
/api/v1/ Old API versions with fewer protections
/debug/, /test/ Debug interfaces
phpinfo.php PHP configuration disclosure
/server-status Apache status page
/actuator/ Spring Boot management endpoints

18.5.3 Spidering and Crawling

Automated crawling follows links to build a site map:

Burp Suite Spider: 1. Right-click the target in Site Map 2. "Crawl" (Burp Scanner) or "Spider this host" (older versions) 3. Configure crawl depth and scope 4. Review discovered content in the Site Map tree

Custom Crawling with Python: For more control, write a custom crawler (see code example example-01-web-crawler.py). Custom crawlers can handle JavaScript rendering, respect rate limits, and extract specific data patterns.

Important Considerations: - Always set crawl scope to prevent testing unauthorized targets - Respect robots.txt during authorized tests (but also review it for intelligence) - Be aware that crawlers can trigger destructive actions (DELETE endpoints, logout links) - Use authenticated crawling to discover protected content - Note: JavaScript-heavy SPAs (like ShopStack's React frontend) require a browser-based crawler or tools that execute JavaScript (like Burp's built-in browser)

18.5.4 API Endpoint Enumeration

For API-driven applications, systematic endpoint discovery is crucial:

From JavaScript Source:

# Extract API calls from JavaScript bundles
curl -s https://shopstack.example.com/static/js/main.js | \
    grep -oP '["'"'"']/api/[^"'"'"'\s]+["'"'"']' | sort -u

From Swagger/OpenAPI: If the application exposes API documentation:

# Common API documentation paths
curl -s https://shopstack.example.com/api/swagger.json
curl -s https://shopstack.example.com/api/docs
curl -s https://shopstack.example.com/api/v2/openapi.yaml
curl -s https://shopstack.example.com/api-docs

From Traffic Analysis: Browse the application extensively with Burp Proxy running. Every API call the frontend makes will appear in HTTP history. Sort by URL to identify endpoint patterns.

18.5.5 Parameter Discovery

Parameters are the primary input channels for attacks:

Visible Parameters: - URL query parameters: ?search=laptop&category=electronics&page=2 - POST body parameters: {"username":"admin","password":"secret"} - Path parameters: /api/v2/products/42 (where 42 is a parameter) - Cookie values: session=abc123

Hidden Parameters: Use Burp's Param Miner extension or manual testing:

# Arjun - parameter discovery tool
arjun -u https://shopstack.example.com/api/v2/products --stable

Common hidden parameters to test: debug, test, admin, verbose, callback, _method, format, template, id, role.

18.5.6 Authentication and Session Analysis

Map the authentication flow completely:

  1. Registration: What validation exists? Can you create admin accounts?
  2. Login: What credentials are required? How does the token/session work?
  3. Session Management: How are sessions maintained? What is the session lifetime?
  4. Password Reset: What is the reset flow? Are tokens time-limited and single-use?
  5. Logout: Is the session properly invalidated server-side?
  6. MFA: If present, can it be bypassed?

For ShopStack, the authentication flow is:

POST /api/v2/auth/register    -> Create account
POST /api/v2/auth/login       -> Receive JWT
GET  /api/v2/users/me         -> Authorization: Bearer <JWT>
POST /api/v2/auth/refresh     -> Refresh expired JWT
POST /api/v2/auth/logout      -> Invalidate refresh token
POST /api/v2/auth/reset       -> Password reset request
POST /api/v2/auth/reset/:token -> Password reset completion

Each of these endpoints requires thorough testing.

18.5.7 Creating a Test Plan from Reconnaissance

After reconnaissance, organize findings into a structured test plan:

Target: ShopStack (shopstack.example.com)
Technology: React + Node.js/Express + PostgreSQL on AWS

Authentication:
  - JWT-based (HS256)
  - Refresh token rotation
  - Password reset via email

API Endpoints Discovered: 47
  - Public: 12 (product listing, search, categories)
  - Authenticated: 28 (orders, profile, cart, wishlist)
  - Admin: 7 (user management, product management, reporting)

Input Parameters Identified: 156
  - Search/filter: 23
  - User input forms: 18
  - File upload: 3
  - API body parameters: 112

Security Headers: Partial
  - HSTS: Present
  - CSP: Present but permissive (unsafe-inline)
  - X-Frame-Options: Missing
  - Cookie flags: HttpOnly present, SameSite=Lax

Priority Test Areas:
  1. Access control on admin endpoints (A01)
  2. JWT implementation (A07)
  3. Search functionality - injection (A03)
  4. File upload - arbitrary file upload (A04)
  5. CSP bypass for XSS (A03)

18.6 Input Validation and Output Encoding

The root cause of most web application vulnerabilities is the same: the application treats user-supplied data as trusted. Input validation and output encoding are the two fundamental defenses against this entire class of attacks.

18.6.1 The Core Problem

Consider ShopStack's product search:

// VULNERABLE: User input directly concatenated into SQL
app.get('/api/v2/products', (req, res) => {
    const query = `SELECT * FROM products WHERE name LIKE '%${req.query.search}%'`;
    db.query(query).then(results => res.json(results));
});

The developer assumed req.query.search would contain a product name. An attacker sends:

GET /api/v2/products?search=' UNION SELECT username,password,null,null,null FROM users--

The server constructs:

SELECT * FROM products WHERE name LIKE '%' UNION SELECT username,password,null,null,null FROM users--%'

And the database returns every username and password.

18.6.2 Input Validation Strategies

Allowlisting (Preferred): Define exactly what valid input looks like and reject everything else:

// Validate search input
const searchPattern = /^[a-zA-Z0-9\s\-]{1,100}$/;
if (!searchPattern.test(req.query.search)) {
    return res.status(400).json({ error: 'Invalid search query' });
}

Denylisting (Fragile): Block known-bad patterns. This approach is inherently incomplete because attackers will find patterns you have not blocked:

// DON'T rely on this alone
const blocked = /['";-]|UNION|SELECT|DROP|INSERT/i;
if (blocked.test(req.query.search)) {
    return res.status(400).json({ error: 'Invalid input' });
}
// Bypass: sElEcT, SEL/**/ECT, Unicode encoding, etc.

Type Validation: Enforce data types strictly:

// Parse and validate numeric input
const productId = parseInt(req.params.id, 10);
if (isNaN(productId) || productId < 1) {
    return res.status(400).json({ error: 'Invalid product ID' });
}

Length Validation: Enforce reasonable length limits:

if (req.body.username.length > 50 || req.body.username.length < 3) {
    return res.status(400).json({ error: 'Username must be 3-50 characters' });
}

18.6.3 Parameterized Queries

The definitive defense against SQL injection is parameterized queries (also called prepared statements). Instead of building SQL strings with user data, you separate the query structure from the data:

// SECURE: Parameterized query
app.get('/api/v2/products', (req, res) => {
    const query = 'SELECT * FROM products WHERE name LIKE $1';
    const searchParam = `%${req.query.search}%`;
    db.query(query, [searchParam]).then(results => res.json(results));
});

The database driver ensures that searchParam is always treated as data, never as SQL code, regardless of its content.

18.6.4 Output Encoding

Output encoding prevents injected data from being interpreted as code when rendered. The encoding must match the output context:

HTML Context:

// Encode for HTML body
function htmlEncode(str) {
    return str.replace(/&/g, '&amp;')
              .replace(/</g, '&lt;')
              .replace(/>/g, '&gt;')
              .replace(/"/g, '&quot;')
              .replace(/'/g, '&#x27;');
}
// <p>Search results for: &lt;script&gt;alert(1)&lt;/script&gt;</p>

JavaScript Context:

// Encode for JavaScript string
function jsEncode(str) {
    return str.replace(/\\/g, '\\\\')
              .replace(/'/g, "\\'")
              .replace(/"/g, '\\"')
              .replace(/\n/g, '\\n')
              .replace(/\r/g, '\\r');
}

URL Context:

// Encode for URL parameter
const safeParam = encodeURIComponent(userInput);
// Turns <script>alert(1)</script> into %3Cscript%3Ealert(1)%3C%2Fscript%3E

CSS Context:

// CSS encoding for dynamic values
function cssEncode(str) {
    return str.replace(/[^a-zA-Z0-9]/g, function(char) {
        return '\\' + char.charCodeAt(0).toString(16) + ' ';
    });
}

18.6.5 Content Security Policy (CSP)

CSP is the most powerful defense against XSS and data exfiltration. It tells the browser which sources of content are legitimate:

Content-Security-Policy:
    default-src 'self';
    script-src 'self' 'nonce-abc123';
    style-src 'self' 'unsafe-inline';
    img-src 'self' data: https://cdn.shopstack.com;
    connect-src 'self' https://api.shopstack.com;
    font-src 'self' https://fonts.gstatic.com;
    frame-src 'none';
    base-uri 'self';
    form-action 'self';
    report-uri /api/csp-report;

CSP Directives for Defense:

Directive Purpose
default-src Fallback for all resource types
script-src JavaScript sources; use nonces or hashes, avoid unsafe-inline
style-src CSS sources
img-src Image sources
connect-src AJAX/WebSocket/EventSource targets
frame-src iframe sources; prevents clickjacking
base-uri Restricts <base> tag; prevents base tag injection
form-action Restricts form submission targets
report-uri / report-to Where to send violation reports

18.6.6 Framework-Level Protections

Modern frameworks provide built-in protections:

React (ShopStack Frontend): - JSX automatically escapes values rendered in templates - dangerouslySetInnerHTML is explicitly named to discourage use - Component-based architecture limits global DOM manipulation

Express.js (ShopStack API): - Helmet middleware adds security headers - express-validator provides input validation - express-rate-limit prevents brute force - csurf provides CSRF protection

// ShopStack security middleware stack
const helmet = require('helmet');
const rateLimit = require('express-rate-limit');
const { body, validationResult } = require('express-validator');

app.use(helmet());
app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));

app.post('/api/v2/auth/login',
    body('username').isEmail().normalizeEmail(),
    body('password').isLength({ min: 8 }),
    (req, res) => {
        const errors = validationResult(req);
        if (!errors.isEmpty()) {
            return res.status(400).json({ errors: errors.array() });
        }
        // ... authentication logic
    }
);

Blue Team Perspective: Defense in depth for web applications means layering protections: WAF at the edge, input validation in the application, parameterized queries at the database, output encoding in the response, and CSP in the browser. No single layer is sufficient. When performing authorized testing, you are looking for gaps where one or more layers are missing.


18.7 Web Application Firewalls (WAF)

Web Application Firewalls sit in front of web applications and filter malicious requests. Understanding WAFs is important for both testers and defenders.

18.7.1 How WAFs Work

WAFs inspect HTTP traffic using:

  1. Signature-Based Detection: Known attack patterns (regex matching)
  2. Anomaly-Based Detection: Deviations from normal traffic patterns
  3. Machine Learning: Behavioral analysis of request patterns
  4. Rate Limiting: Throttling excessive requests from single sources

Common WAFs: - AWS WAF: Integrated with CloudFront and ALB - Cloudflare WAF: CDN-integrated - ModSecurity: Open-source, runs with Apache/nginx - Imperva/Incapsula: Enterprise-grade - F5 Advanced WAF: Hardware/virtual appliance

18.7.2 WAF Detection

Before testing, determine if a WAF is present:

# Wafw00f - WAF detection tool
wafw00f https://shopstack.example.com

# Manual detection - send obvious attack and check response
curl -s "https://shopstack.example.com/search?q=<script>alert(1)</script>" -o /dev/null -w "%{http_code}"
# 403 with custom page = likely WAF

# Check for WAF headers
curl -sI "https://shopstack.example.com" | grep -i "cf-ray\|x-sucuri\|x-cdn\|server.*cloudflare"

18.7.3 WAF Bypass Techniques (For Authorized Testing)

During authorized penetration tests, you may need to test whether the WAF can be bypassed:

  • Case variation: SeLeCt instead of SELECT
  • Encoding: URL encoding, double encoding, Unicode
  • Comment insertion: SEL/**/ECT
  • Alternative syntax: HAVING 1=1 instead of OR 1=1
  • HTTP parameter pollution: Duplicate parameters with different values
  • Content-Type confusion: Sending JSON payload as form data
  • Request smuggling: Exploiting differences between WAF and backend parsing

Important Note: WAF bypass testing must be explicitly authorized in your rules of engagement. Never bypass a WAF on a production system without written permission.


18.8 Web Application Testing Methodology

Bringing everything together, here is a structured methodology for web application testing:

Phase 1: Scope and Authorization

  • Confirm target URLs and IP ranges
  • Review rules of engagement
  • Understand testing windows and restrictions
  • Set up dedicated testing environment and tools

Phase 2: Reconnaissance (This Chapter)

  • Identify technology stack
  • Map application structure
  • Enumerate endpoints and parameters
  • Analyze authentication mechanisms
  • Document trust boundaries

Phase 3: Vulnerability Discovery (Chapters 19-24)

  • Test each OWASP Top 10 category
  • Focus on input handling (injection, XSS)
  • Test access controls systematically
  • Check authentication and session management
  • Review security configuration

Phase 4: Exploitation and Validation

  • Confirm vulnerabilities are exploitable
  • Document proof of concept
  • Assess business impact
  • Chain vulnerabilities where possible

Phase 5: Reporting

  • Document all findings with evidence
  • Provide risk ratings (CVSS)
  • Include remediation recommendations
  • Offer retesting after fixes

Applying the Methodology to ShopStack

For ShopStack, your test plan might prioritize:

  1. Access Control Testing: Can a regular user access admin API endpoints? Can user A view user B's orders?
  2. JWT Analysis: How are tokens generated? Can the signing key be brute-forced? Is the none algorithm accepted?
  3. Injection Points: Search functionality, product reviews, user profile updates, file upload names
  4. Client-Side Attacks: XSS in search results, CSRF on state-changing operations, clickjacking on the checkout page
  5. Business Logic: Can prices be manipulated? Can order quantities be negative? Can discount codes be reused?

Lab Exercise: Set up DVWA (Damn Vulnerable Web Application) in your home lab. Configure Burp Suite to proxy traffic to DVWA. Complete the following exercises at "Low" security level: (1) Browse DVWA with Burp intercepting and examine the Site Map, (2) Identify all cookies and their attributes, (3) Find hidden directories using Gobuster, (4) Document the technology stack using only HTTP headers and source code.


18.9 MedSecure Portal: Web Application Security Considerations

The MedSecure patient portal presents unique web application security challenges due to the sensitivity of healthcare data and regulatory requirements (HIPAA in the US, GDPR for EU patients).

Architecture Differences from ShopStack: - Additional authentication layer (patient identity verification) - Strict session timeouts (auto-logout after 15 minutes of inactivity) - Audit logging of all data access (who viewed which patient record, when) - Role-based access control with multiple roles (patient, nurse, doctor, admin) - Integration with external systems (lab results, pharmacy, insurance)

Testing Priorities: 1. Access Control: Can Patient A view Patient B's records? Can a nurse access admin functions? 2. Session Security: Are sessions properly expired? Can sessions be hijacked? 3. API Security: Are FHIR endpoints properly authenticated? Can bulk data be exported? 4. Data Exposure: Do error messages leak patient data? Are logs sanitized? 5. Integration Security: Are connections to external systems encrypted and authenticated?

The consequences of web application vulnerabilities in healthcare are severe: HIPAA violations can result in fines up to $1.5 million per incident, and breached patient data can never be "unbreached."


18.10 Summary

Web application security fundamentals encompass a broad landscape of knowledge:

  • Architecture: Modern web applications are complex distributed systems with multiple tiers, each presenting attack surfaces. Understanding the architecture is prerequisite to effective testing.
  • OWASP Top 10: The standard taxonomy of web risks, with Broken Access Control now the most prevalent critical vulnerability. Use it as a communication framework and testing baseline.
  • HTTP Protocol: The language of the web. Master it at the raw level---request structure, methods, headers, status codes, cookies---because every attack and defense is expressed in HTTP.
  • Burp Suite: The essential intercepting proxy. Configure it properly, learn its workflows, and extend it with BApp Store plugins.
  • Reconnaissance: Thorough recon finds more vulnerabilities than automated scanning. Map the technology stack, discover endpoints, enumerate parameters, and analyze authentication flows before attacking.
  • Input Validation and Output Encoding: The fundamental defenses against injection and XSS. Allowlisting, parameterized queries, context-aware output encoding, and CSP form the defensive layers.

With this foundation, you are prepared to dive into specific attack categories. Chapter 19 begins with injection attacks---the third most critical OWASP category and historically the most devastating. Chapter 20 explores cross-site scripting and client-side attacks. Both chapters build directly on the HTTP knowledge, Burp Suite skills, and architectural understanding established here.

The key insight of web application security is this: the browser is a universal client that will faithfully execute whatever the server tells it to. The server processes whatever the client sends it. When neither side validates the other's input, vulnerabilities emerge. Your job as an ethical hacker is to find those gaps before malicious actors do.


Chapter 18 References

  1. OWASP Foundation. "OWASP Top 10:2021." https://owasp.org/Top10/
  2. PortSwigger. "Web Security Academy." https://portswigger.net/web-security
  3. Stuttard, D. and Pinto, M. The Web Application Hacker's Handbook, 2nd Edition. Wiley, 2011.
  4. OWASP Foundation. "Application Security Verification Standard (ASVS) 4.0." https://owasp.org/www-project-application-security-verification-standard/
  5. Verizon. "2024 Data Breach Investigations Report." https://www.verizon.com/business/resources/reports/dbir/
  6. Fielding, R. et al. "RFC 9110: HTTP Semantics." IETF, 2022.
  7. OWASP Foundation. "Testing Guide v4.2." https://owasp.org/www-project-web-security-testing-guide/
  8. Mozilla Developer Network. "Content Security Policy." https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP