Case Study 10.2: Exposed Databases — When MongoDB and Elasticsearch Meet Shodan

DataField.Dev

Case Study 10.2: Exposed Databases — When MongoDB and Elasticsearch Meet Shodan

"There is no such thing as a misconfigured database that no one will find. Shodan and Censys guarantee that every exposed service will be discovered — usually within hours." — Victor Gevers, GDI Foundation

The Scale of the Problem

Between 2017 and 2023, security researchers discovered tens of thousands of MongoDB, Elasticsearch, Cassandra, Redis, and CouchDB instances exposed to the public Internet with no authentication. These databases contained billions of records — personal information, medical records, financial data, login credentials, corporate emails, and government documents — all accessible to anyone who knew how to look.

The common denominator in every case was the same: these databases were installed with their default configuration, which often binds to all network interfaces (0.0.0.0) without requiring authentication. Combined with search engines that continuously scan the Internet — Shodan, Censys, BinaryEdge, and ZoomEye — these exposed databases became trivially discoverable.

This case study examines how enumeration tools designed for legitimate security research became the mechanism for discovering (and in many cases, exploiting) some of the largest data exposures in history.

Shodan and Censys: The Search Engines for Internet-Connected Devices

What They Are

Shodan, created by John Matherly in 2009, continuously scans the entire IPv4 address space and indexes the banners and responses from discovered services. It has been described as "the search engine for the Internet of Things" and "the most dangerous search engine in the world."

Censys, developed by researchers at the University of Michigan (including Zakir Durumeric, the creator of ZMap), performs similar Internet-wide scanning but focuses on TLS certificates, HTTP responses, and structured data about Internet services.

Both tools perform what Chapter 10 covers at massive scale: port scanning, banner grabbing, service identification, and enumeration — across the entire Internet, continuously.

How They Work

Shodan's scanning infrastructure sends packets to every IPv4 address on common ports (HTTP, HTTPS, FTP, SSH, Telnet, SNMP, SIP, and dozens more). When a service responds, Shodan records the banner, HTTP headers, SSL certificate information, and other metadata. This data is then indexed and made searchable.

Example Shodan queries that reveal exposed databases:

# MongoDB instances with no authentication
product:"MongoDB" port:27017

# Elasticsearch clusters
product:"Elastic" port:9200

# Redis instances
product:"Redis" port:6379

# CouchDB
product:"CouchDB" port:5984

# Cassandra
product:"Cassandra" port:9042

Censys provides similar capabilities with a different query syntax:

# Elasticsearch on Censys
services.service_name: "ELASTICSEARCH"

# MongoDB
services.port: 27017 AND services.banner: "MongoDB"

⚖️ Legal and Ethical Note: Shodan and Censys are legitimate research tools used by security professionals, researchers, and organizations to understand their own exposure. Using them to discover exposed databases is legal. Accessing data within those databases without authorization is not. The distinction between discovery and access is legally critical.

The MongoDB Apocalypse (2017)

Timeline of Events

In late December 2016, security researcher Victor Gevers of the GDI Foundation identified approximately 200 MongoDB instances that had been compromised by an attacker who replaced their contents with a ransom demand. The attacker, using the handle "harak1r1," had found open MongoDB instances, exported their data, deleted it from the server, and left a message demanding 0.2 Bitcoin for its return.

By January 2017, the situation had escalated dramatically:

Date	Exposed Instances	Ransom Groups Active
Dec 27, 2016	~200 compromised	1
Jan 3, 2017	~2,000 compromised	3
Jan 6, 2017	~10,000 compromised	5
Jan 9, 2017	~28,000 compromised	15+
Jan 12, 2017	~34,000 compromised	20+

By mid-January, over 34,000 MongoDB databases had been ransomed, and the attacks had spread to Elasticsearch, CouchDB, Hadoop, and Cassandra instances.

Why So Many Databases Were Exposed

The root cause was a combination of insecure defaults and cloud deployment patterns:

MongoDB's default configuration: Prior to version 2.6 (released in 2014), MongoDB's default configuration bound to 0.0.0.0 (all interfaces) with no authentication enabled. Even after this was changed, many deployments used older versions or explicitly overrode the secure defaults.

Cloud deployment without firewalls: Organizations deploying MongoDB on cloud infrastructure (AWS, Azure, GCP) often launched instances without configuring security groups or network access control lists. The database was immediately reachable from the Internet.

Docker and automation: Automated deployment scripts frequently used configurations that exposed services externally for developer convenience but were never secured for production.

No authentication by default: MongoDB did not require authentication unless explicitly configured. An administrator who installed MongoDB and started populating it with data might not realize that anyone on the Internet could connect.

How Attackers Found Them

The attack chain was straightforward:

Discovery via Shodan: Search for product:"MongoDB" port:27017 — returns thousands of results with IP addresses, banners, and database metadata.
Connection: mongo --host <target_ip> — no credentials needed.
Enumeration: show dbs; use <database>; show collections; db.<collection>.count() — reveals all data.
Exfiltration: mongodump --host <target_ip> — exports entire database.
Ransoming: Drop all collections, insert ransom note demanding Bitcoin.

🔴 Impact: Researchers estimated that over 680 terabytes of data were exposed across the affected MongoDB instances. Many organizations did not have backups of their only copy of the data, meaning the ransom was their only hope of recovery — and many attackers never actually backed up the data before deleting it.

The Elasticsearch Data Exposures

Pattern of Discovery

Elasticsearch, a popular search and analytics engine, suffered from similar exposure problems. Between 2018 and 2022, security researchers discovered hundreds of major data exposures via Elasticsearch instances accessible without authentication:

Notable incidents:

Exactis (June 2018): Security researcher Vinny Troia discovered an Elasticsearch instance containing approximately 340 million records on nearly every American adult — including names, addresses, phone numbers, email addresses, interests, ages, children's ages, and religions. The 2 TB database was accessible without any authentication.

First American Financial (May 2019): An exposed Elasticsearch-powered web application leaked 885 million records of sensitive financial documents — Social Security numbers, bank account numbers, mortgage records, tax documents, and wire transfer receipts dating back to 2003.

Facebook (September 2019): Over 419 million Facebook user records were found in an exposed Elasticsearch database, including phone numbers, Facebook IDs, and in some cases names and locations.

Decathlon (February 2020): An Elasticsearch server exposed 123 million records containing employee data for the sporting goods retailer, including Social Security equivalents, usernames, passwords, API logs, and personal information.

Microsoft (December 2019): A misconfigured Elasticsearch server exposed 250 million Microsoft customer service records, including email addresses, IP addresses, and case descriptions.

The Discovery Process

In most cases, the discovery followed a predictable pattern:

Researcher queries Shodan/Censys for Elasticsearch instances (port 9200).
Researcher connects to the Elasticsearch API: curl http://<target>:9200
Cluster info returned without authentication: cluster name, version, node count.
Index enumeration: curl http://<target>:9200/_cat/indices?v — reveals all indices and document counts.
Data sampling: curl http://<target>:9200/<index>/_search?pretty&size=10 — returns actual documents.
Scale assessment: Researcher determines total record count and data sensitivity.
Responsible disclosure: Researcher contacts the organization (sometimes directly, sometimes through CERTs or platforms like GDI Foundation or Open Bug Bounty).

Why Elasticsearch Gets Exposed

Elasticsearch, like MongoDB, was designed for internal use within trusted networks. Its default configuration:

Listens on all interfaces (0.0.0.0)
Requires no authentication (the security plugin, X-Pack Security, is a paid add-on in older versions)
Returns detailed API responses to any request
Has a REST API that is trivially easy to query with curl

When developers deploy Elasticsearch on cloud infrastructure without configuring network security, it becomes an open book on the Internet.

Defensive Lessons

For Database Administrators

Never expose database ports to the Internet. Use firewalls, security groups, and network segmentation to restrict access to authorized IP addresses only.
Always enable authentication. Even for "internal" databases, enforce username/password or certificate-based authentication.
Change default ports as a defense-in-depth measure (though not a substitute for proper security).
Monitor Shodan/Censys for your organization. Regularly search for your IP ranges and domain names to detect accidental exposures.
Use infrastructure-as-code with security defaults baked in, rather than relying on manual configuration.

For Penetration Testers

This case study illustrates the immense value of enumeration. During authorized assessments:

Check Shodan/Censys for the client's external IP ranges to identify exposed services that may have been overlooked.
Scan for database ports (27017, 9200, 6379, 5984, 9042, 5432, 3306) on all in-scope networks.
Attempt unauthenticated connections to any discovered database services.
Document the data exposure — not just that the database is accessible, but what types of data it contains and the potential business impact.

💡 MedSecure Scenario: During the MedSecure external assessment, a Shodan search reveals an Elasticsearch instance on port 9200 associated with one of MedSecure's development servers. The instance is accessible without authentication and contains a development copy of patient records. This is classified as a critical finding — it represents both a HIPAA violation and a patient privacy breach.

The Ongoing Challenge

Despite years of high-profile incidents, the problem persists. As of 2024, Shodan queries for exposed MongoDB and Elasticsearch instances still return tens of thousands of results. The reasons are structural:

Cloud makes deployment easy — but security is not automatic.
Default configurations prioritize convenience over security.
Rapid development cycles leave security as an afterthought.
Lack of monitoring means exposures go undetected for months or years.

Internet-wide scanning tools like Shodan and Censys have made the Internet more transparent. This transparency is a double-edged sword: it enables researchers to find and report exposures, but it also enables attackers to find and exploit them. The race between discovery and remediation continues.

Discussion Questions

Shodan and Censys make Internet-wide scanning data available to anyone. Should access to these tools be restricted? What would be the consequences of restricting versus not restricting access?
Many of the exposed databases were discovered by independent security researchers who then attempted responsible disclosure. However, some organizations responded with legal threats rather than gratitude. How should the security community handle this tension?
Should database vendors be held liable for default configurations that lead to data exposure? Should cloud providers bear responsibility for not flagging obviously exposed services?
If you discovered an exposed database containing medical records during an authorized penetration test, but the database belongs to a different organization than your client, what would you do?
How does the pattern of MongoDB/Elasticsearch exposure relate to the broader concept of "shifting security left" in the development lifecycle?

Key Takeaways

Default configurations kill security. MongoDB and Elasticsearch both defaulted to binding on all interfaces without authentication, directly leading to billions of exposed records.
Shodan and Censys transform Internet-wide scanning from a research curiosity into a practical security tool. Any exposed service will be indexed and discoverable within hours.
Enumeration reveals data exposure. The ability to connect to a database and enumerate its contents is the same skill used by both researchers and attackers.
Cloud deployment amplifies risk when security controls are not configured alongside infrastructure.
Continuous monitoring of your external attack surface (including Shodan/Censys monitoring) is essential for any organization.