Case Study 2 — The Citation That Didn't Exist

DataField.Dev

Case Study 2 — The Citation That Didn't Exist

A composite, fictional-but-realistic scenario reflecting a now-common failure mode. The people are invented; the phenomenon — AI tools generating references to papers that were never written — is real and documented.

The situation

Marcus Webb is a junior data analyst writing his first white paper for the company's engineering blog: an argument that the team should move some processing to edge devices to cut latency. He's a strong writer but new to formal sourcing, and he's on a tight deadline. To speed things up, he asks an AI assistant to "write a paragraph arguing that edge computing reduces latency for IoT applications, with citations to support it."

The tool returns a fluent, confident paragraph. It even includes three references, formatted in clean IEEE style:

PARAGRAPH (AI-generated):
"Edge computing significantly reduces latency for IoT applications by
processing data closer to its source, avoiding the round-trip to
centralized cloud servers. Studies have demonstrated latency reductions
of up to 80% in real-time analytics workloads [1], with particularly
strong gains in autonomous-vehicle and industrial-sensor deployments
[2], [3]."

REFERENCES (AI-generated):
[1] J. Martinez and S. Kapoor, "Latency optimization in edge-computing
    architectures for IoT," IEEE Internet of Things Journal, vol. 7,
    no. 4, pp. 2891–2905, 2020.
[2] R. Thompson et al., "Real-time edge analytics for autonomous
    systems," in Proc. IEEE Int. Conf. Edge Computing (EDGE), 2019,
    pp. 144–152.
[3] L. Chen and M. Okafor, "Industrial IoT and the latency advantage of
    distributed processing," ACM Trans. Internet Technol., vol. 21,
    no. 3, 2021.

Marcus is impressed. The paragraph reads better than his draft, the claim is one he believes, and the references look completely real — plausible authors, real-sounding journals (IEEE Internet of Things Journal and ACM Transactions on Internet Technology both exist), proper volume/issue/page formatting. He's about to paste the whole thing into his white paper.

What almost went wrong

His mentor, reviewing the draft over his shoulder, stops him with a single question: "Have you read any of those three papers?"

He hasn't. He's never seen them. The AI produced them.

The mentor pulls up a library database and they search for reference [1] — "Latency optimization in edge-computing architectures for IoT" by Martinez and Kapoor, supposedly in the IEEE Internet of Things Journal, volume 7, 2020. It doesn't exist. No paper by that title, by those authors, in that journal, that year. They check [2] and [3]. Same result: real-looking, real-formatted, completely fabricated. The journals are real; the papers are not. The "80% latency reduction" figure traces to nothing at all.

This is the AI fabrication problem (§11.6), and it's exactly as dangerous as it looks. Language models generate text that is statistically plausible, and a plausible citation — by construction — has real-sounding authors, a real journal, and correct formatting. The model isn't looking anything up; it's producing the shape of a citation. Had Marcus pasted that paragraph in, he would have committed citation fabrication under his own name — one of the gravest integrity violations there is — without any intent to deceive. He'd have published a specific statistic with no source and three references to papers that don't exist. The first reader to try to follow [1] would have found nothing, and Marcus's credibility, and the company's, would have taken the hit.

The fix

The fix is not "never use AI." Marcus's mentor was clear about that — the tool had genuinely helped him see a cleaner way to structure the argument. The fix is the verification rule (§11.6): every fact and every citation an AI produces must be independently verified before it enters a document you'll put your name on.

So Marcus did the real work the tool had skipped:

He verified the citations — and discarded them. All three were fabricated, so all three were deleted. Fabricated references aren't "close enough to fix"; they're nothing.
He found real sources. He searched the actual literature for evidence that edge computing reduces latency, read what he found, and located two genuine, citable sources that supported the claim — at the magnitude the real papers reported, not the AI's invented "80%."
He corrected the claim to match the real evidence. The honest number was more modest and more conditional than the AI's confident "up to 80%," so he wrote what the sources actually supported, with the conditions.
He kept the AI's structural help, in his own words. The clearer paragraph shape the tool suggested was fine to use; he rewrote the prose himself, around real sources.
He disclosed. Because his company's policy required it, he added a one-line note that AI assistance was used for drafting and structure, with all facts and sources independently verified.

❌ Before (AI-generated, unverified): "Studies have demonstrated latency reductions of up to 80% in real-time analytics workloads [1]…" — citing three fabricated papers and a sourceless statistic.

✅ After (verified, honest): "Edge processing can cut round-trip latency by moving computation closer to the data source; measured reductions vary widely by workload and network conditions, with [real source A] reporting [actual figure] for [actual scenario] and [real source B] [actual figure] (real citations, verified)."

Why it works: Every claim now traces to a source Marcus actually read, the magnitude matches the real evidence instead of an invented number, and the fabricated references are gone. The AI's contribution was reduced to what it's actually good at — suggesting a structure — while the facts, sources, and accountability are Marcus's, which is exactly where they belong.

The takeaways

AI fabricates citations that look perfectly real. Plausible authors, real journals, correct formatting — none of it is evidence the paper exists. Format is cosmetic; existence is what matters.
Verify every AI-suggested source before it enters your document. Find the actual paper, confirm it says what you're citing it for. If you can't find it, it probably isn't real.
The fluency is the trap. AI prose sounds authoritative, and a sourceless "80%" sounds like a finding. Confident phrasing is not verification.
Use AI for what it's good at, own what it can't do. Structure and rephrasing: fine, verify and rewrite. Facts, sources, and accountability: yours, non-delegably. If you can't evaluate the output, you can't put your name on it.

One sentence to remember: A citation you didn't verify is a citation you didn't make — and if the AI invented it, it's a fabrication with your name on it.