AI Hallucinations at Work: What They Are and What They Cost

In 2023, a New York attorney submitted a legal brief containing citations to six court cases that did not exist. The cases had been generated by ChatGPT. The attorney had not verified them. The consequences — sanctions, public embarrassment, disciplinary proceedings — were severe and entirely preventable.

That story circulated widely. Most professionals read it, nodded, and moved on. What they did not do was change their behavior, because they assumed the lesson was "do not use AI for legal citations." The actual lesson is broader and more important than that.

The Nature of the Problem

AI language models do not retrieve facts from a database. They generate text that is statistically likely to be correct based on their training data. This means they can produce output that is fluent, confident, structurally plausible, and completely fabricated — and they will do so without any signal that something is wrong.

This is what professionals call a hallucination: output that is presented with full confidence and is factually incorrect. The term is imprecise but the phenomenon is real and consequential.

The professional risk is not that AI will obviously fail. It is that AI will fail in ways that are difficult to detect without domain expertise and careful verification.

Where Hallucinations Are Most Likely

Not all AI tasks carry equal hallucination risk. Understanding the risk distribution helps you allocate your verification effort appropriately.

High risk: Specific facts, statistics, dates, names, citations, legal or regulatory references, technical specifications, medical information, financial data. Any claim that depends on a specific piece of information being accurate.

Moderate risk: Summaries of documents you have provided (AI can misread or misrepresent), analysis that requires accurate data as inputs, historical context, claims about how specific organizations or policies work.

Lower risk: Structural tasks (organizing information you have provided), stylistic tasks (rewriting for tone), brainstorming (where accuracy is not the point), tasks where you can evaluate the output from your own expertise.

Notice that lower risk does not mean no risk. It means the risk is more manageable with normal professional review.

The Three Categories of Consequential Error

Fabricated specifics. Made-up statistics, citations, studies, names, or data points that sound legitimate. These are the classic hallucinations and the easiest to miss because they are indistinguishable from real ones in the output.

Plausible distortions. Real information that has been slightly altered — a real statistic with the wrong year, a real policy with a key detail changed, a real person's position described inaccurately. These are harder to catch than pure fabrications because part of the information is correct.

Confident extrapolations. AI filling in gaps with what seems likely, presented as fact rather than inference. This is particularly common when you ask AI to analyze or explain something it has incomplete information about.

A Practical Verification Standard

The goal is not to verify everything — that would eliminate most of the efficiency benefit. The goal is to verify claims that would cause professional damage if wrong.

Before any AI-assisted work leaves your desk, ask: which specific claims in this output, if wrong, would embarrass me, harm someone else, or create legal or professional liability? Those claims require independent verification. Everything else can be reviewed at your normal professional standard.

This is not a new skill. It is the same judgment you apply when reviewing any work product. AI simply adds a new category of risk that requires explicit attention: the possibility that something sounds completely authoritative and is completely invented.

The attorney in 2023 had the skills to verify citations. What he lacked was the discipline to apply them to AI-generated output. That discipline is the professional standard the current moment demands.