05

Evaluating AI Output Critically

Reading carefully, verifying claims, and staying in control of your thinking

If you ask a room full of people to describe the character Rich Uncle Pennybags — the mascot of the board game Monopoly — many will confidently tell you about his top hat, his white mustache, and his monocle. They can picture that monocle clearly. They might even mimic the gesture of him adjusting it. But Rich Uncle Pennybags has never worn a monocle.

This is the Mandela Effect: a phenomenon where large groups of people share a specific, vivid, but completely false memory. We see it everywhere. People swear Darth Vader said, “Luke, I am your father,” when the line is actually, “No, I am your father.” Millions of people remember the children’s book characters as the Berenstein Bears, even though they have always been the Berenstain Bears. For decades, these were just quirks of human psychology — odd glitches in how our brains reconstruct the past. But today, we are building a technology that can turn these glitches into a permanent, automated reality.

Imagine a scenario where a user generates a hyper-realistic AI video of a 1990s cartoon — let’s say, a clip showing the Monopoly Man adjusting that non-existent monocle. They post it online with the caption: “Finally found the lost episode! I knew I wasn’t crazy.” Within hours, the video goes viral. Thousands of people, validated in their false memory, share it. Then, the second layer of AI kicks in. Automated content farms — websites run entirely by artificial intelligence — scrape the trending topic. They churn out hundreds of articles with headlines like “The Monocle Mystery Solved: Lost Footage Resurfaces.”

Now, the loop closes. The next time you ask an AI search engine, “Did the Monopoly Man wear a monocle?” it scans those new articles. It sees the “evidence.” It sees the “consensus.” And it confidently answers: “Yes, while widely debated, footage from 1992 confirms he wore a monocle in early editions.” We can classify this loop as a . It is what happens when an AI model confuses popularity with truth. In this loop, there is no “Ministry of Truth” rewriting history; there is just a feedback cycle where AI output becomes AI training data, validating a lie until it becomes indistinguishable from a fact.

The risk is so potent that the term we are using here — consensus hallucination — was actually coined by the AI during the drafting of this book, an incident we will analyze later as a case study in critical verification. The model presented the phrase as established academic theory, illustrating exactly how persuasive these fabrications can be.

The greater risk, however, is often quieter than a viral hoax. Because AI models are trained on vast amounts of human communication, they do not just learn facts; they inherit our cognitive shortcuts and informal fallacies. They frequently confuse popularity with truth. They defer to authority figures without question. They prioritize plausible-sounding arguments over rigorous evidence. In short, the machine shares our bad habits. To use it well, you cannot simply check its math; you must be ready to audit its reasoning, recognizing that its output often reflects the same cognitive biases that plague human debate.

By the end of this chapter, you should be able to:

  1. Differentiate between “hard hallucinations” (factual errors) and “soft hallucinations” (plausible but biased consensus).
  2. Identify specific informal logical fallacies in AI outputs and compare how they manifest differently in machine generation versus human writing.
  3. Articulate the role of the “Human-in-the-Loop” as the necessary arbiter of logic and truth.
  4. Analyze how the “Black Box” nature of AI complicates the traditional evaluation of evidence and sourcing.
  5. Develop a personal verification toolkit, including counter-argument prompts and checklists, to audit AI reasoning for accuracy and bias.

From Logical Fallacies to Hallucinations

If you have taken a composition, speech, or philosophy course, you may already be familiar with the tools used to evaluate human arguments. However, if these concepts are new to you, they are essential for understanding how your digital partner “thinks.”

To evaluate information effectively, we must look for two specific types of human error:

  • : These are structural errors in an argument. They occur when the reasoning itself is broken, even if the facts might be true. For example, the Post Hoc fallacy assumes that because Event B happened after Event A, Event A must have caused it.

  • : These are psychological shortcuts our brains take to process information quickly. They are not errors of logic, but errors of perception. For example, Confirmation Bias is our tendency to notice and trust information that supports what we already believe while ignoring evidence that contradicts it.

We often assume that computers, being mathematical systems, are immune to these messy human problems. We expect AI to be cold, objective, and purely logical. But this is a dangerous misconception.

Generative AI models are not databases of truth; they are prediction engines trained on vast amounts of human language. They have read our books, our articles, our internet forums, and our debates. Because they learn from us, they do not just learn our facts — they inherit our flaws.

When an AI model generates a “hallucination” — a confident but false statement — it is often not a random glitch. Instead, the AI is successfully replicating a pattern of Logical Fallacy or Cognitive Bias present in its training data. If humans frequently use a specific fallacy to argue a point, the AI learns that fallacy as a “standard” way to construct an argument. If humans historically display bias against a certain group in the training texts, the AI will statistically predict that bias as the “correct” context.

In this sense, an AI hallucination is often just a high-tech mirror. It reflects the illogical or biased patterns of human thought back to us, stripped of the human context but presented with the machine’s absolute confidence. To catch these errors, you cannot simply look for “computer glitches.” You must learn to spot the very human failures that the machine is automating.


Hard and Soft Hallucinations

When we talk about “hallucinations” in AI, it’s tempting to treat them all as one category — a machine making something up. But just as human errors differ depending on why the reasoning went wrong, AI errors fall into two broad types: and . These labels mirror the academic distinction between intrinsic and extrinsic hallucinations discussed in research on large language models.

A hard hallucination happens when the AI states something that can be shown to be false. It contradicts reality, established knowledge, or the internal logic of its own previous statements. This is the machine equivalent of a human argument built on a false premise — the kind of mistake that undermines everything that follows.

Hard hallucinations map closely onto formal logical fallacies, including:

  • Non Sequitur: The AI draws a conclusion that doesn’t follow from its own statements (“X is bigger than Y; Y is bigger than Z; therefore Z is bigger than X”).
  • Straw Man-like distortions: The model misreads or misrepresents the text you provided, summarizing it as if you said something else.
  • False Premise: The model confidently treats outdated or incorrect information as true — such as relying on pre-2024 political information because that’s where its training cut off.

These errors are often easier to spot because they clash with what you already know. Students typically catch them the same way they catch a flawed argument in an essay: by noticing that the reasoning simply doesn’t hold.

Soft hallucinations are trickier. The AI isn’t making an obviously false claim; instead, it fills gaps by reproducing patterns it learned from human writing. These patterns often come from places where human reasoning is biased, sloppy, or incomplete. In academic terms, these resemble extrinsic hallucinations — outputs that can’t be verified because they were never grounded in the input.

Soft hallucinations directly mirror informal fallacies and cognitive biases, including:

  • Confirmation Bias: The AI echoes the user’s assumptions because that’s the statistically “safest” prediction.
  • Appeal to Popularity (Ad Populum): If a misconception appears frequently in its training data, the model treats it as likely true.
  • False Authority: When generating citations or “expert commentary,” the model imitates the form of expertise without the substance, creating sources that sound legitimate but have no basis in reality.
  • Anchoring: The AI latches onto the first detail in the conversation and builds outward, even if that first detail was flawed.

Unlike hard hallucinations, these errors don’t usually sound wrong. They sound reasonable, plausible, and sometimes very persuasive — which is why they are the ones that slip past students most often. They reflect the same cognitive shortcuts humans use when we argue from intuition rather than evidence.

If hard hallucinations are the AI’s version of breaking the rules of logic, soft hallucinations are its version of inheriting our bad habits. Recognizing the difference helps you know what to look for:

  • Hard errors require fact-checking.
  • Soft errors require reason-checking — treating the AI’s argument the same way you’d evaluate a peer’s draft in a rhetoric class.

Taken together, these patterns show why hallucinations are not always loud, obvious mistakes. Some errors break the logic entirely, but others distort meaning in quieter ways — by presenting accurate facts with an exaggerated sense of certainty, or by leaning too heavily on whatever interpretation seems most common in its training data. This is where something like a consensus hallucination emerges: the AI repeats correct information but frames it as universally settled, even when the real-world landscape is more complicated. AI models are also trained to be agreeable. Their bias towards following your lead and reinforcing your assumptions can amplify your own fallacies and biases. In the end, hard and soft hallucinations work together to create a polished surface that feels reliable, even when its reasoning deserves a closer look.


Common Fallacies and Biases — and Their AI Analogues


Where These Errors Come From

If hallucinations can take many forms, it helps to understand where they begin. They usually arise from two places: the structure of the model itself and the way the user interacts with it. Each contributes differently, and sometimes they reinforce each other in ways that make an incorrect answer look polished and convincing.

The Model: What It Learns, How It Learns It

Large language models learn in two major stages. First, they absorb patterns from enormous collections of text drawn from books, articles, websites, and other public sources. Second, they undergo . In RLHF, human reviewers rate early model outputs for qualities like helpfulness, clarity, politeness, and safety. Over many rounds, the model learns which kinds of responses humans tend to reward.

RLHF improves many things — tone, structure, safety — but it can also amplify certain ideas simply because they are familiar, not because they are correct. Consider a narrative example from psychology.

And RLHF isn’t the only source of distortion. Other pressures push the model toward generating incorrect information:

  1. Gaps in Training Data — If the model has little or no information about a specific question, it often tries to fill the silence with something that sounds like an answer.
  2. Pressure for Coherence — The model is built to produce a smooth, readable paragraph — even when it lacks enough evidence to do so.
  3. Blending of Similar Facts — Because models work by predicting the next likely word, they sometimes merge details from different events, people, or sources into a single, confident-sounding claim.
  4. Statistical Guessing — When no strong pattern exists, the model “guesses” based on the most statistically probable phrasing it has seen in similar contexts.
  5. Overrepresentation of Certain Ideas — Common, popular, and frequently linked explanations appear far more often in training data than niche or emerging research.

These forces create an environment where the model can present a fabricated or oversimplified explanation with complete confidence — not because it is trying to deceive, but because it is trying to complete a pattern.

The User: How Our Thinking Shapes AI Thinking

Even if the model were flawless, the would still matter. HITL refers to the idea that people stay actively involved in evaluating, guiding, and adjusting AI output. We’re not just recipients; we are continual editors.

But humans come with their own cognitive shortcuts. If a user prompts the model with a mistaken assumption, the AI often follows that path because it interprets the assumption as intentional. If a user frames a question in highly emotional or one-sided language, the model mirrors that energy.

This is amplified by the fact that the model is trained — through RLHF — to be agreeable. It aims to help. It aims to support. It aims to sound cooperative. So when users begin with flawed premises, the model often carries those premises forward rather than questioning them.

This dynamic has appeared in real-world cases of AI-driven delusions, where individuals with vulnerable thinking patterns prompted models with false or fearful beliefs, and the AI unintentionally reinforced those beliefs by treating them as conversation norms. Even outside clinical contexts, this tendency shows up whenever a model “goes along” with flawed reasoning just to be polite or helpful.

HITL is both a safeguard and a risk. It works when users remain skeptical, curious, and willing to challenge the model’s claims. It fails when users assume that fluent writing equals sound reasoning or let their own assumptions steer the model. Used well, HITL allows AI to become a collaborative partner — one that sharpens arguments, challenges gaps, and helps uncover missing context. Used poorly, it turns the model into an efficient amplifier of our own biases.


Evaluate, Verify, Revise

The habits introduced in this chapter — close reading, lateral reading, and recognizing fallacies and biases — are only as useful as the practice you put behind them. This activity walks you through one complete cycle: from writing a prompt, to evaluating the output critically, to verifying a claim with an outside source, and revising your approach based on what you find.

ExerciseEvaluate, Verify, Revise: A Critical AI Output Check

Evaluate, Verify, Revise

Set a goal, write a prompt, run it with AI, evaluate the output critically, verify a claim with an outside source, revise your approach, and reflect. Estimated time: 20–30 minutes.

Step 1 of 6 How To Use

How to use this activity

Before you begin

Use a real task from a course you're in This activity works best when your prompt connects to actual coursework — an essay topic, a concept you're studying, or a question you need to understand.
Write your own prompt — no template required You have full control over wording, scope, and framing. Your goal from Step 1 will carry forward as a starting point that you can edit freely.
Evaluate and verify the output Using close reading and lateral reading, you'll inspect the AI's response for accuracy, reasoning quality, and bias — then check a claim against an outside source.
Revise, run again, and compare After evaluating, you'll rewrite your prompt with intention — then run it with AI again and compare the two results side by side.
Your work stays on your screen only Nothing you type is saved to a server. Use Build My Summary at the end to print or copy everything before closing the page.

Set Your Goal

Define what you want the AI to do before you prompt

Why this step matters

Strong AI use starts with a clear goal. When you know what you want to accomplish before you write a prompt, you can evaluate whether the AI actually helped — and revise with purpose when it doesn't.

Think T.A.G. — what's the Task, who's the Audience (you, a classmate, a specific level of knowledge), and what's the Goal? Write 2–4 sentences.

Example goals — for inspiration

English / WritingGenerate possible thesis statements for an essay on social media and civic engagement
U.S. GovernmentExplain a Supreme Court case in plain language and compare majority vs. dissent reasoning
HistoryCreate a study guide for the causes of World War I with categories and examples
BiologyExplain photosynthesis for a first-year student and compare it to cellular respiration
PsychologySummarize one theory of memory and give a real-life example
SociologyIdentify possible variables in a study about social class and educational attainment
Want a more structured approach to prompting?

The Prompt Lab in Chapter 3 walks you through two frameworks — T.A.G. and Five Moves — that can help you build a stronger starting prompt. Come back here once you have your goal ready.

Write Your Prompt

Write, run, and record your original prompt

Your starting point

Your goal from Step 1 has been carried forward below. Edit it into a prompt you're ready to test — or write something entirely new. This is your original prompt; you'll evaluate and revise it in the steps that follow.

Run Your Original Prompt

Your goal has been pre-filled below as a starting point. Edit it into the prompt you want to test, then click Run My Prompt.

Your AI response will appear here after you click Run My Prompt.

Automatically recorded when you click Run My Prompt. Read-only.

What was your first impression of the output? (1–2 sentences)

Evaluate the Output

Close reading · Lateral reading

You'll evaluate the AI's response using two habits. Close reading means inspecting the response from the inside — checking logic, accuracy, and internal consistency. Lateral reading means stepping outside the response to test a specific claim using an independent source.

Part A — Close Reading

Questions to ask before you write

  • Does the response actually answer my prompt?
  • Is anything factually wrong, or likely wrong?
  • Does it contradict itself anywhere?
  • Does it make a claim without evidence or explanation?
  • Does it sound confident even where it seems vague or uncertain?
  • Does it leave out context that changes the meaning?
  • Do I notice any bias, loaded wording, or one-sided framing?
  • Do I notice a logical fallacy — false cause, straw man, or false equivalence?

A polished tone does not always mean strong reasoning. Read for the argument, not the style.

Identify at least one error, reasoning issue, missing context, or bias problem. Explain where it appears and why it matters.

Part B — Lateral Reading

Questions to ask before you write

  • What is one specific claim in the AI response worth checking?
  • Can I find an outside source that addresses that claim?
  • Is my source credible — textbook, government site, scholarly article, reputable news?
  • Does the outside source confirm, complicate, or contradict the AI?
  • Is the AI missing context that the outside source includes?
  • Could the AI be using outdated information?
  • If the AI and source differ — is it oversimplification, bias, or missing context?

Paste a URL to the source you used. Credible sources include textbooks, government sites, scholarly articles, and established news organizations.

Explain: (1) what claim you checked, (2) what your source says about it, (3) whether the AI and source align or conflict, and (4) what might explain any disconnect — missing context, outdated information, or oversimplification.

Revise Your Prompt

Apply what you learned — then compare results

Use what you learned from your evaluation to write a better prompt. Small changes in wording, scope, framing, or context can produce very different results. This step is about applying evidence from your close reading and lateral reading — not just rewriting for the sake of it.

Part A — Reflect on Your Original Prompt

Your original prompt — carried forward from Step 2

Complete Step 2 first — run your original prompt to capture it here.

Look back at your original prompt with fresh eyes. What choices — in wording, structure, or framing — may have contributed to the problems you identified in Step 3?

Part B — Write and Test Your Revised Prompt
Run Your Revised Prompt

Apply what you learned. Write a revised prompt that addresses the issues you identified, then click Run Revised Prompt.

Your AI response will appear here after you click Run Revised Prompt.

Automatically recorded when you click Run Revised Prompt. Read-only.

There is no penalty if the second version isn't better. The goal is to notice how prompt choices affect results. Be honest about what improved, what didn't, and why.

Reflect

Step back and draw a conclusion from the full cycle

You've completed one full cycle — setting a goal, writing a prompt, running it with AI, evaluating the output critically, verifying a claim with an outside source, revising with purpose, and comparing results. This final step asks you to step back and reflect on what the experience taught you.

The Full Cycle
  • Goal setting — defining what you wanted the AI to do before prompting
  • Prompt writing — making deliberate choices about wording, scope, and framing
  • Close reading — inspecting the output for reasoning, accuracy, and bias
  • Lateral reading — verifying a claim with an independent source
  • Revision — rewriting the prompt based on what you learned
  • Comparison — evaluating whether your revision made a difference

This could be about AI output, your prompting habits, the limits of AI tools, or the role of verification. Write at least 3–5 sentences.

Describe at least one specific habit or practice you plan to carry forward.

Summary

Your Completed Activity

Ready to save or share.

Name
Course / Context
Goal Statement
My Original Prompt
First Impression
Close Reading Analysis
Outside Source
Lateral Reading Reflection
Wording Reflection
My Revised Prompt
Original vs. Revised — Comparison
Most Important Learning
Transfer — How I Will Apply This
To save as PDF Click Print / Save as PDF, then in the print dialog choose Save as PDF (or “Microsoft Print to PDF” on Windows). The activity prints cleanly without navigation.

Wrapping Up: Reading With New Eyes

Working with AI requires many of the same skills you already use when evaluating human writing, but it also asks you to sharpen them. In your composition or philosophy classes, you learned how to spot weak arguments, missing evidence, or leaps in logic. Those same skills matter here, but the context shifts. With AI, reasoning errors often arrive wrapped in fluent, confident language that can make an unsteady idea feel sturdy.

To be your best as the human-in-the-loop, there are two essential habits: and .

Close reading is the skill you already know. It involves reading a text carefully and slowly, looking for contradictions, gaps, or assumptions hiding between the lines. When working with AI, close reading helps you catch hard hallucinations — the moments where the model breaks its own logic or contradicts the information you provided. They can be subtle, especially when the writing is polished, so slowing down to check the internal structure matters.

Lateral reading asks you to step outside the text entirely. It means opening another source, checking a claim through an independent reference, or verifying an idea against outside evidence. Journalists use this method, as do historians and scientists. For AI users, it is the best way to catch soft hallucinations — the plausible but misleading claims that sound correct because they echo what is common, popular, or comforting rather than what is true.

A good example comes from the writing of this book. When we were drafting this chapter, the model suggested the term Consensus Hallucination and presented it as if it were an established academic phrase. It fit the conversation so well that it felt real. But something about it seemed too perfect. I stepped out of the chat, checked external sources, and discovered that the term did not exist in the literature. The AI had generated a scholarly-sounding concept from thin air. Without that quick check, the phrase could have slipped into the text as fact instead of invention.

That moment became a teaching tool. The term was useful, so we kept it, but we kept it honestly. We named it as a coined concept and used it to illustrate why lateral reading is essential. You will encounter similar moments in your own work. It is not about whether the model makes mistakes, because it will. It is about whether you catch them, question them, and strengthen your own reasoning in the process.

This chapter has shown that evaluating AI output is not only about spotting errors. It is about developing intellectual habits that make you an active partner in the collaboration. Strengthen the skills you already have. Practice reading both inside the text and beyond it. Treat every confident answer as an invitation to verify rather than simply accept.

AI can write quickly and convincingly, but you bring something it cannot: judgment, curiosity, and the ability to question what seems obvious. Those are the tools that keep you grounded when the apparent consensus turns out to be only a polished illusion.


Dig Deeper

For more about the Mandela Effect and the psychology of collective false memories — and why shared confidence in an incorrect belief can feel indistinguishable from genuine recollection: Prasad, D. & Bainbridge, W.A. (2022). “The Visual Mandela Effect as Evidence for Shared and Specific False Memories Across People.” Psychological Science, 33(12), 1971–1988. doi.org/10.1177/09567976221108944

For more about the distinction between intrinsic and extrinsic hallucinations in large language models — the academic framework underlying the hard/soft hallucination distinction used in this chapter: Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A., & Fung, P. (2023). “Survey of Hallucination in Natural Language Generation.” ACM Computing Surveys, 55(12), Article 248. doi.org/10.1145/3571730

For more about how Reinforcement Learning from Human Feedback shapes model behavior — including the ways human preferences during training can amplify familiar but inaccurate ideas: Christiano, P., Leike, J., Brown, T.B., Martic, M., Legg, S., & Amodei, D. (2017). “Deep Reinforcement Learning from Human Preferences.” Advances in Neural Information Processing Systems, 30 (NeurIPS 2017). arxiv.org/abs/1706.03741

For more about the five stages of grief as a case study in how a debated psychological model becomes cultural fact — and what that process reveals about how popular ideas outlast the evidence for them: Stroebe, M., Schut, H., & Boerner, K. (2017). “Cautioning Health-Care Professionals: Bereaved Persons Are Misguided Through the Stages of Grief.” OMEGA — Journal of Death and Dying, 74(4), 455–473. doi.org/10.1177/0030222817691870

For more about lateral reading as a verification strategy — originally developed to study how professional fact-checkers evaluate online sources, and now widely applied to media and AI literacy: Wineburg, S. & McGrew, S. (2019). “Lateral Reading and the Nature of Expertise: Reading Less and Learning More When Evaluating Digital Information.” Teachers College Record, 121(11), 1–40. doi.org/10.1177/016146811912101102

For more about how AI systems can reinforce users’ mistaken beliefs — including documented cases where conversational AI amplified delusional thinking by treating false premises as valid conversation norms: Abramson, A. (2023). “How AI Chatbots Can Reinforce Harmful Beliefs.” Monitor on Psychology, 54(6). apa.org/monitor/2023/09/how-ai-chatbots-can-reinforce-harmful-beliefs

For more about cognitive bias in AI-generated content and how training data encodes human reasoning shortcuts into model outputs — a technical perspective accessible to non-specialists: Navigli, R., Conia, S., & Ross, B. (2023). “Biases in Large Language Models: Origins, Inventory, and Discussion.” ACM Journal of Data and Information Quality, 15(2), Article 10. doi.org/10.1145/3597307

For more about AI detection tools, their limitations, and the ongoing challenge of distinguishing human from machine-generated text — with specific attention to false positive rates affecting student writers: Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). “GPT Detectors Are Biased Against Non-Native English Writers.” Patterns, 4(7), 100779. doi.org/10.1016/j.patter.2023.100779