Cognisense Insights

False Positives Are the Hidden Cost of AI Proctoring — And the Damage Is Real

AI proctoring systems often produce false positives by misinterpreting human behavior, leading to unjust accusations of cheating. These systems should not make definitive judgments without human review; instead, assessments should focus on maintaining valid testing conditions and accommodating learners' needs.

AI-driven assessment systems don’t catch cheaters. They catch behavior. And too often, that’s not the same thing.

When we talk about AI in assessment and training, we tend to focus on what it can detect: eye movement, background noise, browser activity, face position. But we rarely talk about what it can’t: intent, context, or reality.

Over the past few years, I’ve seen dozens of cases where online proctoring systems raised red flags that had nothing to do with cheating — and everything to do with misunderstanding human behavior or environmental nuance. The result? False positives. And the consequences? Often career-threatening.

These Aren’t Edge Cases — They’re Common

Let me give you a few real-world examples:

  • Monitor Size Mismatch: One learner was flagged for excessive eye movement. Why? They had a large ultrawide monitor, and the proctoring system was calibrated to expect a standard screen size. Moving their eyes from left to right was interpreted as checking external sources — even though it was just a wide field of view.
  • Eye Tracking and Disability: Another learner had strabismus — commonly called a lazy eye — which caused their pupils to track unevenly. The AI flagged this as suspicious activity. It wasn’t. It was a medical condition.
  • Background Noise: Systems often flag “multiple voices” as evidence of collusion. In one case, it was a radio playing softly in the next room. In another, street noise filtered through thin windows. No conversation. No cheating. Just life happening nearby.
  • System Notifications: Learners have been flagged for pop-ups like “Printer low on ink” or “System update available.” Harmless. Unavoidable. But in the world of automated suspicion, enough to generate a violation report.

Each of these incidents was framed — not as a maybe — but as a definite breach, passed along to organizations or employers as if the AI had caught someone red-handed. But it hadn’t. It had seen behavior, made an assumption, and left the human interpretation out of the loop.

The EU AI Act Has It Right: AI Shouldn’t Be Making These Calls

Under the EU AI Act, systems used in high-risk domains — including education — are prohibited from making unreviewed characterizations of people. That means AI shouldn’t be labeling a test session as "cheating" or "compromised" without a human in the loop.

Why? Because behavior is not evidence. And inference is not fact.

That’s the foundational problem: most AI proctoring doesn’t detect wrongdoing. It detects patterns and anomalies — and outsources the guilt to the people reviewing the report, often with an implied conclusion.

Assessment Is Not About Policing — It’s About Standards

If your goal is to prove someone cheated, you're playing a probability game you’ll rarely win cleanly. Instead, the better approach is to define the conditions under which the assessment is valid, and enforce those.

In other words: don’t try to catch cheaters after the fact. Instead, pause or remove individuals from assessments when the environment clearly fails to meet baseline conditions for trust.

In-person testing centers have understood this for decades. Their job isn’t to prove someone cheated — it’s to ensure a quiet, distraction-free space, monitor basic behavior, and interrupt the assessment if anything suspicious occurs. The standard isn’t proof of guilt — it’s a break from acceptable conduct.

We should treat online testing the same way.

Suspicion Is a Threshold — Not a Verdict

Imagine an in-person exam. A student keeps glancing at their palm. The instructor walks over and sees them rub their hand, smearing what looks like ink. When the instructor looks closer, the evidence is gone.

Did the student cheat? Maybe. Maybe not. But the behavior created reasonable suspicion, and that’s enough to intervene — not accuse.

In online assessments, that intervention is usually missing. The system flags. The learner is told they’ve “violated exam rules.” But no one asks the most important question: Did the learner actually do anything wrong — or just fail to meet unrealistic environmental conditions?

The Responsibility Must Shift

The lesson here is twofold:

  1. AI tools should never be used to characterize human behavior as dishonest without human review.
  2. Assessment systems must focus less on detecting violations, and more on enforcing controlled environments.

This means placing the responsibility where it belongs: on the learner to create a distraction-free, private space — and on the organization to define and communicate those expectations clearly. If those expectations can’t be met due to disability, housing, or other constraints, then accommodation should be sought — not penalization.

What Real Integrity Looks Like

The goal of assessment isn’t to trick learners or catch them out. It’s to create conditions where demonstrating competence is possible and trustworthy.

If we design systems that default to suspicion, we don't protect integrity — we erode it.

And if we let AI make the call without context, we’re not enforcing standards — we’re outsourcing judgment.

False positives aren’t just a technical glitch. They’re a human cost. And unless we rethink how we use AI in assessments, that cost will keep falling on the very people the system was supposed to serve.

Book a Free ConsultationPrivacy policy