I am going to show you three short paragraphs. One was written by a human. Two were written by AI. See if you can tell.
Paragraph A: "The integration of artificial intelligence in educational settings presents both opportunities and challenges. Educators must balance the efficiency gains of AI tools with the imperative to maintain academic integrity. A thoughtful, evidence-based approach will be essential as institutions navigate this evolving landscape."
Paragraph B: "My daughter came home from school last Tuesday and told me her teacher accused a kid of using AI on a paper. The kid cried. The teacher apologized the next day. My daughter said she couldn't tell if the teacher was sorry or just scared of the parents. I couldn't tell either."
Paragraph C: "The proliferation of generative AI tools has created an urgent need for new frameworks in assessment design. Research suggests that educators who proactively adapt their pedagogical approaches will be better positioned to leverage AI as a complement to, rather than a replacement for, traditional learning methodologies."
If you picked B as the human-written one, you are right, and you did not need a detection tool to do it. You needed about three seconds and a question most people can answer without being trained: which one sounds like a person who has a life?
That question is the whole article.
Why Detection Tools Are a Liability
The tool everyone reaches for first is the detector. GPTZero, Turnitin's AI module, ZeroGPT, a dozen others. The pitch is the same: paste in the text, get a score, make a decision. Most educators I know have tried at least one, and most of them have a story about the time it went wrong.
Here is the math nobody does before they install one.
GPTZero, in its own published benchmarks, reports a false positive rate of about 1.28%. That is the best-case number, from the vendor's own testing, under clean conditions. ZeroGPT, one of the most widely used free tools, has a documented false-positive rate as high as 20%.
A class of 200 students writing weekly papers for a 15-week semester runs about 3,000 submissions. At a 1.28% false-positive rate, that is 38 students falsely accused in one class, in one semester, by the best detector available. At ZeroGPT's rate, that is over 600.
38 innocent students are accused per semester, per class, by the best tool on the market. That is not a detection tool. That is a liability.
Brandeis University's AI Steering Council published explicit guidance telling faculty not to use detection tools as the basis for integrity proceedings. A peer-reviewed study in PMC concluded that AI detectors misclassify original creative work with enough frequency that institutional policy should not rest on a single tool's output.
The detectors are not getting better fast enough. The models are getting better faster. That race is over, and the detectors lost; most institutions have not updated their policies.
What Voice Actually Is
Voice is the residue of a specific person's way of seeing the world. It is not a style. It is not grammar. It is not tone. It is the thing underneath all of those, the thing that makes a reader say "I know who wrote this" before they check the byline.
It shows up in five places, and once you know the five, you cannot unsee them.
Specificity that does not pay off. A human will mention an irrelevant, true detail (the broken vending machine, the cousin who showed up late, the color of the chair in the waiting room) that an AI would not bother to invent because it serves no narrative purpose. AI omits useless truth. Humans include it, because that is how memory works.
Asymmetric emphasis. A human cares disproportionately about one part of a topic and undersells the rest. Read any paragraph A human wrote about a subject they love, and you will find a sentence that got three times more attention than its neighbors. AI distributes attention evenly. It gives every point the same weight, because it has no preferences.
Voice contradictions. A human will say something they half-believe and walk it back two sentences later. AI maintains consistent confidence within a single output because it does not have the experience of being uncertain about its own thinking in real time.
Idiosyncratic word choice that recurs. Every human has a small set of vocabulary tics, four to six words or phrases they lean on without noticing. AI's vocabulary tics are generic. If you see "delve," "tapestry," "navigate," or "landscape" in a piece of writing, you are probably reading AI. If you see a specific weird word three times in 800 words, you are probably reading a human.
Stakes. A human reveals what they are afraid of, what they want, and what they are protecting. AI describes those things from the outside, the way an observer would, because it does not have anything at stake.
The One-Sentence Test
Here is the test. One sentence, usable immediately, no software required.
What does this writer love, fear, or want, and would the writer themselves recognize it on the page?
AI output reliably fails this test even when it passes every detector. The reason is structural: AI does not love, fear, or want anything, and when it simulates those things, it simulates them generically, the way a person would describe an emotion they have read about but never felt.
That is the tell. Not the vocabulary. Not the sentence length. Not the statistical signature. The tell is the absence of stakes.
The Honest Challenge
I owe you two caveats before this becomes a neat answer.
First: the voice test punishes inexperienced writers. A first-year college student writing in a register they have not yet developed will produce voiceless prose for honest reasons. Their voice is not absent. It is nascent. Using the voice test as a disciplinary instrument against novice writers is a tax on the people who most need time to develop the thing you are testing for.
The fix is to use the voice test as a teaching tool, not a policing tool. Teach students what voice is. Show them the five markers. Have them find them in their own writing and in each other's. The skill of recognizing voice is the same skill that builds voice, and it transfers.
Second: AI is getting better at faking a voice. Models are being fine-tuned on individual writers and on idiosyncratic samples. The voice test will erode over time. I am not going to promise it is permanent. I am going to promise it is the best heuristic we have right now, and that the underlying skill (paying attention to specific human signals) is durable even if the specific tells are not.
The person who learns to read for voice in 2026 will still be reading for voice in 2036. The tells will be different. The skill will be the same.
Same Story In Home Services
Here is where the voice test has commercial value that most business owners are not thinking about.
AI-generated reviews are flooding Google and Yelp. A service business owner who learns to spot them is protecting their reputation and their competitors' ranking manipulation at the same time. The heuristic is the same: no irrelevant true detail, no asymmetric emphasis, no stakes. A real customer mentions the dog, the wrong-colored nail polish, and the kid who answered the door. A fake one talks about "professionalism" and "punctuality" in a generic register that could describe any company in any city.
Customer communications have the same problem. A homeowner can usually tell when a contractor's reply was AI-drafted because the reply does not sound like the person who came to the house. The voice mismatch destroys trust faster than a slow reply ever did. The contractor who writes "I'd run this line behind the dryer because the previous owner did something weird with the breaker box" wins the job over the one who writes "We will leverage industry-leading techniques to optimize your installation." Useless true detail wins the job. Every time.
The same literacy that helps an educator catch AI-written homework helps a business owner protect their reputation, spot a fake review, and write a proposal that sounds like a person.
What I Want You To Do This Week
Print the one-sentence test and tape it next to your screen.
What does this writer love, fear, or want, and would the writer themselves recognize it on the page?
Use it for one week. On student papers. On marketing copy. On emails. On your own writing. See what you notice. The test is not a policy. It is a practice, and the practice sharpens something in you that no detection tool can replace: the ability to recognize another human being on the page.
That is a skill worth having, whether AI gets better or worse, whether the detection tools improve or collapse, whether the policy debate resolves or drags on for another decade. It is the durable bet.
If you want a printable one-pager of the voice test and the five markers, I have one at bensaibrain.com. Come say hi.
Sources
Created with ❤️ by humans + AI assistance 🤖