Playbook

What is an AI interview co-pilot, and why does it matter?

The fastest-growing category of interview fraud is not deepfakes, it is real candidates running AI assistants in a second window. Here is how to spot it and what to do.

Poya Farighi

Founder, Veref

April 2, 20268 min read

AI interview co-pilots are the dog that did not bark. Everyone expected deepfakes to be the defining threat of remote hiring in 2026. In practice, the fastest-growing and most common form of interview fraud is simpler: a real candidate, on a real webcam, reading answers off a second screen that has been generated by an AI assistant listening to the interviewer in real time.

This piece explains how co-pilots actually work, why they are harder to detect than deepfakes, what the three reliable tells look like in practice, and what recruiters should do about it on a live call.

What is an AI interview co-pilot?

An AI interview co-pilot is a tool that transcribes the interviewer's question in real time and generates an answer the candidate reads aloud. The candidate on the webcam is genuinely themselves. Their identity verification passes. Their face matches their ID. Their voice is their own. The words coming out of their mouth are written by a large language model two seconds before they speak them.

The common forms fall into three groups.

Browser extensions are the simplest. Tools like Final Round AI, Sensei Copilot, and a long tail of free Chrome extensions overlay an answer box on the interview page. The candidate reads the answer from the same screen as the video call. These are the easiest to catch because the gaze signal is obvious: the candidate is reading, not thinking.

Second-device tools run on a phone, tablet, or separate monitor. The candidate looks slightly off-camera to read. Screen share does not reveal them because they are not on the interview device. These are significantly harder to catch visually but produce the same latency and perplexity signals.

AI-augmented earpiece setups pipe the generated answer into a small earpiece. The candidate hears the answer and repeats it. Gaze drift is minimal, but the latency between question and answer is still measurable and the answer content is statistically distinguishable from unscripted speech.

All three variants have become consumer-accessible in 2025. Final Round AI alone reported more than 500,000 users by mid-2024 and is one of many similar products. Not all users are cheating; some use it for legitimate practice. But the same tool that helps a candidate rehearse a system design interview on a weekend can be loaded into a live interview on Monday, and the product does not distinguish.

Why is this harder to catch than deepfakes?

Deepfakes leave biometric artifacts. Face-swap models produce subtle frequency-domain anomalies on the video stream. Voice clones produce detectable fingerprints on the audio. Video deepfake detectors and voice anti-spoof systems are getting good at finding them.

AI co-pilots produce neither. The candidate's face on camera is the candidate's face. The candidate's voice is the candidate's voice. Identity verification passes. Continuous face match passes. Voice authenticity passes. Every biometric signal in a deepfake-oriented stack gives a green light while the entire interview is being ghost-written.

This is why AI co-pilots are the dominant attack class in 2026 despite deepfakes getting all the press. They are cheaper to deploy (no face-swap setup), harder to detect (no biometric signature), and lower-risk for the attacker (if caught, deniable as "I just had ChatGPT open for note-taking"). They also do not require any skill beyond using a web app.

The defense has to move from biometric to behavioral.

What are the three tells of a co-pilot in use?

Three signals, when combined, catch the large majority of AI co-pilot use. None is decisive on its own. Together they produce a confidence score a recruiter can act on.

Latency spikes

A typical human answer to a well-formed interview question begins within a second of the question ending. Short clarifying answers ("yes", "I think so", "in my last role") start in under 500 milliseconds. Longer structured answers (explaining a past project) typically begin within 1 to 1.5 seconds of the question ending. Candidates pause longer on hard questions, of course, but the distribution is recognisable.

AI co-pilots introduce a mandatory pause of two to four seconds on every answer, not just the hard ones. The tool has to hear the question, transcribe it, send it to the model, receive the answer, and display it. Two seconds is the floor on current-generation tools. The tell is not the absolute latency; it is the flat distribution. Real candidates are faster on easy questions and slower on hard ones. Co-pilot users are slow on everything.

Measured across a twenty-minute interview, the median response latency is a statistically significant signal. A real candidate typically lands around 800 milliseconds. A co-pilot user typically lands north of 2.5 seconds.

Gaze drift

The second signal is where the candidate's eyes go between hearing the question and starting their answer. Real candidates look at the interviewer, at the ceiling, or at the space just past the camera while they think. Co-pilot users look at their screen, usually below and to the side of the camera where the answer is being displayed.

In-browser gaze tracking using MediaPipe or WebGazer is accurate enough to flag this reliably. The signal is a sustained gaze off-axis during the latency window. Candidates who look down briefly to check notes are not the same pattern; that is a quick glance, not a three-second read.

Gaze drift is the single most reliable visual tell because it correlates strongly with the reading behaviour co-pilots require. An attacker who covers the co-pilot with an earpiece setup removes this signal, but very few candidates have gone to that level of sophistication in observed cases.

Transcript perplexity

The third signal lives in the text of what the candidate actually says. AI-generated text has a measurable statistical property: it is more predictable than human speech. Under a reasonable language model, the likelihood of each next token in AI-generated text is systematically higher than in spontaneous human speech.

Stanford's Center for Research on Foundation Models and a dozen academic groups have published reliable methods for scoring this. The practical application in an interview context is to run the live transcript through a perplexity scorer as the candidate speaks. Low perplexity across a multi-sentence answer is evidence of AI origin. The signal is probabilistic, not deterministic, but combined with latency and gaze it reliably separates generated answers from human ones.

Perplexity is also the one signal that survives the earpiece case, because it does not depend on visual behaviour.

What should recruiters do in the interview itself?

Three changes to interview design close the gap that the three signals leave open.

Change the question format. AI co-pilots are optimised for common interview prompts. "Tell me about a time you..." and "How would you design..." questions are their home turf. They struggle on questions that reference the specific interview context or require the candidate to commit to a position in dialogue. Ask questions that build on the candidate's previous answer. Ask the candidate to rate their own response and explain the rating. Ask the candidate to disagree with a premise you introduce mid-question.

Make them narrate a process. Most co-pilots are reactive: they answer the question asked. They are not good at narrating an ongoing process, because there is no question to transcribe. Ask the candidate to walk you through a decision they would make in real time, while you add constraints. The candidate who is reading cannot keep up.

Ask for a sketch, a diagram, or a code trace. Anything that requires a visual output creates a gap in co-pilot support. The candidate has to look away from the answer screen, produce something original, and explain it. Real candidates find this the easy part of the interview. Co-pilot users find it the hardest.

None of these are gotcha moves. They are good interview design regardless. The co-pilot era just makes them mandatory.

How does Veref handle AI co-pilots?

Three signals surface live in the recruiter console: a rolling-average response latency, a sustained gaze-off-axis indicator, and an LLM-perplexity score on the live transcript. Each feeds into the overall integrity score for the session. When any of the three crosses a configurable threshold, the session record captures a short clip around the event, and the recruiter sees the evidence alongside the score.

The system does not reject candidates. It never has and it never will. Every flag is evidence for a human to weigh, consistent with the EU AI Act and with our own position that an AI system should not make adverse hiring decisions. What the system does is eliminate the "we did not know" defense. If the signals fire during the interview, the recruiter sees them and decides how to handle the call in real time.

The in-product default is to surface the signals without interrupting the flow. The recruiter sees a pulsing icon on the integrity dial. They can click through to see the evidence. The candidate does not see anything different from a normal Veref interview. The decision about whether to confront the candidate, reset the interview, or continue and flag for review belongs entirely to the interviewer.

See the full platform on Veref Interview or the underlying verification layer in the Veref Passport. If you want to see an AI co-pilot flagged live on a real interview flow, book a demo.

Sources and further reading

[1]Rise of real-time AI interview assistance tools · TechCrunch, 2024
[2]Perplexity as a signal of AI-generated text · Stanford CRFM, 2023
[3]Behavioral signals in remote interviews · MIT Sloan Management Review, 2024
[4]Final Round AI and real-time interview assistants · The Information, 2024

Frequently asked questions

Is an AI interview co-pilot always fraud?+

Not always. Some candidates use AI for research, practice, or accessibility. The issue is undisclosed real-time use during a live interview. Disclose, and it stops being fraud; hide, and it is.

Can a deepfake detector catch an AI co-pilot?+

No. Deepfake detectors look at the video and audio stream for synthesis artifacts. An AI co-pilot produces neither; the candidate's face and voice are real. You need behavioral signals instead.

Should we just ban AI use in interviews?+

Banning without detection creates a paper policy nobody can enforce. Pair the policy with real-time signals so the recruiter knows when it happens and can address it on the call.

Will asking a candidate to share their screen solve this?+

Partially. Many co-pilot tools run on a second device (phone or tablet) or display answers on a smart monitor that does not show up in screen share. Screen share is a deterrent, not a detector.

What is a reasonable response when an AI co-pilot is detected mid-interview?+

Name the signal, not the verdict. Tell the candidate what you observed (for example, a multi-second latency pattern with off-screen gaze) and ask them to narrate their reasoning out loud for the next question. You get a reset, they get a chance to recover, and you preserve the evidence either way.