There’s a genre of video doing the rounds where someone asks a chatbot to write their interview answer, reads out the bland result, and concludes that AI is no good for promotion preparation. The demonstration is fair, and we’ve made the same point ourselves about why ChatGPT won’t get you promoted. The conclusion is wrong though, and the reason is the part nobody films: what actually sits underneath a tool that scores you. So here’s a look under our bonnet, and a short education in how this kind of AI really works.
A schema is the difference between a chat and a measurement
Start with the word schema, because it’s where serious AI parts company with a chat box. A schema is a strict, predefined structure that the model is forced to fill in. Instead of letting it reply with a paragraph of whatever sounds good, you require it to return its judgement as defined fields, each with a type and a rule: a score from one to five in one field, and the evidence for that score in another. The software then checks the structure is complete and valid before it accepts it.
A chat box is free text. A schema is a form the model has to complete properly, every time, with no room to waffle or flatter. That single constraint is what lets you treat a model’s judgement as data you can rely on, rather than an opinion you have to take on trust.
A rubric is the standard, written down
A schema is the shape. A rubric is the standard that goes inside it. A rubric is your assessment criteria turned into explicit, encoded instructions: exactly what’s being judged, and what each level of performance looks like. Encode the rubric and the model stops marking against a vague memory of frameworks it once read online, and starts marking against a fixed standard, the same one, every single time.
Get the rubric right and two officers who give the same quality of answer get the same score. Get it wrong, or skip it entirely, and you’ve got a chatbot guessing. The work is in writing rubrics that actually match how a panel marks, then testing them against real answers that have already been scored, and adjusting until the machine agrees with the humans. That testing has a name in this field, evaluation, and it’s most of the job.
This is the same craft behind some very large projects
None of this is exotic. It’s the standard discipline behind the most serious AI work being done anywhere. The system that predicted the three dimensional shape of almost every protein known to science, and won its makers the 2024 Nobel Prize in Chemistry, is structured prediction, not a chat. The tools that turn millions of messy documents into clean records do it by forcing the model into a strict schema, and the way the big AI labs grade their own models is a rubric and a structured score, the very technique we use to grade an answer.
We’re not pretending to be DeepMind. Those are enormous, expensively funded projects, and ours is a small one. But it’s the same craft on a much smaller canvas: a schema, a rubric, grounding in real data, and relentless testing against known good results. The distance between that and a sentence typed into a chatbot is the distance between an instrument and a guess.
One engine, every tool
That craft runs through the whole platform, not one feature. Every tool is the same engine pointed at a different task. When you build a STARR or PEEL answer, the review scores it descriptor by descriptor against the CVF at your target rank. The coaching takes a single section and tells you what to change and why, against the same standard. The question bank generates practice shaped by your force’s real inspection picture. STANCE scores a presentation the same structured way. And the mock board runs the whole thing live, in voice, probing what you actually said and marking it.
Different surfaces, one discipline underneath: a defined schema, an encoded rubric, your rank, your force, checked against real graded answers. That’s what we mean when we say we built something. It isn’t a chatbot with a police theme. It’s an assessment engine, and the policing judgement inside it was put there by serving officers who’ve sat on the panels it imitates.
Why it was hard, and why that’s the point
Here’s the honest part. This was hard. A rubric for every competency at every rank, a schema for the model to fill, your force’s inspection data wired in, a voice pipeline that listens and speaks, and months of testing every score against answers a real panel had already marked. The chatbot in that video took none of that work, which is the whole reason it can sound confident and still have no idea whether you’d pass.
That effort is exactly what you’re paying for. Not access to AI, you’ve already got that for free in your pocket. What you’re buying is the engineering between you and the model: the standard it holds you to, the rank it judges you at, the force context it knows, and the honesty that comes from a fixed rubric rather than a tool built to please you. One of those will tell you your answer is excellent. The other will tell you the truth, which is the only feedback worth having before the day.
You can sit a full AI mock board and practice the board out loud with AI, and every score you get is the engine described above, not a paragraph a chatbot wrote to make you feel ready.
For the full picture of the process, from how the NPPF works to what panels score at each rank, see the complete guide to UK police promotion boards.