How does AI score a police promotion board answer?

A built assessment forces the model to return its judgement in a fixed structure, not free text, and marks it against an encoded rubric: the CVF 2024 descriptors or the Met's future-focused indicators at your rank, with the evidence for each mark. Logic in the software caps an answer pitched below your target rank, and your force inspection context is fed in. It's structured scoring, not a chatbot opinion.

Is AI accurate enough to assess a promotion board?

A general chatbot isn't, because it has no fixed standard and is built to please you. A system built for the job is a different thing: a schema, an encoded rubric, and testing against answers a real panel has already scored. The same engineering discipline grades the outputs of frontier AI models and predicts protein structures. A promotion answer is a far smaller problem.

How AI Scoring Actually Works for a Police Promotion Board

Q: Does State6 write your promotion board answer for you?

No. State6 never writes your answer. You write it and you speak it, from your own anonymised evidence. The AI scores and coaches the answer you give, descriptor by descriptor against the CVF 2024 or the Met's own model at your target rank. Using AI to write an answer you then submit is exactly what panels catch, which is why the tool is built to assess yours, not to author one.

There’s a genre of video doing the rounds where someone asks a chatbot to write their interview answer, reads out the bland result, and concludes that AI is no good for promotion preparation. The demonstration is fair, and we’ve made the same point ourselves about why ChatGPT won’t get you promoted. The conclusion is wrong though, and the reason is the part nobody films: what actually sits underneath a tool that scores you. So here’s a look under our bonnet, and a short education in how this kind of AI really works.

What is a schema and why does it matter for AI scoring?

Start with the word schema, because it’s where serious AI parts company with a chat box. A schema is a strict, predefined structure that the model is forced to fill in. Instead of letting it reply with a paragraph of whatever sounds good, you require it to return its judgement as defined fields, each with a type and a rule: a score from one to five in one field, and the evidence for that score in another. The software then checks the structure is complete and valid before it accepts it.

A chat box is free text. A schema is a form the model has to complete properly, every time, with no room to waffle or flatter. That single constraint is what lets you treat a model’s judgement as data you can rely on, rather than an opinion you have to take on trust.

What is an AI rubric?

A schema is the shape. A rubric is the standard that goes inside it. A rubric is your assessment criteria turned into explicit, encoded instructions: exactly what’s being judged, and what each level of performance looks like. Encode the rubric and the model stops marking against a vague memory of frameworks it once read online, and starts marking against a fixed standard, the same one, every single time.

Get the rubric right and two officers who give the same quality of answer get the same score. Get it wrong, or skip it entirely, and you’ve got a chatbot guessing. The work is in writing rubrics that actually match how a panel marks, then testing them against real answers that have already been scored, and adjusting until the machine agrees with the humans. That testing has a name in this field, evaluation, and it’s most of the job.

This is the same craft behind some very large projects

None of this is exotic. It’s the standard discipline behind the most serious AI work being done anywhere. The system that predicted the three-dimensional shape of almost every protein known to science, and won its makers the 2024 Nobel Prize in Chemistry, is structured prediction, not a chat. The tools that turn millions of messy documents into clean records do it by forcing the model into a strict schema, and the way the big AI labs grade their own models is a rubric and a structured score, the very technique we use to grade an answer.

We’re not pretending to be DeepMind. Those are enormous, expensively funded projects, and ours is a small one. But it’s the same craft on a much smaller canvas: a schema, a rubric, grounding in real data, and relentless testing against known good results. The distance between that and a sentence typed into a chatbot is the distance between an instrument and a guess.

One engine, every tool

That craft runs through the whole platform, not one feature. Every tool is the same engine pointed at a different task. When you build a STARR or PEEL answer, the review scores it descriptor by descriptor against the CVF 2024 or the Met’s own model at your target rank. It also judges whether the example you reached for actually fits the question, telling you when your story or angle is a strong fit, a workable one, or simply the wrong choice for what was asked. The coaching takes a single section and tells you what to change and why, against the same standard. The question bank generates practice shaped by your force’s real inspection picture. STANCE structures a presentation and coaches it section by section against the same standard. And the mock board runs the whole thing live, in voice, probing what you actually said and marking it.

Different surfaces, one discipline underneath: a defined schema, an encoded rubric, your rank, your force, checked against real graded answers. That’s what we mean when we say we built something. It isn’t a chatbot with a police theme. It’s an assessment engine, and the policing judgement inside it was put there by serving officers who’ve sat on the panels it imitates.

Why it was hard, and why that’s the point

Here’s the honest part. This was hard. A rubric for every competency at every rank, a schema for the model to fill, your force’s inspection data wired in, a voice pipeline that listens and speaks, and months of testing every score against answers a real panel had already marked. The chatbot in that video took none of that work, which is the whole reason it can sound confident and still have no idea whether you’d pass.

That effort is exactly what you’re paying for. Not access to AI, you’ve already got that for free in your pocket. What you’re buying is the engineering between you and the model: the standard it holds you to, the rank it judges you at, the force context it knows, and the honesty that comes from a fixed rubric rather than a tool built to please you. One of those will tell you your answer is excellent. The other will tell you the truth, which is the only feedback worth having before the day.

You can sit a full AI mock board and practise the board out loud with AI, and every score you get is the engine described above, not a paragraph a chatbot wrote to make you feel ready.

For the full picture of the process, from how the NPPF works to what panels score at each rank, see the complete guide to UK police promotion boards.

How We Do What We Do

What is a schema and why does it matter for AI scoring?

What is an AI rubric?

This is the same craft behind some very large projects

One engine, every tool

Why it was hard, and why that’s the point

Frequently Asked Questions