The world's most-read Scottish politics website

Wings Over Scotland


On review scoring

Posted on August 01, 2010 by

It's one of the most-observed truths of videogame reviewing that the entire concept of scoring is, as practised almost universally in all forms of current print, broadcast and online media, fundamentally broken.

Everyone knows that the marks awarded in game reviews – whether out of five stars, ten points or 100% – are not in fact sequential numbers as we were taught them in arithmetic lessons, but abstract ciphers whose true value is heavily encoded. In videogame reviewing, 4 isn't any bigger than 2, 6=7, and 10 is more than twice as many as 9.

And therefore – since the sole and entire point of scoring is to attach an instantly comprehensible numerical summary of the reviewer's opinion to the text – videogame review scores are functionally almost meaningless.

There are certain phenomena that we know to be almost universally true:

—————————————————————————-

1. In scoring systems marking out of 10, the vast majority of scores clump around the scores 6, 7 and 8. (For the sake of clarity, this feature will generally use marks out of 10 as the default reference.) Similarly, with percentage-marking systems, most games score between 51% and 80%. (Which – rather than 60% to 80% – is the true equivalent range.)

2. In scoring systems marking out of five (or less), maximum scores (ie 5/5) are commonplace. Yet in systems marking out of 10 or 100, they're almost unheard of. This isn't very rational. Marking out of 10 only doubles the fineness of division, so there should be roughly half as many games scoring 10/10 in a 10-point system as there are scoring 5/5 in a five-point system (with the other half getting 9/10).

Yet something like Edge, which has a declared policy of cherry-picking only the best games each month, still only awards a 10 roughly once per 250 games reviewed. There seems to be a disproportionately-huge invisible glass barrier between 9 and 10 in a way that there isn't between 4 and 5, and another between 99% and 100% that's so vast as to be almost infinite.

(Edge has reviewed somewhere in the vicinity of 3000 games in its 17-year life. If we assume that their cherry-picking policy selects only, say, the best 40% of games to start with, that would suggest that approximately 750 games should have had a 10. The actual number is 12.)

3. Despite the above, there are always far more games clustered around the 9 (or equivalent) mark than there are in the entire range from 5 down to 1. It seems to be far, far easier to break into the 9/10 club than it is to score 5 or less in a 10-point system, yet astonishingly hard to take that one extra step.

4. A similar marginal level of distinction doesn't apply in the bottom half of the scoring range. To most modern reviewers (and readers), the scores 5/10 and 1/10 are basically interchangeable. In other words, any system marking out of 10 or more is in fact really marking out of three:

"Definitely buy this" (encompassing scores of 9 and 10. Strangely, in a percentage system the cut-off here is 90% and above, even though the actual equivalent should be 81% and above).

"Definitely don't buy this" (encompassing anything from 1 to 5).

"Um, it's sort of in the middle. Buy this if this is the sort of thing you usually like buying", (encompassing scores from 6 to 8, and basically not a review at all).

(What this means is that systems marking out of more than five are more often than not actually LESS precise than those marking out of five, in which the full range is more widely employed.)

—————————————————————————-

Odd, isn't it? There are two core reasons for these phenomena, and we'll very quickly deal with the less-interesting one first: corruption.

Corruption takes two forms: hard and soft. The "hard" variety is much rarer, and involves direct bribery of some sort – either in the form of a reviewer or editor being offered money or other incentives by the game's publisher, or the magazine being offered advertising placement (or removal), or other editorial benefits (eg exclusive access) conditional on the game's score. All of these things go on depressingly regularly, but they're much less common than the second form of corruption.

"Soft" corruption is when reviewers self-censor, not as a result of any direct instruction or threat but simply as a means to make their own lives easier by not causing what they regard as needless bad feeling.

That is, if you have to review Game X and it's absolutely terrible, but  you know that Game X's publisher is about to announce Game Y which you want privileged access to, you might find yourself giving Game X a score of 5/10 – safe in the knowledge that that will be enough to put your readers off buying it, while much less likely to make the publisher apopleptic with rage than the 1 or 2 score that it really deserves.

 

But neither of those is what I want to talk about. The second reason that review scores are so hopelessly debased is that reviewers have absolutely no grasp of the fundamental purpose of marking.

The purpose of having a marking scale is to compare and sort games against each other, so that you can give your readers advice on which to buy. It's the only criterion that makes any sense, because even if we leave aside the vagaries of personal taste there is no such thing as an absolute empirical measure of game quality – it's a constantly changing baseline.

(If you judged every game against the standard of the entire history of videogames, for example, you'd have to give everything 10/10, because even the crappiest 400-point XBLA game is immeasurably superior to, say, Video Pinball. And the baseline changes even if you stay within a single format – the Spectrum games of 1992 were incomparable to the ones of 1982.)

Now, very stupid people sometimes point out that it's appropriate to cluster things uselessly around the middle, because of the bell curve. But this is a disastrously wrong-headed notion, because the bell curve method of grading is an artificial process specifically designed to clump its constituent elements around the middle, which is exactly what review scoring ISN'T supposed to do.

(Because they're very stupid, incidentally, what these people are almost certainly doing is confusing bell-curve grading with the other form of the bell curve, the normal-distribution graph. This is equally inapplicable because it refers to naturally-occurring phenomena such as human height, whereas there is no such thing as a naturally-mediocre videogame. You can't change your height, but you can employ more QA testers or make your fricking cutscenes skippable.)

The bell curve is completely the wrong model for reviewing – any idiot can say "most things are sort of average with only a few examples at either extreme", because everyone already knows that. But a reviewer's SOLE AND ENTIRE REASON FOR EXISTING is to separate out that clump of things and tell people which ones are most worthy of their limited time and money.

If people invest precious moments of their lives reading your reviews, the absolute minimum they should be able to expect is that they should come out at the other end knowing more than they did when they started. If all they've gleaned from your 1500-word review is "most things are in the middle with only a few examples at either extreme", then you've wasted their time, because they ALREADY KNEW THAT.

So a review scale should be spread as evenly as possible in order to achieve the greatest possible amount of distinction between games. That is, for a scoring system to have ANY worthwhile meaning at all, there should be roughly the same number of games in each division on the scale at any given point in time. If instead you've just clumped everything in the middle and left readers to judge for themselves which of 900 different 8-rated games they should buy, you've failed.

(As a theoretical ideal, reviewing would take the form of a single chart comprising the total number of games reviewed in the publication's entire lifespan, with each new review being assigned a unique position on the chart. If 576 games had been reviewed and a new game was deemed the best ever, its "score" would be 1/577. Sadly this approach is slightly impractical, at least in print magazines, due to the arguable need to categorise games by genre. You'd end up with a hefty slab of pages every month occupied by numerous charts of what was basically reprint.)

The thing that sparked these thoughts off today was some Sunday morning idle time, which I whiled away by examining the reviews of an iPod gaming site. The site marked out of four, which is just about the least useful grading system you could possibly imagine, but the implementation made it even worse. Out of 1063 games "reviewed", the marks broke down like this:

1 ("Avoid): 55
2 ("Caution"): 273
3 ("Good"): 471
4 ("Must Have"): 264

Now, to establish exactly how useless this system is, first we have to look at the games that get 2/4 ("Games in this category should not be avoided in all cases. Some players will find value in them", which is an American translation of the classic "If you like this sort of thing, this is the sort of thing you'll like").

How exactly is someone supposed to exercise "caution" over buying an iPod game? If the game has a Lite version, we don't need some muppet stating the bleeding obvious by telling us that we should probably check that out first if we're not sure. And if it doesn't have a Lite, then there's no way of being cautious. We can either buy the game or not buy it  – there's no "buy it cautiously" option available on the App Store.

(What that score is in fact telling us, more or less explicitly, is "Go and check out some other people's reviews, because ours just wasted your time.")

So we can immediately bin the 2/4 scores, because they offer us no help whatsoever in making a buying decision. That leaves us with 790 reviews, of which 735 – or 93% – come with a buying recommendation. ("This rating is our seal of approval", says the site of its 3/4 "Good" mark.)

Well, thanks. You've really filtered the wheat from the chaff for us there, guys. Who knew that only 13 out of every 14 iPod games were worth buying? Where would we have been without you?

As a concept, giving reviews scores is brilliant. It's informative and a great time-saver in today's busy world, and it makes things like Metacritic (another terrific idea in theory) possible. Writers bemoan them at every turn, because they seem to make all the writer's finely-crafted words an afterthought to a number, but the scores aren't the problem. It's the idiots awarding them.

0 to “On review scoring”

  1. Derek says:

    I still think Pinball Scoring is the only way to go. With New High Score occasionally.

    Reply
  2. Marc says:

    I like your game ranking idea: "if you're going to buy games, buy them in this order." Also, discerning whether game Y is better or worse than game X makes a lot more sense than game Y is 1% better than game X. This reminds me of ACE magazine (I think) that had scores out of 1000.
     
    It poses the question: can the ranking system work if your site has multiple reviewers?
     
    I gave your post 4/5, by the way. It didn't quite break 'the invisible wall' 😉

    Reply
  3. Marco Gazpacho says:

    If this article is the sort of thing you'll like, then you'll like this. 73%.

    Reply
  4. CheapSheep says:

    Sounds a bit like the Amiga Action Power League (was that what it was called?) to me.

    Reply
  5. tssk says:

    The sad thing is that magazine reviews seem to have gone backwards since the Amiga Power/Zzap/Crash days of yore.  (Of course even Zzap which I was a big fan of failed it's readers sometimes, most notably not pointing out that Thalamus was run by Newsfield and the Operation Thunderbolt C64 review debacle.)
    Nowadays the only highly critical stuff in print media for games can be found in Retro gamer because let's face it, people are much more frank once things are 10-20 years dead in the market.
    Amiga Power was the last mag where I felt that the journo's were on the consumer's side. (Arcade had it's moments though, especially big write ups of games they loved.)

    Reply
  6. VLII says:

    "the Spectrum games of 1992 were incomparable to the ones of 1982"
    Which were better?  I can't think off-hand of any Speccy classics that were released in 1992, but likewise there were a hell of a lot of supremely crude "I MADE A GAEM" crap in the early years.

    Reply
  7. Tom Camfield says:

    Excellent article, I was hoping for it to run a little longer and come up with a solution.
    As far as I can see you've come up with two workable solutions that mags tend to use anyway. Take PC Gamer, near the back they always have a list of the best games for each genre then two or more that are just beneath it. PC Gamer also has a regular top 100 which helps to order the games.
    Is that a good enough solution, or should there be something more like GamesTM where at the end of each review there's a better than and worst than comparison (or at least used to be)? Or, indeed, should the whole review be about comparing it to other games within the genre, how it handles itself compared to them?
    Is the widespread use of top 100s an antidote to badly applied review scores or does the reviewing itself have to change?

    Reply
  8. @tssk: Actually, I don't think Retro Gamer's entirely off the hook. It's pretty lenient when it comes to ratings, and then every now and again it'll absolutely slam something (such as PMCE for iPod); the homebrew section's also scored insanely highly throughout, the the writer admitted on the RG forum he tends to think of 70% as an 'average'.
    @Tom: A solution is to award games the rating they deserve and to use the full range. If something is unmitigated shit, give it 1/10. If it's brilliant, give it 10. Sadly, most publishers get scared of the former—I wrote a couple of reviews for a publication that shall remain nameless, but the editor went mental when I tried to give a rubbish game a 2. His argument: he'd read by review, bought the game himself and it worked fine. Therefore, it was worth "at least a 4". No matter that it was utter bollocks.

    Reply
  9. DG says:

    RG is also the magazine that gave the "Ultimate" Sega collection 98% despite you not being able to play Sonic 3 + Knuckles lock on with it.
    Sega of course released said item for XBLA very shortly afterwards.

    Reply
  10. Rev. Stuart Campbell says:

    I'm pretty sure RG uses a scoring system that starts at 84%.

    Reply
  11. CdrJameson says:

    On the 'try cautiously' mark, some US scoring systems have the baffling concept of a 'rent it' recommendation, with hilarious consequences when applied to downloadables.
    Personally, I still mentally work on the Zzap/Crash model. Games are one of:
    – Stay up way past bedtime.
    – Looking forward all day to playing later.
    – Something to try when there's nothing else on
    – Possibly get round to if I have a long, but not too debilitating illness
    – Real Crap

    Reply
  12. The Owl says:

    I find it hard to believe that someone managed to mention ("Michael Jackson" -Ed) in these comments in a vaguely positive way given the approach they took to marking games.

    Reply
  13. DG says:

    I don't believe Craig ever wrote for them but I've heard a very very similar story to his with regard to ("Michael Jackson", – ed)  so consider that redressing the balance.

    Reply
  14. Irish Al says:

    Is it not the case that rampant payola and the threatened withdrawal of ad revenue and exclusives make any scoring method in any mainstream print or web publication so skewed as to be fairly useless?

    Reply
  15. asdasdasd says:

    link to escapistmagazine.com

    Reading the above review made me think of this article. Points awarded for :
    – if you like this sort of thing, you'll like this.
    – if you don't like this sort of thing, think about how you don't like this sort of thing when considering whether to buy it.
    you may want to rent it instead.
    – making the assertion that the game is 'more of' its prequel fully five times across its 750-odd words.
    – the use of the word 'gameplay' twice in the same sentence.

    Reply
  16. bedroomcoder says:

    Anyone remember Simon Kirrane giving Micro Machines 2 100% for playability?
    How could you justify giving anything 100%? Seriously..?

    Reply
  17. Mr Lizard says:

    One reason most review scores cluster around 51%-80% is that in the main, reviewers are still scoring for competence as well as quality.
    The equivalent would be a film review by Barry Norman in the Radio Times that says "The actors keep forgetting their lines and I swear I saw a boom mike in shot in one scene. One star."

    Reply
  18. Rev. Stuart Campbell says:

    Or more accurately, "the actors DON'T forget their lines and I DIDN'T see any boom mikes, therefore it gets at least 3/5 straight away".

    Reply
  19. Darran says:

    Our scores start at 84%? Tosh.
    The thing with Retro Gamer is that we have a very limited amount of space, so I mainly tend to cover the good stuff I enjoy playing. Most of our readers also know what we like so will typically change a score because they know what our personal preferences are, just like in the days of old.
    I'll admit we sometimes get things wrong (I awarded Pac-Man Championship 4/5 recently for a bookazine) but hey, we're only human and sometimes make mistakes. We do take a harsher viewpoint now, but as we tend to review the best titles it's not really noticeable 😉

    Reply
  20. Andrew says:

    Are you familiar with monitor gamma values? (Bear with me here.) The brightness of a pixel on your monitor is the RGB value to the power of 2.5 (ish). It's done for dumb historical reasons, but one benefit is that we concentrate the 255 available divisions around the dark colours where the human eye is better at distinguishing tones.
    Your anti-bell-curve system, where 1% of games get 1%, 1% get 2% and so on means the divisions are clustered around 50% — because "most things are sort of average with only a few examples at either extreme". So, the difference between a 50th percentile game and a 51st percentile game is tiny compared to the difference between 1st and 2nd percentile games. It's all very noble to separate out the clump, but it's of no use to the reader, because he can't afford to buy the best 1% of all games. He can afford, at a push, the best .1% of all games. It would be far more useful to score the top 0.5% from 1-10 and give everything else zero.
    The theoretical answer to this is to set gamma less than one, so the average game gets maybe 25%, 0-1% is a huge change and 99-100% is a tiny change. But people won't intuitively understand that. People intuitively understand the bell-curve, and it does cluster marks around 100% (albeit at the cost of also clustering them around 0%).
    Or, instead of a score, give each game a salary — "Game X is worth buying if you earn more than $53,000/year ($47,000 for fans of the genre)".

    Reply
    • Rev. Stuart Campbell says:

      “It’s all very noble to separate out the clump, but it’s of no use to the reader, because he can’t afford to buy the best 1% of all games. He can afford, at a push, the best .1% of all games.”

      Then he buys the tenth of the best 1% that he’s interested in. If he’s into arcade games he’s not going to care about even a 99%-rated flight sim.

      Reply
  21. MattyFTM says:

    I have a comment on the criticisms of that 93% of that site’s ratings are high – I feel that this is an inevitable trend in any review site or magazine, and one that isn’t necessarily a bad thing. Ultimately, a review outlet is going to have limited resources. They can only review a limited number of games that are out there. It’s perfectly rational for them to focus their reviews on games that appeal to them and their audience, and those games are far more likely to score highly than games in which the reviewer has little interest in to begin with.

    In a perfect world the outlet would review every iPhone game available, and then theoretically there would be an equal number of 1/4 reviews as there are 4/4 reviews, but that’s never going to happen. Instead they focus on games that they are interested in, and they are likely to enjoy, and thus reviews end up clumped together at the high end of the scale.

    Overall though, this is a fantastic read, and you raise a lot of fantastic points.

    Reply
  22. Andrew says:

    I don't want the best 1% of all arcade games, I want the best arcade game, the best fighting game, the best racing game, and the top two FPS games.
     
    In any case, my point is that it's dumb to say "we can immediately bin the 2/4 scores, because they offer us no help whatsoever in making a buying decision" and then say "a reviewer's SOLE AND ENTIRE REASON FOR EXISTING is to separate out that clump of things and tell people which ones are most worthy of their limited time and money" because separating the clump offers us no help whatsoever in making a buying decision either. The whole clump is "no". Nobody cares that Block Puzzle II (50%) is one point better than The Averageventures of Tim the Person (51%) or one point worse than Banal Rally February 2011 (49%) because all three games are far too dull to ever consider buying, whereas the difference between a 99% game and a 98.5% game might well affect a purchasing decision.

    Reply


Comment - please read this page for comment rules. HTML tags like <i> and <b> are permitted. Use paragraph breaks in long comments. DO NOT SIGN YOUR COMMENTS, either with a name or a slogan. If your comment does not appear immediately, DO NOT REPOST IT. Ignore these rules and I WILL KILL YOU WITH HAMMERS.


  • About

    Wings Over Scotland is a thing that exists.

    Stats: 6,874 Posts, 1,235,820 Comments

  • Recent Posts

  • Archives

  • Categories

  • Tags

  • Recent Comments

    • James Cheyne on A Dumber Nation: “Hatey, I think I said energy, not specifically just wind, although that could be included, along with water, Whatever we…Feb 10, 10:50
    • James Cheyne on A Dumber Nation: “Alf Baird, As I was suggesting yesterday, the networking between Colonialism and venture capitalism built on historical events and evidence…Feb 10, 10:32
    • Alf Baird on A Dumber Nation: ““Immunity from what?” (English) Crown immunity means that they (i.e. state actors, institutions or its agents) cannot be prosecuted, no…Feb 10, 10:18
    • Hatey McHateface on A Dumber Nation: “You an all, Dave? You another one who has yet to twig that on some days, the wind doesn’t blow?…Feb 10, 10:15
    • Hatey McHateface on A Dumber Nation: “@Cynicus Even the BBC Radio 4 could see this morning that almost certainly, yesterday’s spat was pure, performative theatre, scripted…Feb 10, 10:05
    • Insider on Echoes of history: ““I will be retiring from Wings over Scotland, which I have been promising for awhile,” Please, please, dear God! Let…Feb 10, 09:56
    • David Holden on A Dumber Nation: “I looked it up a while back as I was going to a meeting about the proposed offshore wind farm…Feb 10, 09:47
    • Hatey McHateface on A Dumber Nation: “Ah, c’mon noo, James. Hoo mony hames dae we supply wi energy in a flat calm? Engage brain afore posting!Feb 10, 09:46
    • James Cheyne on A Dumber Nation: “PC Foster, Crown immunity ( prisons) Hansard, Volume 771. Debated on Tuesday 22nd July 2025. 3: 10 pm. Presented by…Feb 10, 09:28
    • James Cheyne on A Dumber Nation: “Karen, As Alba mentioned, the energy firm Octopus comments suggest that Scotland should be one of the cheapest places in…Feb 10, 09:20
    • Ally on A Dumber Nation: “It’s like the SNP Gov have forgotten that they have to govern for all of Scotland not just a minority…Feb 10, 09:12
    • ALANM on A Dumber Nation: ““a guard told me I would need to take my sweater off if I wanted to visit my niece This…Feb 10, 08:52
    • Karen on A Dumber Nation: ““Those who can make you believe absurdities can make you commit attocities” – Voltaire. What are the atrocities? Freeports, pylons,…Feb 10, 03:54
    • Angus on A Dumber Nation: ““We honest-to-God wish we could believe that our leaders were merely morons.” Yes they are pure evil. Not only in…Feb 10, 02:55
    • Cynicus on A Dumber Nation: “Morag says: 9 February, 2026 at 4:31 pm “The Scottish National Orchestra has been captured?” ======== And will the Parliamentary…Feb 10, 02:53
    • Cynicus on A Dumber Nation: “Rev. Stuart Campbell says: 10 February, 2026 at 12:45 am “Sarwar is a complete irrelevance not worth wasting breath or…Feb 10, 02:43
    • Rev. Stuart Campbell on A Dumber Nation: “” Many of us would like to see you further tear shreds out of Anus Sarewar following his blatent “not-my-fault-if-we-tank-in-election”…Feb 10, 00:45
    • A2 on A Dumber Nation: “would the hypothetical Trrans person sue before or after performing suicide?Feb 10, 00:19
    • GM on The Marshalling Plan: “Small change, Northcode man. You need at least a billion to get into the Degenerate club. Millionaires would maybe get…Feb 10, 00:06
    • sarah on Echoes of history: “@ James Cheyne, those rumours sound good. I hope they come true. But must you leave Wings?Feb 9, 22:30
    • Aidan on Echoes of history: “@Hatey – it’s vanishingly unlikely that any Liberate bum will be hitting any Holyrood seat given the general lack of…Feb 9, 22:20
    • Rob on A Dumber Nation: “I very much doubt any trans man would argue to be in the men’s prison estate. Can you imagine what…Feb 9, 22:20
    • sarah on A Dumber Nation: “This report confirms what we knew would be the case if men were allowed into women’s prisons [and elsewhere]. What…Feb 9, 22:09
    • Hatey McHateface on A Dumber Nation: “Be more specific, Fearghas. Up what?Feb 9, 21:27
    • Hatey McHateface on Echoes of history: “We’ll still be favourites for the toe curling.Feb 9, 21:25
    • Fearghas MacFhionnlaigh on A Dumber Nation: “Re the article’s top photo: Why has Swinney only got ONE hand up?Feb 9, 20:32
    • Peter McAvoy on A Dumber Nation: “The Health Minister should rule out the trial of puberty blockers in Scotland. Then this awful policy should be scrapped.…Feb 9, 20:28
    • Scot Finlayson on A Dumber Nation: “The American government is looking into opening up insane asylums for the protection and safety of those that are insane,…Feb 9, 20:11
    • 100%Yes on Echoes of history: “Playing for the colonizer is a hard act to swallow, I couldn’t watch or support team GB no matter where…Feb 9, 20:04
    • Hatey McHateface on Echoes of history: “” With luck, the Alliance MSPs might be part of the “government” ” Seriously? So as soon as bums hit…Feb 9, 19:56
  • A tall tale



↑ Top