Information for participants on Scheme purpose, design, marking and reports

The way we mark Cases using Peer Assessors, calculate Rolling Time-Window Scores and present the data in reports follows most of the approaches that will be familiar to anyone participating in any of the Birmingham Quality UK NEQAS Schemes.

There are a few basic principles on which the Scheme operates

The Scheme is intended to augment the familiar Laboratory analytically-based EQA by looking at the interpretation aspect which begins once you have the test results.

The Scheme is aimed primarily at individuals, and CPD certificates are provided annually at 1 point per case. We also provide ‘Group’ participations, which may be operated locally in a variety of ways, but the Scheme cannot and does not provide CPD certification for them.

A scenario is presented and a clinical question posed. This may come from a GP or a hospital doctor and you may be asked to interpret the results, suggest what to do next or suggest a diagnosis. You may be asked what comment you would add to the result or what you would say in a phone call. It is important to answer the question and to take account of the recipient. Though the facts may not change, what you say to a GP or to a Hospital Consultant will inevitably differ. The Scheme judges both what you say and how you say it - but more of that later.

Marks are awarded for giving ‘added value’. Adding nothing will attract a mark of 0. A comment that is deemed to be wrong or misleading would attract -1. Marks of +1 and +2 would be awarded for adding some value and a lot of value, while +3 would be awarded for a strong thorough, well worded answer expressed in a clear way. Needless to say, marks of +3 are not given out willy-nilly. This marking scale is the same as used in the long-established scoring system in the UK NEQAS for Microbiology and adopted for phenotyping in our UK NEQAS for Cholinesterase Investigations, though in these schemes the scale is limited to +2 for a fully correct response and there is no +3 for ‘excellence’.

Marks are awarded by a number of individual Assessors and an average ‘Participant Case Mark’ [PCM] is reported back to the Participant. Over a 12-Case time-window an average ‘Participant Time-window Score’ [PTS] is calculated. It follows that if you regularly ‘add value’ (aka ‘get it right’) your PTS will be positive and the higher the score the better.

We produce a report for every Case. This includes telling you what your PCM and PTS are and also plots these out in graphical form in an analogous way to the standard Birmingham Quality UK NEQAS format. We use shading. We use Box and Whisker Plots to describe spreads of ranked data. These comprise the median, a box capturing the ‘inter-quartile range’ [25th to 75th Centiles] which contains half the data points, and whiskers showing the spread to the near extreme 5th and 95th Centiles (not the absolute highest or lowest), as the usual statistical approach.

The Report contains a summary of some of the background and outcome of the Case if known. This has been covered elsewhere by others, but something worth stating is that the true outcome may actually be a bit obscure and identifying it based on the information given may not get you the highest mark. It is often better to work through the most common/likely things rather than backing the 100 to 1 outside bet which occasionally comes off. The odds of getting a total of 7 rather 12 when rolling a pair of dice might be a better analogy.

As well as your own Comment, the Report also contains some examples of Comments which attracted high marks, some which attracted ‘average’ marks and some which attained only low marks.

Why do we use Assessors?

A key point to note is that there is often no single ‘right answer’. We are NOT asking for exam answers along the lines of: ‘What are the four most common reasons for a high potassium?’

This is not a failing of the Scheme, it is one of its main strengths. There is no marking scheme and there is no model answer. What there is, is peer judgement on the quality of the Comment. Each Comment is ‘marked’ or ‘assessed’ by a number of Assessors. The final PCM is the average mark awarded to you. Each Assessor marks independently and doesn't know whose Comment he/she is marking and doesn't know what other Assessors have made of the Comment. The mark awarded by each Assessor is known as the Assessor Case Mark [ACM].

The original aim of the Scheme was to have a rolling replacement programme where at any one time around 10% of the Participants would be Assessors. This remains an aspiration, but volunteers are welcomed!

Is an ‘average’ of ACMs to produce a PCM a fair way to judge a Comment?

In an ideal world each comment would be looked at by the all assessors and the final PCM might be based on 12 to 20 Assessor Case Marks. In reality, due to real life, we have a smaller subset of Assessors looking at the comments and not all Assessors will mark every comment. That said, in general each comment might be looked at by around 6 Assessors, and the allocation process is random.

We have no fixed view on whether an Assessor should mark his/her own Comment. It really doesn't skew the picture, as it will be marked by other Assessors and the average allocated. Even although the comment was the Assessor's best initial answer, his marks will be awarded based on the overview of all Comments to be marked, which may bring new knowledge and clarity.

We have never reported back individual ACMs to Participants. This was to prevent the inevitable “why did I get a 0 when all my other marks were +2?” query. The short answer is that the Assessor who awarded you a 0 thought that was what the Comment merited.

We do have a breakdown that we keep for internal use and which can assist with appeals. It might be interesting to know whether your PCM of “+1” was made up of six +1 ACMs or two -1's and four +2's, but (while not wishing to give the impression that we are less than transparent) we are keen to avoid protracted and, in the end, damaging navel-gazing and dispute. We might be tempted to give a broad-brush “tight 1” or “loose 1”, but inevitably this will only lead to fractious poring over data: Who gave me the bad mark? Is he a Scientist or a Medic? Was it him who gave me that bad mark three Cases ago?

A football analogy - Why do you always say that the Scheme is a League and not a Cup-tie?

In a nutshell, anyone can get a good, or indeed a bad mark, from any Assessor. This could be compounded by getting other similar marks from many Assessors in the same Case. If, on the other hand, all your marks are bad Case in, Case out, then the likelihood is that this is indeed a fair representation of your performance. So in the football analogy there are many Cup-ties where the minnows outplay their Premier League opponents, but the Cup-run rarely goes all the way. If at the end of the season you are at the top of the league then you must be doing things right most of the time. Similarly, if you are at the bottom then rather than relegation, the Scheme will recommend more training/education etc.

Looking at your own data

The way that the Scheme operates can be viewed from the bottom up; that is starting with an individual Case or, from the top down, taking a wider view and drilling down where necessary.

It is self-evident that in order to get a high PCM (Participant's Case Mark), you need to be awarded a high mark from all/most Assessors. The corollary of this is that if you get a low PCM that you must have been awarded a very low mark from all/most Assessors. Given that the Assessors work independently and have no knowledge of the marks awarded by other Assessors, this gives credence to the PCM.

It really doesn't matter if you get a bad set of marks for a Case. Even if you genuinely feel that you have been unfairly hard done by, it really doesn't matter that much over time. The whole point of the Scheme is educational. If you come away with the learning points, then that is the main benefit. Most Participants are strictly mid-table. Returning to the football analogy, Europe may be out of the question, but relegation is not on the horizon either. The Box and Whisker plots show that the central 50% of Participants are all fairly close together, with PTS scores currently around the 1.08 value. Very few Participants have genuine problems, but we have helpful procedures in place to address this.

How do we ensure that the Assessors are fair?

As is customary at this point, we quote Juvenal's “Quis custodiet ipsos custodies?” and describe how we assess the Assessors.

At each Case an Assessor will get a report. This will indicate the number of comments marked, the breakdown of ACMs awarded and whether this was similar to their usual marking behavior and whether this was deemed to more dove-like or hawk-like compared to the Assessor group as a whole. The point of this is not to intervene and tell Attila the Hun to be a little more understanding. All it does is to let little Attila, or indeed Mother Teresa of Calcutta, know where they lie on the spectrum of marking.

There are two further ways that we monitor Assessors. We allow, indeed encourage, Assessors to submit a brief statement on what they made of the Case, which are collated and fed back to all Assessors. We also have an annual Assessors' Meeting which allows informal feedback between the Assessors themselves as well as with the Scheme Organisers/Secretariat and SAG (Specialist Advisory Group) members.

Throw my toys out of the pram?

Another typical ‘complaint’ is “I said exactly the same as the good Comment you quoted in the Report but I got a lower mark”. This can happen, but the usual reason is that the Participant has a strange notion as to what the word ‘exactly’ means! Word order, phrasing, emphasis and having additional or fewer facts usually do matter.

Participation Rates

The best way to minimise the ‘uncertainty’ [in the statistical sense, not meaning ‘I'm not sure’] is to have a full data set. If you answer every Case then your data will be more robust. There is no point arguing over the odd decimal place in your marks when you have only answered a third of the Cases! Indeed, a criterion of performance for the Scheme is to have a 50% return rate. This is not met by most participants. This means that some Participants are being assessed for easy and hard Cases, while some Participants might flit in and out only answering what they assume to be ‘easy’ cases. The bad news for this approach is, as stated above, your statistics are built on too few data points but also the observation that what appear to be ‘easy’ Cases on the surface, often turn out to be more complicated!

Closing comments

We know that the scheme is artificial in nature, so saying so doesn't really tell us something we don't already know. All exams, all tests are artificial, to some extent, and this is no different. We kindly ask Participants to enter into the spirit when taking part, to get the most out of the Scheme. Rather than say “I would always telephone”, let us assume that's not possible. It is the Friday before a Bank Holiday and you are on annual leave the following week, your colleagues are off sick and this is one of 10 queries you have to clear off your desk in the next 5 minutes. Play the game - the Scheme is educational, not punitive in outlook.

As Churchill might have said: “The ComC Scheme is the worst form of EQA for individuals, except for all the others that have been tried”. And since this was, we believe, the world's first such EQA programme [having evolved from Gordon Challand's Cases for Comment] we didn't have anything to base it on. We know it's not perfect. We presented some options at Focus 2015 as to the way forward and we thank all our Assessors and Advisors who have assisted in the Scheme delivery and also a big thank you to Participants for taking part.

