UTS site search

Assessment

Multiple choice questions

Multiple-choice Questions (MCQs) are a subset of what are referred to as "objective questions". Objective questions are questions which have a correct answer (usually only one). The term "objective" here means there is complete objectivity in marking the test. The construction, specification and writing of the individual questions (items) are influenced by the judgements of examiners as much as in any other test.

The objective test is largely used to test factual material and the understanding of concepts. Because of the objectivity and ease of marking it is frequently used for testing large groups. It is claimed that skilled items writers can develop items to test higher level intellectual skills (Cannon and Newble: 1983) but if the perception of students is that these types of questions usually test the recall of facts, then they will prepare for them accordingly.

Advantages

  • objective tests can sample a broad range of a course
  • objective tests are rapidly marked
  • students are not able to "bluff" or "pool" answers
  • scoring is objective and reliable (e.g. no halo effect)
  • distribution of scores is determined by the test - not by the examiner
  • only the objectives tested for are marked
  • "good" items may be stored in an item bank and reused

Disadvantages

  • objective tests are time consuming to set
  • there is no credit for partial information
  • objective tests may encourage guessing
  • good objective tests are difficult to set
  • student "selects" information
  • objective tests may encourage reproduction learning

It is to be noted that the quality of an objective test is determined by the skill of the constructors of the test.

While the following notes refer specifically to the most common form of objective tests (MCQs), many of the comments are relate to all objective questions.

Characteristics of multiple-choice questions (MCQs)

Multiple-choice Questions are probably the most widely used of objective tests. Such questions are normally composed of four parts:

  • STEM - question or incomplete statement
  • OPTIONS - suggested answers or completions
  • DISTRACTERS - incorrect responses
  • KEY - correct response

While the stem is, in most cases, written material, other material such as graphs, diagrams, sets of results may be used. In these cases it is probable that abilities other than recall may be tested. In some cases a number of questions (items) may be related to the same material. In these circumstances, care must be taken to make the questions independent (i.e. the answer to one question not depending on an answer to a previous one).

In some cases, MCQs may be used to test higher abilities by asking students to judge the most appropriate answers to a problem from several "correct" ones. Obviously, great care is required for the setting of such a question.

It is usual to have four or five options, with five options giving the most reliable test. However, it may be difficult to provide five plausible options and in this case it is better to stay with four.

Guidelines for setting MCQs

Dunstan (1971), in a short monograph on multiple choice testing, summarises the steps involved in writing MCQs. He makes three important general statements which briefly may be summarised as follows:-

Let objectives determine what you require to be tested. Have a panel rather than an individual write and review tests to avoid bias and to ensure testing covers a wide range of student abilities. Remember good tests require time to construct.

More detailed setting guidelines and considerations are given below.

1. Objectives:

  • What is the item intended to assess?
  • Is an objective test the best means of assessment?
  • What type of objective test should be used?

The answers to these questions are important. Because of the increasing use of computer marked tests, MCQs are becoming increasingly popular, particularly where large classes are involved.

In some cases, the use of MCQs may not be appropriate. For example, it may be difficult to find suitable distracters for a question. In other words, the material and abilities under test should determine the nature of the examination rather than fitting the examinable materials to a specific type of test.

2. Stem:

  • Is the problem clearly stated?
  • Is the problem clearly and concisely worded?

It is most important that the stem is clearly and unambiguously worded. The language used should be simple and clear. If these guidelines are not followed, the test item becomes one in which comprehension is tested as much as the material on which the question is based. The stem should also state clearly what is expected of the student before the options are read. This requirement would rule out use of the question "which of the following is correct?'.

Options

Options should be independent of one another, consistent in logic and grammar to the stem and the language used must be clear and simple. In particular, the examiner should avoid clues to the correct answer and clues which will allow a student to discard distracters even if little is known of the material under test. Poorly constructed items may allow a student to reduce the number of plausible distracters and hence increase the chance of guessing the correct answer. In particular, the person constructing the test should AVOID USING:

  • stereotyped or standard phrases which direct a student to the correct answer
  • unequal length of options where, in order to make the key unambiguously correct, the examiner makes it obviously longer than the distracters
  • mutually exclusive options where, if one option is incorrect, its opposite must be correct
  • overlapping alternatives when one or more options is a subset of another
  • grammatical inconsistencies where options may be disregarded because they are grammatically inconsistent with the stem
  • absolutes (never, always, only, none of the above, all of the above). These options are usually incorrect
  • the key in a different form from the distracters makes the correct response more conspicuous
  • double negatives - causes great confusion. Where negatives are used they should only be used in the stem and should be highlighted by printing in capitals or by underlining. "None of the above" should only be used when it is important to conceal a correct response which is easily recognised.

Testing the test

One of the advantages of using multiple-choice questions is the statistics on the tests which are easily obtainable (particularly when the tests are computer scored). The parameters which define the quality of the test item are discrimination (D) and facility (F).

Discrimination compares the number of correct responses to an item for the upper and lower 27% of the class (based on the total test score). If for any one item the number of correct responses from people who are in the lower 27% for the whole test is greater than the number from those in the top 27%, then the item may not be effectively discriminating between students.

Facility (F) is the percentage of the class obtaining the correct answer. In general:

  • if F < 30% the question is hard
  • if F = 30-75% the question is satisfactory
  • if F > 75% the question is easy

Whether or not items are acceptable in terms of difficulty is largely the judgement of the examiner. A hard item with a high discrimination may be useful in establishing a rank order of students. An easy item with a low discrimination is probably a give-away question and consideration might be given to using another form of assessment. Items with acceptable discrimination and facility may be stored for future use. In this way an item bank may be built up over time and questions selected randomly for tests.

The quality of the test as a whole is measured by its validity (v) and reliability (r). Validity for lecturer-developed tests is determined in the light of course content and course objectives to determine the degree to which the test, content and objectives coincide.

Reliability is, in effect, the probability that the same score on a test would be achieved, on repetition of the test, by the same or an equivalent group of students or when the test is scored by another examiner. Values of this statistic range from 0 to 1 and a value of 0.7 or greater is generally acceptable. Objective tests in general have a higher reliability than essays primarily because of marker objectivity. Reliability increases with length of test and if questions with high discrimination values are included. Clear statements of instructions and items will also increase reliability.

A note of caution: high reliability does not imply high validity. An examiner could consistently be testing other objectives than those believed to be under examination.

Guessing

Objective tests are often criticised because they encouraging guessing. Marking schemes which can take account of this assumption include:-

  • Raising of the pass mark: In a test of 100 questions with 5 options per question, random guessing should allow a score, on average, of 20. Hence the range of marks to be considered is from 20 to 100. The mid-point of this range (or pass mark) is 60, rather than 50. This method assumes that random guessing takes place and that a score of 60 represents a satisfactory attainment of course objectives. Both these assumptions may or may not be valid.
  • Deductions for guessing: A score of 1/(n-1) is deducted for each incorrect answer where n is the number of options per item. There is no penalty for an omitted answer. Assumptions are made when this method is applied. Firstly, it is assumed that incorrect answers are made on the basis of random guessing and that omissions are made only on the basis of insufficient information. Secondly, it is assumed that all options are equally attractive. Other considerations which may be taken into account when guessing corrections are used include:
    • the influence of personality and test-taking strategy on the scores of students - confident students may guess; cautious students may adopt "error avoidance" schemes;
    • the overflow of scoring to subtract from other areas, which means that the score does not vary directly with knowledge.

Obviously, the quality of the item writing will have a large bearing on the way students select options as being correct or otherwise. Where poor distracters are used students will be able to use logical deductions to assist them in their search for the correct answer rather than guessing. In this case one of the assumptions made in the use of guessing corrections does not apply.

References

Billing, D.E. (1973) Objective Tests in Tertiary Science Courses in D.E. Billing and B.F. Furniss (Ed.) Aims Methods and Assessment in Advanced Science Education, Heydon & Son, London, p.131-148

Billington, D.R. (1981) The uses and abuses of assessment in biochemistry education in C.F.A. Bryce (Ed.) Biochemical Education, Croom Helm, London, p.95-122.

Cannon, R.A. and Newble, D. (1983) A Handbook for Clinical Teachers, Lancaster, MTP Boston: p 97-105.

Dunstan, M. (1971) A Guide to the Planning Writing and Review of Multiple Choice Tests, Tertiary Education Research Centre, University of New South Wales

Gibbs, G., Habeshaw, S. and Habeshaw, T, (1986). 53 Interesting Ways to Assess Your Students. Technical and Educational Services: Bristol.

Gronlund, N.E. (1982) Constructing Achievement Tests, 3rd. Edition, Prentice Hall, p.36-70

Hudson, B. (1973) Assessment Techniques - An Introduction, Methuen, London, p.122-126

Lennox, B. (1974) Hints on Setting and Evaluation of Multiple Choice Questions of the One from Five Type, Association for the Study of Medical Education, Dundee, Scotland

Stratton, J.J. (1981) Recurrent Faults in Objective Test Items, Teaching at a Distance, 20, p.66-73

 

(Originally published in Trigwell, K. (1992). Information for UTS staff on Assessment. Sydney: UTS Working Party on Assessment).