Writing questions that check for more than recall of information

Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف
عدد الردود: 13

My most difficult challenge has been to create questions that ask students to apply, analyze and evaluate information based on the content of a course.  Over time, I have collected a number of techniques that have made this type of question writing easier.

Recently, I find it difficult to convince my experts that such questions should be placed in assessments.  They feel that questions that ask students to apply and analyze should not be included because students are not used to them and will complain.  They argue that students only expect questions about recall of information and dislike anything else.

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

I would see the problem here being poor specification of the learning outcomes. If the learning outcomes say that all the learner has to do is to be able to recognise or recall content, then these questions would be OK. However they are the two lowest levels on Bloom's taxonomy and I would question whether that learning is worthwhile in all but minor learning situations. Looking up a handout or electronic file (or performance support system) would meet the needs.

The assessment must test the learning outcomes (or competencies). If these are specified at higher levels, then the assessment needs to also be at higher levels. This is where I would have my discussion with the subject experts. If they agree on an appropriate level for the learning outcomes, the assessment follows naturally.

A very nice parody, "How (not) to… Write Multiple-Choice Questions" by Cathy Moore is avaiulable here:

http://blog.cathy-moore.com/2007/08/can-you-answer-these-6-questions-about-multiple-choice-questions/

I have also attached it as a PPT presentation, bringing in the distractors one at a time.

Unfortunately I won't be able to attend the online session because:

a. it starts at midnight in Melbourne.

b. I'm not interested

c. none of the above

d. all of the above

cheers

Ian Bell

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

Great points Ian!

The learning outcomes/objectives specify performances at the higher levels of Bloom's taxonomy.  Yet experts feel that those questions are too challenging and may frustrate learners.

We'll miss your input during the live session, but I encourage you to rest and not stay up unitl midnight.

Thanks,
Tsvet 

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - Patrick Parrish

Hi Ian,

I would not let the surface of a multiple choice question fool you into thinking they can't evaluate deeper levels beyone recognition, as Tsvet's presentation showed. If it was so, then perhaps cloud identification is low level as well, since in that task a cloud is presented, and a person just has to go through the official list of cloud types she has memorized and choose the correct answer. Of course, much more is going on, and so it is with many multiple choice questions. 

Here are just a few other interesting resources you can find free on the Web, but also any educational evaluation textbook will provide plenty of examples of deeper level objective questions:

http://theelearningcoach.com/elearning_design/multiple-choice-questions/

http://jfmueller.faculty.noctrl.edu/toolbox/tests/morethanfacts.htm

http://www.learningsolutionsmag.com/articles/804/writing-multiple-choice-questions-for-higher-level-thinking

Pat

رداً على Patrick Parrish

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

Hi Pat and others,

I had previously read two of your references but I am still not convinced that we should use multiple choice questions. Connie Malomed is an excellent writer but has said that multiple choice questions are the hardest thing to write.

The Learning Solutions article says, "On the other hand, Bloom’s top two levels – Synthesis and Evaluation – being divergent thinking, are best tested with fill-in or essay questions since a predetermined correct answer does not exist" and " In my research for this article, I was surprised by the number of poorly written multiple-choice questions I found while randomly searching for ideas among online multiple-choice tests."

I agree that multiple choice questions CAN be used to test deeper level learning but I generally think that they should NOT be used. Let me explain.

Firstly, this is an example of a high level question: "Here is a lot of data ... Would you expect tornadoes to occur in the forecast area during the forecast time? a) Yes b) No." This is high level but would not tell us whether someone can really forecast tornadoes. Clearly this wouldn't be a satisfactory question. If we had a 50-50 result would that mean that half the people can do it and the other half have misconceptions or that they all guessed?

We could make the question stem and distractors more complex but this introduces new problems.

Tsvet's presentation was very good with plenty of interaction. However, it didn't convince me. There were a small number of synchronous participants who all appeared to be experienced trainers. However, for the deeper questions there was considerable variation in the answers.

I don't believe that this was due to the fact that people weren't competent in terms of the content but that it was an artifact of the assessment technique, ie, multiple choice questions. I'm sure that if we all sat around discussing the issues there would be much more convergence than showed in the answers.

How did this occur? Aulikki has already said that some of her answers were different because of her level of English. Even for native English speakers my impression is that multiple choice questions are often difficult to understand. They require a high linguistic level from both the writer of the questions and the person trying to interpret them. Further, they often require a high level of logic to separate the possible answers. That is usually inherent in the idea of "distractors". They need to appear similar.

This is why the cloud identification module is an example of multiple choice questions that works. The question is the same at all times: "What cloud type is this image?" There is no dependence on language and logic and so no room for misinterpretation . In fact they question is implied rather than stated (if I recall correctly) and has been used widely by people who don't speak English. Also it's a visual test.

Let's return now to the linguistic and logic issues and what it is we are trying to assess. Clearly we need to assess against the learning objectives and/or competencies. If the assessment depends on other skills then I would question its validity.

Our challenge is to make our assessment authentic. (Yes, that word keeps coming up, because it is important.) Does the assessment convince us as trainers that our students are competent to perform the tasks they will be required to do on the job? They will not be required to answer multiple choice questions.

I can see a possible, limited role for multiple choice questions in formative assessment but my criticisms still hold in terms of linguistics and logic.

What do others think?

Cheers

Ian Bell

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

Thank you all for sharing your thoughts and engaging the topic!

As you might imagine, I'd also like to share some thoughts but I am currently working under a project deadline and will be able to get to the forum later today or early next week.

Thank you all!

Tsvet 

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

Ian, Pat and Aulikki,

Thank you for posting your reflections on the use of the multiple choice question.  Ian posed some excellent questions and I have tried to answer them.  So, Ian here we go.

Let me start by saying that as a non-native English speaker assessed primarily through the use of essays, I detested multiple choice questions.  I first encountered them on college entrance tests like the TOEFL and the SAT.  Yet even then I acknowledged to myself that if I knew the answer to the question I did not find them confusing.  Since then I have learned a great deal about the use of the multiple choice questions through formal courses and a lot of experience.  I find myself now offering sessions at various training events on how to craft good multiple choice questions that assess the higher level learning objectives.

One of the first things I learned from my professors, who held a joint appointment at a University and a company designing college entrance exams, was that one should never base a decision about a person’s future on written exams alone.  Written exams are one piece of the picture of a person’s competence and should be treated with great caution.

You identified several of the reasons for this caution - the authenticity of assessment, the difficulty of writing good questions, and the fact that even a well-written question can be misinterpreted by a very capable learner.  These are true for the entire field of written assessment, they are applicable to all types of text-based questions.  These issues are not unique to the multiple choice question.

I think that every poster to this forum is in agreement that the most authentic assessment is direct observation of the learner performing the actual task.  The closer to reality the test is, the better.  There is no form of written question that will get close to the authenticity of reality.  For performance based tasks, we can focus on the products (final outcome) examine the processes (actions that lead to the product) for creating them as well.  But before we all go towards direct observations we need to consider these questions:
How will we grade the learner performance?

  • checklists (did they do all the tasks and did they do them in the right sequence)
  • rating scales (was the performance poor, fair, good, or excellent)
  • rubrics (a more descriptive way to characterize the quality of the student’s work)


How can we ensure that the grader was objective, did not miss elements of the performance, has a good handle on what constitutes poor, or excellent performance?

  • we can record the exam and review it again but how much more time does that entail?

If we have multiple raters, how do we ensure inter-rater reliability?

What about essay questions?  I turned your example into one: “Here is a lot of data ... Please write an essay explaining whether you expect tornadoes to occur in the forecast area during the forecast time?"  Sure, we will gain more information from a question like this than we will from a multiple choice question, but in reality, a forecaster will not write an essay when forecasting a tornado.  The essay question does not call forth the desired performance either.  An essay question like this also lacks fidelity because we cannot give the learners all the data that they can access in their offices, and we do not wish them to talk to a colleague when writing the essay.

Similarly to a multiple choice question, the question above will not reveal the process that the user went through to determine the answer.  In order to get to that, we’ll need to write a better question. For instance:

Review the data above and determine if a tornado will occur in the forecast area during the forecast time.  Write out a complete justification for your forecast using the data provided, and describe the sequence of steps that you used in arriving at your conclusion.  Explain why you performed the steps in the order that you did.


Perhaps this illustrates the point that good essay questions that call forth the desired performance are not easy to write.  As a grader of essay questions, I have seen plenty of examples of poorly written questions, as well as plenty of answers to a good essay question that were still misinterpreted by the learner.  How do we grade those answers?  They contain sound, well-structured, logical arguments, but somewhat tangential to the topic.

We can ask in general, how do we grade essay questions?  Should we base our judgment against an ideal answer?  If so, we’ll need to look for the structure of the answer, the sequence of steps, the completeness of the justification.  This means that we need to have written at least a “model answer” outline (better yet a complete model answer) in order to be fair in our grading.  And what about the human tendency to rate a “good” answer as only “fair” if it was preceded by an “excellent” answer?  How about inter-rater reliability and objectivity? We can control these issues to some extent, but not completely.  So even after we have put in all this effort and time on creating a good essay question, we are still uncertain whether the user can perform what they have written out.

Furthermore, all written question types will be susceptible to linguistic problems, especially for second language speakers.

A good essay question and a good multiple choice question (all question types for that matter) are difficult to write.  I am sure you realize that the example of a multiple choice question that you offered is a True/False question written in multiple-choice format:  

Here is a lot of data ... Would you expect tornadoes to occur in the forecast area during the forecast time?

a) Yes

b) No


To assess such a complex task, we’ll need a series of multiple choice questions. The series may begin with a question like this:

Review this data.  What is the probability that a tornado will occur in the forecast area at this forecast time and why?

a. 10% … for these reasons

b. 30% … for these reasons

c. 60% … for these reasons

d. 90% … for these reasons


The series could continue with: "Where in the forecast area would you expect the tornado to occur?" or “When would you issue your tornado warning?”

You suggested that the spread of answers on some of the example questions in my presentation is an artifact of the multiple choice question form: “I don't believe that this [diversity of answers] was due to the fact that people weren't competent in terms of the content but that it was an artifact of the assessment technique, ie, multiple choice questions. I'm sure that if we all sat around discussing the issues there would be much more convergence than showed in the answers.”

I would like to offer a few more ideas as to why the presentation participants evaluated the example questions with such divergent perspectives. After a previous offering of this presentation, the participants (an international group of future WMO trainers) and I discussed their experience of the presentation and its example questions. Their evaluation of the examples, too, was diverse; their explanations for their answers, however, were not related to the multiple choice format itself, but rather, to their experience with assessment and the unique circumstances of their practice.

For example, for the example question about a snowstorm during rush hour, some participants explained that this was the first time they have seen a multiple choice question used for anything other than simple facts: in other words, they were not accustomed to evaluating multiple choice questions for higher level learning objectives. Another participant suggested that the observational network in his country was not extensive, and could not be used to provide the ground truth as the question suggested. Another pointed out their models do not include simulated reflectivity; yet another indicated that he has never thought about how forecasters decide if a model has a good handle on reality.  

Thus, as you can see, there are more reasons for the spread among participants’ evaluation of the example questions than simply artifacts of the multiple choice format.  

While I will continue to use these example questions with U.S. audiences, I will look for other examples for international audiences. The diversity of experience and practice that a global audience brings to the table make it impossible for a single question, multiple choice or otherwise, to be universally evaluable, given the world community’s variety of meteorological tools and forecasting methods.

You also mentioned that multiple choice questions require “high linguistic level” from both the writer and the person taking the question.  I agree and would add the the same is true for essay questions as well.  Second language speakers may find answering essay questions even more taxing than answering multiple choice ones.

A common misconception about multiple choice questions is that they need to be convoluted and difficult to understand in order to function well.  In fact, the opposite is true: a well-written multiple choice question reduces the learner’s cognitive load by being clear and precise in its question and alternatives.  In this way, only lack of knowledge of the content prevents a learner from selecting the correct answer.  Often times, trainers think that their first or second draft of a question is ready for use.  As I pointed out in my presentation, we need to make more drafts and test them with actual users before we put the questions into operation in an active assessment.  I would also suggest that the same is true for all question types, including essay questions.

In a well-written multiple choice question, every distractor is designed to address a common misunderstanding, a confusing aspect, or something that most learners find difficult to apply.  If a learner selects the wrong answer, the instructor can identify an area in which the student needs to improve.  In order for a multiple choice question to address a higher-level learning objective, it needs to have a clearly defined problem, be based on real-life situations, and have well-written options that test for the common misconceptions about the topics.  With a multiple choice question rather than an essay question, we can also be a bit more confident that human subjectivity did not influence our grading approach.

As Pat mentioned, multiple choice questions are an efficient and effective way of measuring if learning has occurred for all levels of learning objectives in a variety of situations.  They do not suffer some of the grading problems of other types of questions.  They have also been successfully used for decades in college entrance tests diagnosing the ability of learners to succeed in academia.  I hope that you can see that the issues you identified as problematic for multiple choice questions -- the authenticity of assessment, the difficulty of writing good questions, and the fact that even a well-written question can be misinterpreted by a very capable learner -- are in fact problematic for the entire field of text-based assessment

With all this in mind, I would still prefer that pilots, medical staff, and nuclear power plant operators -- as well as personnel in the many other fields in which the consequences of mistakes can be catastrophic -- have passed a direct observation test with flying colors.  Text-based assessment can give us only a crude approximation of someone’s competency, at best.

Cheers,
Tsvet

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - Patrick Parrish

Tsvet,

Your reply is a valuable contribution to this conference. Thanks for the time it took to put together such a well thought out summary, response, and extension of the discussion.

I would like to reiterate just a couple things, and I hope this is in the same spirit as your posting. I didn't mean previously to dismiss the concerns about the problematic nature of MC question, but to argue for their continued utility despite concerns. I think the concerns are there for any assessment, not just languaged-based evaluations, as you point out. The concerns just shift.

- I don't believe, and I don't think others do either, that MC questions on their own should be used to determine competency. But they can help broaden the accumulation of evidence for competency.

- MC questions CAN be written to evaluate higher levels of learning. You and others have demonstrated that well. However, it is very hard to say for certain WHICH level they measure. This is probably due to some vagueness inherent in categorization of levels of cognitive processing more than any issues with MC questions.

- Since complex cognitive tasks involve many levels of mental processing, including recalling knowledge, discrimination, analysis and determination of the salient features of a situation, MC questions are "authentic" to this extent. They do not measure success in completing whole tasks, but they can capture success at completing parts of tasks, which is a highly valuable contribution to evaluation.

- Writing good, valid MC questions is hard, just like creating and implementing any other valid assessment.

- We should not throw out a useful tool like MC questions, but use them wisely and with good measure. Competency assessments might draw from them, but competency means just that--doing the task. (For this reason, simulations don't measure competency either...just to stir up a discussion in another part of this conference.)

We thank you Tsvet for bringing this discussion forward, and just because the website says the session ends today doesn't mean we can't keep talking about this. I am sure there is more to say.

Pat

رداً على Patrick Parrish

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

Pat,

Thank you for your kind words and insightful comment!  Our posts are indeed in the same spirit and I heartily agree with all your points.

Thank you for the opportunity to share this small peice of the competency assessment puzzle.

Tsvet

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - مستخدم محذوف

I think that your experts are too prejudiged. Young people adapt to new ways, and might even think they are more interesting. Old ways may be liked because they demand the least amount of thinking, but actually, it is boring!

رداً على مستخدم محذوف

Re: Writing questions that check for more than recall of information

بواسطة - Vesa Nietosvaara

I am slowly getting warmed up by this multiple-choice forum, thanks to Ians and others' recent postings , and thanks to the guilt I feel for not having yet done my homework.

I wanted to test one MCQ Quiz I found from COMET page, and I worked on recent ASMET module ; I am supposed to be managing ASMET, so its better to know what we are doing.

So I went here...

https://www.meted.ucar.edu/training_module.php?id=921

and did the Quiz , without reading the module.

My result:

"Congratulations!
You scored 18 of 19 points, or 95%.
A score of 70% is required to pass this quiz.
Your certificate of completion for this quiz is now available."

Either I am very good in hydrology or the Quiz was too merciful. But looking back at the questions there I am starting to see the difficulty in writing effective questions.

Its time to listen up Tsvets playback.


رداً على Vesa Nietosvaara

Re: Writing questions that check for more than recall of information

بواسطة - Patrick Parrish

We all said writing good questions is hard. :) But remember, Vesa, you have a pretty solid science and meterology background to draw upon. The tests in those modules are designed to show if someone knows the important points that came up, not to fail people. If they already know much of it, or can figure it out because they have a good background, they will pass easily.

Being hard is not a reason to not do something. Training is hard, and assessment is particularly hard. Creating authentic assessments is REALLY hard. And if they end up being about a situation that the person has experienced in some way before, they might get a good result without really understanding everything you are hoping they can transfer into new situations.

Multiple choice questions are a tool that can be used for a great many things, and should not be dismissed due to a stereotype that they can't measure higher level knowledge. That is I think Tsvet's key point. They are NOT the solution to every assessment.

Multiple choice questions are easy to administer, can be computer graded, and are fast, unlike other forms of assessment.

If you want a rich assessment, use a simulation, but accompany it by a more simple to administer multiple choice test that gets at underlying knowledge and skills so you get broader picture of a person's knowledge. Just in case they get lucky with the simulation.

Use mutliple choice quizzes during instruction to quickly measure if people are getting it. Then teach a different way if what you were doing wasn't working.

Use multiple choice tests for placement tests, for standardized tests that require some objectivity.

Use well written, higher order multiple choice tests for self-paced online learning that must  be graded by computer.

So, to summarize, when is it good to use multiple choice tests for assessment?

a. Never

b. Always

c. When the situation calls for them

d. Wombats

Please, lets not be purists. Lets use the tools we've got effectively.

Pat