The evaluator effect

A chilling fact about usability evaluation methods

Morten Hertzum, Niels E. Jacobsen

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artefacts. However, cognitive walkthrough, heuristic evaluation, and thinking-aloud studies – three of the most widely used UEMs – suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of eleven studies of these three UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any two evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no one of the three UEMs is consistently better than the others. While evaluator effects of this magnitude may not be surprising for a UEM as informal as heuristic evaluation, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of cognitive walkthrough or observe users thinking out loud. Hence, it is highly questionable to use a thinking-aloud study with one evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterised by (1) vague goal analyses leading to variability in the task scenarios, (2) vague evaluation procedures leading to anchoring, and/or (3) vague problem criteria leading to anything being accepted as a usability problem. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.
OriginalsprogEngelsk
TidsskriftInternational Journal of Human-Computer Interaction
Vol/bind15
Udgave nummer1
Sider (fra-til)183-204
ISSN1044-7318
StatusUdgivet - 2003
Udgivet eksterntJa

Citer dette

@article{acc65fe052bd11dba4bc000ea68e967b,
title = "The evaluator effect: A chilling fact about usability evaluation methods",
abstract = "Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artefacts. However, cognitive walkthrough, heuristic evaluation, and thinking-aloud studies – three of the most widely used UEMs – suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of eleven studies of these three UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any two evaluators who have evaluated the same system using the same UEM ranges from 5{\%} to 65{\%}, and no one of the three UEMs is consistently better than the others. While evaluator effects of this magnitude may not be surprising for a UEM as informal as heuristic evaluation, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of cognitive walkthrough or observe users thinking out loud. Hence, it is highly questionable to use a thinking-aloud study with one evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterised by (1) vague goal analyses leading to variability in the task scenarios, (2) vague evaluation procedures leading to anchoring, and/or (3) vague problem criteria leading to anything being accepted as a usability problem. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.",
author = "Morten Hertzum and Jacobsen, {Niels E.}",
year = "2003",
language = "English",
volume = "15",
pages = "183--204",
journal = "International Journal of Human-Computer Interaction",
issn = "1044-7318",
publisher = "Taylor & Francis Inc.",
number = "1",

}

The evaluator effect : A chilling fact about usability evaluation methods. / Hertzum, Morten; Jacobsen, Niels E.

I: International Journal of Human-Computer Interaction, Bind 15, Nr. 1, 2003, s. 183-204.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - The evaluator effect

T2 - A chilling fact about usability evaluation methods

AU - Hertzum, Morten

AU - Jacobsen, Niels E.

PY - 2003

Y1 - 2003

N2 - Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artefacts. However, cognitive walkthrough, heuristic evaluation, and thinking-aloud studies – three of the most widely used UEMs – suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of eleven studies of these three UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any two evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no one of the three UEMs is consistently better than the others. While evaluator effects of this magnitude may not be surprising for a UEM as informal as heuristic evaluation, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of cognitive walkthrough or observe users thinking out loud. Hence, it is highly questionable to use a thinking-aloud study with one evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterised by (1) vague goal analyses leading to variability in the task scenarios, (2) vague evaluation procedures leading to anchoring, and/or (3) vague problem criteria leading to anything being accepted as a usability problem. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

AB - Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artefacts. However, cognitive walkthrough, heuristic evaluation, and thinking-aloud studies – three of the most widely used UEMs – suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of eleven studies of these three UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any two evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no one of the three UEMs is consistently better than the others. While evaluator effects of this magnitude may not be surprising for a UEM as informal as heuristic evaluation, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of cognitive walkthrough or observe users thinking out loud. Hence, it is highly questionable to use a thinking-aloud study with one evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterised by (1) vague goal analyses leading to variability in the task scenarios, (2) vague evaluation procedures leading to anchoring, and/or (3) vague problem criteria leading to anything being accepted as a usability problem. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

M3 - Journal article

VL - 15

SP - 183

EP - 204

JO - International Journal of Human-Computer Interaction

JF - International Journal of Human-Computer Interaction

SN - 1044-7318

IS - 1

ER -