Summary9
From EQUIS Lab Wiki
two displays evaluated 1) a display made of bus tickets on strings to show how far a bus was from the bus stop 2) a light that tells you how light or dark it is outside
goal of the study was to obtain feedback of heuristics based on nielson's but modified to be more applicable to ambient displays.
they modified nielsons heuristics, ommitting those that were not relevant and added 5 heuristics of their own relevant to ambient displays, then gave out a pilot survey with the modified heuristics and further refined the modified heuristics.
results of survey : average relevance rating was high for each heuristic, none lower then 4.2 (out of 5 point likert scale)
A Likert scale (pronounced 'lick-ert') is a type of psychometric scale often used in questionnaires. It asks respondents to specify their level of agreement to each of a list of statements. It was named after Rensis Likert, who invented the scale in 1932.
evaluating their heuristics to neilsons
sixteen participants with a median of 5 years evaluating experience, split into 2 groups of 8
used a b/w subjects design where group one was asked to evaluate the busmobile and daylight display using nielsons and group 2 using the modified.
each participant completed the evaluation individually and spent as much time as necessary.
RESULT:
A total of 26 issues are known for the daylight display,
24 of which were found in the heuristic evaluation.
Thirty-nine issues are known for the bus display, 35 of
which were found in the heuristic evaluation. Two of
the missed issues were severe, while four were not severe.
The order in which displays were evaluated had
no significant effect on the severity or number of issues
found.
Hypothesis 1: The number of issues found in the ambient condition will be greater and the issues will be more severe than those found in the Nielsen condition.
tested this hypothesis using a univariate analysis of variance test, comparing the average number of cases and average severity of each issue found with each heuristic set and display (busMobile and daylight display) as factors. By cases, we mean that if the same issue was found multiple times by different evaluators, we count each finding. There was no statistically significant difference for any factor or combination of factors in the average number or severity of issues found by evaluators.
Hypothesis 2: The percentage of known issues found in
the ambient condition will be higher than the percentage
of known issues found in the Nielsen condition.
the percentage of issues found by all of the evaluators in the ambient condition, combined, is higher than the percentage of issues found by all of the evaluators in the Nielsen’s condition, combined. Visual inspection shows that three to five novice evaluators find 40-60% of problems using the ambient heuristics, a result consistant with Nielsen’s original work [18]. Graphs of two other randomly chosen orderings of evaluators showed similar results.
Hypothesis 3:The ambient heuristics will be more useful to evaluators than Nielsen’s heuristics. A heuristic that finds many, severe problems is more useful than a heuristic that finds fewer problems with lower severity.
Figure 4 shows the correlation between the average
number of issues found across all evaluators with
each heuristic (including both the Nielsen and ambient
heuristics) and the average severity of issues found with
each heuristics. There is one outlier (a heuristic for
which an average of 2 issues were found, with an average
severity of 4), visible in the upper left. When that
outlier was removed, a strong correlation proving our
hypothesis was found (Pearson’s r=.657, p<.004). Visual
inspection shows that the ambient heuristics tend
to be on the upper right of the plot. This is important
because it means evaluators using the most useful ambient
heuristics are finding a greater number of problems,
and problems of greater severity than evaluators using
the most useful Nielsen’s heuristics.
We also looked at the proportion of major (severity >= 4) issues and compared that to the proportion of minor (severity <= 2) issues found by each evaluator. On average, evaluators using the ambient heuristics found significantly more major issues (22%) than minor issues (12%) (p<.05). Evaluators using Nielsen’s heuristics found only 11-13% of both major and minor issues.