Summary9

From EQUIS Lab Wiki

Jump to: navigation, search

two displays evaluated 1) a display made of bus tickets on strings to show how far a bus was from the bus stop 2) a light that tells you how light or dark it is outside

goal of the study was to obtain feedback of heuristics based on nielson's but modified to be more applicable to ambient displays.

they modified nielsons heuristics, ommitting those that were not relevant and added 5 heuristics of their own relevant to ambient displays, then gave out a pilot survey with the modified heuristics and further refined the modified heuristics.

results of survey : average relevance rating was high for each heuristic, none lower then 4.2 (out of 5 point likert scale)

A Likert scale (pronounced 'lick-ert') is a type of psychometric scale often used in questionnaires. It asks respondents to specify their level of agreement to each of a list of statements. It was named after Rensis Likert, who invented the scale in 1932.


evaluating their heuristics to neilsons

sixteen participants with a median of 5 years evaluating experience, split into 2 groups of 8

used a b/w subjects design where group one was asked to evaluate the busmobile and daylight display using nielsons and group 2 using the modified.

each participant completed the evaluation individually and spent as much time as necessary.

RESULT:


A total of 26 issues are known for the daylight display, 24 of which were found in the heuristic evaluation. Thirty-nine issues are known for the bus display, 35 of which were found in the heuristic evaluation. Two of the missed issues were severe, while four were not severe. The order in which displays were evaluated had no significant effect on the severity or number of issues found.

Hypothesis 1: The number of issues found in the ambient condition will be greater and the issues will be more severe than those found in the Nielsen condition.

tested this hypothesis using a univariate analysis of variance test, comparing the average number of cases and average severity of each issue found with each heuristic set and display (busMobile and daylight display) as factors. By cases, we mean that if the same issue was found multiple times by different evaluators, we count each finding. There was no statistically significant difference for any factor or combination of factors in the average number or severity of issues found by evaluators.


Hypothesis 2: The percentage of known issues found in the ambient condition will be higher than the percentage of known issues found in the Nielsen condition.

the percentage of issues found by all of the evaluators in the ambient condition, combined, is higher than the percentage of issues found by all of the evaluators in the Nielsen’s condition, combined. Visual inspection shows that three to five novice evaluators find 40-60% of problems using the ambient heuristics, a result consistant with Nielsen’s original work [18]. Graphs of two other randomly chosen orderings of evaluators showed similar results.

Hypothesis 3:The ambient heuristics will be more useful to evaluators than Nielsen’s heuristics. A heuristic that finds many, severe problems is more useful than a heuristic that finds fewer problems with lower severity.


Figure 4 shows the correlation between the average number of issues found across all evaluators with each heuristic (including both the Nielsen and ambient heuristics) and the average severity of issues found with each heuristics. There is one outlier (a heuristic for which an average of 2 issues were found, with an average severity of 4), visible in the upper left. When that outlier was removed, a strong correlation proving our hypothesis was found (Pearson’s r=.657, p<.004). Visual inspection shows that the ambient heuristics tend to be on the upper right of the plot. This is important because it means evaluators using the most useful ambient heuristics are finding a greater number of problems, and problems of greater severity than evaluators using the most useful Nielsen’s heuristics.

We also looked at the proportion of major (severity >= 4) issues and compared that to the proportion of minor (severity <= 2) issues found by each evaluator. On average, evaluators using the ambient heuristics found significantly more major issues (22%) than minor issues (12%) (p<.05). Evaluators using Nielsen’s heuristics found only 11-13% of both major and minor issues.