Task-Centered User Interface Design
A Practical Introduction
by Clayton Lewis and John Rieman
Copyright ©1993, 1994: Please see the "shareware notice" at the front of the book.
Contents | Foreword | ProcessUsers&Tasks | Design | Inspections | User-testing | Tools |Documentation |

5.1 Choosing Users to Test
5.2 Selecting Tasks for Testing
5.3 Providing a System for Test Users to Use
5.4 Deciding What Data to Collect
5.5 The Thinking Aloud Method
        5.5.1 Instructions
        5.5.2 The Role of the Observer
        5.5.3 Recording
        5.5.4 Summarizing the Data
        5.5.5 Using the Results
5.6 Measuring Bottom-Line Usability
        5.6.1 Analyzing the Bottom-Line Numbers
        5.6.2 Comparing Two Design Alternatives
5.7 Details of Setting Up a Usability Study
        5.7.1 Choosing the Order of Test Tasks
        5.7.2 Training Test Users
        5.7.3 The Pilot Study
        5.7.4 What If Someone Doesn't Complete a Task?
        5.7.5 Keeping Variability Down
        5.7.6 Debriefing Test Users


5.6.2 Comparing Two Design Alternatives


If you are using bottom-line measurements to compare two design alternatives, the same considerations apply as for a single design, and then some. Your ability to draw a firm conclusion will depend on how variable your numbers are, as well as how many test users you use. But then you need some way to compare the numbers you get for one design with the numbers from the others.


The simplest approach to use is called a BETWEEN-GROUPS EXPERIMENT. You use two groups of test users, one of which uses version A of the system and the other version B. What you want to know is whether the typical value for version A is likely to differ from the typical value for version B, and by how much. Here's a cookbook procedure for this.


   1. Using parts of the cookbook method above, compute the means for the two groups separately. Also compute their standard deviations. Call the results ma, mb, sa, sb. You'll also need to have na and nb, the number of test users in each group (usually you'll try to make these the same, but they don't have to be.)
   2. Combine sa and sb to get an estimate of how variable the whole scene is, by computing
      s = sqrt( ( na*(sa**2) + nb*(sb**2) ) / (na + nb - 2) )
      ("*" represents multiplication; "sa**2" means "sa squared").
   3. Compute a combined standard error:
      se = s * sqrt(1/na + 1/nb)
   4. Your range of typical values for the difference between version A and version B is now:
      ma - mb plus-or-minus 2*se

Another approach you might consider is a WITHIN-GROUPS EXPERIMENT. Here you use only one group of test users and you get each of them to use both versions of the system. This brings with it some headaches. You obviously can't use the same tasks for the two versions, since doing a task the second time would be different from doing it the first time, and you have to worry about who uses which system first, because there might be some advantage or disadvantage in being the system someone tries first. There are ways around these problems, but they aren't simple. They work best for very simple tasks about which there isn't much to learn. You might want to use this approach if you were comparing two low-level interaction techniques, for example. You can learn more about the within-groups approach from any standard text on experimental psychology (check your local college library or bookstore).

HyperTopic: Don't Push Your Statistics Too Far.

The cookbook procedures we've described are broadly useful, but they rest on some technical assumptions for complete validity. Both procedures assume that your numbers are drawn from what is called a "normal distribution," and the comparison procedure assumes that the data from both groups come from distributions with the same standard deviation. But experience has shown that these methods work reasonably well even if these assumptions are violated to some extent. You'll do OK if you use them for broad guidance in interpreting your data, but don't get drawn into arguments about exact values.


As we said before, statistics is a big topic. We don't recommend that you need an MS in statistics as an interface designer, but it's a fascinating subject and you won't regret knowing more about it than we've discussed here. If you want to be an interface researcher, rather than a working stiff, then that MS really would come in handy.



Copyright © 1993,1994 Lewis & Rieman
Contents | Foreword | ProcessUsers&Tasks | Design | Inspections | User-testing | Tools |Documentation |