Designing Agents as if People Mattered

Thomas Erickson

User Experience Architects' Office, Apple Computer

(now at) snowfall@acm.org

One of Apple Computer's buildings used to have an advanced energy management system. Among its many features was the ability to make sure lights were not left on when no one was around. It did this by automatically turning the lights off after a certain interval, during times when people weren't expected to be around. I overheard the following dialog between a father and his six year old daughter, one Saturday evening at Apple. The energy management system had just noticed that the lights were on during 'off hours,' and so it turned them off.

Daughter: Who turned out the lights?

Father: The computer turned off the lights.

Daughter: (pause) Did you turn off the lights?

Father: No, I told you, the computer turned off the lights.

(someone else manually turns the lights back on)

Daughter: Make the computer turn off the lights again!

Father: (with irony in his voice) It will in a few minutes.

I like this vignette. It illustrates a number of the themes we're going to be exploring in this chapter. It is evident that the child is struggling to understand what is going on. She clearly had a model of how the world worked: people initiate actions; computers don't. But the world didn't behave as expected. Even after double checking to make sure Dad really didn't turn off the lights, she still assumed that ultimately he was in control: surely he could make the computer turn off the lights again. In this, too, she was mistaken. One wonders how the little girl revised her model of the world to account for the apparently capricious, uncontrollable, but semi-predictable behavior of the computer. The computer as weather? The computer as demigod?

Just like the little girl, we all strive to make sense of our world. We move through life with sets of beliefs and expectations about how things work. We try to understand what is happening. We make up stories about how things work. We try to change things. We make predictions about what will happen next. The degree to which we succeed in doing these things is the degree to which we feel comfortable and in control of our world.

As the opening vignette illustrates, we have no guarantees that technology will behave in accordance with our expectations and wishes. We may suddenly find ourselves in the dark, wondering what on earth happened. The goal of this chapter is to explore ways of preventing this. The central theme is that we need to focus not just on inventing new technologies, not just on making them smarter, but on designing technologies so that they fit gracefully into our lives.

Agents are a case in point. As this volume illustrates, a lot of work is being directed at the development of agents. Researchers are exploring ways to make agents smarter, to allow them to learn by observing us, to make them appear more lifelike. However, relatively little work is being focused on how people might actually experience agents, and on how agents might be designed so that we feel comfortable with them.

What Does "Agent" Mean?

To begin, let's take a look at the concept of agent. "Agent" is the locus of considerable confusion. Much of this confusion is due to the fact that agent has two different meanings that are often conflated.

One way in which the word is used is to designate an autonomous or semi-autonomous computer program. An agent is a program that is, to some degree, capable of initiating actions, forming its own goals, constructing plans of action, communicating with other agents, and responding appropriately to events--all without being directly controlled by a human. This sense of agent implies the existence of particular functional capacities often referred to as intelligence, adaptivity, or responsiveness. To discuss agents in this sense of the term, I will use the phrase adaptive functionality .

The second meaning of agent is connected with what is portrayed to the user. Here, agent is used to describe programs which appear to have the characteristics of an animate being, often a human. This is what I will call the agent metaphor. The agent metaphor suggests a particular model of what the program is, how it relates to the user, and its capabilities and functions. Examples of the agent metaphor include the bow-tied human figure depicted in Apple's Knowledge Navigator video (Apple 1987), the digital butler envisioned by Negroponte (this volume), and the Personal Digital Parrot described by Ball and his colleagues (this volume).

Now, of course, these two meanings of agent often go together. A common scenario is that of a program which intercepts incoming communications and schedules meetings based on a set of rules derived from its understanding of its user's schedule, tasks, and responsibilities. Such a program might be portrayed using the metaphor of an electronic secretary, and would of course require adaptive functionality to learn and appropriately apply the rules. But it is important to recognize that the metaphor and functionality can be decoupled. The adaptive functionality that allows the 'agent' to perform its task need not be portrayed as a talking head or animated character: it could, for example, be presented as a smart, publicly accessible calendar. Thus, someone wanting to schedule a meeting could log on to it and directly schedule a meeting in an available slot. The rules would still be present, but rather than being portrayed through an agent which handled the scheduling, they would be reflected in which (if any) calendar slots were made available to the person seeking the meeting.

It is important to distinguish between these two meanings of agent because each gives rise to different problems. Adaptive functionality raises a number of design issues--as we saw in the opening vignette--that are independent of how it is portrayed to users. Programs that take initiative, attempt to act intelligently (sometimes failing), and change their behavior over time fall outside our range of experience with computer programs. Likewise, the agent metaphor has its own set of problems that are distinct from those caused by adaptive functionality. Portraying a program as a human or animal raises a variety of expectations that designers have not had to deal with in the past.

In this chapter we will explore the difficulties surrounding adaptive functionality, and the agent metaphor, respectively. In the first case, I describe the three basic problems that computer researchers and developers will have to address, regardless of whether or not they use an agent metaphor. In the second case, I discuss how people react to the agent metaphor, and consider the implications of these reactions for designing agents. Finally, we look beyond the surface of the agent metaphor and note that it suggests a very different conceptual model for human computer interfaces. This, in turn, has implications for when and how the agent metaphor should be used. Throughout the chapter, the ultimate concern is with how to design agents that interact gracefully with people. What good are agents? When should functionality--adaptive or not--be portrayed through the agent metaphor? What benefits does depicting something as an agent bring, and what sort of drawbacks? While there are no absolute answers, an understanding of some of the tradeoffs, as well as issues that require further research, can only aid us as we move into the future.

Adaptive Functionality: Three Design Issues

Whether our future is filled with agents or not, there is no question that there will be lots of adaptive functionality. Consider just a few of the things brewing in university and industry laboratories:

After observing its user performing the same set of actions over and over again, a computer system offers to produce a system-generated program to complete the task (Cypher 1991).
An adaptive phone book keeps track of which numbers are retrieved; it then uses that information to increase the accessibility of frequently retrieved numbers (Greenberg and Whitten 1985).
A "learning personal assistant" fits new appointments into the busy calendar of its user, according to rules inferred by observing previous scheduling behavior (Mitchell, et al. 1995).
A multi-user database notices that over time certain seemingly unrelated bibliographic records--call them X and Y--are frequently retrieved in the same search session. It uses that information to increase the probability that Y is retrieved whenever X is specified, and vice versa (Belew 1989).
A full text database allows its users to type in questions in plain English. It interprets the input, and returns a list of results ordered in terms of their relevance. Users can select an item, and tell it to 'find more like that one' (Dow Jones & Co. 1989).
A variety of recognition systems transform handwriting, speech, gestures, drawings, or other forms of human communication from fuzzy, analog representations into structured, digital representations.

In general, systems with adaptive functionality are doing three things:

noticing: trying to detect potentially relevant events
interpreting: trying to recognize the events (generally, this means mapping the external event into an element in the system's 'vocabulary') by applying a set of recognition rules
responding: acting on the interpreted events by using a set of action rules, either by taking some action that affects the user, or by altering their own rules (i.e. learning)

Thus, a speech recognition system tries to notice sounds that may correspond to words, tries to interpret each sound by matching it to a word in its vocabulary (using rules about phonetics and what the user is likely to be saying at the moment), and then responds by doing an action that corresponds to the word it recognized, reporting an error if it couldn't interpret the word, or adjusting its recognition rules if it is being trained.

Such adaptive functionality holds great promise for making computer systems more responsive, personal, and proactive. However, while such functionality is necessary for enhancing our systems, it is not sufficient. Adaptive functionality does no good if it is not, or can not be used; it may do harm if it confuses its users, interferes with their work practices, or has unanticipated effects.

Notice that there are many chances for adaptive functionality to fail. The system may fail to notice a relevant event (or may mistakenly notice an irrelevant event). It may misinterpret an event that has been noticed. Or it may respond incorrectly to an event that it has correctly noticed and interpreted (that is, the system does everything right, but the rules that it has for responding to the event don't match what the user expects). These failures are important to consider because they have a big impact on the user's experience. Let's take a closer look at some of the design issues which are raised by adaptive functionality.

Understanding: What Happened and Why?

Consider an intelligent tutoring system that is teaching introductory physics to a teenager. Suppose the system notices that the student learns best when information is presented as diagrams, and adapts its presentation appropriately. But even as the system is watching for events, interpreting them, and adjusting its actions, so is the student watching the system, and trying to interpret what the system is doing. Suppose that after a while the student notices that the presentation consists of diagrams rather than equations: it is likely that the student will wonder why: 'Does the system think I'm stupid? If I start to do better will it present me with equations again?' There is no guarantee that the student's interpretations will correspond with the system's. How can such potentially negative misunderstandings on the user's part be minimized?

Control: How can I Change It?

If the system makes an error--either because it has failed in notification or interpretation, or because its actions are not in line with the users wishes or expectations--what should the design response be? In most circumstances the user ought to be given a way to take control of the system and to undo what the adaptive functionality has wrought. But how is this to be done?

The problem is not simply one of providing an undo capability. That works well for today's graphic user interfaces where users initiate all actions and the "undo" command can be invoked when a mistake is made. However, with adaptive functionality, the difficulty is that the user did not initiate the action. This leads to several problems.

First, since the user didn't initiate the change, it may not be clear how to undo it. Thus, the student who wants the teaching system to continue presenting equations will have no idea what to do, or even where to look, to make the system return to its earlier behavior. This is complicated by the fact that it may take the user a while to notice that the system has changed in an undesirable way, and so clues about what actually happened have vanished.

Second, there may be a mismatch between the user's description of what has happened, and the system's description of its action. What the user notices may only be a side effect of the system's action. Users may need assistance in discovering what the relevant action was in the first place, and it is an open question of whether the system will be able to provide it. If the tutoring system shifted to content which just happened to consist of diagrams, a student searching for a way to modify the style of presentation may be baffled. If the energy management system describes its action as shutting off a particular power subsystem, a user searching for a way to control the lights on the fourth floor may have difficulty.

All of this presupposes that the users understand that the system can be controlled in the first place. What kind of model of the system is necessary to make this clear? It would be important for the model to not only indicate what aspects of the system can be controlled, but to provide an obvious representation and set of methods for exercising control.

Prediction: Will it Do What I Expect?

Prediction goes hand in hand with understanding and trying to control what is happening. Let's take a close look at an actual example of adaptive functionality, found in a program called DowQuest (Dow Jones and Co. 1989). DowQuest is a commercially available system with a basic, command line interface, but very sophisticated functionality. It provides access to the full text of the last 6 to 12 months of over 350 news sources, and permits users to retrieve information via relevance feedback (Stanfill and Kahle 1986).

Rather than using a sophisticated query language, DowQuest allows users to type in a sentence (e.g. 'Tell me about the eruption of the Alaskan volcano'), get a list of articles, and then say--in essence--'find more like that one.' Figures 1 and 2 show two phases of the process of constructing a query. In Figure 1 the user has entered a question and pressed return. DowQuest does not try to interpret the meaning of the question; in the example shown, the system will drop out the words "tell," "me," "about," "the," and "of," and use the lower frequency words to search the database. Next the system returns the titles of the 16 most 'relevant' articles, where relevance is defined by a sophisticated statistical algorithm based on a variety of features over which the user has no control (and often no knowledge). While this list frequently contains articles relevant to the user's question, it also usually contains items which appear--to the user--to be irrelevant. At this point, the user has the option of reading the articles retrieved or continuing to the second phase of the query process.

tell me about the eruption of the alaskan volcano

DOWQUEST STARTER LIST HEADLINE PAGE 1 OF 4

1 OCS: BILL SEEKS TO IMPOSE BROAD LIMITS ON INTERIOR . . .

INSIDE ENERGY, 11/27/98 (935 words)

2 Alaska Volcano Spews Ash, Causes Tremors

DOW JONES NEWS SERVICE , 01/09/90 (241)

3 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . .

AVIATION WEEK and SPACE TECHNOLOGY, 01/01/90 (742)

4 Volcanic Explosions Stall Air Traffic in Anchorage

WASHINGTON POST: A SECTION, 01/04/90 (679)

* * * * *

Figure 1. The first phase of DowQuest interaction: the user types in a 'natural language' query and the system searches the database using the non-'noise words' in the query and returns a list of titles of the 'most relevant' articles.

In phase 2 of the process (figure 2) the user tells the system which articles are good examples of what is wanted. The user may specify an entire article or may open an article and specify particular paragraphs within it. The system takes the full text of the selections, drops out the high frequency noise words, and uses a limited number of the most informative words for use in the new query. It then returns a new list of the 16 'most relevant' items. This second, relevance feedback phase may be repeated as many times as desired.

search 2 4 3

DOWQUEST SECOND SEARCH HEADLINE PAGE 1 OF 4

1 Air Transport: Volcanic Ash Cloud Shuts Down All Four . . .

AVIATION WEEK and SPACE TECHNOLOGY, 01/01/90 (742 words)

2 Alaska Volcano Spews Ash, Causes Tremors

DOW JONES NEWS SERVICE , 01/09/90 (241)

3 Volcanic Explosions Stall Air Traffic in Anchorage

WASHINGTON POST: A SECTION, 01/04/90 (679)

4 Alaska's Redoubt Volcano Gushes Ash, Possibly Lava

DOW JONES NEWS SERVICE , 01/03/90 (364)

* * * * *

Figure 2. The second phase of DowQuest interaction: the user instructs the database to find more articles like 2, 3 and 4, and the system returns a new set of relevant articles. (Note that the first three, 'most relevant' articles are those that were fed back (an article is most 'like' itself); the fourth article is new.

New users generally had high expectations of DowQuest: it seemed quite intelligent. However, their understanding of what the system was doing was quite different from what the system was actually doing. The system appeared to understand plain English; but in reality it made no effort to understand the question that was typed in--it just used a statistical algorithm. Similarly, the system appeared to be able to 'find more items like this one;' but again, it had no understanding of what an item was like--it just used statistics. These differences were important because they led to expectations that could not be met.

Users' expectations were usually dashed when, in response to the first phase of the first query, DowQuest returned a list of articles containing many obviously irrelevant items. When this happened some users concluded that the system was 'no good,' and never tried it again. While reactions like this may seem hasty and extreme, they are not uncharacteristic of busy people who do not love technology for its own sake. Furthermore, such a reaction is perfectly appropriate in the case of conventional application: a spreadsheet that adds incorrectly should be rejected. Users who had expected DowQuest to be intelligent could plainly see that it was not. They did not see it as a semi-intelligent system that they had control over, and that would do better as they worked with it. This was quite ironic, as the second stage of the process, relevance feedback, was the most powerful and helpful aspect of the system.

Only a few users gave up after the first phase. However, efforts to understand what was going on, and to predict what would happen continued to influence their behavior. In the second phase of a DowQuest query, when users requested the system to retrieve more articles 'like that one,' the resulting list of articles was ordered by 'relevance.' While no computer scientist would be surprised to find that an article is most relevant to itself, some ordinary users lacked this insight: when they looked at the new list of articles and discovered that the first, most relevant article was the one they had used as an example, they assumed that there was nothing else relevant available and did not inspect the rest of the list. Obviously, a system with any intelligence at all would not show them articles that they had already seen if it had anything new.

DowQuest is a very compelling system. It holds out the promise of freeing users from having to grapple with arcane query languages. But, as is usually the case with adaptive functionality, it doesn't work perfectly. Here we've seen how users have tried to understand how the system works (it's smart!), and how their expectations have shaped their use of the system.

How can designers address these problems? One approach is to provide users with a more accurate model of what is going on. Malone, Grant, and Lai (this volume), advocate this sort of approach, with their dictum of 'glass boxes, not black boxes,' suggesting that agents rules be made visible to and modifiable by users. This is certainly a valid approach, but it is not likely to always work. After all, the statistical algorithm which computes the 'relevance' of stories is sufficiently complex that describing it would probably be futile, if not counterproductive, and allowing users to tinker with its parameters would probably lead to disaster. In the case of DowQuest, perhaps the aim should not be to give users an accurate picture of what is going on. One approach might be to encourage users to accept results that seem to be of low quality, so that they will use the system long enough to benefit from its sophistication. Another approach might be to construct a 'fictional' model of what the system is doing, something that will set up the right expectations, but without exposing them to the full complexity of the system's behavior. See Erickson and Salomon (1991) and Erickson (1995) for a discussion other issues in this task domain, and a glimpse of one type of design solution.

Understanding how to portray a system which exhibits partially intelligent behavior is a general problem. Few will dispute that, for the foreseeable future, intelligent systems will fall short of the breadth and flexibility which characterize human-level intelligence. But how can the semi-intelligence of computer systems be portrayed? People have little if any experience with systems which are extremely (or even just somewhat) intelligent in one narrow domain, and utterly stupid in another, so appropriate metaphors or analogies are not easy to find. Excellent performance in one domain or instance is likely to lead to expectations of similar performance everywhere. How can these expectations be controlled?

The Agent Metaphor: Reactions and Expectations

In this section we turn to the agent metaphor and the expectations it raises. Why should adaptive functionality be portrayed as an agent? What is gained by having a character appear on the screen, whether it be a bow-tied human visage, an animated animal character, or just a provocatively named dialog box? Is it somehow easier or more natural to have a back and forth dialog with an agent than to fill in a form that elicits the same information? Most discussions that advance the cause of agents focus on the adaptive functionality that they promise--however, as we've already argued, adaptive functionality need not be embodied in the agent metaphor. So let's turn to the question of what good are agents as ways of portraying functionality? When designers decide to invoke the agent metaphor, what benefits and costs does it bring with it?

First it must be acknowledged that in spite of the popularity of the agent metaphor there is remarkably little research on how people react to agents. The vast bulk of work has been focused either on the development of adaptive functionality itself, or on issues having to do with making agents appear more lifelike: how to animate them, how to make them better conversants, and so on. In this section, we'll look at three strands of research that shed some light on the experience of interacting with agents.

Guides

The Guides project involved the design of an interface to a CD ROM based encyclopedia (Salomon, Oren, and Kreitman 1989; Oren, et al. 1990). The intent of the design was to encourage students to explore the contents of the encyclopedia. The designers wanted to create a halfway point between directed searching and random browsing by providing a set of travel guides, each of which was biased towards a particular type of information.

The interface used stereotypic characters such as a settler woman, an Indian, and an inventor (the CD-ROM subset of the encyclopedia covered early American history). The guides were represented by icons that depicted the guide's role--no attempt was made to reify the guide, either by giving it a realistic looking picture or by providing information such as a name or personal history. As users browsed though stories in the encyclopedia, each guide would create a list of articles that were related to the article being looked at and were in line with its interests. When clicked on, the guide would display its 'suggestions.' Thus, if the user were reading an article about the gold rush, the Indian guide might suggest articles about treaty violations, whereas the inventor guide might suggest an article about machines for extracting gold.

The system was implemented and was then tested on high school students. The students had a variety of reactions. They tended to assume that the guides, which were presented as stock characters, embodied particular characters. For example, since many of the articles in the encyclopedia were biographies, users would assume that the first biography suggested by a guide was its own. If the inventor guide first suggested an article on Samuel Morse, users often assumed that Morse was now their guide. Students also wondered if they were seeing the article from the guide's point of view (they weren't). And they sometimes assumed that guides had specific reasons for suggesting each story and wanted to know what they were (in line with users' general wish to understand what adaptive functionality is actually doing).

In some cases the students also became emotionally engaged with the guides. Oren, et al. (1990) report some interesting examples of this: " the preacher guide brought one student to the Illinois history article and she could not figure out why. The student actually got angry and did not want to continue with the guide. She felt the guide had betrayed her." While anecdotes of users getting angry with their machines are common, stories about users getting angry with one interface component are much less so. In another case, a bug in the software caused the guide to disappear. Oren, et al., write: "One student interpreted this as ' the guide got mad, he disappeared.' He wanted to know ' if I go back and take his next choice, will he come back and stay with me?'" Here the tables are turned. The user infers that the guide is angry. While no controlled experiment is available, it is hard to believe that the user would have made such an inference if the suggested articles been presented in a floating window that had vanished.

While this evidence is anecdotal, it is nevertheless interesting and relevant. Here we again see users engaged in the effort to understand, control, and predict the consequences of adaptive functionality. What is particularly interesting is how these efforts are shaped by the agent metaphor. The students are trying to understand the guides by particularizing them, and thinking about their points of view. One student wants to control his guide (the one that 'got mad and disappeared') by being more agreeable, suspecting that the guide will come back if his recommendations are followed. All of this happens in spite of the rudimentary level of the guides' portrayals.

Computers as Social Actors

Nass, and his colleagues at Stanford, have carried out an extensive research program on the tendency of people to use their knowledge of people and social rules to make judgments about computers. Two aspects of their results are interesting in relation to the agent metaphor. First, they show that very small cues can trigger people's readiness to apply social rules to computers. For example, simply having a computer use a human voice is sufficient to cause people to apply social rules to the computer (Nass and Steuer 1993). This suggests that the agent metaphor may be invoked very easily--human visages with animated facial expressions, and so forth, are not necessary. This is in accord with the finding from the Guides study, in which stereotypic pictures and role labels triggered attributions of individual points of views and emotional behavior. The second aspect of interest is the finding that people do, indeed, apply social rules when making judgments about machines.

Let's look at an example. One social rule is that if person B praises person A, a third person will perceive the praise as more meaningful and valid than if person A praises himself. Nass, Steuer, and Tauber (1994) designed an experiment to show that this social rule holds when A and B are replaced with computers. The experiment went something like this (it has been considerably simplified for expository purposes):

in part 1, a person went through a computer-based tutorial on a topic
in part 2, the person was given a computer-based test on the material covered
in part 3, the computer critiqued the effectiveness of the tutorial in part 1.

The experimental manipulation was that in one condition, parts 1, 2, and 3 were all done on computer A (i.e. computer A praised itself), whereas in the second condition computer A was used for giving the tutorial and computer B was used to give the test and critique the tutorial (i.e., B praised A). Afterwards, the human participants in the study were asked to critique the tutorial themselves. The result was that their ratings were much more favorable when computer B had praised A's tutorial, than when computer A had praised itself. That is, they were more influenced by B's praise of A than by A's praise of itself.

The finding that people are willing apply their social heuristics to computers is surprising, particularly since the cues that trigger the application of the social rules are so minimal. In the above experiment, the only cue was voice. There was no attempt to portray the tutorial as an agent or personal learning assistant. No animation, no picture, no verbal invocation of a teacher role, just a voice that read out a fact each time the user clicked a button. This finding appears to be quite general. Nass and colleagues are engaged in showing that a wide variety of social rules are applied to computers given the presence of certain cues: to date, these range from rules about politeness, to gender biases, to attributions about expertise (Nass, Steuer, and Tauber 1994; Nass and Steuer 1993).

While this research is important and interesting, there is a tendency to take it a bit too far. The finding that people apply social rules to interpret the behavior of computers is sometimes generalized to the claim that individuals' interactions with computers are fundamentally social (e.g., Nass, Steuer, and Tauber 1994; Ball, et al., this volume). I think that this is incorrect. It is one thing for people to apply social heuristics to machines; it is quite another to assume that this amounts to social interaction, or to suggest that the ability to support social interaction between humans and machines is now within reach. Interaction is a two way street: just as people act on and respond to computers, so computers act on and respond to people. Interaction is a partnership. But social interaction relies on deep knowledge, complex chains of inferences and subtle patterns of actions and responses on the part of all participants (see, for example, Goffman 1967). Computers lack the knowledge, the inferential ability, and the subtlety of perception and response necessary to be even marginally competent social partners. Does this mean that this research should be disregarded? Certainly not. If anything, the willingness of people to apply social rules to entities that can't hold up their end of an anticipated social interaction raises more problems for designers.

Faces

Thus far we have looked at cases where rather minimal portrayals of agents have evoked surprising reactions. For an interesting contrast, let's move to the other end of the spectrum and examine work on extremely realistic portrayals of agents.

One of the more famous examples of a highly realistic agent is "Phil", an agent played by a human actor in the Knowledge Navigator video tape (Apple Computer 1987). During the video, Phil interacts via natural language, and uses vocal inflection, direction of gaze, and facial expressions to support the interaction. While, as noted in the previous section, the intelligence and subtlety necessary to support such interaction is far beyond the capacities of today's software and hardware, it is possible to create portrayals of agents which synchronize lip movements with their speech and make limited use of gaze and facial expression (e.g. Walker, Sproull, and Subramani 1994; Takeuchi and Taketo 1995).

Walker, Sproull, and Subramani (1994) report on a controlled study of human responses to two versions of a synthesized talking face that was used to administer a questionnaire. One group simply filled in a textual questionnaire presented on the computer. Two other groups listened while synthesized talking faces (a different one for each group) read a question, and then typed their answer on the computer. Compared to people who simply filled out the questionnaire, those who answered the questions delivered by the synthesized faces spent more time, wrote more comments, and made fewer errors. People who interacted with the faces seemed more engaged by the experience.

Of particular interest was the difference between people's responses to the two synthesized faces. The faces differed only in their expression: one face was stern, the other was more neutral. Although the difference in expression was extremely subtle--the only difference was that the inner portion of the eyebrows were pulled inward and downward--it did make a difference. People who answered questions delivered by the stern face spent more time, wrote more comments, and made fewer errors. Interestingly enough, they also liked the experience and the face less.
Is the Agent Metaphor Worth the Trouble?

So far it looks like the agent metaphor is more trouble than its worth. Designers who use the agent metaphor have to worry about new issues like emotion and point of view and politeness and other social rules and--if they put a realistic face on the screen--whether people like the face's expression! Perhaps the agent metaphor should be avoided.

I think there are several reasons not to give up on agents. First, it is too soon to give up on the agent metaphor. The difficulties noted above are problems for designers--not necessarily for users. They may very well be solvable. We simply don't know enough about how people react to agents. Far more research is needed on how people experience agents. Second, the research by Nass and his colleagues suggests that we may not have much of a choice. Very simple cues like voice may be sufficient to invoke the agents metaphor. Perhaps our only choice is to try to control expectations, to modulate the degree to which the agent metaphor is manifested. It's not clear. The third reason is that I believe the agent metaphor brings some clear advantages with it.

The Agent Conceptual Model

We've discussed the two meanings of agent--adaptive functionality and the agent metaphor--and some of the new problems they raise. In this section I want to look below the surface of the agent metaphor at its most fundamental characteristics. The agent metaphor brings with it a new conceptual model, one that is quite different from that which underlies today's graphic user interfaces. It is at this level that the agent metaphor has the most to offer. To begin with, let's look at the conceptual model that underlies today's interfaces, and then we'll consider the agent conceptual model in relation to it.

The Object-Action Conceptual Model
Today's graphic interfaces use a variety of different metaphors. The canonical example is the desktop metaphor, in which common interface components such as folders, documents, and the trash can, can be laid out on the computer screen in a manner analogous to laying items out on a desktop. However, I don't think the details of the metaphors--folders, trash cans, etc.--are what is most important. Rather, it is the conceptual model that underlies them.

The underlying conceptual model of today's graphical user interfaces has to do with objects and actions. That is, graphic user interface elements are portrayed as objects on which particular actions may be done. The power of this object-action conceptual model is rooted in the fact that users know many things about objects Some of the general knowledge that is most relevant to the objects found in graphic user interfaces includes the following:

objects are visible
objects are passive
objects have locations
objects may contain things

This knowledge translates into general expectations. An object has a particular appearance. Objects may be moved from one location to another. Because objects are passive, if users wish to move them, they must do so themselves. Objects that contain things may be opened, their contents inspected or changed, and then closed again.

Graphic user interfaces succeed in being easy to use because these expectations are usually met by any component of the interface. When users encounter an object--even if they have absolutely no idea what it is--they know that it is likely that they can move it, open it, and close it. Furthermore, they know that clicking and dragging will move or stretch the object, and that double clicking will open it. They know that if they open it up and find text or graphics inside it, they will be able to edit the contents in familiar ways, and close it in the usual way. Because this general knowledge is applicable to anything users see in the interface, they will always be able to experiment with any new object they encounter, regardless of whether they recognize it.

The Agent Conceptual Model

The agent metaphor is based on a conceptual model that is different from the object-action conceptual model. Rather than passive objects that are acted upon, the agent metaphor's basic components (agents, of course) have a degree of animacy and thus can respond to events. We'll call this the responsive agent conceptual model.

Consider some of the general knowledge people have about agents:

agents can notice things
agents can carry out actions
agents can know things
agents can go places

This knowledge translates into expectations for agents that differ from those for objects. Since agents can notice things and carry out actions, in contrast to inanimate objects where these attributes don't apply, the responsive agent conceptual model is well suited to representing aspects of a system which respond to events. The sorts of things an agent might notice, and the ways in which it might respond, are a function of its particular portrayal.

Another basic difference is that while objects can contain things, agents know things, and, as a corollary, can learn things. Thus, the agent conceptual model is suitable for representing systems which acquire, contain, and manage knowledge. What sort of things are agents expected to learn or know? That depends on the way in which the agent is portrayed. To paraphrase Laurel (1990), one might expect an agent portrayed as a dog to fetch the electronic newspaper, but one would not expect it to have a point of view on its contents. A 'stupid' agent might only know a few simple things that it is taught, and might be unable to offer explanations for its actions beyond citing its rules; a more intelligent agent might be able to learn by example, and construct rationales for its actions. Note that more intelligence or knowledge is not necessarily better: what is important is the match between the agent's abilities and the user's expectations. Ironically, the agent metaphor may be particularly useful not because agents can represent intelligence, but because agents can represent very low levels of intelligence.

Another difference between object-action and agent conceptual model is that agents can go places. Users expect objects to stay where they're put; agents, on the other hand, are capable of moving about. Where can agents go? That depends both on the particular portrayal of the agent, as well as on the spatial metaphor of the interface. At the very least, an agent is well suited for representing a process that can log onto a remote computer, retrieve information, and download it to its user's machine. Another consequence of an agent's ability to go places is that it need not be visible to be useful or active. The agent may be present 'off stage,' able to be summoned by the user when interaction is required, but able to carry out its instructions in the background.
Objects and Agents

These arguments about the differences between the object and agent conceptual models could be ignored. After all, interface components ignore many properties of the real things on which they are based. For example, 'Folder objects' in graphic user interfaces can be deeply nested, one inside another inside another inside another, unlike their real world counterparts. Yet in spite of this departure from our knowledge of the real world objects, it works well. Perhaps we could simply integrated adaptive functionality into what were formerly passive, unintelligent objects. It's easy to conceive of an interface folder that is 'smart,' or that can 'notice' particular kinds of documents and 'grab' them, or that can 'migrate' from one a desktop machine to a portable when it is time to go home. However, the drawback of such a design tack is that it undermines the object-action conceptual model. If that tack were pursued, users wouldn't know as much about what they seen on the screen. If they encounter a new object, what will it do? Perhaps it will just sit there, or perhaps it will wake up and do something. Perhaps double clicking will open it, or perhaps double clicking will start it running around, doing things.

I believe that there is much to be said for maintaining the separation between the object and agent conceptual models. It becomes a nice way of dividing up the computational world. That is, objects and agents can be used in the same interface, but they are clearly distinguished from one another. Objects stay what they are: nice, safe, predictable things that just sit there and hold things. Agents become the repositories for adaptive functionality. They can notice things, use rules to interpret them, and take actions based on their interpretations. Ideally, a few consistent methods can be defined to provide the users with the knowledge and control they need. That is, just as there are consistent ways of moving, opening, and closing objects, so can there be consistent ways of finding out what an agent will notice, what actions it will carry out, what it knows, and where it is. Such methods get us a good deal of the way to providing users with the understanding, control, and prediction they need when interacting with adaptive systems.

There is a risk of over emphasizing the importance of metaphors and conceptual models. Normally, people are not aware of the conceptual model, the metaphor, or even individual components of the interface. Rather, they are absorbed in their work, accomplishing their actions with the kind of unreflective flow that characterizes expert performance. It is only when there are problems--the lights go out, the search agent brings back worthless material, the encyclopedia guide vanishes--that we begin to reflect and analyze and diagnose.

But this is why metaphors and conceptual models are particularly important for adaptive functionality. For the foreseeable future, it will fall short of perfection. After all, even humans make errors doing these sorts of tasks, and adaptive functionality is immeasurably distant from human competence. As a consequence, systems will adapt imperfectly, initiate actions when they ought not, and act in ways that seem far from intelligent.

Concluding Remarks

In this chapter we've explored a number of problems that are important to consider when designing agents. First we noted that there are two distinct senses of agent: the metaphor that is presented to the user, and the adaptive functionality that underlies it. Each gives rise to particular problems. The agent metaphor brings a number of expectations that are new to user interface design. And adaptive functionality raises a number of other issues that are independent of how the functionality is portrayed.

The chief challenge in designing agents, or any other portrayal of adaptive systems, is to minimize the impact of errors and to enable people to step in and set things right as easily and naturally as possible. We've discussed two approaches to this. One is to make sure that adaptive systems are designed to enable users to understand what they're doing, and predict and control what they may do in the future. Here we've suggested that the agent conceptual model may provide a good starting point, providing general mechanisms for accessing and controlling agents. Second, since the agent metaphor can create a wide variety of expectations, we need to learn more about how portrayals of agents shape users' expectations and then use that knowledge to adjust (which usually means lower) people's expectations. Research which focuses on the portrayal of adaptive functionality, rather than on the functionality itself, is a crucial need if we wish to design agents that interact gracefully with their users.

Acknowledgments

Gitta Salomon contributed to the analysis of the DowQuest system. A number of the findings about the use of DowQuest are from an unpublished manuscript by Meier, et al. (1990), carried out as project for a Cognitive Engineering class under the supervision of Don Norman, with Salomon and myself as outside advisors. The paper benefited from the comments of Stephanie Houde, Gitta Salomon, and three anonymous reviewers.

References

Apple Computer. 1987. The Knowledge Navigator. (Videotape)

Belew, R. K. 1989. Adaptive Information Retrieval.: Using a Connectionist Representation to Retrieve and Learn about Documents. In Proceedings of SIGIR . Cambridge, MA: ACM Press, pp 11-20.

Cypher, A. 1991. EAGER: Programming Repetitive Tasks by Example. Human Factors in Computing Systems: the Proceedings of CHI '91 , pp 33 39. New York: ACM Press.

Dow Jones and Company, Inc. 1989. Dow Jones News/Retrieval User's Guide.

Erickson, T. 1996. "Feedback and Portrayal in Human Computer Interface Design." Dialogue and Instruction , eds. R. J. Beun, M. Baker and M. Reiner. Heidelberg: Springer-Verlag, in press, 1996.

Erickson, T., and Salomon, G. 1991. Designing a Desktop Information System: Observations and Issues. Human Factors in Computing Systems: the Proceedings of CHI '91 . New York: ACM Press.

Goffman, E. 1967. Interaction Ritual. New York: Anchor Books.

Greenberg, S., and Whitten, I. 1985. Adaptive Personalized Interfaces--A Question of Viability. Behavior and Information Technology , 4(1): 31 45.

Laurel, B. 1990, Interface Agents: Metaphors with Character. The Art of Human-Computer Interface Design , ed. B. Laurel. Addison Wesley, pp 355 365.

Meier, E.; Minjarez, F.; Page, P.; Robertson, M.; and Roggenstroh, E. Personal communication, 1990.

Mitchell, T.; Caruana, R.; Freitag, D.; McDermott, J.; and Zabowski, D. 1995. Experience with a Learning Personal Assistant. Communications of the ACM , 37(7): 1-91.

Nass, C., and Steuer, J. 1993. Anthropomorphism, Agency, and Ethopoeia: Computers as Social Actors. Human Communication Research, 19 (4): 504-527.

Nass, C.; Steuer, J; and Tauber, E. R. 1994. Using a Human Face in an Interface. Human Factors in Computing Systems: CHI '94 Conference Proceedings . New York: ACM Press.

Oren T.; Salomon, G.; Kreitman K.; and Don, A. 1990, Gui des: Characterizing the Interface. The Art of Human-Computer Interface Design , ed. B. Laurel. Addison Wesley, pp. 367 381.

Salomon, G.; Oren, T.; and Kreitman. K. 1989. Using Guides to Explore Multimedia Databases. The Proceedings of the Twenty-Second Annual Hawaii International Conference on System Science .

Stanfill, C., and Kahle, B. 1986. Parallel Free-text Search on the Connection Machine System. Communications of the ACM , 29(12,): 1229 1239.

Takeuchi, A.; and Taketo, N. 1995. Situated Facial Displays: Towards Social Interaction. Human Factors in Computing Systems: CHI '95 Conference Proceedings . New York: ACM Press.

Walker, J.; Sproull, L.; and Subramani, R. 1994. Computers are Social Actors. Human Factors in Computing Systems: CHI '94 Conference Proceedings . New York: ACM Press.