Case Based Systems

“I have but one lamp by which my feet are guided, and that is the lamp of experience. I know no way of judging of the future but by the past.” —Patrick Henry

Background: Natural Language Processing

Chomsky, Linguistics, Syntax, Generative Grammar.
Semantics - discerning the meaning of language.
Schank: Conceptual Dependency - explicit representation of concepts, thoughts, ideas.
MARGIE system (Stanford)
- Conceptual Information Processing (1975)
- Parser (Riesbeck)
- Inferencer (Reiger)
- Language generator (Goldman)
Schank and Abelson (Yale):
- Scripts, Plans, Goals and Understanding (1977)
- SAM (Cullingford)
- PAM (Wilensky)
- FRUMP (deJong)
Schank: Dynamic Memory (1982)
- Semantic memory
  - Fido ISA dog
  - Dog ISA mammal
  - Mammals have hair.
- Episodic memory (Tulving, 1972)
  - What you had for lunch today.
  - Semantic memory does not explain how memories are learned.

Episodic memory receives and stores information about temporally dated episodes or events, and temporal-spatial relations among these events. A perceptual event can be stored in the episodic system solely in terms of its perceptible properties or attributes, and it is always stored in terms of its autobiographical reference to the already existing contents of the episodic memory store. The act of retrieval of information from the episodic store, in addition to making the retrieval contents accessible to inspection, also serves as a special type of input into episodic memory and thus changes the contents of the episodic memory store. (Tulving 1972, pp. 385–386)

By contrast, “semantic memory is the memory necessary for the use of language. It is mental thesaurus” (Tulving 1972, p. 386).

Conceptual Memory (Schank)

The distinction between semantic memory and episodic memory is a false one. We shall argue that what must be present is a lexical memory which contains all of the information about words, idioms, common expressions etc., and which links these to nodes in a conceptual memory, which is language free. We believe that it is semantic memory rather than episodic memory which is the misleading notion. Once we change semantic memory by separating out lexical memory, we are left with a set of associations and other relations between concepts that could only have been acquired by personal experience. We claim that conceptual memory, therefore, is episodic in nature. (Schank 1975b, pp. 255–256)

A key feature of Schank’s conceptual memory is the notion that information is derived from experience. Knowledge is not innate. A theory of memory must account for the acquisition of knowledge.

Scripts, Memory Organization Packets (MOPs) and Reminding

Schank and Abelson (1975, 1977) proposed knowledge structures for representing episodic information. Their primary knowledge structure was the script. Scripts accounted for information about stereotypical events, such as going to a restaurant, taking a bus, or visiting the dentist. In such common situations, a person has a set of expectations concerning the default setting, goals, props, and behaviors of the other people involved. Scripts are analogous to Minsky’s (1975) frames, which were proposed in the context of visual processing. It is important to note that scripts are directly related to autobiographical events. Scripts are inherently episodic in origin and use. That is, scripts arise from experience and are applied to understand new events.

Scripts were proposed as a knowledge structure for a conceptual memory. The acquisition of scripts is the result of repeated exposure to a given situation. For example, children learn the restaurant script by going to restaurants over and over again. As a psychological theory of memory, scripts suggested that people would remember an event in terms of its associated script. However, an experiment by Bower, Black, and Turner (1979) showed that subjects often confused events that had similar scripts. For example, a subject might mix up waiting room scenes from a visit to a doctor’s office with a visit to a dentist’s office.

Schank (1979, 1980, 1982) postulated a more general knowledge structure to account for the diverse and heterogeneous nature of episodic knowledge. This new structure was the memory organization packet (MOP). MOPs can be viewed as metascripts. For example, instead of a dentist script or a doctor script, there might be a professional-office-visit MOP that can be instantiated and specified for both the doctor and the dentist. This MOP would contain a generic waiting room scene, thus providing the basis for confusion between doctor and dentist episodes.

The steak and the haircut. X described how his wife would never make his steak as rare as he liked it. When this was told to Y, it reminded Y of a time, 30 years earlier, when he tried to get his hair cut in a short style in England, and the barber would not cut it as short as he wanted it.

The sand dollars and the drunk. X’s daughter was diving for sand dollars. X pointed out where there were a great many sand dollars, but X’s daughter continued to dive where she was. X asked why. She said that the water was shallower where she was diving. This reminded X of the joke about the drunk who was searching for his ring under the lamppost because the light was better there even though he had lost the ring elsewhere. (Schank 1982, p. 47)

Process Model I began with the intent of representing knowledge, thus deriving a theory of memory to account for episodic information. Scripts and MOPs were postulated as knowledge structures for representing experience. However, the knowledge structures provide only part of the answer. We must also specify the processes involved in acquiring and accessing these structures. We need a process model. In figure 1 (after Riesbeck and Bain [1987]), I present a flowchart that illustrates the basic process of case-based reasoning and learning. Boxes represent processes, and ovals represent knowledge structures. The process of interpreting and assimilating a new event breaks down into the following steps, starting with an input event, as shown at the top of the flowchart:

1. Assign Indexes: Features of the new event are assigned as indexes characterizing the event. For example, our first air shuttle flight might be characterized as an airplane flight.

2. Retrieve: The indexes are used to retrieve a similar past case from memory. The past case contains the prior solution. In our example, we might be reminded of a previous airplane trip.

3. Modify: The old solution is modified to conform to the new situation, resulting in a proposed solution. For our airplane case, we would make appropriate modifications to account for changes in various features such as destination, price, purpose of the trip, departure and arrival times, weather, and so on.

4. Test: The proposed solution is tried out. It either succeeds or fails. Our airplane reminding generates certain expectations, not all of which can be met.

5. Assign and Store: If the solution succeeds, then assign indexes and store a working solution. The successful plan is then incorporated into the case memory. For a typical airplane trip, there will be few expectation failures and, therefore, little to make this new trip memorable. It will be just one more instance of the airplane script.

6. Explain, Repair, and Test: If the solution fails, then explain the failure, repair the working solution, and test again. The explanation process identifies the source of the problem. The predictive features of the problem are incorporated into the indexing rules to anticipate this problem in the future. The failed plan is repaired to fix the problem, and the revised solution is then tested. For our air shuttle example, we realize that certain expectations fail. We learn that we do not get an assigned seat and that we do not have to pay ahead of time. We might decide that taking the air shuttle is more like riding on a train. We can then create a new case in memory to handle this new situation and identify predictive features so that we will be reminded of this episode the next time we take the shuttle.

In support of this process are the following types of knowledge structures, represented by ovals in the figure:

Indexing Rules: Indexing rules identify the predictive features in the input that provide appropriate indexes into the case memory. Determining the significant input features is a persistent problem (Schank, Collins, and Hunter 1986).

Case Memory: Case memory is the episodic memory, which comprises the database of experience.

Similarity Metrics: If more than one case is retrieved from episodic memory, the similarity metrics can be used to decide which case is more like the current situation. For example, in the air shuttle case, we might be reminded of both airplane rides and train rides. The similarity rules might initially suggest that we rely on the airplane case.

Modification Rules: No old case is going to be an exact match for a new situation. The old case must be modified to fit. We require knowledge about what kinds of factors can be changed and how to change them. For the airplane ride, it is acceptable to ride in a different seat, but it is usually not advisable to change roles from passenger to pilot.

Repair Rules: Once we identify and explain an expectation failure, we must try to alter our plan to fit the new situation. Again, we have rules for what kinds of changes are permissible. For the air shuttle, we recognize that paying for the ticket on the plane is an acceptable change. We can generate an explanation that recognizes an airplane ride as a type of commercial transaction and suggests that there are alternative acceptable means of paying for services

Psychological Issues The process model depicted in figure 1 is not meant to stipulate the necessary and sufficient conditions for simulating cognitive behavior. Rather, it illustrates a variety of salient issues in case-based reasoning. I can summarize the psychological assumptions of the case-based reasoning paradigm as follows:

First, memory is predominantly episodic. The primary content of memory is experience.

Second, memory is richly indexed. Experiences are related to each other in many complex and abstract ways.

Third, memory is dynamic. The organization and structure of memory changes over time.

Fourth, experience guides reasoning. We interpret and understand new situations in terms of prior experience. Fifth, learning is triggered by failure. When an expectation from a previous case fails to predict a new situation, we learn by incorporating the new episode into memory.

Similarly, we can present the research questions that arise from these respective assumptions: First, what makes up a case? What is the content and structure of an episode in memory? What is the relationship between episodic memory and other types of knowledge? How can we represent case memory?

Second, how is memory organized? What set of indexes is appropriate for classifying cases? What search algorithms complement the structure of memory? What are the indexing rules?

Third, how does memory change? What leads to forgetting? How does the memory of cases and stories degrade? How do the case memory and indexing rules change over time?

Fourth, how can we adapt old solutions to new problems? How can we recognize a new situation as similar to a previous episode? What are the similarity metrics and modification rules?

Fifth, what leads us to reject or accept a new case that is in conflict with a previous case? How do we explain the differences between episodes? How can we learn from mistakes? What are the repair rules?

It might seem that I present more questions than answers. However, my basic premise is that case-based reasoning provides a foundation for a broad range of research. It is appropriate and, indeed, desirable to stimulate research through the principled identification and examination of cognitive phenomena. I now review the history of case-based reasoning in AI research.

Computer Models

Many of the principles of case-based reasoning can be found in Sussman’s (1975) HACKER program. HACKER’s answer library was similar to a case memory, and its debugging process was analogous to plan repair. Furthermore, the underlying cognitive premise of the HACKER model was learning through experience, clearly at the heart of the case-based reasoning paradigm. The episodes for HACKER were restricted to computer programs rather than more general human experiences.

The first computer programs to use scripts were SAM (script applier mechanism) (Cullingford 1978) and FRUMP (fast reading understanding memory program) (DeJong 1979). These programs read newspaper stories and performed various language tasks, such as translation, summarization, and question answering. These programs contained static knowledge structures that were used in processing stories. The content of the programs’ memory did not change as a result of processing—in spite of the memory in FRUMP’s name.

These programs were a successful demonstration of the natural language processing of stories and of scripts as a knowledge structure. Understanding a story entailed processing an episode or event. Scripts provided a feasible means for representing such episodic knowledge. However, the programs failed to demonstrate knowledge acquisition. The scripts of SAM and FRUMP were innate, as it were, having been written by programmers. The programs used the scripts to guide the processing of stories, but the programs did not learn their scripts through experience.

Furthermore, the programs did not remember anything. SAM and FRUMP could read the same story 20 times in a row and not recognize that they previously saw this story. Clearly, a program that modeled human memory should remember its own experience.

Two programs that addressed the issue of memory organization for episodic knowledge were CYRUS and IPP. CYRUS (Kolodner 1980; Schank and Kolodner 1979; Kolodner 1984; Kolodner and Cullingford 1986) simulated an episodic memory of events relating to former Secretary of State Cyrus Vance. The program answered questions about a range of autobiographical episodes, such as meetings, diplomatic trips, and state dinners. CYRUS was the first program to model episodic storage and retrieval strategies. Although the focus of CYRUS was on memory organization and indexing, an attempt was made to integrate CYRUS with the FRUMP newswire program to provide an automatic update for CYRUS’s memory (Schank, Kolodner, and DeJong 1980). The combined system, Cyfr, read news stories about the secretary of state and integrated the events into CYRUS’s episodic memory.

IPP (Lebowitz 1980) was a prototype case based reasoning and learning program. IPP read news stories about terrorist acts, such as bombings, kidnappings, and shootings. The program started with generic knowledge about such acts and, after reading hundreds of stories, developed its own set of generalizations about terrorism that it could apply to understanding new stories. For example, IPP read the following two stories about terrorism by the Irish Republican Army (IRA) in Northern Ireland:

Story: XX1 (4 12 79) Northern-Ireland (Irish Republican Army guerrillas ambushed a military patrol in West Belfast yesterday killing one British soldier and badly wounding another Army headquarters reported)

Story: XX2 (11 11 79) Northern-Ireland (A suspected Irish Republican Army gunman killed a 50-year-old unarmed security guard in East Belfast early today the police said)

The program noticed that in each case, the victims were establishment, authority figures (policemen and soldiers) and that the terrorists were IRA members. IPP formed a generalization based on this similarity. It then read the following story about another shooting in Northern Ireland:

Story: XX3 (1 12 80) Northern-Ireland (A gunman shot and killed a part-time policeman at a soccer match Saturday and escaped through the crowd to a waiting getaway car ^comma^ police said.

Based on its prior experience, IPP inferred that the unidentified gunman in story XX3 is an IRA member. This inference might or might not be correct, but it demonstrates the ability to relate previous episodes to new situations.

Expert Systems: Rules versus Cases

The programs from the late 1970s that modeled episodic memory were largely natural language–processing programs. By this time, another topic of AI research had developed into a primary application area, namely, rulebased expert systems. Early programs such as DENDRAL (Buchanan, Sutherland, and Feigenbaum 1969) and MYCIN (Shortliffe 1976) demonstrated the possibility of simulating the problem-solving ability of human experts, such as chemists or physicians. The success of these and other programs stimulated interest in developing expert systems for a vast number of technical applications.

The basic unit of knowledge in these expert systems was the rule. A rule comprised a conditional test-action pair, for example, if condition, then action. Several hundred rules might be required for a typical diagnostic or repair task.

Building rule-based or production systems became a popular enterprise. As experience with expert systems increased, so did awareness of some basic shortcomings of the rulebased system paradigm.

The first problem was knowledge acquisition. To build an expert system, a computer programmer (or knowledge engineer) had to sit with the human expert to determine what rules were appropriate for the given domain. This knowledge was difficult to uncover. The human expert could not simply make a list of the hundreds of rules s/he used to solve problems. Often, the informant articulated a set of rules that in fact did not accurately reflect his(her) own problem-solving behavior. For these reasons, this difficult knowledge-acquisition process became known as a bottleneck in constructing rule-based expert systems (Hayes-Roth, Waterman, and Lenat 1983).

Second, the rule-based systems did not have a memory. That is, just as SAM and FRUMP would not remember news stories that they had already read, rule-based systems would not remember problems that they had already solved. For example, if a medical diagnosis program were presented with a patient with a certain set of symptoms, the program might have fired dozens or hundreds or thousands of rules to come up with a diagnosis or treatment. Subsequently, if the program were presented with another patient displaying the same set of symptoms, the program fired the same set of rules. The program did not remember having previously seen a similar patient. One might state that this observation is of little consequence beyond some argument for computational efficiency. However, efficiency can be a significant concern in many situations. Moreover, a program without a memory cannot remember its mistakes and, thus, is destined to repeat them. Thus, accuracy and efficiency are related problems for a system without a memory.

Third, rule-based systems were not robust. If a problem were presented to the system that did not match any of the rules, the program could not respond. The system’s knowledge base was limited to its rules, so if none of the rules could apply, the system had no alternatives. It was brittle.

We can compare the behavior of the rulebased expert system with the behavior of the human expert. First, the central feature of expertise is experience. An expert is someone who has vast, specialized experience; has witnessed numerous cases in the domain; and has generalized this experience to apply it to new situations. When confronted with a problem, the expert is reminded of previous, similar problems and their respective resolutions. It might be that the expert has so many exemplars for a given problem that the experiences have been distilled into a general rule to be applied. Still, this general rule has its roots in actual experience.

Thus, the human expert derives knowledge from experience. The basic unit of knowledge is not the rule but the case. Human experts acquire knowledge by assimilating new cases, either first hand or through reports from others. Furthermore, it is easier for people to articulate knowledge as experience than rules. This observation suggests the psychological hypothesis that expert knowledge might, in fact, primarily be encoded as episodes rather than rules. Contrast this acquisition of knowledge from experience with the knowledgeacquisition bottleneck given as the first problem of rule-based systems.

Second, human experts remember their own experience. The doctor who fails to effectively treat a case should remember this case when another patient presents the same symptoms. The doctor can learn from his(her) mistakes.

Third, human experts can reason by analogy. If our doctor sees a patient who presents symptoms that are unlike anything in his(her) experience, the doctor does not need to simply give up. The doctor might be reminded of various previous cases that were similar in one way or another and devise a treatment accordingly. Just as our first air shuttle trip might remind of us of both an airplane trip and a train ride, the doctor might be able to arrive at a composite diagnosis based on different earlier cases.

These arguments suggest an alternative to the rule-based system: a case-based system. An expert system that can extract information from its experience can grow and acquire knowledge on its own. This ability is crucial for the long-range success of the expert system concept in AI. The automated reasoning power can be applied to so many tasks that it is necessary to develop a mechanism that can directly assimilate new knowledge from experience.

The technology of case-based systems directly addresses problems found in rule-based systems: First is knowledge acquisition. The unit of knowledge is the case, not the rule. It is easier to articulate, examine, and evaluate cases than rules. Second is performance experience. A case-based system can remember its own performance and modify its behavior to avoid repeating prior mistakes. Third are adaptive solutions. By reasoning from analogy with past cases, a case-based system should be able to construct solutions to novel problems.

The scientific research issues previously given for case-based reasoning models also directly apply to the technological research issues for case-based systems. We must answer these same questions in building case-based systems:

First, what makes up a case? How can we represent case memory?

Second, how is memory organized? What are the indexing rules?

Third, how does memory change? How do the case memory and indexing rules change over time?

Fourth, how can we adapt old solutions to new problems? What are the similarity metrics and modification rules?

Fifth, how can we learn from mistakes? What are the repair rules?

At this point, the astute reader might ask why case-based systems use rules for indexing, modification, and repair because rules seem to be at the heart of so many problems with rule-based systems. There are two answers. First, the rules in case-based systems do not make up the primary knowledge base but, rather, independent support modules. Thus, there should be less complexity. However, the theory of case-based reasoning suggests that these rules would themselves be acquired by experience from cases through a recursive CYRUS and IPP can be viewed as prototypes for case-based systems. application of the case-based reasoning algorithm. That is, the system would derive rules for indexing, modification, and repair from cases and experience. Early case-based systems have not addressed this problem, but the basic paradigm suggests this approach.

The technology of case-based systems is an instantiation of the psychological theories of case-based reasoning. CYRUS and IPP can be viewed as prototypes for case-based systems. They began to address the fundamental questions of case representation and indexing previously posed. In the 1980s, researchers began explicitly to develop case-based systems.

Case-Based Systems

To see how the technology of case-based systems developed, we first look at Hammond’s (1984, 1986, 1989) CHEF program. Unlike CYRUS and IPP, CHEF is not based in a natural language-understanding task but instead focuses on planning. CHEF develops new plans based on its own experience in the domain of cooking. When faced with the task of preparing a dish for which it has no appropriate plan (recipe), CHEF modifies an existing plan to fit the new situation and then tries to detect and correct any errors that result. CHEF learns from its own mistakes.

CHEF demonstrates how episodic knowledge can be used to guide planning and avoid past failures. When presented with a problem— how to prepare a certain dish—the program is reminded of previous related recipes. It modifies the most similar previous recipe to fit the new requirements and then tries out the new recipe. CHEF tests the recipe through a simulation involving rules that specify the physical effects of each step of the cooking process. The results are then examined to see if they match the goals of the intended dish. If the program recognizes a failure, it then tries to analyze and explain the failure through a process of reasoning by asking questions. Finally, the program modifies the recipe in light of its explanation to correct the failure. This case-based planning process closely follows the flowchart in figure 1.

For example, when presented with the task of creating a strawberry souffle, CHEF resorts to modifying a vanilla souffle recipe. However, simply adding strawberries to the standard recipe keeps the souffle from rising properly. CHEF discovers the source of the problem in the excess liquid from the berries and decides that the best remedy is to add more whipped egg whites. This solution fixes the recipe. CHEF never repeats this mistake and can use this experience in other recipes, such as a raspberry souffle.

CHEF provides one set of answers for our cardinal research questions:

First, what makes up a case? For CHEF, cases are recipes—a particular set of plans. CHEF suggests the feasibility of looking at planning as case-based reasoning.

Second, how is memory organized? CHEF’s memory is indexed in many ways, including goals, plan failures, and plan interactions.

Third, how does memory change? CHEF learns from mistakes. When a failure occurs, CHEF must identify the source of the failure, fix the plan, and remember the result. CHEF tries not to make the same mistake twice.

Fourth, how can we adapt old solutions to new problems? CHEF can create new recipes by starting with old recipes. In selecting an old plan, CHEF tries to match as many of the new plan’s goals as possible. The modification to the recipe is driven first by a rudimentary knowledge of the ingredients and then by trial-and-error testing.

Fifth, how can we learn from mistakes? CHEF can identify errors when it observes that a recipe does not satisfy its intended goals. At this point, CHEF must find the source of the error and apply a set of strategies for modifying the plan. Many of CHEF’s modification strategies are domain specific, relating to the substitution or preparation of cooking ingredients.

As an ongoing research enterprise, case-based reasoning finds new ways of asking and addressing these questions.

References

Case Based Reasoning: A Research Paradigm. Slade 1991

http://www.aaai.org/ojs/index.php/aimagazine/article/view/883

An Introduction to Case-Based Reasoning. Kolodner, 1992.

http://alumni.media.mit.edu/~jorkin/generals/papers/Kolodner_case_based_reasoning.pdf