Why a computational model?

It is not always easy to figure out what a psychological model can or cannot explain, just based on a list of its assumptions.  For example, for a long time psychologists took for granted that any model assuming the dimensional overlap hypothesis could not explain the S-S consistency effects found in 2-to-1 mapping tasks.  However, it turns out that this was based on the implicit assumption of a static stimulus identification process.  If a dimensional overlap model instead assumes flexible stimulus identification, it can account for these data.  Because the response selection and dimensional overlap models were evaluated with the implicit assumption of static stimulus identification, it was difficult to evaluate and compare them clearly and accurately.

Moreover, when a model contains a large number of assumptions, trying to account for effects that interact with one another, reading through arguments about the model’s explanations and predictions can also be confusing.  A good example of this might be the argument that dimensional overlap models can account for the electrophysiological data in S-S consistency tasks.  This argument introduces a number of ideas about processing, such as the notion that stimulus identification can make a “mistake” that can later be corrected, that are not intrinsic to dimensional overlap models.  It would not be surprising or unexpected for the reader to want to take a step back, and wonder:  Does this model really fulfill the requirements of a dimensional overlap model?  Can it actually account for the data that it is purported to account for?  Or, is there a slip in the logic somewhere that is being glossed over through vague words and convoluted argumentation?

One way of addressing these problems in to develop psychological models that can be implemented as computational algorithms.  This approach is very natural, in fact, for anyone taking the information processing approach to understanding how the mind works.  If mental life is all about how information is represented by mental codes and transformed by mental processes, then it has a natural analog in a computer program, which represents information with variables and data structures and transforms that information with functions and procedures.  By taking assumptions about mental codes and mental processes, and implementing them as a program with variables and functions, we create a very concrete and unambiguous form of psychological model.

The first and most obvious advantage to doing this is that a computer program cannot be vague, and must be completely explicit about all of its assumptions.  This not only keeps things honest, but also allows for a more concrete direct comparison of different models: for example, if two computational models are completely identical except for one parameter, and there is a difference in the predictions of the two models, then it is clear that the difference in performance must be due to that one parameter.

Moreover, just in case slick words and convoluted logical arguments can leave you wondering, “Can this model really explain that result?”, a computational model can provide a compelling existence proof.  That is, a computer program is one of the most compelling demonstrations that a system with a certain set of assumptions about the representation and processing of information can explain a certain pattern of results.  Computer programs do not engage in fast-talking or hand-waving.

Finally, computer programs provide actual numerical predictions about people’s reaction times, rather than vague predictions like “people are faster when X than when Y.”  If the particular set of assumptions that are implemented in a computer program accurately reflects how our minds work, then the (simulated) time it takes the program to go from stimulus information to the formation of a motor code should be proportional to the time it takes people to do the same thing (i.e. their reaction time).

Unfortunately, there is also a cost to implementing models as computer programs.  In order to run, a computer program must be more specific than the psychological hypotheses that it is designed to implement.  For example, the generic dimensional overlap and generic response selection models are characterized by a list of critical defining assumptions (i.e. the assumptions in the coding model framework, plus either the dimensional overlap hypothesis or the response selection hypothesis).  A number of details about cognitive processing are left unspecified: for example, how mental codes actually form over time, how information is transferred from stimulus codes to response codes, and how the formation of one mental code influences the formation of other alternative codes of the same type (i.e. stimulus or response).  These details are deliberately left unspecified, in order to allow the models to focus on the critical assumptions about cognitive processing to be examined.

However, these other details have to be specified in order to get a computer program to run.  As a result, implementing computational versions of these models involves making arbitrary decisions – decisions about information processing not inherent in the models that they are based on.  In contrast with the critical assumptions of the models, these are auxiliary assumptions, or implementation details, associated with the computational models.  They have to be there for the computational implementation to run, but they do not represent crucial psychological hypotheses.

Because of this, the “existence proof” mentioned above does not work in reverse: if a particular computational model cannot explain a particular result, this does not mean that the psychological model it is based on cannot predict that result.  If a computational model were implemented with the same set of critical assumptions, but a different set of auxiliary assumptions, it could very well make different predictions.  This means that whenever a computational model can or cannot explain some finding, a great deal of care must be taken to ascertain why: what aspect of the model leads to its success or failure? A critical assumption, an auxiliary assumption, or an interaction between the two?

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn

Where does selective inhibition happen?

One question about which there is little consensus among connectionist models of reaction time, is the question of the locus of inhibition between alternatives.  Are inhibitory mechanisms active within each cognitive process, or between them?

According to the within-process inhibition view, the activation of one mental code in a particular process (i.e. within a module) will inhibit all of the other mental codes in that process (i.e. within the same module).  According to the between-process inhibition view, activation of a mental code in one process (e.g. stimulus identification) will directly inhibit non-corresponding codes in the following process (e.g. response selection).

Most of the earliest connectionist network models, including McClelland’s (1979) Cascade model, were feed-forward networks (see Rumelhart, McCelland, et al., 1986).  This meant that any given unit could only supply input to units in later processes, and there were no connections at all between units within the same module.  As a result, these network models naturally had to implement between-process inhibition: each stimulus unit had a positive association with one or more response units, and negative associations with all of the rest of the response units.

This feed-forward architecture also arises from many “learning algorithms”: processes through which the association strengths in a network can change from trial to trial, “learning” based on experience.  For example, the model of the Stroop effect by Cohen, Dunbar and McClelland (1990) assigns association strengths based on a standard learning algorithm (known as backpropagation) that “trains” the network on the process of reading (giving color word responses to word inputs) and on the process of color-naming (giving color word responses to color inputs).  The result of this algorithm is that, for example, the stimulus unit for the word “red” has a positive association with the response unit for the word “red,” and also a negative association with the response unit for the word “green”.  Similarly, the stimulus unit for the word “green” has a positive association with the response unit for the word “green”, and also a negative association with the response unit for the word “red”.  The result is between-process inhibition: negative associations connect stimulus units to response units.

McClelland and Rumelhart (1981), however, introduced a framework with a different assumption about connections between units.  They describe a semantic activation model with a feature module (with units representing mental codes for individual visual features), a letter module (with units representing mental codes for individual letters), and a word module (with units representing mental codes for whole words). They suggest that units are connected with positive or negative associations based on their consistency: that is, because the existence of the word “THE” is consistent with the existence of the letter “T” in the initial position, the initial “T” letter unit and the “THE” word unit have a positive association; on the other hand, because the existence of the word “ARE” is inconsistent with the existence of the letter “T” in the initial position, the initial “T” letter unit and the “ARE” word unit have a negative association.

One result of this kind of assumption is that all of the units within the same module are mutually inhibitory: the existence of a letter “T” in the initial position of a word is inconsistent with there being any other letter in that initial position, the existence of the word “ARE” is inconsistent with there being any other word, and so on.  As a result, this model implements within-process inhibition: activation of any unit within a module (process) inhibits activation of all of the other units in the same module.  However, this model also implements between-process inhibition, because the same rules apply to connections between units in different modules (e.g. letter units and word units).

This set of assumptions is used as the basis for the Stroop model developed by Phaf, van der Heijden, and Hudson (1990).  In this model, color units (representing alternative possible color codes) inhibited both each other (within-process inhibition) and inconsistent color word responses units (between-process inhibition).  Phaf et al. (1990) further argue in support of within-process inhibition by citing neurophysiological evidence: they claim that the neurophysiological phenomenon of “lateral inhibition” (inhibition among all of the nearby neurons in the same layer in cortical tissue) can be thought of as evidence that the brain implements within-process inhibition.

It was soon noted, of course, that having both of these kinds of inhibition is computationally redundant.  Several years later, McClelland (1993) proposed a normative framework for connectionist network models of performance called the Graded Random And Interactive Network (GRAIN) framework, in which he suggested that models should use only inhibitory connections between units in the same module, and only excitatory connections between units in different modules.  This effectively called for models of performance to constrain themselves to implementing within-module inhibition.  Most of the motivation for this move was computational, not psychological, and almost every “advantage” described by McClelland (1993, pp. 659-660) for within-process inhibition could also be found in an appropriately structured model with between-process inhibition.

Many models of consistency effects, nonetheless, have followed this normative suggestion (e.g. Barber & O’Leary, 1997; Cohen & Huston, 1994; Cohen, Servan-Schreiber, & McClelland, 1992; O’Leary & Barber, 1993; Zhang & Kornblum, 1998; Zhang, Zhang, & Kornblum, 1999; Zorzi & Umilta, 1995).  Zorzi and Umilta (1995), however, did explore the performance of an alternative version of their model that implemented between-process inhibition.  They found that both models could account for performance equally well, and that in the end the only motivation for preferring within-process inhibition was theoretical consistency: because everyone else was doing it.

Kornblum et al. (1999) pointed out that having within-process inhibition can lead to explosive inhibitory feedback effects if activation values are allowed to go below zero (see Kornblum et al., 1999,  p. 706).  Zorzi and Umilta (1995) and Cohen and Huston (1994) dealt with this by constraining the output of units between 0 and 1.  Instead of adding this additional constraint on the dynamics of processing, Kornblum et al. (1999) implement between-process inhibition.  By using this alternative inhibition mechanism, and forcing fewer constraints on processing (because output could be either positive or negative with no ill consequences to the model), they were still easily able to account for performance in Simon tasks, Stroop-like tasks, and their factorial combinations.


 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn

Debate: Processing Stages or Continuous Activation

Do stimulus and response processing happen in stages? Or does information get continuously transmitted from stimulus-processing to response-processing over time?

This is an important question that models of performance heatedly disagree about. According to the continuous transfer view, any activation that accumulates for a stimulus code is immediately used as input to any associated response codes, which in turn has an immediate influence on response code activation.  According to the discrete transfer view, on the other hand, activation of a stimulus code must accumulate to some critical level, indicating that stimulus identification has been completed with some degree of certainty, before a signal is sent to the response selection process and activation of the response codes can accumulate.

Historically, popularity has swung back and forth between these alternatives.  Over thirty years ago, Sternberg (1969) proposed a method of interpreting reaction time data called the additive factors method (AFM). This framework assumes that information is transferred discretely between processes, and shows that given this assumption, if the effect of manipulating some factor A changes (i.e. becomes larger or smaller) as one manipulates another factor B, then these two manipulations must influence the same underlying process.  The logic of this method was very compelling, and was used with much success for interpreting empirical results (see Sanders, 1980, 1990; Sternberg, 1971); so, people were happy, for a while, to assume discrete information transfer.

A decade later, however, McClelland (1979) developed the Cascade model, in which information is transferred continuously from stimulus to response processes, and showed that this model could account for many of the same kinds of empirical data that AFM could account for, but also allowed for different inferences to be drawn about what processes are influenced by experimental manipulations.  Around the same time, a large number of other criticisms of AFM also arose, as well as new models favoring the assumption of continuous information transfer (e.g. Eriksen & Schulz, 1979; Taylor, 1976; Wickelgren, 1977).  The tide had turned toward continuous models.

Approximately a decade after that, Miller (1988) brought extensive criticism against this shift, saying sharply: “We consider the en masse abandonment of discrete models in favor of continuous ones to be wholely unjustified given the evidence currently available, and thus scientifically premature.”  He analyzed much of the empirical data that had been adduced in support for continuous models, and found that they could all be accounted for by models that did not assume continuous information transfer.  Most often, the empirical evidence spoke against models that consisted of only a single unitary stimulus identification process, but could easily be accounted for by models that assumed the formation of multiple parallel stimulus codes.  Miller shows that his Asynchronous Discrete Coding (ADC) model, in which multiple stimulus identification processes each give separate and independent discrete outputs to response processing, is able to account for much of the critical data (see also Miller 1982a, 1982b, 1983).

Increasingly sophisticated methods have been proposed to empirically test whether information transfer is discrete or continuous (Roberts & Sternberg, 1993), and increasingly complex models have been proposed that manipulate assumptions about different ways in which information transmission can be discrete, continuous, or varying from one to the other on a continuum (e.g. Liu, 1996; Miller, 1993).  No overall agreement, however, has been reached in this debate.

Finally, it should be noted that the terms “discrete” and “continuous” can be, and have been, used in other ways when talking about models of mental processes.  Miller (1988) distinguishes between three different ways in which a model can be “discrete” or “continuous”.  First, it may have discrete or continuous representation: mental representations may either vary freely across a continuum, or may be restricted to a limited number of mental codes.  Second, it may have discrete or continuous transformation: mental representations may vary continuously in the degree to which they are formed or activated, or may have only a limited number of states they can be in (e.g. formed or not, prepared or not).  Third, it may have discrete or continuous transfer of information, as has been discussed so far here.  Most of the empirical tests and theoretical debates have been focused on the question of information transmission, although there has been some work in trying to empirically establish whether the transformation of information within response selection is discrete or continuous (e.g. Meyer, Irwin, Osman, Kounios, 1988; Meyer, Yantis, Osman, & Smith, 1985).

Connectionist models of performance and consistency effects all have discrete representation (a finite number of units, representing discrete mental codes) and continuous transformation (continuous accumulation of activation within each unit);  However, although most of these models assume continuous information transfer (e.g. Barber & O’Leary, 1993; Cohen, Dunbar & McClelland, 1990; Cohen & Huston, 1994; Cohen, Servan-Schreiber, & McClelland, 1992; O’Leary & Barber, 1997; Phaf, van der Heijden, & Hudson, 1990; Servan-Schreiber, 1990; Zhang & Kornblum, 1998), at least one of these models explicitly assumes discrete transfer (Kornblum, et al., 1999), and another implies it (Zorzi and Umilta, 1995; this case will be discussed below).

This debate is important when evaluating computational implementations of generic dimensional overlap and response selection models.  The way a model implements the transfer of information could seriously impact the predictions that it makes, also influencing any comparison that is made between it and other models.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn

Inverted-U activation of irrelevant stimuli

In most connectionist models of reaction time that try to account for the effects of irrelevant stimuli, they model the influence of attentional focus by having the activation of the irrelevant stimulus unit increase at first, and then decrease again, producing a non-monotonic, inverted U-shaped activation function over time (Kornblum, et al., 1999; Lu, 1997).  This activation curve can be implemented in a computational model in a number of ways.  For example, if the input to the irrelevant stimulus is turned off or decreases shortly after it is initially turned on, activation would follow a rising and falling course over time (e.g. Kornblum et al., 1999).  Alternatively, an actual “attentional inhibition” mechanism could be made explicit, where other units become activated in response to irrelevant stimulus unit activation, and these units subsequently inhibit activation in the irrelevant stimulus units (e.g. Houghton & Tipper, 1994).  Regardless of the mechanism, the basic assumption of these models is that irrelevant stimulus unit activation increases and then decreases again over time.

This characteristic of irrelevant stimulus unit activation has been inferred primarily from empirical results that have used stimulus onset asynchrony (SOA), or the relative timing of the relevant and irrelevant stimulus characteristics, to measure how the size of consistency effects changes over time.  All consistency effects seem to follow a non-monotonic, inverted U-shaped time-course, with different effects differing only in the shape and peak of this curve (see Kornblum et al., 1999; Lu, 1997).  Moreover, in connectionist network models, the size of a consistency effect reflects the amount of activation in the irrelevant stimulus unit during processing.

It should be noted that most early models actually do not include this assumption (Cohen, Dunbar, & McClelland, 1990; Cohen & Huston, 1994; Phaf, et al., 1990).  However, these models also do a poor job of accounting for time-course, and are unable to account for the decrease in the size of consistency effects for long SOA values (Cohen & Huston, 1994).  More recent models have either included this assumption from the outset (Kornblum, et al., 1999; Zorzi & Umilta, 1995), or incorporated the assumption at some point later on (Barber & O’Leary, 1997; Zhang, Zhang, & Kornblum, 1999).

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn

Mental codes form gradually

One of the basic assumptions shared by all connectionist models of reaction time is that mental codes form gradually, through the  accumulation of evidence over time based on input information.  For example, when a blue stimulus appears, information from the sensory signals gradually causes evidence to accumulate in favor of the stimulus code representing the color blue.  In the language of connectionist networks, the activation of the stimulus unit for the color blue gradually increases over time. A mental code can be thought of as “completely formed” once activation in the appropriate unit has reached some criterion level.  Activation  in a stimulus unit can trigger the accumulation of activation in a response unit either right away, or after a decision threshold is met, depending on whether the model assumes stages or continuous processing. The ultimate speed of performance is determined by how long it takes for activation of the motor units to reach some “decision criterion,” indicating that the motor codes have been fully formed.

This idea evolved out of the combination of three ideas.  Signal detection theory (Green & Swets, 1966; Swets, 1964; Tanner & Swets, 1954) suggested that detecting a particular stimulus (i.e. forming a particular stimulus code) is a statistical decision: the signals that you get from your sensory system are subject to variability, so although a particular stimulus characteristic on average produces a particular sensory signal, there will be times when the signal appears without the stimulus being there, and there will be times when the signal fails to appear when the stimulus is there.  So, you have to use a decision threshold for how strong the signal has to be so that you are most likely to detect it when it is there, but least likely to think it is there when it is actually not.

Stimulus sampling theory (Estes, 1950, 1955) introduced the idea that perception involves repeated sampling of a stimulus over time, allowing signal detection theory to be extended over time (see, e.g., Pike, 1973).  The statistical decision tool called sequential sampling and optional stopping (Wald, 1947) is used whenever you do not want to sample more data than necessary, to determine the number of times a signal has to be sampled in order to make a confident decision about its value.  According to this method, each additional sample of information modifies your cumulative level of confidence, allowing you to continue sampling information until your confidence is high enough to meet some decision criterion, at which point you stop sampling.

This idea was immediately incorporated into a large number of psychological models of performance in CRT tasks (e.g. Audley, 1960; Audley & Pike, 1965; LaBerge, 1962; McGill, 1963, 1967; Stone, 1960; Vicker, 1970).  Although the details of these models differ, the basic premise is the same: evidence for each stimulus code accumulates over time, due to repeated sampling of sensory information, until a decision criterion of some sort is reached, indicating that the code has been fully formed and the stimulus has therefore been fully identified (see Luce, 1986, for more details).  Currently, two major types of models based on this premise are being pursued: diffusion models (Ratcliff, 1978, 1980, 1981, 1988) and the connectionist models discussed here.

Most connectionist models use the same equation to determine how activation changes over time, drawing on the first connectionist model of performance, McClelland’s (1979) Cascade model.  McClelland proposed that units be understood as first-order linear integrators, so that their activation at any given point in time is a time-averaging function of their input.  When units like this are given a constant input, their activation will asymptotically approach that input value according to a “loading curve”: approaching the input level at a rate proportional to how far away it is from the input.  This function actually first appeared in a psychological model proposed by Grice (1972, 1977; Grice, Nullmeyer, & Spiker, 1982), although he rarely gets credit for this contribution (see Luce, 1986).

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn

Connectionist models of consistency effects

Before the DO2000 model, there were already a number of computational models of consistency effects.  All of these models can be classified as specific instances of either the generic response selection model or the generic dimensional overlap model.  Interestingly, the response selection models are each designed to account for performance in only one kind of task: Cohen (Cohen, Dunbar & McClelland, 1990; Cohen & Huston, 1994) and Phaf, van der Heijden, and Hudson (1990) have described response selection models of performance in the Stroop task;  Servan-Schreiber (1990; Cohen, Servan-Schreiber, & McClelland, 1992) has described a response selection model of performance in the Eriksen task; and Zorzi and Umilta (1995) has described a model of performance in the Simon task (which could be classified as either a dimensional overlap model or a response selection model, since both models agree on their explanation of the S-R consistency effect).

The three models that have specifically been designed as general models of consistency effects, on the other hand, are all dimensional overlap models: Barber and O’Leary (1993; O’Leary & Barber, 1997) have described a dimensional overlap model of performance in Simon and Stroop tasks, and their variants; and both Zhang and Kornblum (1998) and Kornblum et al. (1999) have described dimensional overlap models of performance in consistency tasks in general, including Eriksen, Simon, Stroop and Stroop-like tasks, and their variants and factorial combinations.

All of these models also have a common computational heritage, and so therefore also share a number of common assumptions, as well as a common descriptive language.  They can generally be classified as connectionist network models (Quinlan, 1991; Rumelhart, McCelland, et al., 1986).  Connectionist models consist of a network of interconnected processing units, where each unit is very simple, usually involving a single variable (called the unit’s “activation”) that changes as a function of input to the unit, and determines the output of the unit to be transferred to other connected units.

More specifically, these models are localist connectionist models (see, e.g. Grainger & Jacobs, 1998; Page, in press).  This means that each unit in the network represents a mental code.  In models of performance in classification tasks, the units in the network can be divided into three groups or modules: a relevant stimulus module, containing units corresponding to each of the elements of the relevant stimulus set; an irrelevant stimulus module, containing units corresponding to each of the elements in the irrelevant stimulus set; and a response module, containing units corresponding to each of the elements in the response set.  Some models also include modules of units representing executive cognitive functions, such as “task demand units,” which represent mental codes that specify which of the two stimulus sets is relevant (e.g. Cohen & Huston, 1994).

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn