The development of features in object concepts

Philippe G. Schyns Robert L. Goldstone Jean-Pierre Thibaut

University of Glasgow Indiana University Université de Liège

Dept. of Psychology Dept. of Psychology Dept. of Psychology

Glasgow, G12 8QB Bloomington, IN 47405 Batiment B32

UNITED KINGDOM USA Sart-Tilman 4000 Liège BELGIUM

philippe@psy.gla.ac.uk rgoldsto@ucs.indiana.edu jthibaut@vm1.ulg.ac.be

Keywords: Concept learning, conceptual development, perceptual learning, features, stimulus encoding

SHORT ABSTRACT

One productive and influential approach to cognition maintains that object categorization and higher-level cognitive processes operate on the output of lower-level perceptual processing. Our perceptual systems provide us with a set of fixed features which are the inputs to higher-level processes. We question this unidirectional approach, arguing that in many situations, categorization and higher-level processes cause lower-level features to be developed. Rather than viewing the "vocabulary" of primitives as fixed by low-level processes, our view maintains that the vocabulary is dependent on the higher-level processes that use the vocabulary.

LONG ABSTRACT

One productive and influential approach to cognition maintains that categorization, object recognition and higher-level cognitive processes operate on the output of lower-level perceptual processes. That is, our perceptual systems provide us with a set of fixed features which are the inputs to higher-level cognitive processes.

We question this unidirectional approach, arguing that in many situations, the higher-level cognitive process being executed influences the lower-level features that are developed. Rather than viewing the "vocabulary" of features as being fixed by low-level processes, we present a theory in which people create features in order to subserve the representation and categorization of objects.

In our view, two types of category learning should be distinguished. Fixed space category learning occurs when new categorizations are representable with the available feature set. Flexible space category learning occurs when new categorizations are not representable with the available features. Whether fixed or flexible learning occurs depend on the requirements of a particular categorization situation. That is, it depends on the featural contrasts and similarities between the new category to be represented and the individual's concepts. Fixed feature approaches face one of two problems when they are confronted with tasks that require new features. If the fixed features are fairly high-level and directly useful for categorizations, then they will have insufficient flexibility to represent all objects that may be relevant for a new task. If the fixed features are small, subsymbolic fragments (such as pixels), then regularities at the level of functional features, regularities that are required to predict categorizations, will not be captured by these primitives.

We present psychological evidence suggesting flexible perceptual changes due to category learning, and theoretical arguments for the importance of such perceptual flexibility. We characterize situations that promote feature creation, and argue against interpretations of these situations in terms of fixed features. Finally, we discuss implications of functional features for object categorization, conceptual development, chunking, constructive induction and formal models of dimensionality reduction.

1. INTRODUCTION

There has been an influential and powerful idea in cognitive science that we nonetheless believe must be revised in order to provide a full account of cognition. This idea is that cognitive processes such as categorization and object recognition operate on a fixed set of perceptual or conceptual features which are the building blocks for complex object representations. We will argue that categorization and object recognition often require the creation of new featural descriptors. Rather than viewing the featural vocabulary as being fixed, our view maintains that the vocabulary is dependent on situation demands, novel categorization requirements, and environmental contingencies.

In this paper, a feature refers to any elementary aspects of a stimulus (object, event) that is psychologically processed. This does not imply that people are consciously aware of these aspects as separable features. Instead, features are individuated by their functional role in the cognitive architecture. Dimensions are ordered set of feature values such as size, brightness, hue. Note that two features can create a new dimension, for example by interpolating the intermediate values between poles defined by the two features.

1.1. Fixed Feature Vocabularies

In a typical application of the fixed features approach in categorization (e.g. Bruner, Goodnow, & Austin, 1956), subjects are shown simple objects, and are instructed to learn the rule for their categorization. Such rules involve logical combinations of features that are manifestly present in the stimuli. A subject might learn, for example, a rule combining white and square features to provide a categorization. Importantly, the subject does not have to create the relevant features to be used for categorization. Instead, there is an implicit agreement between the experimenter and the subject about what features compose the stimuli.

Although categorization research has come a long way since these early experiments, many recent approaches to categorization have continued to use stimuli that "wear their features on their sleeves." Clear-cut dimensions with clearly different values are often used for reasons of experimental hygiene. This constraint has led researchers to use simple shapes (Murphy & Ross, 1994), line positions (Aha & Goldstone, 1992), colors (Bruner et al, 1956), and line orientations (Nosofsky, 1987) as the relevant sources of variation in their experiments. This approach of composing stimuli out of components has also been influential in other fields. In the Recognition By Components (RBC) theory (Biederman, 1987), compositions of a fixed set of 36 geometric elements is designed to account for the recognition of a very large set of objects. Theories of phoneme (Jacobson, Fant & Halle, 1963) and letter (Gibson, 1971; Selfridge, 1959) recognition also hypothesize a limited set of primitives. In a similar vein, Schank's (1972) Conceptual Dependency theory postulates a fixed set of about 20 semantic primitives such as PTRANS (physical transfer) and INGEST (see also, Katz & Fodor, 1962 for a related point of view). These approaches vary widely on the nature of their components and the means of combining those components, but all assume representations composed out of a fixed feature set.

The fundamental aspect of fixed features is that they are the lowest building blocks of object representation and categorization. That is, any functionally important difference between objects must be representable as differences in their building blocks if it is to be used within the system. Typically these features are assumed to be nondecomposable units or "atoms," although, if pressed, many researchers would concede that their atoms may be decomposable if required. All of the strengths of "mental chemistry" are inherited by this approach. Namely, a very large number of object descriptions can be generated from a finite set of elements and a set of combination rules. In addition, compositions of features allow for structured representations (Palmer, 1977), as opposed to the template approach to recognition (Ullman, 1989). Also, the systematic relations between different objects can be expressed in terms of their features and their combination rules (Fodor & Pylyshyn, 1988).

We wish to preserve those powerful properties of componential representations. However, we also wish to provide a framework for augmenting feature sets with new features. In our view, componential theories of cognition should provide powerful principles for developing new representations. Fixed feature theories limit new representations to new combinations of the fixed features. Consequently, all possible categorizations are bounded by the possible combinations of the features (the conceptual repertoire). If a categorization requires a feature not originally present in the feature set or derivable from this set, then the categorization cannot be learned. This is a rather restrictive conception of conceptual change. There may be occasions when features not originally present in the system are useful to distinguish between important categories in the world that newly confront the organism. A system that is constructed so as to flexibly learn such features would be able to tailor its vocabulary to the demands of categorization. In many situations, it is unrealistic to think that a system could come fully equipped to deal with all possible contingencies of a complex environment.

We will provide an account of feature development in which the components of a representation have close ties to the developmental history of the organism. We will discuss the empirical evidence suggesting that such learning occurs, and the theoretical grounds that necessitate learned features. Although we will not propose a particular implementation of flexible feature learning, we will discuss computational mechanisms that can account for learned features, and how current incarnations of these mechanisms must be supplemented. Our analysis is addressed to literatures in both object recognition and categorization. Although these fields have not traditionally been linked, both deal with the question: "What is this object?" To recognize an object as a cart does not seem to be fundamentally different from placing the object into the category of things that are carts. In both cases, the problem is to extract the relevant components of the object and compare this featural encoding with memory representations.

1.2. Empirical Evidence for Learned Features

Although not addressed by fixed feature approaches to categorization, there is a corpus of evidence that indicates substantial changes to perceptual systems during learning. A subset of these perceptual changes is most parsimoniously explained by postulating the discovery of new features.

Before we review the evidence, a few ground rules are necessary. First, we distinguish between varieties of feature weighting and feature creation. A feature that is useful (diagnostic) for a categorization may be selectively attended. This selective attention may simply be a decisional strategy that does not affect the perceptual appearances of the objects to be categorized (Elio & Anderson, 1981; Nosofsky, 1987). For example, to achieve efficient categorical judgments, Elio and Anderson's subjects learned to selectively base their categorizations on diagnostic features even though the subjects could still easily notice the nondiagnostic features. On the other hand, some researchers have hypothesized that features are selectively weighted if they are diagnostic, and that this selective weighting affects actual perceptual, rather than strategic or final decisional, processes (Gibson, 1969). Both of these conceptions assume that changes are attributable to previously existing features or dimensions. A third conception is possible in which new features or dimensions are created due to categorization requirements. Explanations involving new features are suggested if the required number of pre-specified features would be implausibly large unless new features were created.

Second, the reported experiments will differ in the posited psychological level of the representational change. On some occasions, representational changes are relatively late and strategic. Learning may consist of strategically using a previously diagnostic feature in new situations (Lawrence, 1949). On other occasions, feature changes are relatively perceptual and nonstrategic. Categorization relevance may influence relatively perceptually-based tasks. It is notoriously difficult to draw a sharp distinction between perceptual and conceptual tasks, and, in fact, it will be our contention that such a distinction is ill-advised. For example, same/different judgments tasks (tasks where subjects are required to respond as to whether two simultaneously presented stimuli are physically identical or not) have usually been thought of as providing relatively clear evidence for perceptual similarity. However, to the extent that subjects always have to represent, remember (albeit for a very short time), and attend to aspects of the compared stimuli, we cannot be certain that these tasks tap purely sensory representations. Still, by examining the particular stimuli and task demands, we might be able to assess the relative contributions of strategic and perceptual factors.

1.2.1. Preexposure

The simplest form of perceptual learning that has been studied is predifferentiation (Gibson & Walk, 1956). In predifferentiation, exposure to stimuli before testing results in heightened sensitivity to those stimuli. For example, human subjects are better able to distinguish between "doodles" after repeated exposures to them. Researchers (e.g. Gibson, 1991) have discussed preexposure results in terms of perceptual differentiation, a process whereby aspects of the stimuli that serve to distinguish them are made more salient. Feedback on the classification or use of stimuli is not required for sensitization; simple exposure to the stimuli suffices.

1.2.2. Diagnosticity Driven Learning

Although preexposure effects indicate that category feedback is not a prerequisite for learning new aspects of the stimuli, other studies have suggested that categorizations exert an additional influence on how subjects deal with the stimuli. Subjects become selectively attuned to diagnostic features that facilitate discriminations between categories or classes of responses. Lawrence (1949) described a theory of acquired distinctiveness of cues in which relevant cues for a task become generally distinctive. For example, rats were rewarded for choosing one stimulus over another in a rough-smooth discrimination task. Subsequently, the rats were transferred to a discrimination task in which, for example, rough patterns required left responses and smooth patterns require right responses. Rats learned this second discrimination more quickly than rats who were first given a black-white discrimination.

Although experiments of this sort show that dimensions can be selectively sensitized, they provide little evidence for perceptual changes per se. One simple account of these results is that the organism simply generalizes the usefulness of a dimension from one situation to another. However, other recent data suggest that categorization diagnosticity influences an object's representation in terms of features. Categorization diagnosticity can influence perceptual changes in at least two ways. First, it can influence the discriminability of values within existing dimensions, or the discriminability of entire preexisting dimensions. For example, Goldstone (1994a) gave human subjects categorization training involving the sizes or brightnesses of squares. Subsequent to prolonged training, subjects were transferred to a same/different task in which squares that varied slightly on their sizes or brightnesses were presented (or the same square was repeated twice). When a dimension was relevant for categorization, subjects' same/different judgments along this dimension were sensitized (using the d' measure from Signal Detection Theory) relative to subjects for whom the dimension was irrelevant, and to control subjects who did not undergo categorization training. The greatest sensitization of the categorization-relevant dimension was found along those particular dimension values that served as the boundaries between the learned categories. However, the sensitization of the relevant dimension also extended to other values along the dimension even though the values were originally placed in the same category. In addition, one case of acquired equivalence was found in which a dimension that was irrelevant for categorization became desensitized relative to control subjects. Because same/different judgments involve such "cognitive" factors as (very) short term memory, attention, and encoding, these results do not strictly isolate a perceptual change, but at least it can be said that the categorization training influences a task that many researchers have assumed to tap relatively low-level processes. Andrews, Livingston and Harnad (in press) have found similar influences of categorization on similarity judgments.

Second, category diagnosticity can also influence perceptual change by participating in the creation of new features of object categorization. For example, Schyns & Murphy (1991, 1994) provide evidence for such a process (see Figure 2, picture a). In a typical experiment, subjects had to learn labeled categories of new objects and were later tested on the features used to encode the categories. The stimuli were three-dimensional continuous rock-like "blobby" objects. The stimuli had a complex blob structure so that naive subjects showed little agreement on how the objects were decomposed into parts before experience with the categories. The categories were defined by a coherent group of a few contiguous blobs present on each category exemplar; all other blobs of an exemplar were given random shapes. After learning the categorizations, when subjects were instructed to delineate the objects into parts that they thought were relevant, subjects tended to parse the objects into those parts that were diagnostic for categorization. This is despite the failure of subjects to provide this parsing prior to experience with the categories and despite a strong bottom-up constraint (the minima rule, Hoffman & Richards, 1984) on object segmentation that would predict parsings other than those obtained in the experiments. Schyns and Murphy's technique was a delineation method in which subjects draw outlines around the parts of each object (using either a computer mouse or a pen). Although not free of cognitive influences, this technique has the advantage of leaving subjects free to report any fragment of a stimulus that they wish (independently of whether this fragment has an easily expressible name or not). In fact, Braunstein, Hoffman and Saidpour (1989) found that a delineation method gave the strongest evidence for the segmentation with the minima rule, and we should have been able to find strong evidence for physically determined delineations with this task.

A similar influence of categorization on the segmentation of objects was suggested by Pevtzow and Goldstone (1994). Stick figures composed out of six lines were categorized in one of two ways. Different arbitrary combinations of three contiguous lines were diagnostic for the different categorizations. After categorization training, subjects participated in part/whole judgments, responding as to whether a particular set of three lines (a part) was present in a whole stick figure. Subjects were significantly faster to determine that a part was present in a whole when the part was previously diagnostic during categorization. The part/whole judgment task is arguably the most perceptually based task used by Palmer (1977) to explore the "naturalness" of a way of segmenting an object into parts. Although Palmer's model bases the naturalness of a particular segmentation of an object on objective properties of the object (e.g. the proximities, similarities, and shapes of the line segments), the above results indicate that the subjects' experience also influences how they will segment an object into parts.

For both of these experiments, hypothesizing that the effects are due to shifts of attention to existing features would require positing an implausibly large number of dimensions or features. Explanations in terms of mechanisms for dynamically creating new features are more parsimonious in these cases than those that involving very large numbers of pre-existing features.

1.2.3. Differentiation

Several researchers have suggested that experience with stimuli results in subjects differentiating stimulus dimensions that were originally processed together. There is a substantial amount of developmental evidence that children are more likely to perceive stimuli in an undifferentiated manner whereas adults analyze the stimuli into distinct dimensions. For adults, some pairs of dimensions, like size and brightness, are called "separable" (Garner, 1974). They are processed separately, attention can be selectively placed on just one of the dimensions, and similarities between stimuli are computed by summing their separately determined dimension differences. Other pairs of dimensions, like the saturation and brightness of a color, are called "integral." Such dimensions appear to be psychologically "fused" in that it is difficult to selectively attend to just one dimension, and similarities between stimuli are computed by considering the two dimensional differences simultaneously. Several studies have indicated that children process separable dimensions in the same way that adults process integral dimensions (Smith & Kemler, 1978; Ward, 1983). One way to understand these results is to hypothesize that part of the maturation process is to separate dimensions that were not originally separated. Such a process has also been implicated for learning distinctions between more conceptual dimensions such as heat and temperature (Smith, Carey, & Wiser, 1985). Even in adulthood, differentiation of dimensions seems to occur. Through training, the saturation of a color can be psychologically differentiated from its brightness (Goldstone, 1994a; Burns & Shepp, 1984).

There is also a second type of differentiation in which categories, rather than dimensions, are split apart. There are several studies in developmental psychology showing that the lexical categories of young children are frequently broader than the lexical categories of adults (Clark, 1973). For example, when children overgeneralize category labels, they may group together all the round objects as instances of "ball" (Chapman, Leonard & Mervis, 1986). Eventually, after a progressive reorganization of their concepts, children's lexical categories narrow down and match adults' lexical categories. Presumably, adding features to an initially broad concept allows its differentiation into more specific concepts. The acquisition of new features more specifically tuned to the categorizations at hand may also underlie the development of adults' conceptual expertise. Tanaka and Taylor (1991) studied the categorizations of dog and bird experts in these fields. Their results showed that experts are particularly adept at making fine discriminations within their category of expertise, suggesting that experts acquire domain-specific features to structure their categories of expertise. Schyns (1991) has given a network implementation of this type of conceptual differentiation. In a two-layered neural network, units initially representing a broad category became progressively specialized to the representation of finer categories due to a feature extraction process.

1.2.4. Summary

The experimental evidence reviewed above indicates that our categorizations, rather than simply using our perceptually extracted features, also determine the featural description of the object that is used. Some perceptual changes may arise from mere exposure with an environment, but others depend on the way in which objects of the environment are organized into categories. Features are selectively attended, and apparently created, when they distinguish between relevant categories. In addition, perceptual dimensions and categories both undergo a differentiation process based on environmental contingencies. These results provide an initial indication that categorical constraints could significantly affect the process by which features are extracted from objects.

2. A FUNCTIONAL APPROACH TO FEATURE CREATION

2.1. The Function of Features

The function of a feature is to express commonalties between members of the same category, and to distinguish between categories. In fixed feature set approaches to categorization and object recognition, potential functionality guides the construction of the vocabulary of features. That is, the researchers develop their feature sets by keeping in mind the question: "What features would be required to solve this categorization task?" In many cases, the researchers then test their theories using stimuli that were constructed from these derived feature sets.

We completely agree with the premise that features should be functionally determined. However, these constraints should be defined by the environment and not simply by the experimenter. Even if the fixed set researcher manages to create a clever and plausible feature set, the resulting set will probably be limited to a specific domain, and will not adapt to temporary or local environmental states. Moreover, this approach may oversimplify the task of categorization by restricting it to a problem of combining obvious, clearly demarcated features.

As a case study, consider the current object recognition literature in which Biederman's geon theory of object recognition (Biederman, 1987) is contrasted to the multiple-views approach (Edelman & Bülthoff, 1992; Poggio & Edelman, 1990; Tarr & Pinker, 1989). In Recognition By Components (Biederman, 1987) objects are represented by a set of geometric elements derived by taking various geometric slices through the possible transformations of a generalized cone. Importantly, the resulting elements can be distinguished from each other on the basis of a few nonaccidental features--features that are invariant over a wide range of transformations (rotational, translational and scalar). Transformational invariance is a desirable property because telephones do not change their category membership (the fact that they are telephones) simply because they are rotated. However, Biederman's set is severely limited in its application to many natural objects (Kurbat, 1995; Ullman, 1989), it does not allow discriminations between many similar categories, and objects within the same category will not necessarily be represented by the same geon structure. Rather than viewing these limitations as particular problems for Biederman's theory alone, we think that such problems will arise with any approach that does not flexibly tune its building blocks to categorical constraints.

For example, recent object recognition research has demonstrated that the relationship between the observer and the object influences recognition performance (e.g. Edelman & Bülthoff, 1992; Palmer, Rosch & Chase, 1981; Tarr & Pinker, 1989). Viewpoint-dependent recognition was interpreted by Tarr and Pinker (1989) as evidence that objects are represented in memory by a collection of specific views (see also Poggio & Edelman, 1990). When views of an object are the basis of object representation, it is difficult to determine among the set of all possible views which subset best predicts categorizations. Categorizations are so diverse that there may not be a unique, canonical, and task-independent view-based representation of a particular object (Hill, Schyns & Akamatsu, 1995). For example, any view of your face could reveal diagnostic information to distinguish it from a car, but fewer views would be well-suited to discriminate your face from a face of the other sex, and very few views would reliably distinguish your face from another face of the same gender. Viewpoint dependence appears to be relative to the diagnostic information in the task considered, and the location of the information on the object.

Therefore both the geon- and the view-based approach to object recognition must tune their representations to the functional roles of their building blocks. That is, both theories must consider the possible categorizations of an object before considering the possible geometric elements or views that will be used to represent the object.

2.2. Categorical Constraints on Feature Creation

A categorical context is composed of the categories and features an individual knows at a particular time of his/her conceptual development, and the new category to be encoded. The individual knows what the categories are from external feedback, or from the consequences of his/her miscategorizations. The categorical context imposes contrasts and similarities between previously acquired concepts and novel categories that are particular to the individual's history of categorization, and may therefore change from individual to individual.

To illustrate, consider the simple task of learning three categories X, Y and XY. The category X (or Y) is defined by a particular part x (or y) common to all category exemplars. The category XY is defined by two adjacent parts, x and y, which are present in all exemplars. If x and y were prespecified in the feature vocabulary, an initial discretization of the objects into the features x and y could be input to context-sensitive weighting mechanisms (e.g., Nosofsky, 1987). However, if x and y did not exist as features before experience with the categories, people should learn the features to solve the categorizations. Learning the features together with the categorizations could challenge the main claim of fixed feature approaches that people systematically analyze and perceive input stimuli with components sampled from a fixed and pre-specified feature set.

This was tested in Schyns and Rodet (1995) using three categories of "Martian cells" (see Figure 2, picture d). Categories were defined by specific blobs common to all category members to which irrelevant blobs were added (to simulate various cell bodies). One group was asked to learn X before Y before XY (X->Y->XY), and the other group learned the categories in a different order (XY->X->Y). The experiment was designed to explore whether the first group would learn XY as a conjunction of the features x and y, while the second group would represent the same category as a unitary, xy feature. (x&y and xy are mutually exclusive perceptions of the same stimulus aspect.) Similar categorization performances for X, Y and XY stimuli during the testing phase assured that x and y were equally diagnostic in both conditions. However, categorizations of X-Y cells (XY stimuli in which x and y were not adjacent to each other) were different. X->Y->XY subjects tended to categorize X-Y stimuli as XY while XY->X->Y subjects tended to categorize the same stimuli as X, or as Y. Thus, it is likely that category learning affected the perceptual encoding of the XY category.

In X->Y->XY, when X and Y are learned first, subjects have learned two features that they can apply to the analysis of the XY category. The XY category is then represented as a conjunction of two features x&y. On the other hand, when XY is learned first, it tends to be represented as a single configural unit xy because there has been no reason up until then to encode x and y individually. Note that this result is difficult to interpret in terms of a weighting of pre-existing features because both x and y are equally diagnostic in the experimental groups. Thus, both groups see x and y in X-Y stimuli, but the groups nonetheless categorize these stimuli very differently. Other experiments showed that these extracted features changed subjects' perceived similarities between pairs of stimuli. The same pairs were found to be "same" for one group, and "different" for another group.

Schyns and Rodet (1995) also tested more specifically the role of the context of categorization on categorization and similarity judgments. In the first part of their Experiment 3, two groups of subjects learned two Martian cells categories that would later serve as the background context for learning a third category. The categories were designed so that the two groups would learn different concepts using the same learned features. Both groups learned that the feature x characterized the first category X. The two groups differed on the nature of the second category. The first group was exposed to a XY category defined by the x feature adjacent to the y feature. The category of the second group was defined by only the y feature. Subjects then learned a third XYZ category defined by adjacent x, y and z components. Subjects' encoding of the new category was tested with a sorting task and a same/different speeded judgment task. It was found that the second group, but not the first group, of subjects distinguished XY stimuli from XYZ stimuli. These results confirmed the hypothesis that different histories of categorization generate different feature spaces to encode similarities and contrasts between objects.

2.2.1. Two types of concept learning

The previously described experiment indicates that a history of categorization can trigger different concept learning mechanisms. By the time the third concept is to be acquired, subjects of the second group have the necessary features x and y to identify the third category; subjects of the first group must create a third, novel feature z in order to identify the third category.

In the concept learning space fixed by x and y, Group 2 subjects represent XYZ as a combination of the two previously acquired features. This particular encoding illustrates what we call "fixed space learning," the familiar diagnosticity-driven learning that Gibson, Lawrence, and concept learning researchers have discussed. However, the combination of x and y already represents the second category of Group 1, and so subjects must develop a new feature z to distinguish the third category. We call this encoding "flexible space learning," to emphasize the expansion of the categorization space to include a new feature or dimension.

2.3. Functional Features and Primitives

The premise that features are created in order to subserve categorizations concerns the creation of functional vocabulary elements, but it is neutral as to their perceptual realization. For example, the shape feature "square" could be implemented as a concatenation of image pixels, as four line segments, as four corners, as four smaller squares, as two smaller rectangles, as a linear combination of sinusoids, and so forth. In short, there are many possible realizations of a functional feature. We have proposed that object aspects that become diagnostic for important categorizations can become functional elements of a system's vocabulary. However, one potential problem that must be addressed is the degree to which these functional features are themselves based upon a (more) primitive set of features. If a primitive set of features can capture all of the regularities and categorizations that are accommodated by the functional features, then the new functional features do not increase the representational capacity of the system. And if this is the case, then the argument that feature creation is needed to allow a system to represent objects that it was incapable of previously representing cannot be maintained. Accordingly, we will argue that functional features are not always constructed out of a fixed catalog of primitive features.

A set of shape primitives that could ground categorization must satisfy at least three conditions: The primitives must exist prior to experience with the objects they describe, they must be sufficient to represent the entire set of representable objects, and they must be able to bootstrap complex recognition systems. Ultimately, there are two ways of conceptualizing these primitives, each with its own problem. Either primitives are fine-grained and relatively unstructured, or primitives already represent complex structures of the environment.

2.3.1. Unstructured primitives

According to the unstructured approach, if one takes sufficiently fine-grained primitives (e.g. very small line segments, or even pixels) together with powerful combination rules, diagnostic compositions of the primitives could account for increasingly complex features. However, functionally important regularities (e.g., symmetry, serif, beauty, and so forth) are often not captured by simple pixel-based representations. It is unlikely that systems which hypothesize properties such as symmetry as a primitive of object recognition (Gibson, 1969) can have these properties explained by commonalties at the pixel-level. Moreover, as will be discussed in the section on formal models of feature extraction, it is practically unfeasible (although logically possible) to extract relevant categorization features from pixel-based (or similarly unstructured) representations of the input.

2.3.2. Structured primitives

Another approach to primitives posits that the catalog includes more complex primitives such as larger curves, corners, squares, circles, triangles, or even three-dimensional shapes such as cones and cylinders (see Biederman, 1987; Garner, 1974; Treisman & Gelade, 1980; among others). Complex (rather than simple) primitives would already mirror important structures of the visual environment and could therefore account for complex recognition by initially segmenting the visual environment into useful primitives for recognition. However, such preformed recognition systems are blind to structures that are not represented as primitives, and that are not compositions of simpler primitives.

To illustrate, in Fisher's (1986) influential model of letter recognition (cited in Czerwinski, Lightfoot & Shiffrin, 1992), a capital "A" is identified by composing three primitives (two diagonal bars and a horizontal bar). Clearly, diagonal and horizontal bars were selected as primitives with the task of letter categorization in mind; the same primitives would be particularly clumsy in categorizing varieties of ellipses. One might imagine adding a second subset of primitives for distinguishing ellipses. However, any large-scale, highly structured set of primitives is bound to be too coarse to represent all of the distinctions that might be required by different categories of objects.

2.3.3. Interactions between choice of primitives and task constraints

Task constraints almost always influence the primitives that scientists import into their componential theories of recognition. In our view, the task of the subject creating new functional features for categorization is not substantially different from the task of the scientist creating a componential theory of recognition: both must create a catalog of features that are defined by their role in recognition. If the scientist wants to posit a complete fixed set of primitives, he/she must envision all possible recognition tasks before conceiving of the features that would solve them. So, the envisioned set of tasks influences the primitives of recognition that will be selected by a theory of object recognition. Similarly, the particular categorization tasks confronting an individual influences the units of representation that he/she will adopt. Thus, rather than draw a correspondence between a particular theory of object recognition (with its static primitives) and an individual's object recognition capabilities, the proper correspondence may be between the individual and the meta-theoretic search for a proper object recognition theory.

2.4. Functions, Perceptions and their Interactions

The idea that new features are created that are useful for categorization partially, but not fully, constrains feature creation. Our claim for functionally determined features does not mean that physiological or sensory facts are unimportant for defining the feature vocabularies. Features are also based on general perceptual constraints such as contiguity, topological cohesion, changes of curvature, and perceptual salience. In many cases, these constraints are not a catalog of shape primitives, but the constraints nonetheless exert strong pressures to create certain features. To illustrate, Hoffman and Richards (1984) have proposed that objects are segmented by creating parts with endpoints that are local minima of principal curvature. Instead of assuming that objects are segmented into primitive shapes, the authors suggest that a particular patch of shape will be identified as a part because it lies between two points of extreme curvature, not because it matches a primitive element. This approach does not limit possible shape features to the compositions of a catalog of primitives. Instead, as a sheet adjusts to the surface on which it is thrown, new features can be acquired to mirror the shapes lying between the segmentations suggested by the minima rule. Hoffman and Richards' constraint on object segmentation illustrates that the structures required for organizing complex representations are not necessarily structured primitives. Instead, general shape-processing constraints can produce segmentations that interact with structuring principles. As Hoffman and Richards (1984, p. 77) state it, "a boundary-based scheme, then, is to be preferred over a primitive-based scheme because of its greater versatility."

A very interesting aspect of Hoffman and Richards' proposal as it applies to the creation of new shape features is that it allows the feature vocabulary to partially mirror the shapes the categorizer experiences in his/her environment. This presents new challenges for effective procedures of feature creation. It is conceivable, even desirable, that several distinctive methods are used for developing features, depending on the idiosyncrasies of different object classes. For example, smooth objects such as faces could be parsed into their relevant component features using elastic 3D templates (e.g., Hinton, Williams & Revow, 1992). These elastic templates would behave as elastic masks whose parameters would adjust to shape variations within the class. At the time of writing, there is no agreement on the features, or feature configurations these masks would represent. Class-specific variations (e.g., learning to categorize Caucasian faces) would result in class-specific features which would not be directly applicable to the shape variations of other classes (e.g., Asian faces). Mismatches between expected shapes, and expected shape variations could give rise to the 'other race effect' in which people perceive faces of their own race with greater facility than those of another race (Brigham, 1986).

While face stimuli are mostly smooth, many man-made objects are discontinuous. This imposes different biases on the eventual elastic templates (and also different segmentation constraints than Hoffman and Richards' minima rule, which operates on continuous surfaces). The templates could be biased so as to "break" at sharp discontinuities of the surfaces, if a categorization required such a break. Such templates could progressively evolve into a vocabulary whose asymptotic state could resemble Biederman's (1987) geons, if they were exposed to many man-made object categories. The extraction of 2D shape features could also require distinct mechanisms and representations. For example, 2D patterns (letters, numbers, textures, and so forth) could use feature creation mechanisms based on "growth" (see, e.g., Marr, 1982; Ullman, 1984). Small 2D patches could locally grow from the interior of a 2D pattern until boundary edges stop the growth. New shapes could then be learned from correlations across category exemplars. To illustrate, consider a simple example of this process (adapted from Schyns & Murphy, 1994). Object 1 is a 2D pattern in which arrows show the cusps which are perceptual indicators of its parts (see Figure 1). Consider that Object 1 and Object 2 (or Object 3) form a category. If a 2D contiguous patch is grown in Object 1, its intersection with the patch grown in Object 2 will identify a part feature (indicated by dotted lines on Figure 1). A different feature would result from the intersection of Object 1 with Object 3.

----INSERT FIGURE 1 AROUND HERE----

In short, we are arguing that different object categories are likely to prompt the acquisition of different types of features. These different categories are likely to necessitate differently biased mechanisms. Perceptual biases should facilitate the extraction of features in the considered objects (e.g., smooth vs. discontinuous, 2D vs. 3D, and so forth). Categorical biases should tune the nature of the features for the categorizations to be solved. Obviously, the examples discussed suggest the possibility of creating such features, but do not provide detailed realizations. It will be a difficult (but necessary) task to extract class-specific perceptual biases to build task-specific feature extraction mechanisms. We will come back to this point when we discuss formal mechanisms of feature extraction.

Both functional (categorical) and perceptual constraints determine what features will be created. Importantly, we see these constraints as mutually interactive rather than strictly sequential (see also Wisniewski & Medin, 1994). We might envision a system that first created a set of candidate features by applying perceptual constraints, and then selected the new feature from this set of candidates by applying functional constraints. However, such a system suffers from several problems. First, in many cases, an implausibly large number of candidates would need to be considered because of the underdetermining nature of perceptual constraints (e.g. a 2D object silhouette with 20 bumps on it would have 380 possible parsings even if only contiguous segments were considered). If functional constraints are only considered secondarily, then processing will be inefficient in that too many candidate features that are not potentially useful will be considered; the constraining role of functionality would not be fully exploited. Second, if very constraining perceptual constraints are applied (e.g., shape primitives), then the relevant feature will often times not be in the set of candidates. Third, there is substantial evidence that suggests that the functionality of a feature influences relatively low-level perceptual processing (Algom, 1992; Goldstone, 1994a; Goldstone, 1995; Oliva & Schyns, 1995; Rodet & Schyns, 1994). The cumulative effect of this evidence makes it unlikely that functionality is only considered after perceptual processing has been completed.

While we admit the intrinsic futility of searching for the boundary between perception and conception, we believe it useful to describe a continuum from perceptual to conceptual. What varies along this continuum is how much and what sort of processing has been done to the inputted information. Specifying exactly where experiential and categorical pressures influence processing along the perception/conception continuum is a real, although highly empirical, question. One apparently fruitful approach to specifying how early an influence conceptual factors have is by identifying influences on other processes. Thus, there is evidence that conceptual factors (knowledge of categories and attitudes) influence not only physical and immediate color judgments (Delk & Fillenbaum, 1965; Goldstone, 1995), but also that they exert an influence before the perceptual stage that creates color after-images has completed its processing (Moscovici, 1991). Similarly, there is evidence that conceptual factors related to one's knowledge of object categories exert an influence before the processing stage that produces figure/ground segregation (Peterson & Gibson, 1994).

Another approach to specifying locations of influence on a perceptual/conceptual continuum is by observing the time course for the use of particular types of information. For example, on the basis of priming evidence, Sekuler, Palmer, and Flynn (1992) argue that knowledge about what an occluded object would look like if it were completed influences perception after as little as 150 milliseconds. In general, there are experimental tools available that can identify when, absolutely and relative to other processes, conceptual factors modify information processing. Although the bulk of the work needed to specify the locus of influences precisely has yet to be done, the currently available work suggests a surprisingly early contribution of conceptual factors such as background knowledge and learned categories.

2.5. Feature Extraction and Experimental Materials

For reasons of experimental control, many cognitive psychology experiments in concept learning have used very simple stimuli varying on clearly demarcated dimensions. However, real-world objects often vary along many dimensions, and in most cases, it is difficult to even know what the correct dimensional descriptions are. Although there are excellent reasons for using simple, easily described experimental materials, one major disadvantage with this approach is that it may systematically underestimate the importance of finding an appropriate encoding for the stimuli. It may even be that the traditional use of simple materials produces a bias against finding evidence for feature creation.

Table 1 illustrates some properties of different types of materials used in experiments. The properties listed in the left column characterize many typical stimuli used in concept learning experiments. The properties listed in the right column "alternative materials" characterize materials that we believe are likely to promote the encoding of new features during concept learning. Conceptually, all of the properties in the left column serve to make task-relevant features easy to isolate and identify. Conversely, the properties in the right column make it likely that the relevant features are not originally encoded, but allow for the derivation of these features.

----INSERT TABLE 1 AROUND HERE----

Typically, alternative materials are dense (Goodman, 1965) in the sense that there is no limit on the amount of information that can be obtained from the input or the number of interpretations that can be made. So, alternative materials may contain many different levels of intrinsic structure, allowing for widely diverse feature sets to become relevant. Many blobby structures can be extracted from, for example, X-ray pictures that are not combinations of a priori diagnostic features (except to the radiologists). Conversely, traditional materials embody a single level of analysis into a priori known features. The primary level of analysis for alternative materials is subsymbolic because they are designed to insure that symbols (e.g., "square," "circle," "has-legs," etc.) are not easy to assign a priori to the important structures of the stimuli. Stimuli that are likely to be represented in an analog fashion may preserve topological relations which leaves open the possibility of a stimulus reinterpretation if new categorizations require such a reinterpretation. Discrete stimuli do not allow this possibility because their interpretation is often unequivocal and automatic. Figure 2 presents several examples of alternative materials that are used in our experiments on feature creation. Picture (a) shows a Martian Rock (Schyns & Murphy, 1991, 1994), picture (b) some doodles (Goldstone, work in progress), picture (c) some Japanese hiragana characters (Ryner & Goldstone, work in progress), picture (d) shows, from left to right an XY and a X Martian cells (Rodet & Schyns, 1994), picture (e) a Martian Lobster (Thibaut, 1995), and picture (f) a Martian landscape (Schyns & Thibaut, work in progress).

----INSERT FIGURE 2 AROUND HERE----

The task confronting subjects who are given what we are calling traditional materials is similar to the task confronting the child who must learn concepts such as dog, table, and father. The child must learn the features that comprise these objects in addition to learning the proper characterization of the concept. Many formal approaches to categorization explicitly avoid issues of feature representation. Researchers often adopt a stance of: "You tell me what the features are, and I will tell you how they are integrated to perform the categorization." That is, such formal approaches often place no constraints on what may count as a feature. In fact, the lion's share of the work involved in concept learning seems to involve finding the "right" description space for concept learning.

2.6. Evidence for Novel Functional Features

We claim that novel features are sometimes created, and are possibly irreducible to previously existing features of the system. One version of this latter claim is certainly false. Novel visual features are certainly reducible to their retinal encodings, and possibly to existing structures at lower level, early representations. Thus, it is a conceptual challenge to characterize what is a "novel feature." Part of the difficulty is that novelty implies a reference point. At the level of the retina, different encodings of the same object are always novel due to differences in the retinal projections of the input. However, conceptual encodings of this object are much more stable. Functional, high-level features presumably supply the basis for this stability. The question, thus, becomes: When is a functional feature novel?

Functionally, a feature may be novel simply because it encodes a categorization that was not performed previously. However, our conception of functionality is more constrained than this, referring to the synthesis of new elements from raw data. There are two difficulties with this latter variety of novelty. The first difficulty is pre-existence. How do we know empirically that a "created" functional feature did not exist prior to the categorization problem? The second difficulty is reduction. How can we insure that a "created" functional feature does not result from the combination of pre-existing functional features?

An ideal empirical test of pre-existence would demonstrate that a functional feature fx not initially present in the feature vocabulary becomes a member of the set as a result of learning a new categorization. The absence of fx from the initial vocabulary, together with successful categorizations of the new objects would suggest that fx was created (instead of merely weighted for its diagnosticity), assuming that fx is required to perform the categorization. However, empirical evidence for the absence vs. presence of fx is limited to a behavioral manifestation of the new feature (for example, in a transfer or a priming task). Unfortunately, a nonexistent feature is behaviorally equivalent to an existing feature with an "attentional weight" of 0. This equivalence pre-empts attempts to tease apart feature weighting from feature creation based on simple, direct tests of the existence of a feature in memory. Evidence of feature creation is, thus, necessarily indirect, testing the implications of foundational assumptions of fixed feature theories. Two of these assumptions are: (1) that input objects are systematically described by a pre-specified, fixed, unambiguous, and non-decomposable set of features, and (2) that learning always selects, combines, and weights the features of the fixed set that tend to characterize the relevant categories. An important implication of these two assumptions is that a change resulting from category learning is decisional and strategic. In this conception, learning weights features of the fixed set, but it does not change the perceptual analysis and the perceptual appearance of the input.

Consequently, one way to provide evidence for feature creation rather than feature weighting would be to find a suitable empirical test to show that category learning changes the initial featural analysis of the input. However, as is, this demonstration would not suggest that features are created, but only that different features are used for different categorizations. To be more convincing, the demonstration needs further constraints. For example, a design could be devised in which all possible pre-specified features entering the composition of the stimuli would be made equally diagnostic in different categorization conditions. Equally diagnostic features should elicit identical percepts of the category exemplars (because the groups would equally attend to the features defining the exemplars). If it could be shown that the same exemplar was differently perceived in the experimental groups, a feature weighting interpretation of this data would be difficult. Unfamiliar, perceptually dense materials must be used for such an experiment. As already explained, traditional materials tend to have singular featural descriptions, and their analysis into these features is relatively straightforward. Unfamiliar, dense stimuli must be used so that subjects learn how they are analyzed while learning their categorizations. In sum, a feature discovery explanation could be preferred to a feature weighting explanation if (1) an objectively identical object aspect was perceived differently as a consequence of category learning when (2) the experimental design would predict identical perceptions in terms of pre-specified weighted features.

The problem of the reduction of a functional feature to other functional features could be comparatively simpler to address empirically. In principle, if a functional feature is the combination of two or more other functional features, these other features should become active each time the new feature is presented. Thus, priming tests on these subfeatures could indicate whether or not the subfeatures participated to the encoding of the new feature.

These precautions should make it clear that it is always difficult to refute a feature weighting interpretation of categorization results. Part of the reason is that feature weighting is potentially irrefutable when it is used a posteriori to interpret patterns of data. Feature weighting is a form of curve fitting in cases when the weights of features are free parameters, given values that maximize categorization performance or fit to human results. Feature weighting therefore covers not one, but potentially infinitely many models of categorization, and can potentially accomodate any pattern of experimental data if its features are not pre-specified. The problem of attempting to explain features through the history of categorization allows the theorist to ask an important question: What counts as a feature? Most concept learning program do not addess this issue, but they nonetheless demand new features in different situations. We want to provide an explanation of this fact by suggesting potential mechanisms that generate new features. We accept the need for generating different feature sets for different tasks, but we would like the theorist to explain how the features come to be generated rather than simply posit their existence.

2.7. Advantages of New Feature Learning

A system that allows for the creation of new features during concept learning offers several advantages over fixed feature set approaches.

2.7.1. The most basic advantage, as alluded to earlier, is that an ability to acquire new features allows flexible but constrained features. Unlike purely formal models of similarity and categorization, our approach places constraints on what can count as features: Features will be incorporated into a system to the extent that they distinguish between object categories. Unlike fixed feature sets, we suggest that a componential theory should not be limited to the finite set of a priori features designed by a particular researcher for a particular domain.

2.7.2. A learned set may be equivalent to, but not limited to, other proposed fixed feature sets. Fixed feature sets are motivated by design considerations and by psychological evidence. For example, Biederman (1987) suggests that evidence in favor of geons as primitive features comes from studies that delete line segments from objects. When line segments are deleted in a way that does not allow geons to be recovered, object recognition is particularly impaired. However, to the extent that geons are useful features for object categorization, it is reasonable to suppose that they might be generated from functional constraints applied to more simple building blocks such as line segments, or corners, or surfaces. Consequently, evidence in favor of a particular set of features does not entail that the set of features is hard-wired.

2.7.3. A learned set permits a near-optimal fit between categorization demands and the expressive vocabulary. New features are created to represent new categorical commonalties or contrasts, and they can be optimally adjusted in number to a wide variety of task demands (e.g., expert categorizations and subcategorizations). To the extent that each new feature accommodates at least the categorization for which the feature was created, the vocabulary should be free of useless features. A fixed feature approach is necessarily much less parsimonious: Many spurious features must exist in the feature vocabulary to foresee new categorizations. This, however, ensures that most features of the fixed set are never used--they keep waiting for their "Godot category." Suboptimal fitting necessarily characterizes fixed feature set theories--outside the scope of the stimuli they were hand-crafted to represent.

2.7.4. A flexible set of features tuned to specific categorizations reduces the necessity of complex categorization rules. To illustrate that good representations often carry most of the burden of categorization, consider the XOR problem in learning theory. XOR is a binary function categorizing the pairs (0, 0) and (1, 1) as members of the "-1" category and the pairs (1, 0) and (0, 1) as members of the "1" category. Categorization rules that separate the "-1" from the "1" category are complex nonlinear rules because no linear solution (a straight line) achieves the separation. Complex learning problems often become simpler with better representations. The addition of another number, provided as a third input to XOR, which is 1 whenever the two input numbers are both 1 or both 0, and 0 otherwise. This simple recoding simplifies the problem, which now has a linear solution. Although XOR is only a simple formal problem, it nonetheless illustrates the general point that carefully crafted representations often reduce the complexity of categorization processes.

Concept learning theories have frequently stressed the importance of learning categorizations by discovering complex rules that integrate several distinct stimulus features (Bruner, Goodnow, & Austin, 1956; Nosofsky, Palmeri, & McKinley, 1994). Certainly, concept learning sometimes requires such integrations. However, these situations are characterized by effortful, strategic problem solving. They seem to be rather unnatural; people do not seem to be particularly adept at explicitly combining together psychologically separated sources of information. The alternative suggested by our framework is that new categorizations can be based on relatively few, specially tailored features. Humans seem to be much more adept at creating coherent, useful features than they are at simultaneously attending several unrelated sources of information.

2.7.5. In the flexible feature approach, task requirements can produce a decomposition of features into subfeatures. To illustrate, consider the example of glasses and cans. Early in conceptual development, it is conceivable that these two concepts are not distinguished. That is, the representation of glasses and cans in memory can be achieved by a single, undifferentiated conceptual unit. Assume that contingencies are such that the organism must learn to make a distinction within this broad category. This can be achieved by decomposing the undifferentiated feature into two specific features tailored to glasses and cans.

The acquisition of a new feature that segments an initially undifferentiated, unitary feature could account for conceptual differentiation. Differentiation phenomena could include, but are not limited to, the basic to subordinate shift (Tanaka & Taylor, 1991), the narrowing of children's lexical categories (Chapman, Leonard & Mervis, 1986) and the construction of conceptual hierarchies (Schyns & Murphy, 1994). In our view, there is little principled distinction between concepts and features. Features are often potentially decomposable concepts themselves. Cars may be usefully decomposed into features such as wheels, but wheels are themselves hardly elementary, nondecomposable features. Even features such as color that may appear unitary and unstructured, may be decomposed into sub-units (hue, saturation, and brightness) in certain conditions (Foard & Kemler, 1984; Goldstone, 1994a).

3. COMPARISON TO OTHER APPROACHES

We have argued that the description space of objects is often created to reflect the specific categorization requirements of an organism. We described some of the advantages of a feature set grounded in the organism's history of categorization over the fixed feature sets proposed by many theories of object categorization and recognition. Our proposal for creating new features touches on several issues related to perceptual and conceptual change. The following sections discuss the similarities and contrasts between our proposal for feature creation and Fodor's innatist argument, chunking, constructive induction, developmental constraints on feature extraction, and formal models of feature extraction.

3.1. Fodor's Innatist Argument

The view that new perceptual vocabulary elements are created as a function of tasks and experiences will strike many as already having been definitively rejected by Fodor's arguments. In Language of Thought (Fodor, 1975, also see Fodor, 1981), Fodor raises an argument against the possibility of learning a new concept that has expressive power not already present in the representational system. The argument roughly goes: 1) New concept learning involves hypothesis formulation followed by hypothesis testing, 2) To formulate the hypothesis, one must already be able to represent the new concept in the old representation system, and 3) "So, either the conditions on applying a stage two new concept can be represented in terms of some stage one concept, in which case there is no obvious sense in which the stage two conceptual system is more powerful than the stage one conceptual system, or there are stage two concepts whose extension cannot be represented in the stage one vocabulary, in which case there is no way of the stage one child to learn them" (p. 90). What has been said here of concepts is equally applicable to the new functional features that we describe. Essentially, if new descriptions can be learned, then the system must have already had the representational wherewithal to generate them; if not, then the acquisition of new descriptions is not possible.

Compelling replies to Fodor's arguments have been raised elsewhere (e.g., Harnad, 1987). For our present purposes, three responses are sufficient. First, Fodor seems to underestimate the extent to which the environment, rather than the individual's representational system, drives the "hypothesis formulation" stage of new concept or feature learning. On being told that a complex "squiggle" belongs in a category and only this squiggle belongs in the category, people can formulate an hypothesis that this squiggle determines categorization after approximately one trial. The process of formulating the hypothesis is basically one of imprinting upon the input information, not one of combining primitive symbols within the existing representational system together. New object descriptions do not solely rely on our ability to recombine symbols; the perceptual information impinging upon us provides a wealth of hints about what useful descriptors might look like.

Second, increases in representational power within a system can arise by allowing the system to have access to previously inaccessible representations. Given the strong possibility that humans have multiple representation systems (acknowledged by Fodor, 1983), it is possible for one or more system to possess elements that, upon learning, become accessible to other systems. To take a purely symbolic example, a system than can represent multiplication can represent number series such as "32, 16, 8, 4, ..." and a system that can represent addition can represents series such as "2, 5, 8, 11, ..." Neither of these systems, by itself, has the representational capacity to represent series of the form "32, 20, 14, 11, ..." [new = (old/2)+4]. However, if these two systems gain access to each other's capacities, then the resulting new system with unrestricted access may have the power to represent this series. That is, the entire system can gain representational power by accessing representations that it was unable to access previously. Although it is farfetched to believe that an addition system exists separately from a division system, it is highly plausible that perceptual properties that are processed by one system are not automatically accessible to other systems. Given the development of specialized perceptual systems for specific tasks, representations may initially be tied to particular systems or domains, and only become more widely available with learning. Gravitational information is picked up by the vestibular system and used to control eye position, but is not generally accessible to us, to the dismay of carpenters who still need their plumb lines. To explain how people develop the ability to respond to Gibson and Gibson's (1955) 'squiggles,' it is natural to propose that information which was initially only available in the early stages of visual processing becomes generally available to discrimination processes. Certainly the information that is necessary to drive the discrimination is provided to us by the eye, or else it is hard to imagine how a person could ever come to learn the correct discrimination. Still, unless that information is accessible by processes that drive discrimination judgments, it is not represented by the entire system. Perceptual learning can operate by increasing the accessibility of information. In these cases, the representational capacity of parts of the system may be augmented by gaining access to other parts of the system.

Third, cognitive processes that are not strictly within the bounds of the symbolic representation system can still exert an influence on representations that are formed (Harnad, 1990). In order to explain how people have concepts at time t+1 that they did not have at time t, Fodor (1981) argues that the new concepts are "triggered" but are not built out of other concepts or data from the sensorium. Similarly, in Language and Thought, Fodor argues that changes in representational power can come about through "trauma" (in his example, by being hit on the head) but not through computation. The use of "triggered" and "caused by trauma" to explain representational change insinuates that these changes are sudden, inexplicable, and haphazard. However, changes to a representational system may be orderly, purposeful, and adaptive even when they are produced by non-computational, physical means. A change is computationally caused if it results from the application of formal operators on symbols, and is physically caused if it caused by mechanical means. A physical process may produce a "mutation" to a representational system, and thereby change the representational power of the system. Now, consider the case in which the mutation is not purely random, but is guided by particular stimuli presented and their classifications. Physically based mutations to a representation seem less "traumatic" if they generally increase the system's power and predictably occur. Environmental influences on feature development may be considered by Fodor to be traumatic in the sense that they produce physical changes that are outside of the possibilities allowed within the representational system, but this does not prevent these changes from reliably increasing the representational system's fitness. In fact, evolution would be expected to steer physical processes toward creating useful representations. Physically induced changes to representations may not be accidents or flukes; rather, they may reliably occur, and the organism may be designed precisely so that they do occur.

An example of this third ideas occurs in the use of genetic algorithms for creating executable programs (Koza, 1994). Symbolic, Lisp-like codes are evolved for solving problems such as predicting numerical sequences. With mutation (replacing a symbol with a different symbol) and cross-over (recombining two expressions) operations that operate on initially random codes, the Fibonacci sequence {1, 1, 2, 3, 5, 8, 13, ...} can be solved if the system develops the code (+ (S (j-1)) (S (j - 2))), where S(j) is a built in function that outputs the jth number of the series S. Clearly, once this code is built, the system has the power to represent sequences that it was once unable to represent. If cross-overs and mutations are viewed as computational devices for syntactically rearranging components, then the system as a whole can be viewed as having the representational power necessary to represent the Fibonacci series, as Fodor would maintain. However, as the namesakes of these operations in biology suggest, these operations may be physical, concrete operations. An electronic "glitch" may change a "*" function to a "+" function. Viewed in this manner, external pressures may alter the representational capacity of the codes. These pressures, rather than being haphazard or "traumatic," are designed specifically so that the representational codes will, in the long run and probabilistically, have greater "fitness" - greater similarity to the actual numerical series desired. The physical mutation process is present in the system because it tends to produce codes of increasing fitness; organisms that developed this method for increasing the variability of their codes tended to prosper and procreate. As indicated by our second response, these pressures need not be physical operations, but may simply be operations that come from a different representational system. The point is: just because a representational change is caused by something (representations from different systems or non-symbolic perceptual information) outside of the representational system, does not decrease the potential usefulness of the change.

While the evolutionary operators of mutation and cross-over can be viewed either as being operators within the representational system or as external processes that affect representations, many cases of perceptual feature learning are more conducive with the latter view. Features can be learned by: tracing the contour, or part of the contour, of an object; transforming shapes by expansion, rotation, or selective stretching; or by segmenting an image into smaller regions defined by physical cues. A sufficient number of concrete proposals exist (e.g. Marr, 1982; Ullman, 1984) to make the idea of non-symbolic routines for creating feature representations plausible as an alternative to representational changes that solely rely on a system's initial symbolic capacity.

Genetic algorithms can operate in two types of search spaces. In a fixed search space, a population of genotypes initially spans the entire space. Reproduction, mutations and cross-over change the population so that it converges upon the optimal subspace with respect to a fitness function. These searches are called "fixed-length" because all genotypes of a population have the same size. Fixed-lenth searches are covered by Holland's Schema Theorem; they can be interpreted in terms of a function optimization seeking intersections of hyperplanes in a fixed dimensional space. "Novel" genotypes in fixed spaces are then members of a pre-defined space. Another, more interesting, and probably closer conception of novelty to our proposal for flexible features exists in other types of search spaces.

"Flexible" or "variable-length search spaces" allow the size of the individual genotypes to vary in length (e.g., Harvey, 1992; Harvey, Husbands & Cliff, 1994; Koza, 1990). Newly created genes are added to the genotype. This changes the dimensionality of the search space (each gene is one dimension of the search). With time, the search space grows to incorporate new structures in response to environmental contingencies. As summarized in Harvey (1992, p. 8) "When one allows the genotype to vary in length the search space is potentially infinite and it stops making sense to think of it as predefined." The idea of variable-length searches is that novel structures that were not a priori specified by the combination rules of a fixed grammar can be progressively incorporated in the genotype. These new structures affect the intrinsic complexity of the system by adding new functionalities. Novelty can occur in evolution and representational systems, but only if our computational metaphors evolve to permit this possibility.

In sum, the theoretic impossibility of changes in representational power follows only if a representational system 1) cannot imprint on external tutoring signals, 2) completely spans the entire organism without sub-systems, and 3) is completely divorced from adaptive, physical influences. Each of these premises seems unlikely to us. Given the strong evidence for developmental (and if not developmental, then at least evolutionary) increases in representational power, and the computational advantages of representational change, it seems reasonable to question the premises on which this unlearnability thesis depends.

3.2. Chunking and Perceptual Unitization

Research in the visual search literature has supported perceptual changes similar to the types of changes that we have discussed. Training, or automatization effects occur when people actively search for a particular target shape (for example the letter "A") in a visual array of distracter letters (for example, "M," and "W"). In Fisher's (1986) model of visual search, letters are represented by simple features such as horizontal, vertical and diagonal line segments. Similarity between featural descriptions generally makes it more difficult to find, for example, an "A" amongst "W"s than an "A" amongst "M"s; "A" and "W" share two diagonal bars but "A" and "M" have no common feature. However, even when featural descriptions are quite similar, extensive training significantly speeds up search times (e.g., Fisher, 1986).

Czerwinski, Lightfoot and Shiffrin (1992) recently suggested that a perceptual change called perceptual unitization could largely explain the training effects observed in visual search. Perceptual unitization is a mechanism which produces perceptual features from a set of more elementary components. These new features significantly speed up visual search because they recode objects with a more efficient vocabulary -- a vocabulary tailored to the specifics of the search task.

Our proposal of creating new features has both similarities and differences with unitization and chunking. It is similar in that visual search may be framed as a categorization task of distracters and targets. Chunking, then, can be viewed as a context-dependent process influenced by the contrasts and similarities between targets and distracters. This reformulation of visual search emphasizes functional constraints that the chunking process must satisfy; units will be formed that allow members of the target category to be distinguished from distracters. It also allows specific predictions to be derived. For example, in Fisher's (1986) and Czerwinski et al.'s (1992) models, chunked features could represent any subpart of the capital letters, the subpart that reliably unifies and distinguishes the categories. Perceptual chunking probably is an important mechanism of feature creation. The principles governing chunking cannot be fully understood without the notion of category contrasts and similarities.

The differences between unitization and functional feature creation are mostly consequences of using discrete vs. continuous stimuli. As its name indicates, unitization requires the stimuli to be discretized before being unitized. However, it is frequently difficult to assess exactly what discretization the visual system initially applies to a stimulus before unitization occurs. Czerwinski et al.'s stimuli are designed to bias processing according to a particular discretization: line segments (but the authors acknowledge that they can only hope for this segmentation). These stimuli could give the impression that our perceptual systems initially segment the environment into little line segments, and then constructs complex task-dependent representations by unitization. However, the varieties of recognition tasks we face make it very likely that there is not one particular scale of description that is universal.

There are multiple psychophysical and computational models converging on the observation that perception operates simultaneously at multiple spatial scales and that the coarser scales often are sufficient for effective processing of complex pictures (e.g., Burt & Adelson, 1983; De Valois & De Valois, 1990; Marr, 1982; Schyns & Oliva, 1994; Watt, 1987; Witkin, 1986). Multi-scale representations suggest that the input stimuli are discretized at different scales, possibly using scale-specific feature vocabularies. If line segments may serve as the discrete elements at the finer spatial scales (though even here there are serious difficulties), "blobs" or other image measurements are more appropriate for discretizing the coarser scales. A conjunction of high resolution edges often maps to a single coarse-scale blob, suggesting that the input signal could initially be parsed into large components that do not result from fine scale unitizations. Therefore, efficient parsings of real-world stimuli could initially operate with the scale-specific primitives closely corresponding to the relevant events of the input signal (e.g., Oliva & Schyns, 1995). These scale-specific primitives should be adjustable to scale-specific shapes and therefore should be sensitive to task contingencies. Scale-specific vocabularies could arise by applying our proposal for learning new features to the spatial scales made available by perception.

In summary, although chunking probably is an important mechanism for creating new perceptual features, we think there are alternatives. Chunking applies only to a priori discretized stimuli, but evidence suggests that stimuli are not unequivocally discretized into their smallest structures (or for that matter into a single, preferred scale). Large features may be registered without being composed out of smaller features, and small features may sometimes be created by decomposing larger features.

3.3 Constructive Induction

The idea of creating new featural descriptions has been a direct concern of a branch of machine learning called constructive induction (Matheus, 1991; Michalski, 1983). In constructive induction, new features are created by applying inductive operators to the existing set of features. For example, objects that belong to a category may originally be described as 74, 78 or 71 cm tall. With the "close interval" operator, a single new feature "any height between 70 and 80 cm" may be created. Generally, the operators that have been considered have been highly symbolic, including logical operators like "and" and "or," and hierarchical relations between category classes. As an example of the latter type of operator, a playing card that was originally represented as "diamond" may be recoded as "red" if the system knows that diamonds are red.

Hofstader and his colleagues (French & Hofstader, 1991; Mitchell, 1993) have also been concerned with computational systems that create new descriptions for input patterns. For example, Mitchell and Hofstader's Copycat system, when processing the letter sequence "PPQRR," may develop either the description "P, followed by the series PQR, followed by R" or the description "a Q in the middle, flanked by a pair of Ps on the left and a pair of Rs on the right." The description that emerges will depend upon the other developing structures. Copycat creates new descriptions by establishing groups of related letters, and by relating these groups.

Wisniewski and Medin (1994) have recently provided empirical evidence indicating that people alter their descriptions of objects to fit the provided category labels (see also Medin, Goldstone & Gentner, 1993). The same figure in a child's drawing may be interpreted as a tie or buttons, depending on how the drawing is labeled. The authors argue that new descriptions are created when links are established between abstract background knowledge (e.g. "creative children should show more detail in their drawings") and concrete object information.

Our proposal for learning new features is consistent with the above proposals. Although many of the ideas are similar, our stress is different in several respects. We have stressed the need for relatively raw stimulus properties to be preserved for new features to be created. As argued earlier, if distilled, symbolic representations are used to create new features, then there will be severe limitations on the types of new object interpretations that are possible. Such is the case with typical constructive induction systems. Although they can produce an infinite number of new features by successive application of inductive operators, the new features are highly constrained by the object interpretation made by the primitive symbolic features. Both the original features and the new features in constructive induction algorithms are discrete symbols that are the product of an object interpretation process. Far greater flexibility in feature creation can be achieved by beginning with object representations in terms of raw features that have not undergone interpretation. Harnad (1990) has made a similar point with respect to the need for grounding symbols in terms of representations that are non-symbolic. The representation should be raw enough, for example, so that both symbolic interpretation of an "X" ("two crossing diagonal lines" and "a 'v' and an upside-down 'v' just touching") can be generated (McGraw, Rehling, & Goldstone, 1994).

By stressing the importance of raw inputs that implicitly preserve object properties, our approach to new feature creation also stresses perceptual constraints on feature extraction. Whereas constructive induction techniques can create arbitrarily complex features, features that are generated by humans are constrained by perceptual factors such as topology, spatial proximity and global coherence. Thus, features that are generated by standard constructive induction techniques may be improperly constrained in opposing ways. They may be too constrained by the initial symbolic representations, and they may not be sufficiently constrained by properties of our actual perceptual systems.

Another difference is that we have stressed the perceptual changes that accompany feature creation. In standard feature creation techniques, new features are added to the system's vocabulary, but there is little reason to suggest that the new features alter the appearance of the described objects. Rather, they alter the properties that will be inferred about the objects. There is a difference in immediacy between seeing and inferring that an object might be expressed in terms of a particular feature. The psychological evidence that we have reviewed suggests that the immediate appearance of objects (e.g., their discriminability and apparent organization) is altered by experience. Mitchell and Hofstader's letter series may provide an intermediate case (see also Chalmers, French, & Hofstadter, 1992). When people interpret "PPQRR" in a particular way it may be a cognitive inference, an immediate perceptual phenomenon, or something in between. The same ambiguity seems to exist for the high-level features (e.g. forks, traps, and pawn support structures) that are used by chess experts but not novices (De Groot, 1965).

In sum, work in constructive induction is certainly relevant to the current claims for feature creation. Our approach differs in focus from much of this work by focusing on the perceptual constraints and consequences of feature creation, and the importance of beginning with relatively raw object representations for developing novel interpretations of an object.

3.4. Developmental constraints on object feature extraction

There is in principle an infinite number of ways to describe real-world objects with features. This poses a serious problem for developmental psychologists who must explain how children acquire a particular featural object description from a limited data set. Relatedly, when acquiring a new word meaning, children are "faced with an infinite set of possibilities about what a novel word might mean" (Markman, 1995, p.199; see also Landau, 1994; Markman, 1989; Quine, 1960; Jones & Smith, 1993). To reduce the indeterminacy of featural descriptions, it has been proposed that young learners come equipped with biases towards particular aspects of stimuli that increase the speed and accuracy of learning (Landau, 1994; Markman, 1995; Eimas, 1994). These biases are of two sorts: They can arise from theories and beliefs about objects in the real world, or they result from perceptual structures and processes. We discuss how these biases constrain the development of functional features, and we argue that they are not sufficient.

An influential account of conceptual development proposes that new features and concepts are direct consequences of the development of theories--i.e., naive mental explanations of phenomena (Carey, 1985; Gelman, 1988; Keil, 1989; Murphy & Medin, 1985). This conception of concepts proposes that perceptual features (e.g., body_shape, length_of_legs, number_of_legs, and so forth) lie at the periphery of concepts whereas our theories about the causes of category membership (e.g., a genetic code) are at the core of conceptual organization. It has been suggested that the conceptual core exists prior to experience with the world and that it could bias the features young infants preferentially notice in objects (Spelke, 1994; Carey, 1985).

To accommodate new experiences, the conceptual core develops either discontinuously, or continuously. In the discontinuous view, the conceptual core develops through a differentiation process: New explanatory constructs (concepts and features) result from the differentiation of existing constructs of an earlier theory (Carey, 1991). Consequently, children's theories may be incommensurable with corresponding theories in adults (Carey, 1985, 1991; Keil, 1989). For example, Smith, Carey, & Wiser (1985) showed that undifferentiated concepts like "weight" and "density" in 5 to 7 year old children were distinct concepts in older children. Interviews conducted with the children revealed that the theories covering these concepts differed between young and older children.

In opposition to the discontinuous development theory, Spelke (1994) suggests that there is a continuity with respect to theories used during conceptual development. For Spelke, there is an innate constant core at the center of the (intuitive, naive) knowledge later used by older children and adults. Spelke argues that the constant core consists of general constraints that govern the way children perceive and reason about objects in different domains. Innate constraints would make it possible for the child to isolate objects from the environment and learn about them. As Spelke (1994, p.439) puts it: "learning systems require perceptual systems that parse the world appropriately." Among other constraints, Spelke suggests that an innate cohesion principle biases children to group parts that move together into a single objects (see also Eimas, 1994, for a related point of view).

In a continuous or discontinuous view of development, innate knowledge is important because it reduces the indeterminacy of featural descriptions to those dictated by pre-existing theories. For example, Spelke's cohesion principle could facilitate the parsing of objects from their background and bootstrap category learning. In general, however, there is a conceptual difficulty with the idea that innate knowledge constrains perceptual information: Going from theories to predict perceptual data is underconstrained. To illustrate, if a categorizer is instructed that a set of objects with an unknown complex structure is a set of hammers, an existing theory of hammer should list the parts composing the objects. However, unless the theory also specifies all perceptual appearances of these parts, a segmentation procedure would still have difficulties locating the actual parts in a new object: The perceptual realization of the parts depends on the new stimulus itself. Note that this difficulty was already encountered when describing the fixed feature approach: It is difficult to predict a priori all possible perceptual appearances of a particular concept.

Recent work of Thibaut (1994) investigated mappings of theories on perceptual features. In a feature circling task, subjects were instructed to parse the stimuli of a category of unfamiliar objects that displayed the same overall shape and structure (see Figure 2, picture e). All subjects were given a category name so that the corresponding general knowledge could assist their segmentations. When asked to name the segmented parts, subjects did not use the same name (e.g., the same part could be called "head", "leg", or "body" by different subjects). Thus, even when a theory provides a listing of the parts to be searched, the assignment of each part to a perceptual structure is not completely constrained by theories (Thibaut & Schyns, 1995).

3.4.1. The early role of perception in object parsing.

Theories are one source of constraints to reduce the perceptual indeterminacy of stimuli. However, it has been recently suggested that perception also biases children predispositions towards objects. Experimental evidence has revealed that category inductions are guided by a bias for the shape of objects (see Jones & Smith, 1993; Landau, 1994, for reviews of the relevant data). In a typical design, children are presented with a novel three-dimensional object named with a novel name (a count noun). Children are then asked which objects (of a set of objects which have, or do not have, the same shape, texture, and size) should be called by the same name. This is compared to children who are simply asked to select objects that are like the novel object, with no name provided. Converging evidence suggests that children generalize from object names on the basis of shape and neglect large differences on other object aspects. This bias appears to develop until the age of two; later, the shape bias predominates only when children are given a count noun (see Jones & Smith, 1993).

The shape bias is intended to reduce some of the indeterminacy of category induction. However, if complex shapes are decomposable into many different (and sometimes mutually exclusive) sets of components, a bias towards shape is only a first necessary step. Other constraints are required to guide the decomposition of a particular shape into its features. In other words, it remains to be explained how children learn a particular object decomposition, given the large number of possibilities. Such an explanation of parsing could extend the shape bias to specify precisely which aspects of shape attract attention (and therefore bias segmentation) at different stages of development. Early biases for shape aspects could bootstrap simple conceptual systems, but it is conceivable, and even desirable that these early biases are later superseded by biases resulting from experience with particular object classes. As argued earlier, segmentation routines for different classes of geometrical objects (e.g., continuous vs. discontinuous surfaces) could develop and become more adept at making the fine segmentations required by conceptual expertise. Given the importance of this topic for an understanding of early conceptual development, it is surprising that there are so few data on the development of segmentation skills.

Thibaut (1995a) explored the development of segmentation skills in different age groups. Adults, and children aged 4 and 6, were instructed to learn a category of unknown stimuli and were later tested on the parsings of its exemplars. The stimuli shared a global shape and were composed of a common set of shape features that varied slightly across exemplars (see Figure 2, picture e). Children's parsings were very inconsistent compared to adults' parsings. For example, although component parts kept the same relative locations across exemplars, children's parsings often violated topological coherence. That is, children's segmentations changed the location of the same part across exemplars, and the number of segmentations was not constant across stimuli. Together, these inconsistencies stress that when children attend to shape, they can be biased to local similarities between shape aspects, at the expense of a consistent integration of shape aspects across instances. Consequently, the new shape features that children isolate are structurally different from those adults extract from identical materials.

This has important implications for the early extraction of features and category learning. Children's biases towards locally salient properties could impair, or even prevent, their learning of new categories, when these are defined by features comparatively less salient. Recent evidence in Thibaut (1995b) showed that 6 year old children could not learn a simple categorization (a first category defined by the perceptual cue "a-group-of-three-legs-plus-one" and a second category defined by "two-groups-of-two-legs") when the size and orientations of the legs that were irrelevant for categorization varied across exemplars. However, children of the same age experiencing the categories without variations across exemplars had no difficulty learning the categories. These results emphasize the interaction between the development of a feature vocabulary and specific perceptual biases. Over the course of conceptual development, children must learn to neglect irrelevant perceptual characteristics of the stimuli when they learn new categorizations. So far, the processing differences that could explain children's difficulties to segment stimuli consistently into their features are not clear.

In summary, we have presented theories and perceptual biases as possible predispositions of children towards specific object aspects. We argued that these biases were not sufficiently specific to predict the actual segmentation of an object belonging to a category. The structure of the categorization problem itself could be an important constraining factor on the featural descriptions of objects, but it remains to be explained how young children utilize this structure to discover relevant object features.

3.5. Formal Models of Feature Extraction

The problem of finding relevant structures in data is not only the province of psychologists. For decades, mathematicians have been confronted with this issue. Mathematically, an object is often expressed as an n-dimensional feature vector. Each slot of the vector encodes the presence vs. absence, or the values of the n attributes describing the object (e.g., its parts, their shapes, colors and textures). Geometrically, different points in n-dimensional space encode different objects, and categories of similar objects form clouds of points. There are many ways to encode objects, ranging from the raw pixel intensities of digitized pictures, to sophisticated properties that are known to be diagnostic for classification--e.g., number_of_legs, has_wings, has_fur, has_feathers, and hibernates. Although the latter representation would describe animals in an appropriate feature space, pixel-arrays would require extensive processing before diagnostic properties are captured. Our proposal for functional features concerns the extraction of new structures from perceptual data. How could has_feathers be discovered from a training set of pixel arrays, or similarly unstructured representations?

3.5.1. Properties of high dimensional spaces and the bias/variance dilemma.

Many models of concept learning have successfully shown that category representations can be learned from exemplars when there is a prespecified, small feature set (e.g., Gluck & Bower, 1988; Krushke, 1992; Rumelhart, Williams & Hinton, 1986; Widrow & Hoff, 1960); their task is not to discover the feature set from high-dimensional raw data. However, it could be argued that the discovery of features from such high-dimensional spaces is not substantially different from standard mechanisms of category learning. Both concern the extraction of task-dependent invariants. An argument could be made that standard concept learning models operating in low-dimensional spaces could simply be scaled up to operate in high-dimensional spaces.

One of the problems with this idea is that high-dimensional spaces are mostly empty. To illustrate, imagine discretizing a line, a squared plane, a cube and a hypercube with tiles of equal section (e.g., 10 tiles per side). There is a geometric increase (in the example, 101, 102, 103, 104) in the number of tiles that cover the objects. If each tile is represented by an n-dimensional data point, the example shows that one needs approximately 10n tiles to cover an n-dimensional space. If the input distribution varies along many degrees of freedom, a learning problem in high-dimensional space may require an unrealistically large training set to discover robust features, even if an asymptotic solution exists in principle.

This curse of dimensionality (Bellman, 1961) imposes severe limitations on the idea of directly applying simple supervised categorization models to discover perceptual features. Typical concept learning models learn category decision boundaries from a set of pairings of exemplars and their respective category labels. Formally, this consists of finding a function f which successfully approximates the desired category name y from an input x. Often times, f is chosen so as to minimize a cost function. Popular concept learning networks minimize the sum of the square of the error between the estimated and the desired category labels (e.g., Rumelhart, Williams & Hinton, 1986; Widrow & Hoff, 1960).

Generally speaking, "error-based" categorization models such as backpropagation are nonparametric statistical models (Geman, Bienenstock & Doursat, 1992). They are nonparametric because the networks are not biased to particular classes of solutions. Instead, the architectures are unbiased so as to flexibly discover structures from data. Mathematical analysis has shown that the error term (specifically the expected mean square error) of these networks can be algebraically decomposed into a bias and a variance term (see Geman et al., 1992, p. 9-10). These two terms summarize the bias/variance dilemma (Geman et al., 1992). Networks make a bias error when they are dedicated to a class of solutions that is not appropriate for the categorizations at hand. Such networks may be too rigid and flexibility (low bias) would be needed to extract task-specific features. However, low bias comes at the cost of high variance, the second component of the error (where variance means the discrepancy between the correct teacher category and the categorization of the network). There is high variance because a flexible system is too sensitive to the data. That is, it learns many idiosyncrasies of the exemplars (e.g., differences in lighting conditions, rotation in depth, translation in the plane, and so forth) before learning the invariants of a category. Consequently, experience with many exemplars is necessary for the network to "forget" idiosyncrasies and learn relevant abstractions. Only with great experience is the system able to categorize accurately (keep the variance low). The curse of dimensionality is such that unbiased machines designed so as to flexibly discover many types of new perceptual features will often need implausibly large training sets to achieve good categorizations. Note that this problem does not greatly affect fixed feature models which usually operate in smaller spaces for which sufficient exemplars can be generated. The bias/variance dilemma addresses practical computability, not principled limitations.

An ideally flexible system should be constructed so as to keep bias and variance low, using a reasonable training set. The bias/variance dilemma is somewhat analogous to the contrast between structured and unstructured features discussed earlier. By analogy, fixed sets of structured features make it difficult to learn new categorizations (and therefore raise the bias error). In contrast, unstructured systems will tend to capture irrelevant aspects of the input set that have little relation to the actual basis of categorization (and therefore raise the variance).

3.5.2. Dimensionality reduction

Complex supervised categorization problems in high-dimensional spaces would be simplified if it were possible to reduce the dimensionality of the input. Several linear and nonlinear dimensionality reduction techniques have been designed to achieve this goal. Underlying dimensionality reduction is the idea that information processing is divided into two distinct stages. A first stage constructs a representation of the environment and a second stage uses this representation for higher-level cognition such as categorization and object recognition. It is hoped that the constructed representation in a smaller dimensional space is more useful than the raw input representation.

To illustrate, consider the popular technique called Principal Components Analysis. If redundancies exist in the input data, there should be fewer sources of variation than there are dimensions (i.e., p << n). PCA finds the first k orthogonal directions of highest variation in a data set. If each input vector of a high-dimensional space is recoded in terms of a linear combination of the first k sources of variation, the intrinsic structure of the data will be preserved to a first approximation (see Oja, 1982; Sanger, 1989). In general, however, the featural interpretation of principal components is often difficult because orthogonal directions of highest variance have little connection to the best projections for categorization. That is, there are no psychological constraints on the principle components. Principle components need not be spatially or topologically coherent (perceptual constraints), or summarized by a single explanation (conceptual constraints).

Other dimensionality reduction techniques aim at reproducing the intrinsic structure of the input space. Examples of these range from Shepard's (1957) early Multi Dimensional Scaling and Sammon's nonlinear Mapping (1969), to more recent Kohonen maps (Kohonen, 1984), models, and a promising extension to Kohonen Maps called Curvilinear Component Analysis (Demartines, 1994). These algorithms project an n dimensional space on a smaller p dimensional space, while keeping most of the information about the organization of the input space. To illustrate, consider two distinct clouds of points forming two categories in a "high"-dimensional space composed of four dimensions (n = 4). Assume further that exemplars of the first category are identical on two dimensions, while exemplars of the second category have only one dimension in common. While the points of the first cloud lie on a plane (p = 2), the points of the other category have three degrees of freedom (p = 3) and therefore are in three-dimensional space. This simple example illustrates that data sets may have local distributions with different intrinsic dimensions of variation (in the example, 2 and 3). Projections of high-dimensional inputs onto lower-dimensional spaces should account for these intrinsic characteristics, if they want to preserve the important degrees of freedom of the distribution. Unfortunately, techniques for discovering the intrinsic dimensionality of a data distribution are also plagued by high-dimensionality. The number of data points necessary to reliably estimate the structure of a distribution may be enormous if the intrinsic structure is high. Dimensionality reduction techniques also need to give up generality for biases, at the expense of possibly missing "important" structures in the data.

Ideally, we would like the formal definition of "important" in lower-dimensional structures to be closer to the categorization task the system needs to solve. Recent approaches to dimensionality reduction have incorporated measures of "feature goodness" in the algorithm for determining good dimensions of recoding. For example, Intrator (1994; Intrator and Gold, 1993) discusses a technique in which input data are projected onto dimensions that have many distinct clusters of data points (multimodal distributions). This unsupervised technique is more likely to discover dimensions useful for distinguishing categories under the assumption that different categories produces clusters within the data. Intrator (1994) reports that his technique worked on stimuli with 3969 and 5500 dimensions and that few training data were necessary for extracting robust features. This and related techniques based on projection pursuit (Friedman & Stueltze, 1981) provide methods with interesting biases for exploring high-dimensional data spaces.

In the reviewed dimensionality reduction techniques, the feature extraction stage operates independently of higher-level processes, and thus there is no guarantee that the extracted features will be useful for higher-level processes (Mozer, 1994). The functionality principle suggests that the categorizations being learned should influence the features that are extracted. In other words, top-down information should constrain the search for relevant dimensions/features of categorization. Thus, we believe the serial process of (1) project high dimension space onto a new lower dimension space, then (2) determine categorization with new dimensions, will have to be modified such that the second process informs the first. However, computational considerations make it likely that different aspects of perceptual feature extraction need strong biases; biases that do not trivialize the categorization problem (as fixed features often do), but that are sufficiently constraining to allow the learning of general features from a reasonable number of data points (a similar opinion is defended in Anderson & Rosenfeld, 1988; Geman et al., 1992; Shepard, 1989; among others). It is, for example, conceivable that different constraints will be needed to model the categorization of intrinsically different object classes such as faces, man-made vs. natural objects and textures, natural and artificial scenes, and so forth. The empirical study of these psychological constraints and biases should explicitly account for the discussed interactions of categorization and perception, even if they significantly complicate the problem.

4. CONCLUSIONS

The function of a feature is to express commonalties between members of the same category, and differences between categories. Either people come equipped with a complete set of features that account for all present and future categorizations, or, working backwards, people sometimes create new features to represent new categorizations. We argued for an approach in which people create features in order to subserve the categorization and representation of objects. We presented psychological evidence and theoretical arguments for the necessity of flexible features in object categorization theories. Flexible features allow the learning of new but perceptually constrained features when new categorizations must be represented. Thus, provided an appropriate history of categorization, a learned set of features may be equivalent, but not limited to, proposed sets of fixed features. As new features are created to represent new categorical contrasts and similarities, a learned set permits an efficient fit between categorization demands and the feature vocabulary, which should then be free of useless features. Flexible features are inherently linked to categorization tasks and therefore reduce the need for complex categorization rules by providing efficient representations. In addition, advantages can be accrued by decomposing features into subfeatures, without representing all possible decompositions of a holistic feature a priori. In our view, there is little difference between concepts and features: Someone's unitary concept maybe someone else's decomposable structure, depending on the individuals' histories of categorization.

Experimental materials are more likely to promote feature creation when they are not designed with a priori diagnostic features, leading to obvious feature decompositions. These alternative materials (see Figure 2) do not limit the information that can be obtained from the input, have many distinct intrinsic structures, are not exhausted by their symbolic descriptions such as "has-legs," "square," "circle," and so forth. In short, alternative stimuli evoke a representation of their structure in a raw, analog form, in a form authorizing the possibility of a stimulus reinterpretation if new categorizations require such a reinterpretation.

In our view, two types of category learning should be distinguished. Fixed space category learning occurs when new categorizations are representable with the available feature set. Flexible space category learning occurs when new categorizations are not representable with the available features. Whether fixed or augmented learning occurs depend on the requirements of a particular categorization situation. That is, it depends on the featural contrasts and similarities between the new category to be represented and the individual's concepts. Fixed feature approaches face one of two problems when they are confronted with tasks that require new features. If the fixed features are fairly high-level and directly useful for categorizations (such as Biederman's geons), then they will have insufficient flexibility to represent all objects that may be relevant for a new task. If the fixed features are small, subsymbolic fragments (such as pixels), then regularities at the level of functional features, regularities that are required to predict categorizations, will not be captured by these primitives.

Flexible features and the perceptual learning they involve have important similarities, differences and implications for various fields of Cognitive Science. Perceptual unitization similarly proposes that the recoding of stimuli with new features affects the perceptual appearance of the object. However, unitization assumes that stimuli are initially analyzed into components before being unitized, but evidence suggests that stimuli are not unequivocally discretized into their smallest structures (or for that matter into a single, preferred scale). Functional constraints influence the scale of discretization. The field of constructive induction in artificial intelligence is concerned with creating new object descriptions to assist categorization. In many cases, the new descriptions are simple symbolic transformations of existing symbolic descriptions. Instead, we have stressed the need to create object descriptions from relatively raw, unprocessed, perceptual representations, and to create new descriptions by incorporating perceptual rather than purely formal constraints. Developmental biases (both theory-based and perceptual) that could constrain feature extraction were reviewed. We argued that neither the shape bias nor a priori theories were sufficiently constraining to predict the actual perceptual features that are discovered in objects. These features are also provided by the structuring role of learned categories. Formal analogies with the principles we discuss are found in statistical techniques of dimensionality reduction and their network implementations. These techniques also attempt to reduce an initially high dimensional categorization space to a lower dimensional space representing important features. Supervised learning is closer to the principles we discuss since it explicitly provides top-down information to constrain the search for categorization features. However, it needs to be properly constrained to be practically feasible. We believe properly constrained dimensionality reduction techniques (techniques constrained by perceptual and categorical factors) come closest to the principles we discuss.

5. REFERENCES

Aha, D. W., and Goldstone, R. L. (1992). Concept learning and flexible weighting. Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society. (pp. 534-539). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Algom, D. (1992). Psychophysical approaches to cognition. Amsterdam: North Holland.

Anderson, J. A. & Rosenfeld E. (1988). Neurocomputing, Foundations of Research. MIT Press, Cambridge, MA.

Andrews, J., Livingston, K., Harnad, S., & Fisher, U. (in press). Learned categorical perception in human subjects: Implications for symbol grounding.

Barrett, S.E. & Shepp, B.E. (1988). Developmental changes in attentional: The effect of irrelevant variations on encoding and response selection, Journal of Experimental Child Psychology, 45, 382-399.

Bellman, R. E. (1961). Adaptive Control Processes. Princeton University Press, Princeton, NJ.

Biederman, I. (1987). Recognition-by-components : a theory of human image understanding. Psychological Review, 94, 115-147.

Braunstein, M. L., Hoffman, D. D., & Saidpour, A. (1989). Parts of visual objects: An experimental test of the minima rule. Perception, 18, 817-826.

Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. New York: Wiley.

Burns, B., & Shepp, B. E. (1988). Dimensional interactions and the structure of psychological space: The representation of hue, saturation, and brightness. Perception and Psychophysics, 43, 494-507.

Burt, P., & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31, 532-540.

Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.

Carey, S. (1991). Knowledge acquisition: enrichment or conceptual change? In S. Carey & R. Gelman (Eds.). The epigenesis of mind. Hillsdale, NJ: Lawrence Erlbaum.

Chalmers, D. J., Mitchell, R. M. & Hofstader, D. R. (1992). High-level perception, representation and analogy. Journal of Experimental and Theoritical Artificial Intelligence, 4, 185-211.

Chapman, K. L., Leonard, L. B. & Mervis, C. B. (1986). The effect of feedback on young children's inappropriate word usage. Journal of Child Language, 13, 101-107

Clark, E. V. (1973). What's in a word? On the child's acquisition of semantics in his first language. In T.E. Moore (Ed.), Cognitive development and the acquisition of language (pp. 65-110). New York: Academic Press.

Czerwinski, M., Lightfoot, N., & Shiffrin, R. M. (1992). Automatization and training in visual search. American Journal of Psychology, 105, 271-315.

De Groot, A. D. (1965). Thought and choice in chess. The Hague: Mouton.

De Valois, R. L., & De Valois, K. K. (1990). Spatial Vision. Oxford University Press: New York.

Delk, J. L., & Fillenbaum, S. (1965). Differences in perceived color as a function of characteristic color. American Journal of Psychology, 78, 290-293.

Edelman, S., & Bülthoff, H. H. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32, 2385-2400.

Eimas, P. (1994). Categorization in early infancy and the continuity of development. Cognition, 50, 83-93.

Elio, R., & Anderson, J. R. (1981). The effects of category generalizations and instance similarity on schema abstraction. Journal of Experimental Psychology : Human Learning and Memory, 7, 397-417.

Fisher, D. L. (1986). Hierarchical models of visual search: Serial and parallel processing. Paper presented at the meeting of the Society for Mathematical Psychology, Cambridge, MA.

Foard, C.F., & Kemler Nelson, D.G. (1984). Holistic and analytic modes of processing: The multiple determinants of perceptual analysis. Journal of Experimental Psychology: General, 113, 94-111.

Fodor, J. (1975). The Language of Thought. New York: Crowell.

Fodor, J. A. (1981). The present status of the innateness controversy, Chapter 10 in Representations. Cambridge: MIT Press. (pp. 257-316).

Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture : a critical analysis. Cognition, 28, 3-71.

French, R. M., & Hofstadter, D. (1991). Tabletop: An emergent, stochastic model of analogy-making. In Proceedings of the Thirteenth Annual Cognitive Science Society Conference, 708-713. Hillsdale, NJ: Lawrence Erlbaum Associates.

Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Lawrence Erlbaum.

Gelman, S.A. (1988). The development of induction within natural kind and artifact categories. Cognition, 20, 65-95

Geman, S., Bienestock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.

Gibson, E. J. (1969). Principles of perceptual learning and development. New York: Appleton-Century-Crofts.

Gibson, E. (1971). Perceptual learning and the theory of word perception. Cognitive Psychology, 2, 351-368.

Gibson, E. J. (1991). An odyssey in learning and perception. MIT Press: Cambridge.

Gibson, J. J., & Gibson, E. J. (1955). Perceptual learning: Differentiation or enrichment? Psychological Review, 62, 32-41.

Gibson, E. J., & Walk, R. D. (1956). The effect of prolonged exposure to visually presented patterns on learning to discriminate them. Journal of Comparative and Physiological Psychology, 49, 239-242.

Gluck, M. A., & Bower, G. H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195.

Goldstone, R. L. (1994a). influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123, 178-200.

Goldstone, R. L. (1994b). The role of similarity in categorization: Providing a groundwork. Cognition, 52, 125-157.

Goldstone, R. L. (1995). Effects of categorization on color perception. Psychological Science, 6, 298-304.

Goodman, N. (1965). Fact, Fiction, and Forecast. 2d ed. Indianapolis: Bobbs-Merrill.

Harnad, S. (1987). Category induction and representation. in S. Harnad (ed.) Categorical perception: The groundwork of cognition. Cambridge: Cambridge University Press (pp. 535-565).

Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335-346.

Harvey, I. (1992). Species adaptation genetic algorithms: A basis for continuing SAGA. Proceedings of the First European Conference on Artificial Life. MIT Press: Cambridge, MA.

Harvey, I., Husbands, P, & Cliff, D. (1994). Seeing the light: Artificial evolution, real vision. Proceedings of the Third International Conference on Simulation of Adaptive Behaviour. MIT Press: Cambridge, MA.

Hill, H., Schyns P. G. & Akamatsu, S. (1995). Information and viewpoint dependence in face recognition. Submitted for publication.

Hinton, G., Williams, K., & Revow, M. (1992). Adaptive elastic models for handprinted character recognition. In Moody, J., Hanson, S., and Lippmann, R. (Eds.) Advances in Neural Information Processing Systems, IV, San Mateo, CA. Morgan Kaufmann. 341-376.

Hinton, G. E., & Zemel, R. S. (1994). Autoencoder, minimum description length, and Helmholtz free energy. In Cowan, Tesauro & Alspector (Eds.), Advances in Neural Information Processing, VI, San Francisco, CA: Morgan Kauffman.

Hoffman, D. D. & Richards, W. A. (1984). Parts of recognition. Cognition, 18, 65-96.

Intrator, N. (1994). Feature extraction using an unsupervised neural network. Neural Computation, 4, 98-107.

Intrator, N. & Gold, J. (1993). Three dimensional object recognition using an unsupervised BCM network: The usefulness of distinguishing features. Neural Computation, 5, 61-74.

Jakobson, R. Fant, G., & Halle, M. (1963). Preliminaries to speech analysis : the distinctive features and their correlates. Cambridge, MA: MIT Press.

Jones, S. & Smith, L. (1993). The place of perception in children's concepts. Cognitive Development, 8, 113-139.

Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39, 170-210.

Keil, F.C. (1989). Concepts, kinds and cognitive development. Cambridge, MA: MIT

Kohonen, T. (1984). Self-organization and associative memory. Berlin: Springer-Verlag.

Koza, J. R (1994). Genetic Programming II. Cambridge, MA: MIT Pess.

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.

Kurbat, M. A. (in press). Structural description theories: is RBC/JIM a general purpose theory of human entry-level object recognition. Perception.

Landau, B. (1994). Object shape, object name, and object kind: representation and developement. In Medin (Ed.). The Psychology of Learning and Motivation, 31, 253-304. Academic Press: San Diego, CA.

Lawrence, D. H. (1949). Acquired distinctiveness of cues: I. Transfer between discriminations on the basis of familiarity with the stimulus. Journal of Experimental Psychology, 39, 770-784.

Markman, E. (1989). Categorization and naming in children. Problems of induction. Cambridge, MA: MIT Press, Bradford Books.

Markman, E. (1995). Constraints on word meaning in early language acquisition. In L. Gleitman & B. Landau (Eds). The acquisition of the lexicon, 199-227. Cambridge, MA: MIT Press.

Marr, D. (1982). Vision. A computational investigation into the human representation and processing of visual information. San Francisco : W.H. Freeman.

Matheus, C. J. (1991). The need for constructive induction. In L. Birnbaum and G. Collins (Eds.) Proceedings of the eighth international workshop on Machine Learning. Morgan Kaufmann: San Mateo, CA.

McGraw, G., Rehling, J., & Goldstone, R. L. (1994). Letter perception: Toward a conceptual approach. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 613-618). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Medin, D. L. Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254-278.

Michalski, R. S. (1983) A theory and methodology of inductive learning. Artificial Intelligence, 20, 111-161.

Mitchell, M. (1993). Analogy-making as perception. Cambridge: MIT Press.

Moscovici, S., & Personnaz, B. (1991). Studies in social influence: VI. Is Lenin orange or red? Imagery and social influence. European Journal of Social Psychology, 21, 101-118.

Mozer, M. (1994). Computational approaches to functional feature learning. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 975-976). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Murphy, G.L., & Medin, D. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289-316.

Murphy, G. L., & Ross, B. (1994). Predictions from uncertain categories. Cognitive Psychology, 27, 148-193.

Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 87-108.

Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception mode of classification learning. Psychological Review, 101, 53-79.

Oja, E. (1982). A simplified neuron model as a principal components analyzer. Journal of Mathematical Biology, 15, 267-273.

Oliva, A. & Schyns, P. G. (1995). Mandatory scale perception promotes flexible scene categorization. Proceedings of the XVII Meeting of the Cognitive Science Society, 159-163, Lawrence Erlbaum: Hilldsale, NJ.

Palmer, S. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology, 9, 441-474.

Palmer, S., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of objects. In J. Long & A. Baddeley (Eds.), Attention and Performances IX. Hillsdale, NJ: Lawrence Erlbaum.

Peterson, M. A., & Gibson, B. S. (1994). Must figure-ground organization precede object recognition? An assumption in peril. Psychological Science, 5, 253-259.

Pevtzow, R., & Goldstone, R. L. (1994). Categorization and the parsing of objects. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 717-722). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263-266.

Quine, W. (1960). Word and object. Cambridge, MA: MIT Press.

Rodet, L. & Schyns, P. G. (1994). Learning features of representation in conceptual context. Proceedings of the XVI Meeting of the Cognitive Science Society, 766-771, Lawrence Erlbaum: Hilldsale, NJ.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: Bradford Books.

Sammon, J. W. (1969). A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, C-18, 401-409.

Sanger, T. D. (1989). Principal components, minor components, and linear neural networks. Neural Neworks, 2, 459-473.

Schank, R. (1972). Conceptual dependency : a theory of natural language understanding. Cognitive Psychology, 3, 552-631.

Schyns, P. G. (1991). A neural network model of conceptual development. Cognitive Science, 15, 461-508.

Schyns, P. G., & Murphy, G. L. (1991). The ontogeny of units in object categories. Proceeding of the XIII Meeting of the Cognitive Science Society, 197-202, Lawrence Erlbaum: Hilldsale, NJ.

Schyns, P. G., & Murphy, G. L. (1994). The ontogeny of part representation in object concepts. In Medin (Ed.). The Psychology of Learning and Motivation, 31, 305-354. Academic Press: San Diego, CA.

Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time and scale dependent scene recognition. Psychological Science, 5, 195-200.

Schyns, P. G., & Rodet, L. (1994). Categorization creates functional features. Submitted for publication.

Sekuler, A. B., Palmer, S. E., & Flynn, C. (1992). Local and global processes in visual completion. Psychological Science, 5, 260-267.

Selfridge, O. G. (1959). Pandemonium : a paradigm for learning. In Symposium on the mechanization of thought processes. Proceedings of a Symposium held at the National Physical Laboratory, November 1958, Vol. 1. London : H.M.Stationery Office.

Shepard, R. N. (1957). Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325-345.

Shepard, R. N. (1989). Internal representations of universal regularities: A challenge for connectionism. In Neural Connections, Mental Computation, L. Nadel, L. A. Cooper, P. Culicover, and R. M. Harnish, (Eds.), 104-134. Bradford/MIT Press, Cambridge, MA, London, England.

Smith, C., Carey, S., & Wiser, M. (1985). On differentiation: a case study of the development of the concepts of size, weight and density. Cognition, 21, 177-237.

Smith, L. B. (1989). A model of perceptual classification in children and adults. Psychological Review, 96, 125-144.

Smith, L. B., & Kemler, D. G. (1978). Levels of experienced dimensionality in children and adults. Cognitive Psychology, 10, 502-532.

Spelke, E. (1994). Initial knowledge: six suggestions. Cognition, 50, 431-445.

Tanaka, J., &Taylor, M. (1991). Object categories and expertise : is the basic level in the eye of the beholder ? Cognitive Psychology, 23, 457-482.

Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation-dependence in shape recognition. Cognitive Psychology, 21, 233-282.

Thibaut, J.P. (1994). Role of variation and knowledge on stimuli segmentation: developmental aspects. Paper Presented at the Sixteenth Annual Meeting of the Cognitive Science Society. Atlanta.

Thibaut, J.P. (1995a). The development of features in children and adults: the case of visual stimuli. Proceedings of the Seventeenth Annual Meeting of the Cognitive Science Society, 194-199, Hillsdale: N.J., Lawrence Erlbaum.

Thibaut, J.P. (1995). The development of features in children and adults: the case of visual stimuli. Paper presented at the Seventeenth Annual Meeting of the Cognitive Science Society, Pittsburgh.

Thibaut, J.P., & Schyns, P.G. (1995). The development of feature spaces for similarity and categorization. Psychologica Belgica, 35, 167-185.

Treisman, A., & Gelade, G. (1980). A feature-integration theory of attention, Cognitive Psychology, 12, 97-136.

Ullman, S. (1984). Visual routines. Cognition, 97-159.

Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-254.

Ward, T. B. (1983). Response tempo and separable-integral responding: evidence for an integral-to-separable processing sequencing in visual perception. Journal of Experimental Psychology : Human Perception and Performance, 9, 103-112.

Watt, R. (1987). Scanning from coarse to fine spatial scales in the human visual system after the onset of a stimulus. Journal of Optical Society of America, A 4, 2006-2021.

Widrow, G. & Hoff, M. E. (1960). Adaptive switching circuits. Institute of radio engineers, western electronic show and convention, convention records, 4, 96-194.

Wisniewski, E. J., & Medin, D. L. (1994). On the interaction of theory and data in concept learning. Cognitive Science, 18, 221-281.

Witkin, A. (1986). Scale-space filtering. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 1019-1022. Los Altos, CA: Morgan Kauffman.

6. TABLES

Table 1. Stimuli typically used in concept learning versus stimuli likely to give rise to encoding new features

Traditional Materials Alternative Materials

Properties of Dimensions in Isolation

Discrete Analog/Continuous

Symbolic Sub-symbolic

Parts easy to delineate Parts difficult to delineate

Few features Large number of potential features

Relevant features are salient Relevant features are not salient

No emergent properties Emergent properties

Single level of analysis Multiple levels of analysis

Large dimension values differences Small dimension value differences

Properties of Dimensions in Context

A priori diagnostic features A priori nondiagnostic features

Features have constant instantiations Features are variably instantiated

7. FOOTNOTES

Title Note: This work was funded in part by an NSERC Grant awarded to Philippe Schyns and by National Science Foundation grant SBR-9409232 awarded to Robert Goldstone

8. FIGURE CAPTION

Figure 1. This figure, adapted from Schyns and Murphy (1994) illustrates a possible interaction between perceptual and functional constraints in learning new features of object representation. The arrows in the target object indicate perceptual constraints on its segmentation. The Target and Object 1 (or Object 2) constitute a category. The dashed lines on the bottom objects illustrate that the shape features extracted on the target also depends on its category membership.

Figure 2. This figure illustrates examples of the alternative materials that are used in our experiment. Picture (a) shows a Martian Rock (Schyns & Murphy, 1991, 1994), picture (b) some doodles (Goldstone, work in progress), picture (c) some Japanese hiragana characters (Ryner & Goldstone, work in progress), picture (d) shows, from left to right an XY Martian and a X Martian cell (Schyns & Rodet, 1994), picture (e) a Martian Lobster (Thibaut, work in progress), and picture (f) a Martian landscape (Schyns & Thibaut, work in progress).