Thursday, 23 December 2010

Defining Goals: Minimalist Angst (II)

[This is a draft excerpt from Chapter 2 of a proposed monograph on Vietnamese, in which I try to tackle some general theoretical problems. As ever, I really would appreciate comments, and will incorporate feedback in future drafts. Thank you. PS. If you wish to cite this, please reference it as  Duffield, Nigel Particles and Projections in Vietnamese Syntax, draft ms., University of Sheffield.

What is the goal of Linguistic Theory?

To begin, it is appropriate to consider how different linguists view the bigger picture: the overarching goals of grammatical theory. In a recent review article,[i] Cedric Boeckx, a leading advocate and practitioner of Mainstream Minimalism, responds to (frequently levied) criticisms that the framework is imprecisely formalized and hence inadequate for grammatical description, as follows:
…the goal of the generative enterprise in linguistic theory is not to decide whether natural languages can be studied in terms of sets, proofs or models. The idea expressed in Chomsky (1957) that it is possible to bifurcate the set of sentences into the grammatical and ungrammatical and define theoretical adequacy on the basis of that distinction was quickly abandoned.[ii] Instead, as is made extremely clear in the first chapter of Aspects of the Theory of Syntax (Chomsky 1965), the goal of linguistic theory, once firmly placed in a cognitive, and ultimately biological, setting, is to give an account of how children are able to acquire their native languages. In such a setting, talk of models, proofs or sets is largely irrelevant (Boeckx 2006, [emphasis mine]).
To anyone familiar—and in agreement—with Chomsky’s more general writing on linguistics over the past 30 years, this response may appear unexceptionable: it is certainly a predictable reiteration of the ‘party line’ on Explanatory Adequacy. Undeniably, these comments highlight the valid point that a scientifically interesting theory of language should have more ambitious goals than to provide a formally precise description of the grammatical structures of a particular language (even though this is a formidable—and some would claim, largely unanswered—challenge: see amongst others, Johnson & Lappin 1999, Seuren 2004, Blevins 2009). No-one could claim that the Chomskyan programme, from Aspects through GB to current Minimalism, has been short on ambition: there can be few more exciting or complex scientific projects than to understand the nature of the human language faculty or to explain children’s capacity to acquire their native language: what Pinker (1994: Chapter 1) terms ‘an instinct to acquire an art.’ 

Nevertheless, what is remarkable about the way in which the generative notion of Explanatory Adequacy has come to be defined, at least to sceptics of the generative enterprise—and especially to researchers involved in child language development—is the obvious disconnect between the rhetoric and stated rationale of mainstream generativism  on the one hand, and its empirical concerns, on the other. If “the goal of linguistic theory…is [in fact] to give an account of how children are able to acquire their native languages…” then the naïve observer might expect the core research agenda of Minimalism to be devoted to empirical issues in language acquisition research: for example, to determining what all children end up knowing about their language, and when they come to know it; to investigating the extent of true convergence on common grammatical principles (something that is assumed, but rarely tested); to explaining and reconciling the tension between the Logical Problem of Language Acquisition—how it is that children project beyond the variable input to which they exposed to achieve relatively uniform and highly sophisticated grammatical knowledge—and the Developmental Problem—giving an account of how and why children’s early comprehension and production diverges from that of adults (see Atkinson 1990, cf. O’Grady 1995); to understanding the relationship between grammatical knowledge and language processing in language development, and so forth. 

There are, of course, researchers both within and outside the generativist camp whose empirical work addresses precisely these kinds of questions: leading advocates of a generativist approach to language acquisition include Barbara Lust, Stephen Crain, Nina Hyams, Colin Philipps, Tom Roeper, William Snyder, Kenneth Wexler and their students and co-workers; significant alternative perspectives have been also offered by Elizabeth Bates, Melissa Bowerman, Eve Clark, Elena Lieven, William O’Grady, Mark Seidenberg, Dan Slobin and Michael Tomasello, amongst many others. However, the point is that acquisition research, far from driving developments in Minimalist theorizing, is usually regarded as (at best) being tangential to theoretical concerns.[iii] Perhaps the clearest indication of this neglect is the dearth of reference to (or use of) any empirical data from language acquisition in almost all of the core technical literature (e.g., Chomsky 1993, 1995, 1999, 2000, 2001, 2002): where acquisition data are advanced, it is generally only in the service of rhetorical theoretical arguments about the utility of negative evidence and/or Poverty of the Stimulus.

From one perspective, of course, this neglect is unsurprising: if one takes the innateness of language (I-language) to be a fact apriori, there is no logical reason to be concerned with the vagaries of E-language development. Moreover, it follows from innateness that there is little reason to investigate individual differences in development or grammar attainment (aside from pathological ones) nor, indeed, to be greatly concerned with cross-linguistic differences in language structure: from a Mainstream Minimalist perspective, innateness implies universality, hence surface structural differences are treated as peripheral to I-language, as “interface properties” at best, or as E-language properties.[iv] To draw on a frequently used analogy: if one is interested in understanding the genetic basis of avian flight, the developmental and cross-species differences between humming-birds, sparrows and eagles are probably of limited interest, fascinating though they may be to amateur birders, ethologists, or veterinary surgeons: see Marr (1982).[v]

Upon reflection though, there are significant difficulties with this line of argument. The of these is the obvious point that not everyone accepts claims of apriori Knowledge of Language (and the concomitant notion of instantaneous acquisition; see Chomsky 1975, Dresher 1999, cf. Weinberg 1990, Penner & Roeper 1998):[vi] even those, like myself, who are willing to entertain innateness as an hypothesis, generally prefer to use the theory to generate and test relevant empirical predictions against a representative sample of language data—for example, using syntactic theory to try to uncover contentful (see below) formal universals, or to probe the relationship between proposed parameters of grammatical variation and the steady-state grammars of particular languages, or, within the field of first language acquisition, to explore the independence of grammatical knowledge from uncontroversially learned properties of lexical knowledge (cf. Bates & Goodman 1987)—rather than to take innateness as a starting point for empirical inquiry.

A significant problem here is that the standard characterization of explanatory adequacy conflates two quite separate research questions, viz. (i), determining the nature of the human language faculty (FL), and (ii), explaining children’s capacity to acquire their native languages (instances of L). Though these are intimately related—and though it may be reasonable to suppose under a particular presentation of the argument that understanding the first question is prerequisite for progress with the second—the issues are logically and empirically separable.[vii] This is clearly demonstrated by the fact that it is possible to hold contrary positions with respect to the innateness of the two capacities—indeed, for there to be distinct empirical ‘facts of the matter’. For example, it may turn out that Knowledge of Language is part of our biological endowment, but that the capacity to deploy this knowledge to acquire a particular grammar is not—or rather, is not domain-specific, as generative theory generally insists: see e.g., Anderson & Lightfoot (2002). Alternatively, it could be that the capacity to acquire language is innately given and domain-specific,[viii] but that the steady-state grammatical systems that are actually acquired (LEnglish, Lfrench, Lchichewa, etc.) are externally determined, internalized theories of linguistic behavior, which are partly—or even largely—unconstrained by biological or domain-specific cognitive factors. Indeed, Chomsky (1975) considers such an idea quite plausible:
I have been assuming that UG suffices to determine particular grammars (where again, a grammar is a system of rules and principles that generates an infinite class of sentences with their formal and semantic properties). But this might not be the case. It is a coherent and perhaps correct proposal that the language faculty constructs a grammar only in conjunction with other faculties of mind. If so, the language faculty itself provides only an abstract framework, an idealization that does not suffice to determine a grammar (Chomsky 1975: 41) [emphasis mine: NGD].
This latter conception of grammar acquisition is one to which many might subscribe, even beyond the generativist camp. It will also be clear that these two options do not exhaust the possibilities; many other conceptions are possible. Whatever the truth of the matter however, these are logically distinct research questions; hence, it is distracting to use one as the rationale for the other, as most mainstream generative syntacticians seem to have done recently, including Boeckx.[ix]



[i] Review of Postal (2003) Sceptical Linguistic Essays. Oxford: OUP. For a more positive assessment of Boeckx’ commentary, see Collins (2009).
[ii] Boeckx accurately represents Chomsky’s assertion that the basis of grammatical well-formedness of sentences is not the object of inquiry, something that is explicit in the following quote:
The class [of well-formed (grammatical) expressions of L] has no significance. The concepts ‘well-formed’ and ‘grammatical’ remain without characterization or known empirical justification; they played virtually no role in early work on generative grammar except in informal exposition, or since (Chomsky 1993: 44-45).
Several authors have questioned whether this remark—the last clause especially—has any basis in fact. As Geoff Pullum observes, uncompromisingly:
The concept of grammaticality not only played a role in early generative grammar, but the role it played was that of being the only data considered relevant in linguistics…the claim that the concept of grammaticality played no role in early generative grammar is certainly an untruth (Pullum 2006: 139 [Emphasis mine]).
In fact, both quotations contain independently valid statements.  Pullum is surely correct to say that the concept of grammaticality has always played a crucial role in generative theory construction and practice, from Aspects to the present day (however poorly the notion may be understood (Chomsky 1977, Allen & Seidenberg 1999). At the same time, Chomsky’s (and Boeckx’s) assertion is correct—in a very narrow, almost legalistic, sense—to the extent that grammaticality is taken to refer to classes or extensional sets of well-formed vs. ill-formed sentences: at least since Chomsky (1981), it has been clear that such sets of sentences belong to ‘E-language’, which Chomsky rejects as a legitimate, or even coherent, object of inquiry. Nevertheless, what is important is the set of mental states (Knowledge of Language) that occasions grammaticality, or rather, underlies the capacity to give “grammaticality judgments” about core data: it seems perverse to deny that this capacity has always been central to generative theorizing or practice (or indeed that to think that it should not be).
[iii] (It should be) needless to say that there is nothing original about this observation: since the beginning of generative theory, not only psycholinguistics, psychologists and developmentalists and typologists, but also dissenting voices within the generativist camp have repeatedly criticized the failure to incorporate—or even acknowledge—the results of empirical investigations of grammatical phenomena from other sources: see Eysenck (1984), Cutler (2005), or the following well-known quote from Tom Roeper, cited in Newmeyer (1983), Featherston (2007):
‘when psychological evidence has failed to conform to linguistic theory, psychologists have concluded that linguistic theory was wrong, while linguists have concluded that psychological theory was irrelevant (Roeper 1982).’
[iv] As discussed below, the change from GB to Minimalism marks a significant change in imperviousness of core syntax to external language-particular factors, including word order.
[v] From this perspective, one might as well conclude that penguins and chickens ‘know’ how to fly (“I-flight”), or that whales know how to walk (“I-walk”), even though subsequent evolutionary changes have left them unable to implement this knowledge (“E-locomotion”).  Though such a conclusion may seem absurd to many, it is a logically consistent and rational one, reflecting an attitude that is, I think, not so distant from many syntacticians’ views on grammatical competence.
[vi] For example, Dresher (1999) writes: 
‘The early stages of acquisition, during which the grammars of language learners are most idiosyncratic and most different from the target adult language, have no effect upon the grammar eventually acquired. As far as the final result goes, these stages can be ignored for purposes of the logical problem of language acquisition, and acquisition is as if it were instantaneous.’
Contrast this with Weinberg (1990: 165): 
As is well known, current work in generative grammar makes the major idealization of instantaneous acquisition, the assumption that there is no ordering relationship between pieces of linguistic knowledge. This assumption is assuredly false.’ See Ayoun (2005) for alternative discussion.
[vii] Not everyone accepts this logical priority, of course: see, for example, Seidenberg & MacDonald (1999): 
Instead of asking how the child acquires competence grammar, we view acquisition in terms of how the child converges on adult-like performance in comprehending and producing utterances. This performance orientation changes the picture considerably with respect to classic issues about language learnability, and provides a unified approach to studying acquisition and processing (Seidenberg & MacDonald 1999: 570).
[viii] A capacity variously labeled Language Acquisition Device (LAD), the “language organ”, Faculty of Language (FL) etc.: see below.
[ix] To use a social science analogy, to claim that the goal of linguistic theory is to explain child language acquisition is not unlike claiming that the goal of economic theory is to explain the poverty gap in capitalist societies: whether intentional or not, the rhetorical significance of referencing a vulnerable social group (young children, the poor, respectively) should not be underestimated.

Thursday, 16 December 2010

The Kids Are Alright...aren't they?

[This is the pre-print version of a commentary article which appeared last year in Second Language Research. If you wish to formally respond to this article, it should be cited as  Duffield, Nigel. 2009. The Kids Are Alright…aren’t they?: Commentary on Lardiere. Second Language Research 25, 269-278.]

Lardiere’s reflections on Minimalist mechanisms of second language acquisition are as timely as they are thought-provoking. As is perhaps inevitable, empirical work in acquisition tends to lag behind the theory that drives it, and such articles are invaluable in helping to “reset default values” in our theorizing. Of course, there is some irony—and possibly more than coincidence—in the distinctively retro flavour of both revisions: just as the move from GB to Minimalism rehabilitates an earlier phase of generative theory (Chomsky 1957,1964, 1965), so Lardiere steps back to the future in drawing out the valuable aspects of Lado’s (1957) proposals (while ‘putting aside the “behaviorist” baggage of contrastive analysis…(ms. p 49’)). Whether such retrospection is to be welcomed as a belated appreciation of past scholarship, or lamented as a failure of imagination, is something I cannot clearly decide on: either way, the sixties are clearly in again, and reflected the anachronistic title of this note (with thanks to The Who).[1]

Lardiere’s paper raises (reawakens?) at least three concerns one might have about the assumptions underlying this type of approach to second language acquisition. Here, I shall briefly mention the first—which may be better addressed by other commentators—and elaborate further on the latter two, where my knowledge is somewhat more secure.


1. What does formalization buy us?

The first concern has to do with the implicit assumption that formalization of the problem of second language acquisition necessarily moves us closer to an explanation or deeper understanding. More specifically, the question is whether translation of pre-theoretical notions such as plurality, collectivity, specificity or definiteness into a calculus of feature values advances our understanding of acquisitional mechanisms. Note that this is not the more dog-eared question of whether or not formalization is desirable in general: there are good arguments going back to Suppes (1968) suggesting that in certain domains—including grammatical theory—this is the case. Rather, the question is whether formalization of this particular kind benefits either the SLA researcher in understanding language acquisition, or the second language learner in implementing it.[2] Concretely, when Lardiere asserts (ms. p. 27) that:
‘[Patty]…has acquired knowledge that English plural marking can co-occur with non-human, quantified, and indefinite nouns, and in this sense, she has successfully “reassembled” the features associated with English plural marking from the way they are organized in Chinese,’
it is reasonable to ask whether the second half of this sentence adds anything to our knowledge of Patty’s competence. It may be that it does, but it is not self-evident.

2. Do native speakers converge?

The second assumption necessary for Lardiere’s project to get off the ground is that naïve adult native-speakers show clear evidence of strong convergence on the same set of feature-values, as demonstrated through production and judgment data.  By naïve speakers, I mean non-linguists from a variety of social and educational backgrounds; by strong convergence is  intended something more stringent than the kind of ‘threshold convergence’ typically observed in generative SLA experiments, where the need to make stimuli accessible to beginning and intermediate language learners results in ceiling effects for native-speaker controls (who may or may not converge under a more fine-grained analysis).

There are actually two causes for concern here—or perhaps two aspects of the same worry, it’s difficult to be sure. The first is prompted by work showing that university-educated second language learners reliably outperform less-educated native-speakers in relatively straightforward judgment and comprehension tasks. For example, in experiments conducted by Dabrowska & Street (2006)—see also Dabrowska (1997)—less-educated native-speakers actually performed below chance (36% correct), when asked to identify the “doer” in implausible passive sentences such as The cat was chased by the mouse: this result compared with above-90% performance by two groups of non-native speakers in the same condition of the study. Now, one response to results such as these may be to adopt the position of Gleitman & Gleitman (1979), who, when faced with a very similar discrepancy between different groups of native-speakers,[3] concluded that:
‘Language-judgment functions [across native-speakers are] orthogonal to language functions…We suppose that individual differences in language behavior occur more severely at the judgmental level than at the speech and comprehension level...That is, we claim the differences in tacit knowledge are small in comparison to differences in the ability to make such knowledge explicit…(Gleitman & Gleitman, 1979:123)’,
In other words, substantial variability in native-speaker proficiency—that is to say, task-specific performance relative to some presumed target behaviour—need not necessarily reflect any difference in underlying competence.

Whatever one’s opinion of the validity of this ploy, it raises some awkward questions for second language research. On the one hand, if one wants to maintain that less-educated native-speakers are just as competent as more-educated speakers in spite of the behavioral evidence to the contrary, then that same kind of evidence cannot be used to draw any inferences about the underlying competence  of second language learners (one way or the other). This then is a perfectly legitimate move, but it does sharply narrow the empirical base. Conversely, if one concludes that such tasks only tell us about language proficiency—but if proficiency rather than competence is the really important thing, and a property that distinguishes among native-speakers also—then we should perhaps worry less about abstract features, and more about the acquisition of whatever it is that allows (first or second) language learners to achieve successful levels of performance in that language. At the very least, results like these should give us pause: if less-educated native-speakers cannot reliably interpret implausible passive sentences in a straightforward comprehension task, the prospects of their successfully distinguishing between, say, plural and collective readings, or direct vs. inverse scope interpretations in sentences with multiple quantifiers, using standard methods of elicitation, are doubtful at best.[4]

The other worry is that there may be no strong convergence—even among educated native-speakers—with respect to the sorts of subtle interpretive effects that Lardiere wishes to attribute to particular arrangements of underlying feature-values. To put it bluntly, we need to be sure that the intuitive judgments of individual linguists on core data are shared, at least by other educated speakers: if this is not the case, it is unreasonable to expect as much of second language learners. [5],[6] Once again, the available evidence is often less than secure. Two examples serve to illustrate the problem. The first comes from Lardiere’s own detailed discussion of the distribution and interpretation of various kinds of ‘plural’ markers in Korean, in which in passing, she refers to E. Suh’s (2007) observation about a possible interaction with animacy:

Although E. Suh (2007) mentions that pluralization is dispreferred on nonhuman nouns, her own Korean L2 acquisition study apparently showed no significant difference among native Korean-speaking controls in producing plurals on animals vs. humans, and C.-S. Suh (1996) states that ‑tul can be attached to both animate and inanimate nouns (ms. p. 32).

The clear implication here is that E. Suh was mistaken in her initial judgment, and that Korean speakers’ use of -tul is unconstrained by animacy restrictions. But it could very easily have been otherwise: had not Suh carried out an acquisition study, or had the study confirmed her intuition, we might well be asking how second language learners come to reorganise their set of nominal features so as to respect this distinction.

This problem calls to mind the second example, which has to do with scope interactions in sentences containing multiple quantifiers. This is a phenomenon that has received a good deal of attention in generative SLA—see e.g. Miyamoto & Yamane (1996), Miyamoto & Takata (1998)—because it seems to neatly exemplify a subtle interpretive contrast at once underdetermined by the input, and at the same time parameterized. In the theoretical literature, it has been claimed that languages vary parametrically according to whether they observe scope rigidity effects, such that the surface word-order strictly determines the relative scope of quantifiers: see for example, May (1985), Aoun & Li (1993). Within the type of feature-based theory espoused by Lardiere, the difference between  languages that exhibit scope rigidity and those that do not is cashed out in terms of different valuations of (uninterpretable) features. The problem, though, is that linguists who should know strongly disagree on the facts of the matter. The following excerpts from Kuno, Takami & Wu (2001) reveal the extent of the controversy:
In Kuno et al 1999 we pointed out that (i) there are ambiguous sentences that Aoun & Li 1993 predicts to be unambiguous, and (ii) there are unambiguous sentences that their analysis predicts to be ambiguous. Examples 18 and 19 illustrate the first point, and 20 the second [examples not shown]…Referring to other ambiguous examples in Kuno et al 1999, Aoun and Li say that there is a disagreement about the data discussed’ (200: 140). For example, taking up the Japanese sentence in 22 [not shown]…they write that ‘we...relied on Hoji (1985), which indicates sentences such as (22) are unambiguous’ (2000: 140). However it is important to note that the example Aoun & Li 1993 provides, attributed to Hoji, is not 22, but the following [23: not shown]…Sentence 23 is indeed unambiguous, but 22 is ambiguous for many speakers of Japanese. We attribute this difference in scope to pragmatic factors… (Kuno et al, 2001: 140).
The implications for SLA of such disagreements should be obvious: if theoreticians cannot agree on such relatively well-studied phenomena as scope interactions, the success (or otherwise) of second language learners in converging on subtle judgments may be relatively uninstructive. At a minimum, such disputes should force us to be much more circumspect about the assertions made in any single study—however well-regarded—than Lardiere appears to be, for instance, about the work of Kwon & Zribi-Hertz (2001). As Kuno et al (2001) state:
We cannot overemphasize the danger of building syntactic generalizations on the basis of a few unambiguous/unacceptable sentences that first come to mind. Some or all of these sentences may be unambiguous/unacceptable for nonsyntactic reasons, and sentences of the same pattern might be ambiguous/acceptable if they were free from the nonsyntactic factors that made the initial set unambiguous/ unacceptable (p. 142).
In short, the empirical base of feature-based acquisition theory is much less secure—and possibly more restricted—than is generally acknowledged: this must have significant consequences for acquisition theory.

3. What about the children?

Finally, I wish to consider one other assumption that underlies Lardiere’s proposal, and which seems to be crucial for her project. This is the assumption that monolingual children are fully competent with respect to the featural properties of the lexical items they know and use: that whatever the shortcomings of adult second language learners in studies of ultimate attainment, children acquiring their first language get it right…and get it early. That Lardiere subscribes to this view is reasonably clear from the following quotes:
In part because languages vary and because any normal child exposed over a few years in early childhood to any human natural language will acquire it equally well, it has been argued that there is a universal set or inventory of linguistic features available to the child as part of the human genetic endowment, along with a species-uniform computational mechanism that combines and interprets the relevant features in a highly constrained way (ms., p2)…‘Since relative or comparative ease of learning is not an issue in L1 acquisition—that is, young children learn the language of their community, whatever it is, equally “easily”… (ms., p 16).
Of course, the idea that children acquire the grammar of their language perfectly, effortlessly, and early is by no means restricted to Lardiere: the following quotations from Hawkins (2001) and White (2003) are representative of the mainstream view in generative second language research:

‘Children typically acquire all the major structures of their language by the age of three-and-a-half, and by the age of five their understanding of complex and subtle structural distinctions is effectively adult-like (Hawkins 2001: 6)’
‘The arguments for some sort of biological basis to L1 acquisition are well-known …the ability to acquire language is independent of intelligence; the pattern of acquisition is relatively uniform across different children, different languages and different cultures; language is acquired with relative ease and rapidity and without the benefit of instruction; children show creativity which goes beyond the input that they are exposed to.  All of these observations point to an innate component to language acquisition (White 2003).’

No doubt, much hangs on the hedges in these statements (“effectively”, “typically”; “relatively”), but the implication is clear: young children have it all worked out by around five years of age. As someone who spends the greater part of his time in first language research, I am continually struck by the optimism displayed by second language researchers about young children’s language abilities.  For the fact is that—barring a very few precocious exceptions—children do not perform like little adults either in terms of spoken language comprehension and production, or with respect to their performance in judgment tasks. Instead, they behave (unsurprisingly!) like children, deviating in a variety of interesting and systematic ways from the adults around them. Pace Hawkins, there is simply no empirical evidence for the claim that ‘children…acquire all the major structures of their language by the age of three-and-a –half [my emphasis: NGD]’; nor am I aware of any first language researcher who has advanced such a claim. There is of course evidence supporting the view that children show sensitivity to subtle abstract constraints of the adult target grammar considerably in advance of their own productive capacities, and that they project far beyond the input in ways that are consistent with nativist explanations, but those are entirely different matters.

Indeed, there is some irony in the fact that whereas second language researchers assume that children converge early on adult grammar specifications, the leading proponents of nativism in first language acquisition, namely, Crain & Pietroski (2001), make their most compelling case for innateness on the strength of empirical work showing divergence between child and adult grammars, precisely in the area of (abstract) feature-values. As Crain & Pietroski write (2001:2):
‘Children in monolingual English environments acquire English, and not Italian or Chinese. But nativists should not be surprised if such children exhibit some German or Romance or East Asian constructions, absent any evidence for these constructions in the primary linguistic data. Indeed, theory-driven mismatches between child and adult language may be the strongest argument for a universal grammar, and against models according to which children construct hypotheses based on linguistic experience…[my emphasis: NGD].’
This leaves SLA research in something of a quandary: if young monolingual children take their time in arriving at the correct set of feature-values—and this sets aside the two concerns discussed earlier—it becomes much less clear what the standard of comparison should be for second language learners. But things may be worse still, for it appears that even teenagers may not have acquired adult-like knowledge of grammatical feature-values. For reasons too involved to elaborate on here, there is a dearth of available data on the fine-grained syntactic knowledge of 9-18 year olds, but the studies that do exist reveal  that development continues up at least up to late adolescence. One particularly telling result comes from a recent (unpublished) dissertation by Tihana Kras (Kras 2008), investigating L2 acquisition of narrow syntax by child and adult Croatian learners of Italian. The specific phenomenon of interest is sensitivity to constraints on clitic-climbing and auxiliary selection in Italian restructuring constructions, with respect to which—in two separate judgment tasks—the judgments of14-year old Italian native-speakers were significantly less target-like than those of adult L2 learners. This phenomenon (obligatory clitic-climbing) is one that is directly accounted for in Minimalism in feature-based terms, yet it is reasonably clear that 14-year old native-speakers know the lexical items, without (yet) knowing the associated features. Kras herself explains this discrepancy in terms of experience and exposure, and in the final analysis is forced to restrict the scope of her Interface Hypothesis to “phenomena that are highly represented in the input, as phenomena which occur rarely in the input might not be acquired for reasons independent of the type of knowledge they involve [my emphasis: NGD] (Kras 2008: 194).’

Once again, the implications of results like these for SLA in general, and for Lardiere’s project in particular, should be clear: the road to ultimate attainment may be a long one, even for native-speakers.

Summary
In summary, Lardiere’s ‘Thoughts’ are informed and inspiring, and certainly help to move the debate forward into the Minimalist age. At the same time however, we need to bear in mind just how difficult second language research really is: as I have tried to suggest here, at each remove from pure theory, matters become more and more complicated. It’s the theoreticians who have it easy!

References

Aoun, Joseph; and Li, Y-H. Audrey (1993). The Syntax of Scope.vol. 21: Linguistic Inquiry Monograph. Cambridge, MA: MIT Press.

Chomsky, Noam (1957). Syntactic structures: Janua linguarum, nr. 4. s -Gravenhage,: Mouton.

— (1964). Current issues in linguistic theory: Janua linguarum. Series minor, nr. 38. The Hague,: Mouton.

— (1965). Aspects of The Theory of Syntax. Cambridge, MA: MIT Press.

Crain, Stephen; and Pietroski, Paul (2001). Nature, Nurture and Universal Grammar. Linguistics and Philosophy 24, 139-186.

Dabrowska, Ewa (1997). The LAD goes to school: a cautionary tale for nativists. Linguistics 35 (735-766),

Dabrowska, Ewa; and Street, James (2006). Individual differences in language attainment: Comprehension of passive sentences by native and non-native English speakers. Language Sciences 28, 604-615.

Duffield, Nigel (2003). Measures of Competent Gradience. In The Lexicon-Syntax Interface in Second Language Acquisition, Van Hout;Hulk;Kuiken; and Towell (eds.), 97-127. Amsterdam & Philadelphia: John Benjamins Publishing Company.

Gleitman, Henry; and Gleitman, Lila R (1979). Language use and language judgment. In Individual differences in language ability and language behavior, Fillmore;Kempler; and Wang (eds.), 103-126. New York: Academic Press.

Hawkins, Roger (2001). Second Language Syntax: Blackwell.

Kuno, Susumo;Takami, Ken-Ichi; and Wu, Yuru (2001). Response to Aoun and Li. Language 77 (1), 134-143.

Kras, Tihana (2008) L2 acquisition of the lexicon-syntax interface and narrow syntax by child and adult Croatian learners of Italian. Unpublished PhD dissertation, University of Cambridge.

May, Robert (1985). Logical Form: its structure and derivation. Cambridge, MA: MIT Press.

Miyamoto, Yoichi; and Yamane, Maki (1996). L2 Rigidity: the Scope Principle in L2 Grammar. In Proceedings of the 20th annual Boston University Conference on Language Development, Stringfellow;Cahana-Amitay;Hughes; and Zukowski (eds.), 494-505. Somerville, Massachusetts: Cascadilla Press.

Miyamoto, Yoichi; and Takata, Yasuko (1998). Rigidity effects and the strong/weak features in SLA. In Proceedings of the 22nd Boston University Conference on Languag3 Development, Greenhill;Hughes;Littlefield; and Walsh (eds.), 511-522. Somerville, Massachusetts: Cascadilla Press.

Suppes, Patrick (1968). The Desirability of Formalization in Science. The Journal of Philosophy 65 (20), 651-664.

White, Lydia (2003). Second language acquisition and Universal Grammar. Cambridge: Cambridge University Press.

Notes

[1] The eponymous movie and compilation album were released in 1979: however, the best-known tracks—My Generation, I Can See for Miles, Pinball Wizard etc—were all originally recorded in the nineteen sixties.

[2] To her great credit, Lardiere clearly distinguishes throughout the paper between “theory-as-linguist’s-construct” and “theory-as-learner’s-mental-state”, in particular, where she observes that the predictive value of a feature-based theory may be quite different for researchers vs. language learners. (See, for example, the discussion on: ms, p. 21 ‘For the researcher…For the second language learner, on the other hand, …). In so doing, she avoids the “systematic ambiguity” first introduced to linguistic theory in Chomsky (1965), which—it may be argued—has had at least as many negative as positive consequences for understanding language acquisition (whatever its value may be for pure theory):

Using the term ‘grammar’ with a systematic ambiguity to refer, first, to the native speaker’s internally represented ‘theory of his language’ and, second, to the linguist’s account of this, we can say that the child has developed and internally represented a generative grammar in the sense described. […] we are again using the term ‘theory’ — in this case ‘theory of language’ rather than ‘theory of a particular language’ — with a systematic ambiguity to refer both to the child’s innate predisposition to learn a language of a certain type and to the linguist’s account of this (Chomsky 1965: 25).

[3] ‘…When taxed, the average group focused on meaning and plausibility, while the highly educated group focused on the syntax even when meaningfulness was thereby obscured…(Gleitman & Gleitman 1979: 125).’

[4]  Once again, it is entirely possible that all native-speakers do in fact make such distinctions (‘correctly’) unconsciously: as a card-carrying generativist, I remain optimistic that this is the case. However, the point here is that if a significant group of native-speakers cannot adequately demonstrate this ability, it becomes unreasonable to expect any more of second language learners.

[5] This becomes particularly difficult to assess in the case of  less familiar languages, where one is heavily reliant on the judgments of bi-lingual native-speaker linguists, whose formal training has been through English.

[6] Though such situations do arise, as discussed in Duffield (2003).

Monday, 13 December 2010

Do Asians really think differently from Westerners?

[This is an earlier version of an article that has now been accepted to Cognitive Linguistics. If you would like to formally respond to it, the article should be cited as Tajima Y. & N. Duffield, ms. Keio University/Konan University. The pre-print version of the article will appear shortly.]

Japanese Versus Chinese Differences in Picture Description and Recall: Implications for the Geography of Thought
Yayoi Tajima and Nigel Duffield
Keio University and University of Sheffield (now Konan University)


Authors’ Note
This research was supported in part by grants from the Mori Foundation. We would like to thank Mutsumi Imai, Yichun Ryo, Gary Wood and Samir Zarqane for their assistance in conducting this study.


Abstract
This study examined whether the grammatical structure of particular languages predisposes speakers to particular attentional patterns. We hypothesized that the holistic attentional bias of Japanese participants in a previous study (Masuda & Nisbett’s (2001), which was attributed to pan-Asian cultural factors, is better interpreted as a consequence of specific linguistic properties: Japanese speakers’ bottom-up discourse strategy. In experiments involving Japanese, English, and Chinese native speakers, it was found that Japanese participants reported more contextual information before explaining the main point, mentioned more background details overall, and recalled background elements significantly more accurately than either English or Chinese participants. The ‘Asian response’ was thus split, as predicted by the Linguistic Relativity hypothesis, but contrary to the expectations of a Cultural Relativity account.

Keywords: field dependency, attention, linguistic relativity, head directionality


Japanese Versus Chinese Differences in Picture Description and Recall:
Implications for the Geography of Thought

The general aim of the present study1 is to contribute to the ongoing debate concerning the extent to which culture and/or language is able to penetrate core areas of cognition—especially visual attention and recall—that were previously viewed as largely impervious to social or linguistic experience. The theoretical impetus for this research is provided by work by Richard Nisbett and his colleagues (Nisbett, Peng, Choi, & Norenzayan, 2001; Nisbett, 2003; Nisbett & Masuda, 2003; Nisbett & Miyamoto, 2005, inter alia), in which Asians2 and Westerners (specifically, European Americans) are claimed to exhibit distinct cognitive styles—holistic versus analytic attention—this difference being reflected in markedly contrasting levels of field-dependence across a variety of experimental tasks. Nisbett and his colleagues argue that this inter-group difference is due to deep-seated cultural attitudes, beliefs and traditions: In the case of Asian groups, their holistic style is explained by reference to a collectivist, inter-dependent tradition and outlook, informed by Confucianism and relative subservience to societal institutions; by contrast, Westerners’ (Midwestern Americans, in the typical case) analytic style is an expression of a more individualistic impulse, informed by traditions of logical thought and self-determination having their origins in classical Athenian culture.

There are numerous prima facie objections to such claims. There is the observation, for instance, that these arguments gloss over any number of intermediate cases—what about, say, the attentional patterns of more collectivized, interdependent Western European groups, for example, contemporary Athenian citizens?, or highly individualistic Taiwanese MBAs? Or that they seem to grossly overstate the degree of intra-group homogeneity on either side of the Pacific. The most significant objection, however, is that in the final analysis they amount to little more than unexplained correlations (nowhere, for example, is it articulated what causal relationship there might be between a preference for syllogistic reasoning and a decrease in field dependence, or how cultural beliefs should effect changes in brain mechanisms implicated in spatial memory). In spite of this, Nisbett’s arguments appear to have gained some traction amongst cognitive anthropologists and psychologists, and for this reason they deserve serious consideration.

To ‘professional outsiders’ such as ourselves, coming from theoretical and applied linguistics, the appeal of Nisbett’s explanation is less obvious. However, it should be noted immediately that our purpose is not to challenge the data, but rather to question the interpretation in terms of immanent culture.

It should also be clear that it is impossible to tackle every part of Nisbett’s thesis at once: There are simply too many potentially related variables to control for. Instead, this paper focuses attention on the specific issues raised by a study reported in Masuda & Nisbett (2001), in which observed contrasts in field-dependence between Japanese and American participants are interpreted in terms of the culturally embedded attitudes and beliefs outlined above, rather than—as suggested below—in terms of formal grammatical differences between the languages spoken by the two participant groups. In brief, our claim will be that Japanese participants behave as they do in visual description and recall tasks primarily in virtue of being speakers of Japanese, rather than in virtue of any pan-Asian cultural affiliation. We shall support this contention by showing that, in three tasks very similar to those presented in Masuda & Nisbett (2001), the ‘Asian Response’ is split apart, with Chinese participants’ responses either patterning with those of the English group, rather than with the Japanese, or else revealing an intermediate response predicted by the linguistic typology articulated below.

Before presenting the study, something needs to be said about language and culture in the present context, since both terms are open to construals that limit—or even negate—the possibilities for empirical research aimed at teasing these factors apart. On the one hand, it is obvious that many substantive properties of language are dependent on the culture of their speakers. For instance, languages whose speakers live in social groups without governmental or religious institutions will not contain words for position-holders within those institutions, fishermen typically have a richer vocabulary of marine life than mountain herders, and so forth. It is plausible, though not yet conclusively demonstrated, that such cultural differences not only impact upon variation in lexical knowledge, but also upon perceptual and discrimination abilities: This is indeed what is claimed in more recent work by Nisbett and his colleagues (Uskul, Kitayama, & Nisbett, 2008).
Related to this issue is the Sapir-Whorf question, whether particular aspects of languages themselves exercise any determining influence on the non-linguistic cognitive capacities of their speakers. This is considerably more contentious, with proponents of stronger versions of the thesis—such as Boroditsky (2001), Bowerman (1996), Pedersen et al. (1998)—opposed to those offering more universalist interpretations of similar data, including Malt, Sloman, Gennari, Shi, & Wang (1999), Li & Gleitman (2002); see Boroditsky (2003), for an overview. What this brief discussion highlights is the importance of identifying formal linguistic factors that can clearly be demonstrated to be orthogonal to cultural ones: In this paper, we propose one such variable (namely, Head-Directionality in phrase-structure).

A different set of problems surround the term culture. One major problem is that it can easily become weakened to the point at which any contingent property distinguishing two groups, however superficial or ephemeral, can be deemed “cultural”. It may seem to be one thing to use the term to refer to a group sharing a common set of distinctive familial, political and religious practices bound by agreed social norms, and whose distinct conventions and traditions are passed on from one generation to the next, and quite another to speak of groups bound by their contingent employment situation or geographical context—the “culture” of first year international students, for example, or of the 1960s, of high-density living, of fast-food workers in suburban strip-malls, of tabloid readers, and so forth. In practice, however, it proves problematic to decide when culture shades into community of practice, or something even less theoretically significant.3

In addition to this, there seems to be a lack of consensus among psychologists, anthropologists, and social scientists about the necessary or sufficient conditions for belonging to a culture, or acculturation. If an individual can be deemed to be influenced by a particular culture after only months, or even weeks, of contact, it again becomes very hard to tease apart the alleged effects of culture from other superficial contextual properties. This suggests that if one wishes to make interesting theoretical claims about the effects of culture on cognition, then the cultural factors called on should have some reasonable permanence and persistence in the life of the individual: Ideally, these should be attributes acquired in early childhood and shared by all eligible members of the cultural group in question.
Unless definitions are restricted in this way, it becomes nearly impossible to distinguish between the effects of environmental and/or occupational factors on attentional mechanisms—effects that are remarkable but not deeply surprising, where they are found—versus the effects of immanent culture. Consider, for example, the finding reported in Maguire et al. (2000), that the posterior hippocampus regions of London taxi-drivers were significantly larger than those of a control group, that the anterior portions were significantly smaller, and that this asymmetry increased with years of taxi-driving experience. Such results provide striking evidence of adult adaptations in brain regions that are associated with spatial memory and navigation. It seems entirely plausible—though this was not tested—that such physiological adaptation is also reflected in increased sensitivity to contextual factors, which in turn could be interpreted as more holistic/field-dependent cognitive style. But it would be wholly misleading to attribute this occupationally actuated change to cultural factors—say, “the culture of taxi-drivers”.

The Original Study
Masuda & Nisbett (2001) conducted an experiment with Japanese and American participants, in which they first presented underwater scenes (termed ‘animated vignettes’) of 20 seconds’ duration, featuring a salient focal fish as well as other smaller objects such as smaller fish, bubbles, shells and rocks (see Figure 1).


Participants were first asked to describe what they had seen. Subsequently, they were presented with different set of object scenes and were asked to judge whether or not the elements depicted in these new scenes were identical to those featured in the original vignette. Figure 2 provides an example of the Figure condition (in which the focal fish was held constant). Some objects were presented with the original background, and others were presented with a neutral or novel background.


The results showed that, with respect to basic description, the Japanese group made about 50% more statements concerning background information and around 70% more statements about inert objects than the American group. While American participants invariably began their descriptions with the salient (focal) object, Japanese participants were much more likely to begin their statements by mentioning background elements (e.g., “There was a pond, and . . . ”). In addition, Japanese participants’ performance in identifying the focal fish was more (adversely) affected by the change of backgrounds; conversely, the Japanese participants reliably outperformed Americans in correctly identifying Ground features of the original scene.

Masuda & Nisbett (2001), Nisbett & Masuda (2003) interpret these results in terms of the aforementioned cultural dichotomy: It is the persisting social and philosophical values of ancient China that predispose Japanese perceivers to holistic attention. However, it is equally possible to interpret these particular findings as due to linguistic, rather than cultural factors, since in this instance there is a confound between language and culture: As we shall show directly, the grammatical and discourse structure of Japanese (i.e., the Japanese language) differs from that of English at least as much as pan-Asian culture differs from that of European Americans.

Towards an Alternative Interpretation: Thinking for Speaking
Dan Slobin’s Thinking For Speaking hypothesis is especially relevant to the present discussion. In a series of papers (Slobin, 1996, 1997, 2000, 2003), Slobin develops the idea that there exists a process of ‘thinking for speaking’, apart from general cognition. He argues that:

The activity of thinking assumes a distinct character when it takes place for speaking, because, in the process of speaking, one needs to adjust one’s thought to immediately available linguistic forms. Each language provides many, but a finite number, of particular words and grammatical constructions to encode reality. In consequence, when one thinks for speaking, one unconsciously focuses on those aspects of objects and events that are most readily encodable in one’s particular language. (Slobin, 2003, p. 157)
The paradigm case of a cross-linguistic difference in event construal concerns the encoding of motion events, and involves the semantic components of path and manner of motion. In research stemming from seminal work by Talmy (1975), see also Talmy (1985, 2000), it has been repeatedly observed that languages may be classified into two types—Verb-framed versus satellite-framed—according to how these two semantic components are lexically encoded. In verb-framed languages (V-languages), such as Spanish and Japanese, path is obligatorily expressed as a component of the verb, while manner of motion is (optionally) expressed as an adjunct phrase; by contrast, in predominantly satellite-framed languages (S-languages), such as English or Dutch, manner of motion is directly encoded on the verb, while path is expressed as a separate preposition (or particle). This contrast is illustrated in (1) and (2) below: In English and Dutch, path is expressed by the satellite element (across, over), while in French and Japanese, the same semantic notion is encoded in the main verb (traverse, wataru), with the manner component expressed as an (optional) adjunct phrase (en nageant, oyoi-de):

(1) a. He is swimming across the river. (English)

b. Hij zwemt de rivier over. (Dutch)
he swim-PRES the river over
‘He swims over the river.’

(2) a. Il traverse le fleuve en nageant. (French)
He cross-PRES the river in swim-GERUND
‘He crosses the river, swimming.’

b. 泳いで 川を 渡る (Japanese)
oyoi-de kawa-o wataru.
swim-BY river-ACC cross-PRES
‘Swimming, (He) crosses the river.’

The crucial point to observe about these examples is that this linguistic typology is orthogonal to broad-scale cultural, geographic, or indeed, genetic affiliation4: In this case, French and Japanese pattern together, in contrast to English or Dutch.

Slobin points out that this formal difference in language structure has important consequences for many aspects of language use, including—most relevantly—for narrative descriptions. For example, it is shown that S-language speakers use manner verbs significantly more often than V-language speakers when describing the same events (see Hsiao, 1999; Özçalışkan & Slobin, 1999); that S-language novels have greater type and token frequencies in situations in which human movement is described; S-language writers, overall, give their readers significantly more information—explicit and inferential—about the manners in which their protagonists move about (Özçalışkan & Slobin, 2000) than do V-language writers. Such observations suggest, at the very least, that one should be circumspect about ascribing differences in narrative description to cultural factors, since these typological groupings, which cross-cut cultural spheres of influence, also show clear correlations with narrative style.
The S-language/V-language parameter is not, of course, the only typological distinction to cross-cut genetic boundaries, nor—although it nicely illustrates our general point—is it the parameter that we consider best explains the Japanese-English contrast obtained in Masuda and Nibett’s (2001) study. Instead, the typological linguistic variable that we believe to be at work here is the Head or Head-Directionality Parameter.

The Head Parameter: Overview
One of the most obvious ways in which languages vary syntactically is with respect to clausal word order—the position of phrasal constituents relative to one another. This type of variation at the clausal level results directly from the Head Parameter: Whether the head element of the phrases that make up a sentence appears to the left or right of its respective complement. In a consistently head-initial language, such as English or French, the verb precedes the direct object in the verb phrase, (temporal and modal) auxiliaries precede the verb phrase, clausal complementizers precede the embedded clause they introduce, and the language has prepositions, rather than postpositions. This is illustrated for English in (3), where in each example the relevant head(s) is/are indicated in bold, their complement phrases in italics:
(3) a. John [VP brokeV the vase ].
b. John [ModalP shouldM [NEGP notNEG [ASPP have [VP broken the vase]]]].
c. John said [CompP thatCOMP [S he hadn’tT brokenV the vase ]].
d. John danced [PP aroundP [NP the room [ inP [NP the palace]]]].

Exactly the opposite order is observed in a head-final language such as Japanese, Korean or Turkish: The verb follows its object; tense and mood affixes are invariably expressed as verbal suffixes (where these appear as auxiliaries, they follow the verb-phrase); complementizers appear to the right of complement clauses; the language is postpositional:5

(4) a. John-ga [VP kabin-o wattaV ].
John-NOM vase-ACC break-PAST
‘John broke the vase.’

b. [TP John-wa [[[VP kabin-o waruV ] bekide-waT ] na-NEG ] kattaT ].
John-NOM vase-ACC break should-NOM not-PAST
‘John should not have broken the vase.’

c. John-wa [ [ kabin-o watteV nai S] toCOMP CP] ittav.
John-NOM vase-ACC broke not COMP say-HAVE
‘John said that he hadn’t broken the vase.’

d. John-wa [[[[ kyuden NP] no P PP] hiroma NP] deP PP] odottav.6
John-NOM palace of room in dance-PAST
‘John danced around the room in the palace.’

In generative theory (e.g., Chomsky, 1981, 1995), only head-complement order is relevant to determining the head-parameter for a given phrase; see, in particular, Travis (1984). In other approaches, however—and especially within the typological framework initiated by Greenberg (1978)—all head-modifier relations are potentially relevant to determining the head-initial or head-final status of the language. Thus, the position of attributive adjectives, relative clauses, possessor phrases, and subordinate adjunct clauses are also taken into account. By all of these measures also, Japanese is consistently head-final.

Not all languages display such consistent cross-categorical harmony in head-modifier order.7 Some languages, for example, project right-headed phrases for one syntactic category and left-headed phrases for another, so that it becomes harder to classify the language overall in terms of a single binary parameter. (Mandarin) Chinese is a case in point.

Huang (1994) provides a useful discussion of Chinese word order. The core facts are illustrated by the examples below, which reveal that Chinese is normally head-initial with respect to verbs and TAM auxiliaries (5a)/(5b)—including the position of clausal complements (5b)—and with respect to prepositional phrase (6), but head-final with respect to lexical noun phrases: Both nominal complements (7a) and nominal adjuncts precede the head-noun; relative clauses are internally headed by a right-peripheral head (7b).8 Once again, in each example the relevant head element is indicated in bold, the complement or modifier in italics:

(5) a. Zhangsan meiyou [VP kanjianV [NP Lisi]].
Zhangsan not-HAVE see Lisi
‘Zhangsan did not see Lisi.’

b. Zhangsan [VP zhidaoV [S Lisi [NEGP buNEG [AP chengshi]]]].
Zhangsan know Lisi not honest
‘Zhangsan knows that Lisi is not honest.’

(6) a. Zhangsan [VP zhuV [PP zaiP [NP Meiguo]]].
Zhangsan live at America
‘Zhangsan lives in the US.’

b. Zhangsan fang-le yi-ben shu [PP zaiP [NP zhuozi-shang]].
Zhangsan put- PERF one- CL book at table-top
‘Zhangsan put a book on the table.’

(7) a. [[[ yuyanxue NP] deP PP] yanjiuN NP]
linguistics DE research
‘the study of linguistics’

b. [[[ni zui xihuan S] deP PP] nei-ben shuN NP] mai-wan le.
you most like DE that-CL book sell-out PERF
‘The book that you like most has been sold out.’

Hence, with respect to purely syntactic properties, English and Japanese represent two ends of a grammatical continuum, with Chinese somewhere in the middle (though much more like English in terms of token frequency, and crucially, with respect to verbal projections).

The question that arises at this point is how this typological contrast—however interesting it may be from a linguistic perspective—should explain the attentional patterns observed in Masuda & Nisbett’s (2001) study: Why should head-finality predispose Japanese speakers to greater holistic attention or field-dependence? There are two responses to this question, the first relatively superficial, with few consequences for linguistic relativism, the second rather more complex, but with more interesting implications for the relationship between language and visual cognition.

Taking the superficial relationship first, notice that one of the dependent measures in the Masuda & Nisbett (2001) study was order of mention: Whether participants first mentioned the focal fish (Figure) or the background context (Ground) in their verbal descriptions. Masuda and Nisbett interpret the elements first mentioned as ‘more salient’ and conclude from the fact that Japanese participants consistently mentioned contextual information ahead of focal information that Japanese group paid greater attention to the field than their American counterparts. Yet, as we have just seen, the grammatical and discourse structure of Japanese virtually guarantees this result: If contextual information is to be mentioned at all, it must be mentioned first, since the main predicate is canonically the final sentential constituent; conversely, the discourse structure of English affords American participants more opportunity to mention focal elements first in their verbal descriptions— wholly irrespective of the relative salience of Figure and Ground in their conceptual representations of the event.

If this observation is valid, then it follows that group differences in order of mention effects do not necessarily speak to the issue of holistic versus analytic attention. More importantly, this observation allows us to formulate clear predictions for a new study involving Japanese, English, and Chinese participants: If grammatical and discourse structure determine order of mention in visual descriptions, then the verbal reports of Chinese participants should be intermediate between the other two groups, patterning more with those of English speakers, significantly more in the Figure-to-Ground order (from head to modifier) than in the case of the Japanese group, who are expected to make descriptions (almost exclusively) in the Ground-to-Figure order (from modifier to head). If, on the other hand, order of mention is determined by cognitive styles that are associated with cultural affiliation, then the Chinese and Japanese groups should pattern with each other more similarly than either does to English. These predictions are explored in the first of the studies reported below.

A more interesting question, however, is whether it is possible to connect this typological difference to the other dependent measures in the Masuda & Nisbett (2001) study (besides order of mention). It will be recalled that order of mention was only one of three measures that distinguished Japanese from American participants. The other two were the number of contextual (Ground) features mentioned by each participant for each description, and—most interestingly still—the number of contextual features correctly recalled in representation of still fragments: Japanese participants not only mentioned significantly more background details, they also remembered better which elements they had previously observed. This last measure, in particular, is not obviously related to linguistic typology.
And yet it might be. What is distinctive about Slobin’s (2003) thinking for speaking hypothesis is the extent to which linguistic structures are assumed to penetrate cognition in various ways. In contrast, for example, to Levelt (1989), who also entertains a form of the thinking for speaking hypothesis, but who supposes that effects of language are restricted to the time of utterance—that is, it is only when one prepares to speak that language affects conceptualization—Slobin (2003) speculates that thinking for speaking effects could extend beyond speech time and may induce speakers to form specific attentional patterns, even in the absence of language. In support of this speculation, Slobin cites the work of Pederson, Levinson et al. (1998):

Far more than developing simple habituation, use of the linguistic system, we suggest, actually forces the speaker to make computations he or she might otherwise not make. . . . That is, the linguistic system is far more than just an available pattern for creating internal representations: to learn to speak a language successfully requires speakers to develop an appropriate mental representation which is then available for non-linguistic purposes. (Pederson, Levinson et al., 1998, p. 586 [emphasis in original])
Thus for example, when we speak English, we are forced to pay attention to gender of third parties, because the language requires gender specification of (singular) pronouns. On the other hand, when we speak Japanese, we are forced to direct our attention to the asymmetric relationships between individuals: elder/younger, senior/junior, or close/remote, because of the language’s honorific systems. As a consequence, English speakers form a habit of attending to gender and Japanese speakers develop a habit of attending to human relations in non-linguistic contexts also. Slobin thus assumes that thinking for speaking effects induce language-specific attentional preferences beyond the linguistic domain.

In the present case, let us assume that the grammatical and discourse structure of Japanese, which is known have effects on syntactic processing and production, leading to bottom-up parsing routines—as opposed to the top-down strategies of English parsing mechanisms (see Fodor & Inoue, 1994; Nakayama, 1999, for discussion), also impacts on conceptual structures. Suppose that, as is the case for syntactic constituents, final conceptual representations (including representations of causality) are constructed—as they are reported—“from the bottom up”, from background context to focal elements/main arguments, as schematized in Figure 3.
Our conjecture, then—extending the Thinking for Speaking hypothesis—that the (top-down vs. bottom-up) parsing mechanisms that implement different (head-initial vs. head-final) grammatical settings are reflected in the speakers’ discourse strategies, which in turn influences their attentional patterns. Japanese speakers build up phrase structure from complements and/or modifiers to heads; in other words, from semantically and syntactically peripheral elements to core elements, in a bottom-up fashion. It plausibly follows from this that Japanese speakers are predisposed to plan and interpret discourse by placing peripheral elements ahead of the main point. This notion is supported by anecdotal observation: When describing, explaining, excusing, arguing, or persuading, Japanese speakers tend to begin their statements with peripheral elements (reasons, situations, and contexts), before referring to the main points (effects, intentions, and conclusions). As a consequence, they may have developed a perceptual habit of attending to the entire field.

In short, our hypothesis is that Japanese speakers are more likely to attend to contextual information primarily because the grammatical and discourse structure of the language requires speakers to mention contextual information ahead of focal information.

As strained as it may appear at first, this hypothesis nevertheless generates rather clear predictions concerning the behavior of Chinese participants in the present study, with respect to the other two dependent variables at stake (viz., number of mentions of background elements/correct identification of background elements in subsequent presentation). If the head-directionality parameter plays a significant role accounting for the differences between Japanese and American participants in Masuda & Nisbett’s (2001) original study, then Chinese participants are expected to pattern with English participants with respect to these dependent measures, splitting the Asian response. It will be clear that Masuda & Nisbett’s cultural explanation predicts a quite different split.

General Method
In order to test the hypothesis outlined above, we constructed two linguistic tasks to compare the verbal descriptions of Japanese, English, and Chinese speakers with respect to foregrounded and backgrounded information. The first task (story-telling) involved explaining the events depicted in a pivotal scene in well-known children’s story books (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975); the second (picture description) task involved more straightforward description of unfamiliar photographs. Subsequently, as in the Masuda & Nisbett (2001) experiment, fragments of the images previously shown were presented to participants for identification (fragments identification test); the participants were also asked information questions about the photographs they had seen (information recall test).

Participants
The same 120 participants took part in all of the experiments: 43 Japanese native speakers (21 women, 22 men) were recruited from the University of Keio, Japan, 33 English native speakers (23 women, 10 men) from the University of Sheffield, UK, and 44 Chinese native speakers (25 women, 21 men) from the International Study Institute Chukyo, Japan. All of the participants were undergraduate or postgraduate university students, or attended language school students preparing for university entrance, aged 18-30 years old. The Japanese and Chinese participants were tested in Japan, and English participants were tested in UK. All associated language materials were translated, and presented to each group in their own language, by native-speaker experimenters.

Experiment 1
We first conducted a story-telling task to examine the discourse preferences of Japanese, English, and Chinese speakers: Whether contextual information was mentioned before the main point or vice-versa. Given our hypotheses outlined above, it was predicted that the responses of the Japanese group should diverge significantly from those of the other two groups.

Procedure
Participants were presented with illustrations extracted from three well-known picture books for children (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975) and asked to provide a written description of each of these in their native language, by clarifying when, where, and why the depicted events and situations were taking place. The responses were then coded in terms of the kinds of information included in each description, according to the following descriptors.
  • 1. Main events and situations: description of the main story, explaining what main characters are doing in each scene.
  • 2. Peripheral events and situations: description of the field, explaining what is shown in each scene but is not explicitly related to the main story or the main characters; for example, the moon is shining, the woods are covered with snow.
  • 3. Time: description of the time, explaining when the depicted events and situations were taking place, such as at night, in the daytime, in winter, during summer, and so on.
  • 4. Place: description of the place, explaining where the depicted events and situations were taking place, such as at beach, in the woodland, in a small village.
  • 5. Cause: description of the reason, explaining what had caused the depicted events and situations to occur. Note that only constituents that were explicitly mentioned with markers for causal relations are included in this category, such as because, as, since, therefore, accordingly, so, and so forth.
  • 6. Inferred antecedent events and situations: description of the events and situations that were not shown in each scene but were inferred to have happened before the depicted events and situations occur.

    An example of the responses for Mouse-Shadow picture (see Figure 4) is as follows:
    There is a full moon (2. peripheral events and situations)
    at night. (3. time)
    In a forest, (4. place)
    a rabbit is scared (1. main events and situations)
    because it sees the enlarged shadow of the mouse reflected in the moonlight against the snow. (5. cause) The mouse might have been trying to retaliate on the rabbit, who has always bullied him. (6. inferred antecedent events and situations)

    This sample response was thus coded as 234156. In this way, all of the responses were coded into these six categories and labeled with such serial numbers. The overall measure of interest was the average number of times (out of possible total of three descriptions—one for each picture) that some type of contextual information (2-6) was mentioned ahead of the main point (1), for each language group (Measure 1). We also determined the quantity of contextual information mentioned before the main point; that is, how many contextual descriptions indicating time, place, cause, the field, or inferred antecedent events were made before mentioning the main events and situations across each language group (Measure 2) as well as the relative numbers of different contextual elements overall, in any order of mention (Measure 3).

    Results and Discussion
    All three measures revealed clear and reliable differences among the three language groups. First, Figure 5 shows the average number of times that some kind of contextual information (2-6) was mentioned ahead of the main point (Measure 1), out of total of three responses (one per picture).

     As predicted, the Japanese group were considerably more likely to begin their descriptions with contextual information (M = 2.86) than the Chinese (M = 2.33) and the English group (M = 1.55). An analysis of variance revealed a reliable main effect of Language (F(2, 118) = 21.357, p < .01), with post-hoc comparisons (Bonferroni) showing significant differences between the English and Japanese group and between the English and Chinese group at the p < .01 level, and between the Japanese and Chinese group at the p < .05 level.

    We then investigated the quantity of contextual information mentioned ahead of the main point (Measure 2). Figure 6 shows how many contextual descriptions were made before mentioning the main events and situations by English, Chinese and Japanese participants. Again as predicted, the Japanese group reported the highest number of contextual descriptions before the main events (M = 2.87), followed by the Chinese (M = 2.09) and the English group (M = 1.19). The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables, the dependent variable being the number of contextual information reported before the main events and situations. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 19.019, p < .01, with no interaction found between Native Language and Picture. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants and between the Chinese and English participants at the p < .01 level, and also between the Chinese and Japanese participants at the p < .05 level.


    Further analysis revealed that the Japanese participants' tendency to report more contextual information before the main events and situations was especially pronounced in these three areas: Time-, Place-, and Field-related information. Figure 7 shows each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations within each language group. Time-related information was reported ahead of the main point in 79.1% of the Japanese, 59.8% of the Chinese, and 42.9% of the English responses across all the three pictures: Statistically significant differences were found only between the Japanese and English participants,χ2 (4) = 40.187, p < .01. Place-related information was reported ahead of the main events in 82.2 % of the Japanese, 51.2% of the Chinese, and 43.9% of the English responses, with significant differences being found between the two contrasts: English versus Japanese, Chinese versus Japanese, χ2 (4) = 57.333, p < .01. Field-related information appeared prior to the mention of main events in 39.5% of the Japanese, 25.2% of the Chinese, and 11.2% of the English responses, with significant differences only being found between English and Japanese participants,χ2 (4) = 34.844, p < .01. Concerning inferred antecedent events mentioned ahead of the main events, the obtained data were too few to be entered into statistical analysis.



    In addition, Figure 8 shows relative percentage of the cause-effect order in the descriptions including causal relations. As can be seen, both Japanese and Chinese participants showed tendency to mention cause ahead of its effect in explaining causal relations, while English participants clearly preferred to mention effect prior to its cause, χ2 (4) = 54.072, p < .01.

    Finally, Figure 9 shows the overall numbers of different contextual elements reported in participants’ responses, in any order of mention (Measure 3). The results also revealed Japanese participants’ context dependency, with the highest number of contextual elements reported across the three pictures. The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variables, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 13.579, p < .01, as well as a significant interaction between Native Language and Picture, F(4, 220) = 3.216, p < .05. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants in Picture 1 and 2 at the p < .01 level, and between the Japanese and Chinese groups in Picture 2 and 3 at the p < .01 level, but none between English and Chinese participants in any condition.


    Taken together, these results at once confirm the findings of the Masuda & Nisbett (2001) study—that is, that Japanese speakers clearly prefer to mention more peripheral elements, and earlier in their descriptions, than do English speakers—but they also challenge the previous interpretation—namely, that this difference is the result of some pan-Asian cultural predisposition. The reason, clearly, that these results pose a challenge is that the Chinese group—as predicted by their linguistic typology—exhibit an intermediate behaviour, sometimes patterning with the Japanese participants, for example, with respect to mention of Cause before Effect (Figure 8), sometimes like the English group (e.g., with respect to the total numbers of peripheral items mentioned, shown in Figure 9). In other words, the results show that the bottom-up parsers (Japanese speakers) prefer the bottom-up discourse style and the top-down parsers (English speakers) employ the top-down discourse style, with Chinese speakers located intermediate between the two language groups, as predicted by its grammatical typology.

    The following sample illustrates a typical response of Japanese participants for Mouse-Shadow picture (Figure 4), with abundant contextual information mentioned before the main events and situations:
    ‘It is at night with a full moon shining. In a snowy forest, an animal is walking with a stick searching for food. Then, a little mouse standing on a tree branch, who has been bullied by the bigger animal, hit upon a good idea. With the use of the moonlight, he enlarged his own shadow and casted it onto the ground. The animal, seeing the enlarged shadow, thinks it is a monster and is scared.’

    At the very least, the results of this first experiment lend support to the idea that culture may not be the sole, or even key, determinant of the differences observed in the earlier study. As we shall see directly, the next two tasks offer even clearer reasons for scepticism: Not only do Chinese and Japanese participants behave differently, but the ‘Asian response’ is actually split.

    Experiment 2
    In the second experiment, we conducted two related tasks to determine whether there was any difference in attentional patterns toward the field when making visual descriptions. Our hypothesis was that top-down versus bottom-up parsing patterns embedded in participants’ discourse styles would exercise a larger effect on responses than any cultural affiliation. That is to say, it was anticipated that, as bottom-up parsers (Japanese speakers) need to refer to context first in their discourse, they should attend more to the field than would top-down parsers (English and Chinese speakers); as a consequence, Japanese speakers should mention disproportionately more peripheral elements than central elements—and may recall these better—than either English or Chinese speakers.
    The second experiment comprised two tasks, each with its own dependent measures: a picture description task and a visual recall task. In the picture description task, we presented participants with a set of three color photographs and asked for a written description. The description task served two purposes: first, to discover which visual features of shared scenes participants mention in written reporting; second, simply to show participants the photographs for subsequent recall. Participants were not aware that some parts of the photographs would be extracted and presented again in another set of pictures in the visual recall phase. In the subsequent visual recall task, participants were shown those extracted portions of photographs and asked to judge whether these formed parts of the pictures that they had already seen.

    Experiment 2, Phase 1: Picture Description

    Procedure
    The participants saw three photographs in turn and provided a written description for each of these. The photographs selected for description included both salient focal objects and smaller peripheral objects (see Figures 10, 11, and 12). Participants’ responses were then analyzed to determine which aspects of the photographs were more attended to: central objects or peripheral objects.9




    Results and Discussion
    The participants’ responses are charted in Figure 13. Notice that, as expected, the largest differences are observed in the mean number of peripheral items mentioned by each participant across groups. Here, the Japanese behave once again as predicted, with significantly higher mentions of peripheral items (M = 15.05) than the English group (M = 7.48). Notice in particular that this preference for peripheral items is not shared by Chinese participants, who actually mentioned fewer peripheral elements than even the English group (M = 3.43). The description data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Item Location (2 levels: Central, Peripheral) and Picture as within-subjects variables, the dependent variable being the number of items mentioned. The analysis revealed a reliable main effect of Native Language, F(2, 119) = 50.844, p < .01, as well as a significant interaction between Native Language and Location, F(2, 119) = 32.547, p < .01. The source of this interaction is clearly suggested by the plot in Figure 13.

    Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants with respect to peripheral items mentioned at the p < .01 level, and also between the Chinese and English participants at the p < .05 level.

    However, the Chinese group's relatively low number of descriptions of the peripheral items might be attributed to the fact that Chinese participants’ total number of descriptions for each photograph was much smaller than the other two groups overall. Therefore, it was also examined whether there was any difference among the three language groups in the ratio of peripheral items to central items mentioned within each language group. The results showed that the ratio of peripheral items to central items mentioned within each language group was 83% (Chinese), 112% (English), and 213% (Japanese), respectively. The results were entered into a repeated measures ANOVA with Native Language as the between-subjects variable, Picture as a within-subjects variable, and the ratio of peripheral items to central items mentioned in each response of each participant as the dependent variable. The analysis again revealed a reliable main effect of Native Language, F(2, 106) = 25.168, p < .01, without any interaction with Picture. Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants both at the p < .01 level, but none between English and Chinese participants.

    It will be recalled that the focus of Masuda & Nisbett’s (2001) study was on differences between Japanese and English mention of peripheral items (the result, it was claimed, of East Asians’ superior attention to Ground over Figure). In the present study, however, the Chinese group mentioned the lowest number of peripheral items, both in absolute and relative terms. Statistically, the largest differences that were observed distinguished Chinese and Japanese participants, with no statistically significant difference found between Chinese and English participants. The Asian response was thus effectively split, divided by the responses of the English group. Whatever explains this pattern cannot plausibly be related to cultural affiliation.

    Experiment 2, Phase 2: Visual Recall Task
    The final task compared the three groups with respect to visual recall by testing participants’ accuracy in identifying peripheral fragments and recalling information about Ground details. Once again, our hypothesis predicted a split in the Asian response, whereas Masuda & Nisbett’s (2001) cultural interpretation predicted a pan-Asian advantage over the English participant group.

    Procedure
    To construct the visual recall task, peripheral portions were extracted from each of the three photographs shown in the picture description task and from one of the pictures used in the first story-telling task, five portions from each. These extracted fragments (n = 20) were combined with another 20 fragments extracted from novel pictures to create a set of 40 small pictures (2.5 × 2.5 cm): see Figure 14.

    Participants were presented with this set of extracted portions and asked to identify the parts of the pictures that they had already seen in the previous tasks. Each participant’s score was counted according to the number of correctly identified portions: one point for correct identification and equally one point for correct rejection, 40 points in all. Following this identification test, participants were also presented with a forced choice memory test, including questions of the following kind: “In the beach picture, is the balloon yellow or red?” or “In the windmill picture, was the man behind the boy wearing shorts or long trousers?” and so on. There were twelve such questions in all: three questions relating to each of the four pictures used in the identification phase. Thus, this part of the experiment had two dependent measures: (i) number of correctly identified or rejected fragments; (ii) number of correctly answered information questions.

    Results and Discussion

    Figure 15 displays the picture fragments identification scores for the three language groups. As predicted, the Japanese group recorded the highest average score (M = 31.40); this was followed by the English (M = 26.79) and then the Chinese group (M = 22.07). An analysis of variance revealed reliable main effect of Language (F(2,118) = 45.164, p < .01): Post-hoc comparisons (Bonferroni) showed significant differences between the three language pairs, all ps < .01.


    The results for the subsequent forced-choice memory test were also along the same lines (see Figure 16). The Japanese group obtained the highest score (M = 8.02), followed by the Chinese (M = 6.18) and the English group (M = 5.70). Again, a reliable main effect of language was found, F(2,118) = 14.952, p < .01, and again, post-hoc tests (Bonferroni) showed significant differences between the scores for the Chinese and Japanese groups, and the English and Japanese groups both at the p < .01 level, but no difference between the Chinese and English groups.10

    These findings thus clearly demonstrate that there is no common Asian response in the visual recall task: Instead, the Chinese participants pattern to a large degree with the English group, separately from the Japanese group.

    General Discussion
    The experiments reported here examined the attentional biases of Japanese, English, and Chinese speakers across a range of linguistic and non-verbal tasks. Our hypothesis was that top-down versus bottom-up parsing mechanisms would be reflected in the speakers’ discourse strategies, with top-down parsers (English and Chinese speakers) having a preference for top-down discourse patterns—the main point being mentioned before contextual information—and bottom-up parsers (Japanese speakers) building a discourse in which contextual information precedes the main point. These discourse strategies were expected to exercise a larger effect on responses than any cultural affiliation.
    We tested this hypothesis by means of narrative description (story-telling), picture description, and visual recall tasks. In the story-telling task, it was clearly shown that Japanese speakers were more likely to report contextual information before the main point than either English and Chinese speakers, consistent with the linguistic typology. Next, in the picture description and visual recall experiment, it was clearly revealed that Japanese speakers not only reported more background detail, but also recalled details about peripheral information significantly more accurately, than English and Chinese speakers, as evidenced by reliably higher scores on both identification and information recall tasks.
    The present findings across three tasks are thus consistent with the hypothesis advanced here, namely, that the grammatical structure of particular languages predisposes speakers to particular attentional patterns: In other words, these results are consistent with a particular—and rather far-reaching—interpretation of the thinking for speaking hypothesis (Slobin 2003).

    Perhaps more significantly than what these results speak for, is what they speak against: They allow us to reject not only the null hypothesis—that a speaker’s native language has no reliable effect on visual attention and recall scores—but also to reject the alternative hypothesis presented in Masuda & Nisbett (2001), that is, that Japanese participants’ enhanced ability to identify peripheral elements in visual scenes is due to cultural attributes common to Asian cultures. Our experiments show that across all tasks involving visual recall Chinese participants generally behave more like English participants than like Japanese, regardless of cultural affiliation: To reinforce the point—this is in spite of the fact that the test subjects were Chinese students studying Japanese in Japan. Whatever definition of culture one might employ, the expectation must surely be that if cultural factors determine response, these two groups should pattern together. The fact that they did not renders suspect any explanation in terms of immanent cultural values, at least with respect to these data.

    Three final points are worth mentioning. First, as noted above (footnote 1), these are not unprecedented results. The findings of the present study directly confirm those of an independent set of studies reported elsewhere Anon (2010), where again the Chinese group’s results patterned directly with those of the English group and separately from the Japanese across a similar set of description and recall tasks. Thus, in a total of six different tasks, we have found that the Asian response has been split in ways that challenge the cultural relativity explanation, advanced by Nisbett and his colleagues.
    The second point to observe is that these results do not necessarily refute Masuda and Nisbett’s (2001) interpretation of their own data (though such an explanation would be untenable in the present case). Our results, at the data level, are entirely compatible with those obtained by Masuda & Nesbitt for their Japanese and American participants: We also endorse their concluding suggestion that “[typical] Japanese may simply see far more of the world than do [typical] Americans” (Masuda & Nesbitt, 2001, p. 933 [added by authors]). Where we disagree is in respect of the most plausible explanation of this difference. We claim that if the suggestion is true, then it is not primarily because they are East Asians, but because they speak a head-final language, that Japanese speakers may see far more of the world. Such an interpretation would cover both our and Masuda & Nisbett’s findings: The alternative, that a different explanation applies to our respective experiments, seems to be considerably more ad hoc.

    Note that we do not discount the view that cultural traditions may influence modes of perception or categorization in other cases. For example, in other studies, Nisbett and his colleagues have found cognitive (attitudinal and perceptual) differences between ethnic groups of participants sharing a common first language (e.g., monolingual speakers of Turkish in Uskul, Kitayama & Nisbett, 2008; second generation Korean vs. European Americans in Choi, Dalal & Kim-Prieto, 2000): unless the latter participants were also bilingual, then our hypothesis could not be applied to explain such results. All that is claimed here is that observed differences between English and Japanese speakers in these particular types of description and visual recall tasks are better explained in linguistic, rather than cultural, terms: If it were otherwise, Chinese and Japanese participants should pattern together.11

    Notice finally that our hypothesis generates further, testable predictions about cross-linguistic splits and groupings within and across cultural spheres. For instance, given our hypothesis, speakers of Korean, another head-final language, should behave like Japanese participants in visual recall, while Vietnamese and Thai speakers should pattern with the Chinese participants. These predictions will be tested in future experiments. If all effects of language structure, whether it is the grammatical parameter of head position or particular discourse strategy, can be shown to outweigh effects of culture for speakers of other languages, then our results will constitute a significant challenge to this part of the evidence base for cultural relativism.

    References
    Anno, M. (1977). Tabi no ehon. Tokyo: Fukuinkan Shoten.
    Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conception of time. Cognitive Psychology, 43, 1-22.
    Boroditsky, L. (2003). Linguistic relativity. In L. Nadel (Ed.), Encyclopedia of cognitive science (pp. 917-921). London: Macmillan Press.
    Bowerman, M. (1996). The origins of children’s semantic categories: Cognitive vs. linguistic determinants. In J. Gumperz, & S. Levinson (Eds.), Rethinking linguistic relativity (pp. 145-176). Cambridge, MA: Cambridge University Press.
    Choi, I., Dalal, R., & Kim-Prieto, C. (2000). Information search in causal attribution:
    Analytic vs. holistic. Urbana-Champagne: University of Illinois.
    Chomsky, N. (1981). Lectures on government and binding. Dordrecht, Holland; Cinnaminson, N.J.: Foris Publications.
    Chomsky, N. (1995). The Minimalist program. Cambridge, MA: MIT Press.
    Donaldson, J., & Scheffier, A. (2007). The Gruffalo's child. London: Campbell Books.
    Duffield, N., & Tajima, Y. (2010). On the non-uniformity of Asian thinking (for speaking): A response to Masuda and Nisbett. In M. Iverson, I. Ivanov, T. Judy, J. Rothman, R. Slabakova, & M. Tryzna (Eds.), Proceedings of the 2009 Mind/Context Divide Workshop (pp. 28-39). Somerville, MA: Cascadilla Proceedings Project.
    Fodor, & Inoue, (1994). The diagnosis and cure of garden paths. Journal of Psycholinguistic Research, 23, 5, 407-434.
    Greenberg, J. (1978). Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (Ed.), Universals of language. Cambridge, MA: MIT Press.
    Hawkins, J. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21, 223-260.
    Hsiao, A. H.-H. (1999). Holding the frog in place: Linguistic typology of Mandarin Chinese. Unpublished senior honors thesis, University of California, Berkeley. [cited in Slobin (2003)].
    Huang, C.-T. J. (1994). More on Chinese word order and parametric theory. In B. Lust, M. Suñer, & J. Whitman (Eds.), Syntactic theory and first language acquisition: Cross-linguistic perspectives, Vol. 1: Heads, projections and learnability (pp. 15-35). Hillsdale, NJ: Lawrence Erlbaum Associates.
    Kitayama, & Nisbett, (2008)
    Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
    Li, P., & Gleitman, L. (2002). Turning the tables: language and spatial reasoning. Cognition, 83, 265-294.
    Maguire, E. A., Gadian, D. G., Johnsrude, I. S., Good, C. D., Ashburner, J., Frackowiak, R. S. J., & Firth, C. D. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398-4403.
    Malt, B., Sloman, S., Gennari, S., Shi, M., & Wang, Y. (1999). Knowing vs. naming: Similarity and the linguistic categorization of artifacts. Journal of Memory and Language, 40, 230-262.
    Masuda, T., & Nisbett, R. E. (2001). Attending holistically vs. analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology, 81, 922-934.
    Nakayama, M. (1999). Sentence processing. In N. Tsujimura (Ed.), Handbook of Japanese linguistics (pp. 398-424). Oxford: Blackwell.
    Özçalışkan, Ş., & Slobin, D. I. (1999). Learning how to search for the frog: Expression of manner of motion in English, Spanish, and Turkish. In A. Greenhill, H. Littlefield, & C. Tano (Eds.), Proceedings of the 23rd annual Boston University Conference on Language Development, Vol. 2 (pp. 541-552). Somerville, MA: Cascadilla Press.
    Özçalışkan, Ş., & Slobin, D. I. (2000). Climb up vs. ascend climbing: Lexicalization choices in expressing motion events with manner and path components. In S. C. Howell, S. A. Fish, & T. K-Lucas (Eds.), Proceedings of the 24th Annual Boston University Conference on Language Development, Vol. 2 (pp. 558-570). Somerville, MA: Cascadilla Press.
    Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently . . . and why. New York: Free Press.
    Nisbett, R.E., & Masuda, T. (2003). Culture and point of view. PNAS, 100(19), 11163-11170.
    Nisbett, R.E., & Miyamoto, Y. (2005). The influence of culture: holistic vs. analytic perception. Trends in Cognitive Sciences, 9(10), 467-473.
    Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic vs. analytic cognition. Psychological Review, 108, 291-310.
    Pederson, E., Danziger, E., Levinson, S., Kita, S., Senft, G., & Wilkins, D. (1998). Semantic typology and spatial conceptualization. Language, 74, 557-589.
    Slobin, D. I. (1996). Two ways to travel: Verbs of motion in English and Spanish. In M. Shibatani, & S. A. Thompson (Eds.), Grammatical constructions: Their form and meaning (pp. 195-219). Oxford: Oxford University Press.
    Slobin, D. I. (1997). Mind, code, and text. In J. Bybee, J. Haiman, & S. A. Thompson (Eds.), Essays on language function and language type (pp. 437–467). Amsterdam: John Benjamins Publishing Company.
    Slobin, D. I. (2000). Verbalized events – a dynamic approach to linguistic relativity and determinism. In S. Niemeyer, & R. Dinsen (Eds.), Evidence for linguistic relativity. Amsterdam: John Benjamins Publishing Company.
    Slobin, D. I. (2003). Language and thought online: Cognitive consequences of linguistic relativity. In D. Gentner, & S. Goldin-Meadow (Eds.), Language in mind: Advances in the study of language and thought (pp. 157-192). Cambridge, MA: MIT Press.
    Talmy, L. (1975). Semantics and syntax of motion. In J. P. Kimball (Ed.), Syntax and semantics, Vol. 4 (pp. 181-238). New York: Academic Press.
    Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description, Vol. 3: Grammatical categories and the lexicon (pp. 57-149). New York: Cambridge University Press.
    Talmy, L. (2000). Toward a cognitive semantics: Typology and process in concept structuring, Vol. 2. Cambridge, MA: MIT Press.
    Travis, L. (1984). Parameters and effects of word order variation. Unpublished PhD dissertation, MIT.
    Ungerer, T. (1975). Emile. Tokyo: Bunka Publishing Bureau.
    Uskul, A. K., Kitayama, S., & Nisbett, R. N. (2008). Ecocultural basis of cognition: Farmers and fishermen are more holistic than herders. PNAS, 105, 8552-8556.

    Footnotes
    1 As well as that which immediately precedes it (see Duffield & Tajima 2010). The current study offers a completely new set of experiments, in which we attempted to remedy some methodological shortcomings of the original task. This new experiment (with different participants, materials, and modes of analysis) provides even clearer support for our original hypothesis.
    2 Throughout, following Nisbett’s own practice, the term Asian is taken to refer to East Asian ethnic and national groups (especially Chinese, Japanese and Korean groups), rather than to South Asian groups (which is the British default usage of the term): it is unlikely that Nisbett’s claims are intended to extend to any groups beyond the (historical) Han Chinese sphere of influence.
    3 This problem comes to the fore in respect of Nisbett’s subsequent work on other cultural groups (Uskul, Kitayama & Nisbett 2008): see below.
    4 Other V-languages in Slobin’s survey include Turkish, Spanish, and Hebrew; other S-languages include Mandarin and Russian.
    5 It should be clear that the terms head-initial and head-final are completely independent of writing order: Arabic and Hebrew, for example, are head-initial languages that are written from right-to-left; Turkish is a predominantly head-final language written left-to-right.
    6 The constituent analysis proposed here simplifies, but does not fundamentally misrepresent, the phrase-structure of Japanese (It may be, for example, that the genitive element no and the post-nominal marker de should bear other category labels, but this does not change the fact that adpositional phrases in Japanese are consistently head-final).
    7 The term is due to Hawkins (1990).
    8 We are naturally aware of the fact that many generativist linguists, including Huang himself, would treat Mandarin Chinese as underlyingly head-final in the verb-phrase, with verb-movement deriving the overt head-initial order. Be that as it may, what is relevant here are the surface configurations that provide the instructions for parsing and syntactic production: at this level, Chinese patterns—on balance—more like English than like Japanese.
    9 The items classified as central and peripheral for each photograph in the picture description task are as follows:
    For the Windmill picture (Figure 10), central items are the boy in a yellow T-shirt and the green windmill, while peripheral items are restaurant, table, chair, patrons, trees, shade, European street, passers-by, sunny, summer season, basket, instrument, signboard, pillar, sack, posters, balcony, building, and fallen leaves.
    For the Beach Picture (Figure 11), the central item is the boy smiling in the foreground, while the peripheral are balloon, trees, mountains, beach, pebble, sky, clouds, wind, air, buildings, restaurants, construction, sunny, holiday season, resort, and road.
    For the Bubble picture (Figure 12), the central is the boy with a bubble-maker, while the peripheral are other children, buildings, shops, street, signboard, air, sunny, summer season, the man with a balloon, the girl sitting on the bench, sack, passers-by, floating bubble, windows, lamp, curtain, and wooden floor.
    10 Interestingly, there were only weak correlations for all of the groups concerned between their scores for the identification task and those on the information task (Chinese r = 0.26; English r = 0.13; Japanese r = 0.13). This may suggest that there is no necessary relationship between perceptual knowledge of an event and propositional knowledge about it.
    11 This is not to say that our results have no wider implications for Cultural Relativity arguments, but to stress that this is only a first step of a larger project: Ultimately, it would seem to us desirable to account for all putative effects of broad culture in terms of more tangible and more plausible linguistic or local environmental factors. For example, the expectation that inhabitants of high-density urban environments should pay more attention to peripheral visual information than those who live in smaller communities is both plausible and measurable, indeed this is established in Nisbett & Masuda (2003); however, we do not view this effect as ‘cultural’ in any interesting theoretical sense; see introductory discussion.

    List of Figures

    Figure 2. Sample scene fragments: Focal Fish Condition (from Nisbett and Masuda, 2003).

    Figure 3. Top-down versus Bottom-up parsing mechanisms. This figure illustrates the way in which top-down parsers (e.g., English speakers) first decide the whole sentence structure and then fill each slot with words, whereas bottom-up parsers (e.g., Japanese speakers) begin with laying words, and gradually construct the whole sentence.

    Figure 4. Sample picture used in the story-telling task: Mouse-Shadow picture (Donaldson & Scheffier, 2007).

    Figure 5. Experiment 1 Results (Measure 1): Average number of times that some type of contextual information was mentioned ahead of the main point in the story-telling task by English, Chinese and Japanese participants.

    Figure 6. Experiment 1 Results (Measure 2): Mean number of contextual descriptions mentioned before the main events and situations in the story-telling task by English, Chinese and Japanese participants.

    Figure 7. Experiment 1 Results: Each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations in the story-telling task.

    Figure 8. Experiment 1 Results: Each percentage of responses in which cause was mentioned ahead of its effect (Cause-First) and where effect precedes its cause (Effect-First) in the story-telling task. Only constituents that were explicitly mentioned with markers for causal relations are counted into the data.

    Figure 9. Task 1 Results (Measure 3): Mean number of overall different contextual descriptions reported in the story-telling task, in any order of mention.

    Figure 10. Windmill Picture used in the picture description task: Picture 1.

    Figure 11. Beach Picture used in the picture description task: Picture 2.

    Figure 12. Soap Bubble Picture used in the picture description task: Picture 3.

    Figure 13. Mean number of central and peripheral items mentioned by Chinese, English, and Japanese participants in the picture description task.

    Figure 14. Extracted fragments of pictures used in the identification test of the visual recall task (sample).

    Figure 15. Picture fragments identification scores in the visual recall task (by language group).

    Figure 16: Mean scores for the forced-choice memory test in the visual recall task (by language group).