[This is an earlier version of an article that has now been accepted to Cognitive Linguistics. If you would like to formally respond to it, the article should be cited as Tajima Y. & N. Duffield, ms. Keio University/Konan University. The pre-print version of the article will appear shortly.]
Japanese Versus Chinese Differences in Picture Description and Recall: Implications for the Geography of Thought
Yayoi Tajima and Nigel Duffield
Keio University and University of Sheffield (now Konan University)
Authors’ Note
This research was supported in part by grants from the Mori Foundation. We would like to thank Mutsumi Imai, Yichun Ryo, Gary Wood and Samir Zarqane for their assistance in conducting this study.
Abstract
This study examined whether the grammatical structure of particular languages predisposes speakers to particular attentional patterns. We hypothesized that the holistic attentional bias of Japanese participants in a previous study (Masuda & Nisbett’s (2001), which was attributed to pan-Asian cultural factors, is better interpreted as a consequence of specific linguistic properties: Japanese speakers’ bottom-up discourse strategy. In experiments involving Japanese, English, and Chinese native speakers, it was found that Japanese participants reported more contextual information before explaining the main point, mentioned more background details overall, and recalled background elements significantly more accurately than either English or Chinese participants. The ‘Asian response’ was thus split, as predicted by the Linguistic Relativity hypothesis, but contrary to the expectations of a Cultural Relativity account.
Keywords: field dependency, attention, linguistic relativity, head directionality
Japanese Versus Chinese Differences in Picture Description and Recall:
Implications for the Geography of Thought
The general aim of the present study1 is to contribute to the ongoing debate concerning the extent to which culture and/or language is able to penetrate core areas of cognition—especially visual attention and recall—that were previously viewed as largely impervious to social or linguistic experience. The theoretical impetus for this research is provided by work by Richard Nisbett and his colleagues (Nisbett, Peng, Choi, & Norenzayan, 2001; Nisbett, 2003; Nisbett & Masuda, 2003; Nisbett & Miyamoto, 2005, inter alia), in which Asians2 and Westerners (specifically, European Americans) are claimed to exhibit distinct cognitive styles—holistic versus analytic attention—this difference being reflected in markedly contrasting levels of field-dependence across a variety of experimental tasks. Nisbett and his colleagues argue that this inter-group difference is due to deep-seated cultural attitudes, beliefs and traditions: In the case of Asian groups, their holistic style is explained by reference to a collectivist, inter-dependent tradition and outlook, informed by Confucianism and relative subservience to societal institutions; by contrast, Westerners’ (Midwestern Americans, in the typical case) analytic style is an expression of a more individualistic impulse, informed by traditions of logical thought and self-determination having their origins in classical Athenian culture.
There are numerous prima facie objections to such claims. There is the observation, for instance, that these arguments gloss over any number of intermediate cases—what about, say, the attentional patterns of more collectivized, interdependent Western European groups, for example, contemporary Athenian citizens?, or highly individualistic Taiwanese MBAs? Or that they seem to grossly overstate the degree of intra-group homogeneity on either side of the Pacific. The most significant objection, however, is that in the final analysis they amount to little more than unexplained correlations (nowhere, for example, is it articulated what causal relationship there might be between a preference for syllogistic reasoning and a decrease in field dependence, or how cultural beliefs should effect changes in brain mechanisms implicated in spatial memory). In spite of this, Nisbett’s arguments appear to have gained some traction amongst cognitive anthropologists and psychologists, and for this reason they deserve serious consideration.
To ‘professional outsiders’ such as ourselves, coming from theoretical and applied linguistics, the appeal of Nisbett’s explanation is less obvious. However, it should be noted immediately that our purpose is not to challenge the data, but rather to question the interpretation in terms of immanent culture.
It should also be clear that it is impossible to tackle every part of Nisbett’s thesis at once: There are simply too many potentially related variables to control for. Instead, this paper focuses attention on the specific issues raised by a study reported in Masuda & Nisbett (2001), in which observed contrasts in field-dependence between Japanese and American participants are interpreted in terms of the culturally embedded attitudes and beliefs outlined above, rather than—as suggested below—in terms of formal grammatical differences between the languages spoken by the two participant groups. In brief, our claim will be that Japanese participants behave as they do in visual description and recall tasks primarily in virtue of being speakers of Japanese, rather than in virtue of any pan-Asian cultural affiliation. We shall support this contention by showing that, in three tasks very similar to those presented in Masuda & Nisbett (2001), the ‘Asian Response’ is split apart, with Chinese participants’ responses either patterning with those of the English group, rather than with the Japanese, or else revealing an intermediate response predicted by the linguistic typology articulated below.
Before presenting the study, something needs to be said about language and culture in the present context, since both terms are open to construals that limit—or even negate—the possibilities for empirical research aimed at teasing these factors apart. On the one hand, it is obvious that many substantive properties of language are dependent on the culture of their speakers. For instance, languages whose speakers live in social groups without governmental or religious institutions will not contain words for position-holders within those institutions, fishermen typically have a richer vocabulary of marine life than mountain herders, and so forth. It is plausible, though not yet conclusively demonstrated, that such cultural differences not only impact upon variation in lexical knowledge, but also upon perceptual and discrimination abilities: This is indeed what is claimed in more recent work by Nisbett and his colleagues (Uskul, Kitayama, & Nisbett, 2008).
Related to this issue is the Sapir-Whorf question, whether particular aspects of languages themselves exercise any determining influence on the non-linguistic cognitive capacities of their speakers. This is considerably more contentious, with proponents of stronger versions of the thesis—such as Boroditsky (2001), Bowerman (1996), Pedersen et al. (1998)—opposed to those offering more universalist interpretations of similar data, including Malt, Sloman, Gennari, Shi, & Wang (1999), Li & Gleitman (2002); see Boroditsky (2003), for an overview. What this brief discussion highlights is the importance of identifying formal linguistic factors that can clearly be demonstrated to be orthogonal to cultural ones: In this paper, we propose one such variable (namely, Head-Directionality in phrase-structure).
A different set of problems surround the term culture. One major problem is that it can easily become weakened to the point at which any contingent property distinguishing two groups, however superficial or ephemeral, can be deemed “cultural”. It may seem to be one thing to use the term to refer to a group sharing a common set of distinctive familial, political and religious practices bound by agreed social norms, and whose distinct conventions and traditions are passed on from one generation to the next, and quite another to speak of groups bound by their contingent employment situation or geographical context—the “culture” of first year international students, for example, or of the 1960s, of high-density living, of fast-food workers in suburban strip-malls, of tabloid readers, and so forth. In practice, however, it proves problematic to decide when culture shades into community of practice, or something even less theoretically significant.3
In addition to this, there seems to be a lack of consensus among psychologists, anthropologists, and social scientists about the necessary or sufficient conditions for belonging to a culture, or acculturation. If an individual can be deemed to be influenced by a particular culture after only months, or even weeks, of contact, it again becomes very hard to tease apart the alleged effects of culture from other superficial contextual properties. This suggests that if one wishes to make interesting theoretical claims about the effects of culture on cognition, then the cultural factors called on should have some reasonable permanence and persistence in the life of the individual: Ideally, these should be attributes acquired in early childhood and shared by all eligible members of the cultural group in question.
Unless definitions are restricted in this way, it becomes nearly impossible to distinguish between the effects of environmental and/or occupational factors on attentional mechanisms—effects that are remarkable but not deeply surprising, where they are found—versus the effects of immanent culture. Consider, for example, the finding reported in Maguire et al. (2000), that the posterior hippocampus regions of London taxi-drivers were significantly larger than those of a control group, that the anterior portions were significantly smaller, and that this asymmetry increased with years of taxi-driving experience. Such results provide striking evidence of adult adaptations in brain regions that are associated with spatial memory and navigation. It seems entirely plausible—though this was not tested—that such physiological adaptation is also reflected in increased sensitivity to contextual factors, which in turn could be interpreted as more holistic/field-dependent cognitive style. But it would be wholly misleading to attribute this occupationally actuated change to cultural factors—say, “the culture of taxi-drivers”.
The Original Study
Masuda & Nisbett (2001) conducted an experiment with Japanese and American participants, in which they first presented underwater scenes (termed ‘animated vignettes’) of 20 seconds’ duration, featuring a salient focal fish as well as other smaller objects such as smaller fish, bubbles, shells and rocks (see Figure 1).
Participants were first asked to describe what they had seen. Subsequently, they were presented with different set of object scenes and were asked to judge whether or not the elements depicted in these new scenes were identical to those featured in the original vignette. Figure 2 provides an example of the Figure condition (in which the focal fish was held constant). Some objects were presented with the original background, and others were presented with a neutral or novel background.
The results showed that, with respect to basic description, the Japanese group made about 50% more statements concerning background information and around 70% more statements about inert objects than the American group. While American participants invariably began their descriptions with the salient (focal) object, Japanese participants were much more likely to begin their statements by mentioning background elements (e.g., “There was a pond, and . . . ”). In addition, Japanese participants’ performance in identifying the focal fish was more (adversely) affected by the change of backgrounds; conversely, the Japanese participants reliably outperformed Americans in correctly identifying Ground features of the original scene.
Masuda & Nisbett (2001), Nisbett & Masuda (2003) interpret these results in terms of the aforementioned cultural dichotomy: It is the persisting social and philosophical values of ancient China that predispose Japanese perceivers to holistic attention. However, it is equally possible to interpret these particular findings as due to linguistic, rather than cultural factors, since in this instance there is a confound between language and culture: As we shall show directly, the grammatical and discourse structure of Japanese (i.e., the Japanese language) differs from that of English at least as much as pan-Asian culture differs from that of European Americans.
Towards an Alternative Interpretation: Thinking for Speaking
Dan Slobin’s Thinking For Speaking hypothesis is especially relevant to the present discussion. In a series of papers (Slobin, 1996, 1997, 2000, 2003), Slobin develops the idea that there exists a process of ‘thinking for speaking’, apart from general cognition. He argues that:
(1) a. He is swimming across the river. (English)
b. Hij zwemt de rivier over. (Dutch)
he swim-PRES the river over
‘He swims over the river.’
(2) a. Il traverse le fleuve en nageant. (French)
He cross-PRES the river in swim-GERUND
‘He crosses the river, swimming.’
b. 泳いで 川を 渡る (Japanese)
oyoi-de kawa-o wataru.
swim-BY river-ACC cross-PRES
‘Swimming, (He) crosses the river.’
The crucial point to observe about these examples is that this linguistic typology is orthogonal to broad-scale cultural, geographic, or indeed, genetic affiliation4: In this case, French and Japanese pattern together, in contrast to English or Dutch.
Slobin points out that this formal difference in language structure has important consequences for many aspects of language use, including—most relevantly—for narrative descriptions. For example, it is shown that S-language speakers use manner verbs significantly more often than V-language speakers when describing the same events (see Hsiao, 1999; Özçalışkan & Slobin, 1999); that S-language novels have greater type and token frequencies in situations in which human movement is described; S-language writers, overall, give their readers significantly more information—explicit and inferential—about the manners in which their protagonists move about (Özçalışkan & Slobin, 2000) than do V-language writers. Such observations suggest, at the very least, that one should be circumspect about ascribing differences in narrative description to cultural factors, since these typological groupings, which cross-cut cultural spheres of influence, also show clear correlations with narrative style.
The S-language/V-language parameter is not, of course, the only typological distinction to cross-cut genetic boundaries, nor—although it nicely illustrates our general point—is it the parameter that we consider best explains the Japanese-English contrast obtained in Masuda and Nibett’s (2001) study. Instead, the typological linguistic variable that we believe to be at work here is the Head or Head-Directionality Parameter.
The Head Parameter: Overview
One of the most obvious ways in which languages vary syntactically is with respect to clausal word order—the position of phrasal constituents relative to one another. This type of variation at the clausal level results directly from the Head Parameter: Whether the head element of the phrases that make up a sentence appears to the left or right of its respective complement. In a consistently head-initial language, such as English or French, the verb precedes the direct object in the verb phrase, (temporal and modal) auxiliaries precede the verb phrase, clausal complementizers precede the embedded clause they introduce, and the language has prepositions, rather than postpositions. This is illustrated for English in (3), where in each example the relevant head(s) is/are indicated in bold, their complement phrases in italics:
(3) a. John [VP brokeV the vase ].
b. John [ModalP shouldM [NEGP notNEG [ASPP have [VP broken the vase]]]].
c. John said [CompP thatCOMP [S he hadn’tT brokenV the vase ]].
d. John danced [PP aroundP [NP the room [ inP [NP the palace]]]].
Exactly the opposite order is observed in a head-final language such as Japanese, Korean or Turkish: The verb follows its object; tense and mood affixes are invariably expressed as verbal suffixes (where these appear as auxiliaries, they follow the verb-phrase); complementizers appear to the right of complement clauses; the language is postpositional:5
(4) a. John-ga [VP kabin-o wattaV ].
John-NOM vase-ACC break-PAST
‘John broke the vase.’
b. [TP John-wa [[[VP kabin-o waruV ] bekide-waT ] na-NEG ] kattaT ].
John-NOM vase-ACC break should-NOM not-PAST
‘John should not have broken the vase.’
c. John-wa [ [ kabin-o watteV nai S] toCOMP CP] ittav.
John-NOM vase-ACC broke not COMP say-HAVE
‘John said that he hadn’t broken the vase.’
d. John-wa [[[[ kyuden NP] no P PP] hiroma NP] deP PP] odottav.6
John-NOM palace of room in dance-PAST
‘John danced around the room in the palace.’
In generative theory (e.g., Chomsky, 1981, 1995), only head-complement order is relevant to determining the head-parameter for a given phrase; see, in particular, Travis (1984). In other approaches, however—and especially within the typological framework initiated by Greenberg (1978)—all head-modifier relations are potentially relevant to determining the head-initial or head-final status of the language. Thus, the position of attributive adjectives, relative clauses, possessor phrases, and subordinate adjunct clauses are also taken into account. By all of these measures also, Japanese is consistently head-final.
Not all languages display such consistent cross-categorical harmony in head-modifier order.7 Some languages, for example, project right-headed phrases for one syntactic category and left-headed phrases for another, so that it becomes harder to classify the language overall in terms of a single binary parameter. (Mandarin) Chinese is a case in point.
Huang (1994) provides a useful discussion of Chinese word order. The core facts are illustrated by the examples below, which reveal that Chinese is normally head-initial with respect to verbs and TAM auxiliaries (5a)/(5b)—including the position of clausal complements (5b)—and with respect to prepositional phrase (6), but head-final with respect to lexical noun phrases: Both nominal complements (7a) and nominal adjuncts precede the head-noun; relative clauses are internally headed by a right-peripheral head (7b).8 Once again, in each example the relevant head element is indicated in bold, the complement or modifier in italics:
(5) a. Zhangsan meiyou [VP kanjianV [NP Lisi]].
Zhangsan not-HAVE see Lisi
‘Zhangsan did not see Lisi.’
b. Zhangsan [VP zhidaoV [S Lisi [NEGP buNEG [AP chengshi]]]].
Zhangsan know Lisi not honest
‘Zhangsan knows that Lisi is not honest.’
(6) a. Zhangsan [VP zhuV [PP zaiP [NP Meiguo]]].
Zhangsan live at America
‘Zhangsan lives in the US.’
b. Zhangsan fang-le yi-ben shu [PP zaiP [NP zhuozi-shang]].
Zhangsan put- PERF one- CL book at table-top
‘Zhangsan put a book on the table.’
(7) a. [[[ yuyanxue NP] deP PP] yanjiuN NP]
linguistics DE research
‘the study of linguistics’
b. [[[ni zui xihuan S] deP PP] nei-ben shuN NP] mai-wan le.
you most like DE that-CL book sell-out PERF
‘The book that you like most has been sold out.’
Hence, with respect to purely syntactic properties, English and Japanese represent two ends of a grammatical continuum, with Chinese somewhere in the middle (though much more like English in terms of token frequency, and crucially, with respect to verbal projections).
The question that arises at this point is how this typological contrast—however interesting it may be from a linguistic perspective—should explain the attentional patterns observed in Masuda & Nisbett’s (2001) study: Why should head-finality predispose Japanese speakers to greater holistic attention or field-dependence? There are two responses to this question, the first relatively superficial, with few consequences for linguistic relativism, the second rather more complex, but with more interesting implications for the relationship between language and visual cognition.
Taking the superficial relationship first, notice that one of the dependent measures in the Masuda & Nisbett (2001) study was order of mention: Whether participants first mentioned the focal fish (Figure) or the background context (Ground) in their verbal descriptions. Masuda and Nisbett interpret the elements first mentioned as ‘more salient’ and conclude from the fact that Japanese participants consistently mentioned contextual information ahead of focal information that Japanese group paid greater attention to the field than their American counterparts. Yet, as we have just seen, the grammatical and discourse structure of Japanese virtually guarantees this result: If contextual information is to be mentioned at all, it must be mentioned first, since the main predicate is canonically the final sentential constituent; conversely, the discourse structure of English affords American participants more opportunity to mention focal elements first in their verbal descriptions— wholly irrespective of the relative salience of Figure and Ground in their conceptual representations of the event.
If this observation is valid, then it follows that group differences in order of mention effects do not necessarily speak to the issue of holistic versus analytic attention. More importantly, this observation allows us to formulate clear predictions for a new study involving Japanese, English, and Chinese participants: If grammatical and discourse structure determine order of mention in visual descriptions, then the verbal reports of Chinese participants should be intermediate between the other two groups, patterning more with those of English speakers, significantly more in the Figure-to-Ground order (from head to modifier) than in the case of the Japanese group, who are expected to make descriptions (almost exclusively) in the Ground-to-Figure order (from modifier to head). If, on the other hand, order of mention is determined by cognitive styles that are associated with cultural affiliation, then the Chinese and Japanese groups should pattern with each other more similarly than either does to English. These predictions are explored in the first of the studies reported below.
A more interesting question, however, is whether it is possible to connect this typological difference to the other dependent measures in the Masuda & Nisbett (2001) study (besides order of mention). It will be recalled that order of mention was only one of three measures that distinguished Japanese from American participants. The other two were the number of contextual (Ground) features mentioned by each participant for each description, and—most interestingly still—the number of contextual features correctly recalled in representation of still fragments: Japanese participants not only mentioned significantly more background details, they also remembered better which elements they had previously observed. This last measure, in particular, is not obviously related to linguistic typology.
And yet it might be. What is distinctive about Slobin’s (2003) thinking for speaking hypothesis is the extent to which linguistic structures are assumed to penetrate cognition in various ways. In contrast, for example, to Levelt (1989), who also entertains a form of the thinking for speaking hypothesis, but who supposes that effects of language are restricted to the time of utterance—that is, it is only when one prepares to speak that language affects conceptualization—Slobin (2003) speculates that thinking for speaking effects could extend beyond speech time and may induce speakers to form specific attentional patterns, even in the absence of language. In support of this speculation, Slobin cites the work of Pederson, Levinson et al. (1998):
In the present case, let us assume that the grammatical and discourse structure of Japanese, which is known have effects on syntactic processing and production, leading to bottom-up parsing routines—as opposed to the top-down strategies of English parsing mechanisms (see Fodor & Inoue, 1994; Nakayama, 1999, for discussion), also impacts on conceptual structures. Suppose that, as is the case for syntactic constituents, final conceptual representations (including representations of causality) are constructed—as they are reported—“from the bottom up”, from background context to focal elements/main arguments, as schematized in Figure 3.
Our conjecture, then—extending the Thinking for Speaking hypothesis—that the (top-down vs. bottom-up) parsing mechanisms that implement different (head-initial vs. head-final) grammatical settings are reflected in the speakers’ discourse strategies, which in turn influences their attentional patterns. Japanese speakers build up phrase structure from complements and/or modifiers to heads; in other words, from semantically and syntactically peripheral elements to core elements, in a bottom-up fashion. It plausibly follows from this that Japanese speakers are predisposed to plan and interpret discourse by placing peripheral elements ahead of the main point. This notion is supported by anecdotal observation: When describing, explaining, excusing, arguing, or persuading, Japanese speakers tend to begin their statements with peripheral elements (reasons, situations, and contexts), before referring to the main points (effects, intentions, and conclusions). As a consequence, they may have developed a perceptual habit of attending to the entire field.
In short, our hypothesis is that Japanese speakers are more likely to attend to contextual information primarily because the grammatical and discourse structure of the language requires speakers to mention contextual information ahead of focal information.
As strained as it may appear at first, this hypothesis nevertheless generates rather clear predictions concerning the behavior of Chinese participants in the present study, with respect to the other two dependent variables at stake (viz., number of mentions of background elements/correct identification of background elements in subsequent presentation). If the head-directionality parameter plays a significant role accounting for the differences between Japanese and American participants in Masuda & Nisbett’s (2001) original study, then Chinese participants are expected to pattern with English participants with respect to these dependent measures, splitting the Asian response. It will be clear that Masuda & Nisbett’s cultural explanation predicts a quite different split.
General Method
In order to test the hypothesis outlined above, we constructed two linguistic tasks to compare the verbal descriptions of Japanese, English, and Chinese speakers with respect to foregrounded and backgrounded information. The first task (story-telling) involved explaining the events depicted in a pivotal scene in well-known children’s story books (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975); the second (picture description) task involved more straightforward description of unfamiliar photographs. Subsequently, as in the Masuda & Nisbett (2001) experiment, fragments of the images previously shown were presented to participants for identification (fragments identification test); the participants were also asked information questions about the photographs they had seen (information recall test).
Participants
The same 120 participants took part in all of the experiments: 43 Japanese native speakers (21 women, 22 men) were recruited from the University of Keio, Japan, 33 English native speakers (23 women, 10 men) from the University of Sheffield, UK, and 44 Chinese native speakers (25 women, 21 men) from the International Study Institute Chukyo, Japan. All of the participants were undergraduate or postgraduate university students, or attended language school students preparing for university entrance, aged 18-30 years old. The Japanese and Chinese participants were tested in Japan, and English participants were tested in UK. All associated language materials were translated, and presented to each group in their own language, by native-speaker experimenters.
Experiment 1
We first conducted a story-telling task to examine the discourse preferences of Japanese, English, and Chinese speakers: Whether contextual information was mentioned before the main point or vice-versa. Given our hypotheses outlined above, it was predicted that the responses of the Japanese group should diverge significantly from those of the other two groups.
Procedure
Participants were presented with illustrations extracted from three well-known picture books for children (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975) and asked to provide a written description of each of these in their native language, by clarifying when, where, and why the depicted events and situations were taking place. The responses were then coded in terms of the kinds of information included in each description, according to the following descriptors.
There is a full moon (2. peripheral events and situations)
at night. (3. time)
In a forest, (4. place)
a rabbit is scared (1. main events and situations)
because it sees the enlarged shadow of the mouse reflected in the moonlight against the snow. (5. cause) The mouse might have been trying to retaliate on the rabbit, who has always bullied him. (6. inferred antecedent events and situations)
This sample response was thus coded as 234156. In this way, all of the responses were coded into these six categories and labeled with such serial numbers. The overall measure of interest was the average number of times (out of possible total of three descriptions—one for each picture) that some type of contextual information (2-6) was mentioned ahead of the main point (1), for each language group (Measure 1). We also determined the quantity of contextual information mentioned before the main point; that is, how many contextual descriptions indicating time, place, cause, the field, or inferred antecedent events were made before mentioning the main events and situations across each language group (Measure 2) as well as the relative numbers of different contextual elements overall, in any order of mention (Measure 3).
Results and Discussion
All three measures revealed clear and reliable differences among the three language groups. First, Figure 5 shows the average number of times that some kind of contextual information (2-6) was mentioned ahead of the main point (Measure 1), out of total of three responses (one per picture).
As predicted, the Japanese group were considerably more likely to begin their descriptions with contextual information (M = 2.86) than the Chinese (M = 2.33) and the English group (M = 1.55). An analysis of variance revealed a reliable main effect of Language (F(2, 118) = 21.357, p < .01), with post-hoc comparisons (Bonferroni) showing significant differences between the English and Japanese group and between the English and Chinese group at the p < .01 level, and between the Japanese and Chinese group at the p < .05 level.
We then investigated the quantity of contextual information mentioned ahead of the main point (Measure 2). Figure 6 shows how many contextual descriptions were made before mentioning the main events and situations by English, Chinese and Japanese participants. Again as predicted, the Japanese group reported the highest number of contextual descriptions before the main events (M = 2.87), followed by the Chinese (M = 2.09) and the English group (M = 1.19). The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables, the dependent variable being the number of contextual information reported before the main events and situations. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 19.019, p < .01, with no interaction found between Native Language and Picture. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants and between the Chinese and English participants at the p < .01 level, and also between the Chinese and Japanese participants at the p < .05 level.
Further analysis revealed that the Japanese participants' tendency to report more contextual information before the main events and situations was especially pronounced in these three areas: Time-, Place-, and Field-related information. Figure 7 shows each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations within each language group. Time-related information was reported ahead of the main point in 79.1% of the Japanese, 59.8% of the Chinese, and 42.9% of the English responses across all the three pictures: Statistically significant differences were found only between the Japanese and English participants,χ2 (4) = 40.187, p < .01. Place-related information was reported ahead of the main events in 82.2 % of the Japanese, 51.2% of the Chinese, and 43.9% of the English responses, with significant differences being found between the two contrasts: English versus Japanese, Chinese versus Japanese, χ2 (4) = 57.333, p < .01. Field-related information appeared prior to the mention of main events in 39.5% of the Japanese, 25.2% of the Chinese, and 11.2% of the English responses, with significant differences only being found between English and Japanese participants,χ2 (4) = 34.844, p < .01. Concerning inferred antecedent events mentioned ahead of the main events, the obtained data were too few to be entered into statistical analysis.
In addition, Figure 8 shows relative percentage of the cause-effect order in the descriptions including causal relations. As can be seen, both Japanese and Chinese participants showed tendency to mention cause ahead of its effect in explaining causal relations, while English participants clearly preferred to mention effect prior to its cause, χ2 (4) = 54.072, p < .01.
Finally, Figure 9 shows the overall numbers of different contextual elements reported in participants’ responses, in any order of mention (Measure 3). The results also revealed Japanese participants’ context dependency, with the highest number of contextual elements reported across the three pictures. The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variables, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 13.579, p < .01, as well as a significant interaction between Native Language and Picture, F(4, 220) = 3.216, p < .05. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants in Picture 1 and 2 at the p < .01 level, and between the Japanese and Chinese groups in Picture 2 and 3 at the p < .01 level, but none between English and Chinese participants in any condition.
Taken together, these results at once confirm the findings of the Masuda & Nisbett (2001) study—that is, that Japanese speakers clearly prefer to mention more peripheral elements, and earlier in their descriptions, than do English speakers—but they also challenge the previous interpretation—namely, that this difference is the result of some pan-Asian cultural predisposition. The reason, clearly, that these results pose a challenge is that the Chinese group—as predicted by their linguistic typology—exhibit an intermediate behaviour, sometimes patterning with the Japanese participants, for example, with respect to mention of Cause before Effect (Figure 8), sometimes like the English group (e.g., with respect to the total numbers of peripheral items mentioned, shown in Figure 9). In other words, the results show that the bottom-up parsers (Japanese speakers) prefer the bottom-up discourse style and the top-down parsers (English speakers) employ the top-down discourse style, with Chinese speakers located intermediate between the two language groups, as predicted by its grammatical typology.
The following sample illustrates a typical response of Japanese participants for Mouse-Shadow picture (Figure 4), with abundant contextual information mentioned before the main events and situations:
‘It is at night with a full moon shining. In a snowy forest, an animal is walking with a stick searching for food. Then, a little mouse standing on a tree branch, who has been bullied by the bigger animal, hit upon a good idea. With the use of the moonlight, he enlarged his own shadow and casted it onto the ground. The animal, seeing the enlarged shadow, thinks it is a monster and is scared.’
At the very least, the results of this first experiment lend support to the idea that culture may not be the sole, or even key, determinant of the differences observed in the earlier study. As we shall see directly, the next two tasks offer even clearer reasons for scepticism: Not only do Chinese and Japanese participants behave differently, but the ‘Asian response’ is actually split.
Experiment 2
In the second experiment, we conducted two related tasks to determine whether there was any difference in attentional patterns toward the field when making visual descriptions. Our hypothesis was that top-down versus bottom-up parsing patterns embedded in participants’ discourse styles would exercise a larger effect on responses than any cultural affiliation. That is to say, it was anticipated that, as bottom-up parsers (Japanese speakers) need to refer to context first in their discourse, they should attend more to the field than would top-down parsers (English and Chinese speakers); as a consequence, Japanese speakers should mention disproportionately more peripheral elements than central elements—and may recall these better—than either English or Chinese speakers.
The second experiment comprised two tasks, each with its own dependent measures: a picture description task and a visual recall task. In the picture description task, we presented participants with a set of three color photographs and asked for a written description. The description task served two purposes: first, to discover which visual features of shared scenes participants mention in written reporting; second, simply to show participants the photographs for subsequent recall. Participants were not aware that some parts of the photographs would be extracted and presented again in another set of pictures in the visual recall phase. In the subsequent visual recall task, participants were shown those extracted portions of photographs and asked to judge whether these formed parts of the pictures that they had already seen.
Experiment 2, Phase 1: Picture Description
Procedure
The participants saw three photographs in turn and provided a written description for each of these. The photographs selected for description included both salient focal objects and smaller peripheral objects (see Figures 10, 11, and 12). Participants’ responses were then analyzed to determine which aspects of the photographs were more attended to: central objects or peripheral objects.9
Results and Discussion
The participants’ responses are charted in Figure 13. Notice that, as expected, the largest differences are observed in the mean number of peripheral items mentioned by each participant across groups. Here, the Japanese behave once again as predicted, with significantly higher mentions of peripheral items (M = 15.05) than the English group (M = 7.48). Notice in particular that this preference for peripheral items is not shared by Chinese participants, who actually mentioned fewer peripheral elements than even the English group (M = 3.43). The description data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Item Location (2 levels: Central, Peripheral) and Picture as within-subjects variables, the dependent variable being the number of items mentioned. The analysis revealed a reliable main effect of Native Language, F(2, 119) = 50.844, p < .01, as well as a significant interaction between Native Language and Location, F(2, 119) = 32.547, p < .01. The source of this interaction is clearly suggested by the plot in Figure 13.
Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants with respect to peripheral items mentioned at the p < .01 level, and also between the Chinese and English participants at the p < .05 level.
However, the Chinese group's relatively low number of descriptions of the peripheral items might be attributed to the fact that Chinese participants’ total number of descriptions for each photograph was much smaller than the other two groups overall. Therefore, it was also examined whether there was any difference among the three language groups in the ratio of peripheral items to central items mentioned within each language group. The results showed that the ratio of peripheral items to central items mentioned within each language group was 83% (Chinese), 112% (English), and 213% (Japanese), respectively. The results were entered into a repeated measures ANOVA with Native Language as the between-subjects variable, Picture as a within-subjects variable, and the ratio of peripheral items to central items mentioned in each response of each participant as the dependent variable. The analysis again revealed a reliable main effect of Native Language, F(2, 106) = 25.168, p < .01, without any interaction with Picture. Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants both at the p < .01 level, but none between English and Chinese participants.
It will be recalled that the focus of Masuda & Nisbett’s (2001) study was on differences between Japanese and English mention of peripheral items (the result, it was claimed, of East Asians’ superior attention to Ground over Figure). In the present study, however, the Chinese group mentioned the lowest number of peripheral items, both in absolute and relative terms. Statistically, the largest differences that were observed distinguished Chinese and Japanese participants, with no statistically significant difference found between Chinese and English participants. The Asian response was thus effectively split, divided by the responses of the English group. Whatever explains this pattern cannot plausibly be related to cultural affiliation.
Experiment 2, Phase 2: Visual Recall Task
The final task compared the three groups with respect to visual recall by testing participants’ accuracy in identifying peripheral fragments and recalling information about Ground details. Once again, our hypothesis predicted a split in the Asian response, whereas Masuda & Nisbett’s (2001) cultural interpretation predicted a pan-Asian advantage over the English participant group.
Procedure
To construct the visual recall task, peripheral portions were extracted from each of the three photographs shown in the picture description task and from one of the pictures used in the first story-telling task, five portions from each. These extracted fragments (n = 20) were combined with another 20 fragments extracted from novel pictures to create a set of 40 small pictures (2.5 × 2.5 cm): see Figure 14.
Participants were presented with this set of extracted portions and asked to identify the parts of the pictures that they had already seen in the previous tasks. Each participant’s score was counted according to the number of correctly identified portions: one point for correct identification and equally one point for correct rejection, 40 points in all. Following this identification test, participants were also presented with a forced choice memory test, including questions of the following kind: “In the beach picture, is the balloon yellow or red?” or “In the windmill picture, was the man behind the boy wearing shorts or long trousers?” and so on. There were twelve such questions in all: three questions relating to each of the four pictures used in the identification phase. Thus, this part of the experiment had two dependent measures: (i) number of correctly identified or rejected fragments; (ii) number of correctly answered information questions.
Results and Discussion
Figure 15 displays the picture fragments identification scores for the three language groups. As predicted, the Japanese group recorded the highest average score (M = 31.40); this was followed by the English (M = 26.79) and then the Chinese group (M = 22.07). An analysis of variance revealed reliable main effect of Language (F(2,118) = 45.164, p < .01): Post-hoc comparisons (Bonferroni) showed significant differences between the three language pairs, all ps < .01.
The results for the subsequent forced-choice memory test were also along the same lines (see Figure 16). The Japanese group obtained the highest score (M = 8.02), followed by the Chinese (M = 6.18) and the English group (M = 5.70). Again, a reliable main effect of language was found, F(2,118) = 14.952, p < .01, and again, post-hoc tests (Bonferroni) showed significant differences between the scores for the Chinese and Japanese groups, and the English and Japanese groups both at the p < .01 level, but no difference between the Chinese and English groups.10
These findings thus clearly demonstrate that there is no common Asian response in the visual recall task: Instead, the Chinese participants pattern to a large degree with the English group, separately from the Japanese group.
General Discussion
The experiments reported here examined the attentional biases of Japanese, English, and Chinese speakers across a range of linguistic and non-verbal tasks. Our hypothesis was that top-down versus bottom-up parsing mechanisms would be reflected in the speakers’ discourse strategies, with top-down parsers (English and Chinese speakers) having a preference for top-down discourse patterns—the main point being mentioned before contextual information—and bottom-up parsers (Japanese speakers) building a discourse in which contextual information precedes the main point. These discourse strategies were expected to exercise a larger effect on responses than any cultural affiliation.
We tested this hypothesis by means of narrative description (story-telling), picture description, and visual recall tasks. In the story-telling task, it was clearly shown that Japanese speakers were more likely to report contextual information before the main point than either English and Chinese speakers, consistent with the linguistic typology. Next, in the picture description and visual recall experiment, it was clearly revealed that Japanese speakers not only reported more background detail, but also recalled details about peripheral information significantly more accurately, than English and Chinese speakers, as evidenced by reliably higher scores on both identification and information recall tasks.
The present findings across three tasks are thus consistent with the hypothesis advanced here, namely, that the grammatical structure of particular languages predisposes speakers to particular attentional patterns: In other words, these results are consistent with a particular—and rather far-reaching—interpretation of the thinking for speaking hypothesis (Slobin 2003).
Perhaps more significantly than what these results speak for, is what they speak against: They allow us to reject not only the null hypothesis—that a speaker’s native language has no reliable effect on visual attention and recall scores—but also to reject the alternative hypothesis presented in Masuda & Nisbett (2001), that is, that Japanese participants’ enhanced ability to identify peripheral elements in visual scenes is due to cultural attributes common to Asian cultures. Our experiments show that across all tasks involving visual recall Chinese participants generally behave more like English participants than like Japanese, regardless of cultural affiliation: To reinforce the point—this is in spite of the fact that the test subjects were Chinese students studying Japanese in Japan. Whatever definition of culture one might employ, the expectation must surely be that if cultural factors determine response, these two groups should pattern together. The fact that they did not renders suspect any explanation in terms of immanent cultural values, at least with respect to these data.
Three final points are worth mentioning. First, as noted above (footnote 1), these are not unprecedented results. The findings of the present study directly confirm those of an independent set of studies reported elsewhere Anon (2010), where again the Chinese group’s results patterned directly with those of the English group and separately from the Japanese across a similar set of description and recall tasks. Thus, in a total of six different tasks, we have found that the Asian response has been split in ways that challenge the cultural relativity explanation, advanced by Nisbett and his colleagues.
The second point to observe is that these results do not necessarily refute Masuda and Nisbett’s (2001) interpretation of their own data (though such an explanation would be untenable in the present case). Our results, at the data level, are entirely compatible with those obtained by Masuda & Nesbitt for their Japanese and American participants: We also endorse their concluding suggestion that “[typical] Japanese may simply see far more of the world than do [typical] Americans” (Masuda & Nesbitt, 2001, p. 933 [added by authors]). Where we disagree is in respect of the most plausible explanation of this difference. We claim that if the suggestion is true, then it is not primarily because they are East Asians, but because they speak a head-final language, that Japanese speakers may see far more of the world. Such an interpretation would cover both our and Masuda & Nisbett’s findings: The alternative, that a different explanation applies to our respective experiments, seems to be considerably more ad hoc.
Note that we do not discount the view that cultural traditions may influence modes of perception or categorization in other cases. For example, in other studies, Nisbett and his colleagues have found cognitive (attitudinal and perceptual) differences between ethnic groups of participants sharing a common first language (e.g., monolingual speakers of Turkish in Uskul, Kitayama & Nisbett, 2008; second generation Korean vs. European Americans in Choi, Dalal & Kim-Prieto, 2000): unless the latter participants were also bilingual, then our hypothesis could not be applied to explain such results. All that is claimed here is that observed differences between English and Japanese speakers in these particular types of description and visual recall tasks are better explained in linguistic, rather than cultural, terms: If it were otherwise, Chinese and Japanese participants should pattern together.11
Notice finally that our hypothesis generates further, testable predictions about cross-linguistic splits and groupings within and across cultural spheres. For instance, given our hypothesis, speakers of Korean, another head-final language, should behave like Japanese participants in visual recall, while Vietnamese and Thai speakers should pattern with the Chinese participants. These predictions will be tested in future experiments. If all effects of language structure, whether it is the grammatical parameter of head position or particular discourse strategy, can be shown to outweigh effects of culture for speakers of other languages, then our results will constitute a significant challenge to this part of the evidence base for cultural relativism.
References
Anno, M. (1977). Tabi no ehon. Tokyo: Fukuinkan Shoten.
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conception of time. Cognitive Psychology, 43, 1-22.
Boroditsky, L. (2003). Linguistic relativity. In L. Nadel (Ed.), Encyclopedia of cognitive science (pp. 917-921). London: Macmillan Press.
Bowerman, M. (1996). The origins of children’s semantic categories: Cognitive vs. linguistic determinants. In J. Gumperz, & S. Levinson (Eds.), Rethinking linguistic relativity (pp. 145-176). Cambridge, MA: Cambridge University Press.
Choi, I., Dalal, R., & Kim-Prieto, C. (2000). Information search in causal attribution:
Analytic vs. holistic. Urbana-Champagne: University of Illinois.
Chomsky, N. (1981). Lectures on government and binding. Dordrecht, Holland; Cinnaminson, N.J.: Foris Publications.
Chomsky, N. (1995). The Minimalist program. Cambridge, MA: MIT Press.
Donaldson, J., & Scheffier, A. (2007). The Gruffalo's child. London: Campbell Books.
Duffield, N., & Tajima, Y. (2010). On the non-uniformity of Asian thinking (for speaking): A response to Masuda and Nisbett. In M. Iverson, I. Ivanov, T. Judy, J. Rothman, R. Slabakova, & M. Tryzna (Eds.), Proceedings of the 2009 Mind/Context Divide Workshop (pp. 28-39). Somerville, MA: Cascadilla Proceedings Project.
Fodor, & Inoue, (1994). The diagnosis and cure of garden paths. Journal of Psycholinguistic Research, 23, 5, 407-434.
Greenberg, J. (1978). Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (Ed.), Universals of language. Cambridge, MA: MIT Press.
Hawkins, J. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21, 223-260.
Hsiao, A. H.-H. (1999). Holding the frog in place: Linguistic typology of Mandarin Chinese. Unpublished senior honors thesis, University of California, Berkeley. [cited in Slobin (2003)].
Huang, C.-T. J. (1994). More on Chinese word order and parametric theory. In B. Lust, M. Suñer, & J. Whitman (Eds.), Syntactic theory and first language acquisition: Cross-linguistic perspectives, Vol. 1: Heads, projections and learnability (pp. 15-35). Hillsdale, NJ: Lawrence Erlbaum Associates.
Kitayama, & Nisbett, (2008)
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Li, P., & Gleitman, L. (2002). Turning the tables: language and spatial reasoning. Cognition, 83, 265-294.
Maguire, E. A., Gadian, D. G., Johnsrude, I. S., Good, C. D., Ashburner, J., Frackowiak, R. S. J., & Firth, C. D. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398-4403.
Malt, B., Sloman, S., Gennari, S., Shi, M., & Wang, Y. (1999). Knowing vs. naming: Similarity and the linguistic categorization of artifacts. Journal of Memory and Language, 40, 230-262.
Masuda, T., & Nisbett, R. E. (2001). Attending holistically vs. analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology, 81, 922-934.
Nakayama, M. (1999). Sentence processing. In N. Tsujimura (Ed.), Handbook of Japanese linguistics (pp. 398-424). Oxford: Blackwell.
Özçalışkan, Ş., & Slobin, D. I. (1999). Learning how to search for the frog: Expression of manner of motion in English, Spanish, and Turkish. In A. Greenhill, H. Littlefield, & C. Tano (Eds.), Proceedings of the 23rd annual Boston University Conference on Language Development, Vol. 2 (pp. 541-552). Somerville, MA: Cascadilla Press.
Özçalışkan, Ş., & Slobin, D. I. (2000). Climb up vs. ascend climbing: Lexicalization choices in expressing motion events with manner and path components. In S. C. Howell, S. A. Fish, & T. K-Lucas (Eds.), Proceedings of the 24th Annual Boston University Conference on Language Development, Vol. 2 (pp. 558-570). Somerville, MA: Cascadilla Press.
Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently . . . and why. New York: Free Press.
Nisbett, R.E., & Masuda, T. (2003). Culture and point of view. PNAS, 100(19), 11163-11170.
Nisbett, R.E., & Miyamoto, Y. (2005). The influence of culture: holistic vs. analytic perception. Trends in Cognitive Sciences, 9(10), 467-473.
Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic vs. analytic cognition. Psychological Review, 108, 291-310.
Pederson, E., Danziger, E., Levinson, S., Kita, S., Senft, G., & Wilkins, D. (1998). Semantic typology and spatial conceptualization. Language, 74, 557-589.
Slobin, D. I. (1996). Two ways to travel: Verbs of motion in English and Spanish. In M. Shibatani, & S. A. Thompson (Eds.), Grammatical constructions: Their form and meaning (pp. 195-219). Oxford: Oxford University Press.
Slobin, D. I. (1997). Mind, code, and text. In J. Bybee, J. Haiman, & S. A. Thompson (Eds.), Essays on language function and language type (pp. 437–467). Amsterdam: John Benjamins Publishing Company.
Slobin, D. I. (2000). Verbalized events – a dynamic approach to linguistic relativity and determinism. In S. Niemeyer, & R. Dinsen (Eds.), Evidence for linguistic relativity. Amsterdam: John Benjamins Publishing Company.
Slobin, D. I. (2003). Language and thought online: Cognitive consequences of linguistic relativity. In D. Gentner, & S. Goldin-Meadow (Eds.), Language in mind: Advances in the study of language and thought (pp. 157-192). Cambridge, MA: MIT Press.
Talmy, L. (1975). Semantics and syntax of motion. In J. P. Kimball (Ed.), Syntax and semantics, Vol. 4 (pp. 181-238). New York: Academic Press.
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description, Vol. 3: Grammatical categories and the lexicon (pp. 57-149). New York: Cambridge University Press.
Talmy, L. (2000). Toward a cognitive semantics: Typology and process in concept structuring, Vol. 2. Cambridge, MA: MIT Press.
Travis, L. (1984). Parameters and effects of word order variation. Unpublished PhD dissertation, MIT.
Ungerer, T. (1975). Emile. Tokyo: Bunka Publishing Bureau.
Uskul, A. K., Kitayama, S., & Nisbett, R. N. (2008). Ecocultural basis of cognition: Farmers and fishermen are more holistic than herders. PNAS, 105, 8552-8556.
Footnotes
1 As well as that which immediately precedes it (see Duffield & Tajima 2010). The current study offers a completely new set of experiments, in which we attempted to remedy some methodological shortcomings of the original task. This new experiment (with different participants, materials, and modes of analysis) provides even clearer support for our original hypothesis.
2 Throughout, following Nisbett’s own practice, the term Asian is taken to refer to East Asian ethnic and national groups (especially Chinese, Japanese and Korean groups), rather than to South Asian groups (which is the British default usage of the term): it is unlikely that Nisbett’s claims are intended to extend to any groups beyond the (historical) Han Chinese sphere of influence.
3 This problem comes to the fore in respect of Nisbett’s subsequent work on other cultural groups (Uskul, Kitayama & Nisbett 2008): see below.
4 Other V-languages in Slobin’s survey include Turkish, Spanish, and Hebrew; other S-languages include Mandarin and Russian.
5 It should be clear that the terms head-initial and head-final are completely independent of writing order: Arabic and Hebrew, for example, are head-initial languages that are written from right-to-left; Turkish is a predominantly head-final language written left-to-right.
6 The constituent analysis proposed here simplifies, but does not fundamentally misrepresent, the phrase-structure of Japanese (It may be, for example, that the genitive element no and the post-nominal marker de should bear other category labels, but this does not change the fact that adpositional phrases in Japanese are consistently head-final).
7 The term is due to Hawkins (1990).
8 We are naturally aware of the fact that many generativist linguists, including Huang himself, would treat Mandarin Chinese as underlyingly head-final in the verb-phrase, with verb-movement deriving the overt head-initial order. Be that as it may, what is relevant here are the surface configurations that provide the instructions for parsing and syntactic production: at this level, Chinese patterns—on balance—more like English than like Japanese.
9 The items classified as central and peripheral for each photograph in the picture description task are as follows:
For the Windmill picture (Figure 10), central items are the boy in a yellow T-shirt and the green windmill, while peripheral items are restaurant, table, chair, patrons, trees, shade, European street, passers-by, sunny, summer season, basket, instrument, signboard, pillar, sack, posters, balcony, building, and fallen leaves.
For the Beach Picture (Figure 11), the central item is the boy smiling in the foreground, while the peripheral are balloon, trees, mountains, beach, pebble, sky, clouds, wind, air, buildings, restaurants, construction, sunny, holiday season, resort, and road.
For the Bubble picture (Figure 12), the central is the boy with a bubble-maker, while the peripheral are other children, buildings, shops, street, signboard, air, sunny, summer season, the man with a balloon, the girl sitting on the bench, sack, passers-by, floating bubble, windows, lamp, curtain, and wooden floor.
10 Interestingly, there were only weak correlations for all of the groups concerned between their scores for the identification task and those on the information task (Chinese r = 0.26; English r = 0.13; Japanese r = 0.13). This may suggest that there is no necessary relationship between perceptual knowledge of an event and propositional knowledge about it.
11 This is not to say that our results have no wider implications for Cultural Relativity arguments, but to stress that this is only a first step of a larger project: Ultimately, it would seem to us desirable to account for all putative effects of broad culture in terms of more tangible and more plausible linguistic or local environmental factors. For example, the expectation that inhabitants of high-density urban environments should pay more attention to peripheral visual information than those who live in smaller communities is both plausible and measurable, indeed this is established in Nisbett & Masuda (2003); however, we do not view this effect as ‘cultural’ in any interesting theoretical sense; see introductory discussion.
List of Figures
Figure 2. Sample scene fragments: Focal Fish Condition (from Nisbett and Masuda, 2003).
Figure 3. Top-down versus Bottom-up parsing mechanisms. This figure illustrates the way in which top-down parsers (e.g., English speakers) first decide the whole sentence structure and then fill each slot with words, whereas bottom-up parsers (e.g., Japanese speakers) begin with laying words, and gradually construct the whole sentence.
Figure 4. Sample picture used in the story-telling task: Mouse-Shadow picture (Donaldson & Scheffier, 2007).
Figure 5. Experiment 1 Results (Measure 1): Average number of times that some type of contextual information was mentioned ahead of the main point in the story-telling task by English, Chinese and Japanese participants.
Figure 6. Experiment 1 Results (Measure 2): Mean number of contextual descriptions mentioned before the main events and situations in the story-telling task by English, Chinese and Japanese participants.
Figure 7. Experiment 1 Results: Each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations in the story-telling task.
Figure 8. Experiment 1 Results: Each percentage of responses in which cause was mentioned ahead of its effect (Cause-First) and where effect precedes its cause (Effect-First) in the story-telling task. Only constituents that were explicitly mentioned with markers for causal relations are counted into the data.
Figure 9. Task 1 Results (Measure 3): Mean number of overall different contextual descriptions reported in the story-telling task, in any order of mention.
Figure 10. Windmill Picture used in the picture description task: Picture 1.
Figure 11. Beach Picture used in the picture description task: Picture 2.
Figure 12. Soap Bubble Picture used in the picture description task: Picture 3.
Figure 13. Mean number of central and peripheral items mentioned by Chinese, English, and Japanese participants in the picture description task.
Figure 14. Extracted fragments of pictures used in the identification test of the visual recall task (sample).
Figure 15. Picture fragments identification scores in the visual recall task (by language group).
Figure 16: Mean scores for the forced-choice memory test in the visual recall task (by language group).
Japanese Versus Chinese Differences in Picture Description and Recall: Implications for the Geography of Thought
Yayoi Tajima and Nigel Duffield
Keio University and University of Sheffield (now Konan University)
Authors’ Note
This research was supported in part by grants from the Mori Foundation. We would like to thank Mutsumi Imai, Yichun Ryo, Gary Wood and Samir Zarqane for their assistance in conducting this study.
Abstract
This study examined whether the grammatical structure of particular languages predisposes speakers to particular attentional patterns. We hypothesized that the holistic attentional bias of Japanese participants in a previous study (Masuda & Nisbett’s (2001), which was attributed to pan-Asian cultural factors, is better interpreted as a consequence of specific linguistic properties: Japanese speakers’ bottom-up discourse strategy. In experiments involving Japanese, English, and Chinese native speakers, it was found that Japanese participants reported more contextual information before explaining the main point, mentioned more background details overall, and recalled background elements significantly more accurately than either English or Chinese participants. The ‘Asian response’ was thus split, as predicted by the Linguistic Relativity hypothesis, but contrary to the expectations of a Cultural Relativity account.
Keywords: field dependency, attention, linguistic relativity, head directionality
Japanese Versus Chinese Differences in Picture Description and Recall:
Implications for the Geography of Thought
The general aim of the present study1 is to contribute to the ongoing debate concerning the extent to which culture and/or language is able to penetrate core areas of cognition—especially visual attention and recall—that were previously viewed as largely impervious to social or linguistic experience. The theoretical impetus for this research is provided by work by Richard Nisbett and his colleagues (Nisbett, Peng, Choi, & Norenzayan, 2001; Nisbett, 2003; Nisbett & Masuda, 2003; Nisbett & Miyamoto, 2005, inter alia), in which Asians2 and Westerners (specifically, European Americans) are claimed to exhibit distinct cognitive styles—holistic versus analytic attention—this difference being reflected in markedly contrasting levels of field-dependence across a variety of experimental tasks. Nisbett and his colleagues argue that this inter-group difference is due to deep-seated cultural attitudes, beliefs and traditions: In the case of Asian groups, their holistic style is explained by reference to a collectivist, inter-dependent tradition and outlook, informed by Confucianism and relative subservience to societal institutions; by contrast, Westerners’ (Midwestern Americans, in the typical case) analytic style is an expression of a more individualistic impulse, informed by traditions of logical thought and self-determination having their origins in classical Athenian culture.
There are numerous prima facie objections to such claims. There is the observation, for instance, that these arguments gloss over any number of intermediate cases—what about, say, the attentional patterns of more collectivized, interdependent Western European groups, for example, contemporary Athenian citizens?, or highly individualistic Taiwanese MBAs? Or that they seem to grossly overstate the degree of intra-group homogeneity on either side of the Pacific. The most significant objection, however, is that in the final analysis they amount to little more than unexplained correlations (nowhere, for example, is it articulated what causal relationship there might be between a preference for syllogistic reasoning and a decrease in field dependence, or how cultural beliefs should effect changes in brain mechanisms implicated in spatial memory). In spite of this, Nisbett’s arguments appear to have gained some traction amongst cognitive anthropologists and psychologists, and for this reason they deserve serious consideration.
To ‘professional outsiders’ such as ourselves, coming from theoretical and applied linguistics, the appeal of Nisbett’s explanation is less obvious. However, it should be noted immediately that our purpose is not to challenge the data, but rather to question the interpretation in terms of immanent culture.
It should also be clear that it is impossible to tackle every part of Nisbett’s thesis at once: There are simply too many potentially related variables to control for. Instead, this paper focuses attention on the specific issues raised by a study reported in Masuda & Nisbett (2001), in which observed contrasts in field-dependence between Japanese and American participants are interpreted in terms of the culturally embedded attitudes and beliefs outlined above, rather than—as suggested below—in terms of formal grammatical differences between the languages spoken by the two participant groups. In brief, our claim will be that Japanese participants behave as they do in visual description and recall tasks primarily in virtue of being speakers of Japanese, rather than in virtue of any pan-Asian cultural affiliation. We shall support this contention by showing that, in three tasks very similar to those presented in Masuda & Nisbett (2001), the ‘Asian Response’ is split apart, with Chinese participants’ responses either patterning with those of the English group, rather than with the Japanese, or else revealing an intermediate response predicted by the linguistic typology articulated below.
Before presenting the study, something needs to be said about language and culture in the present context, since both terms are open to construals that limit—or even negate—the possibilities for empirical research aimed at teasing these factors apart. On the one hand, it is obvious that many substantive properties of language are dependent on the culture of their speakers. For instance, languages whose speakers live in social groups without governmental or religious institutions will not contain words for position-holders within those institutions, fishermen typically have a richer vocabulary of marine life than mountain herders, and so forth. It is plausible, though not yet conclusively demonstrated, that such cultural differences not only impact upon variation in lexical knowledge, but also upon perceptual and discrimination abilities: This is indeed what is claimed in more recent work by Nisbett and his colleagues (Uskul, Kitayama, & Nisbett, 2008).
Related to this issue is the Sapir-Whorf question, whether particular aspects of languages themselves exercise any determining influence on the non-linguistic cognitive capacities of their speakers. This is considerably more contentious, with proponents of stronger versions of the thesis—such as Boroditsky (2001), Bowerman (1996), Pedersen et al. (1998)—opposed to those offering more universalist interpretations of similar data, including Malt, Sloman, Gennari, Shi, & Wang (1999), Li & Gleitman (2002); see Boroditsky (2003), for an overview. What this brief discussion highlights is the importance of identifying formal linguistic factors that can clearly be demonstrated to be orthogonal to cultural ones: In this paper, we propose one such variable (namely, Head-Directionality in phrase-structure).
A different set of problems surround the term culture. One major problem is that it can easily become weakened to the point at which any contingent property distinguishing two groups, however superficial or ephemeral, can be deemed “cultural”. It may seem to be one thing to use the term to refer to a group sharing a common set of distinctive familial, political and religious practices bound by agreed social norms, and whose distinct conventions and traditions are passed on from one generation to the next, and quite another to speak of groups bound by their contingent employment situation or geographical context—the “culture” of first year international students, for example, or of the 1960s, of high-density living, of fast-food workers in suburban strip-malls, of tabloid readers, and so forth. In practice, however, it proves problematic to decide when culture shades into community of practice, or something even less theoretically significant.3
In addition to this, there seems to be a lack of consensus among psychologists, anthropologists, and social scientists about the necessary or sufficient conditions for belonging to a culture, or acculturation. If an individual can be deemed to be influenced by a particular culture after only months, or even weeks, of contact, it again becomes very hard to tease apart the alleged effects of culture from other superficial contextual properties. This suggests that if one wishes to make interesting theoretical claims about the effects of culture on cognition, then the cultural factors called on should have some reasonable permanence and persistence in the life of the individual: Ideally, these should be attributes acquired in early childhood and shared by all eligible members of the cultural group in question.
Unless definitions are restricted in this way, it becomes nearly impossible to distinguish between the effects of environmental and/or occupational factors on attentional mechanisms—effects that are remarkable but not deeply surprising, where they are found—versus the effects of immanent culture. Consider, for example, the finding reported in Maguire et al. (2000), that the posterior hippocampus regions of London taxi-drivers were significantly larger than those of a control group, that the anterior portions were significantly smaller, and that this asymmetry increased with years of taxi-driving experience. Such results provide striking evidence of adult adaptations in brain regions that are associated with spatial memory and navigation. It seems entirely plausible—though this was not tested—that such physiological adaptation is also reflected in increased sensitivity to contextual factors, which in turn could be interpreted as more holistic/field-dependent cognitive style. But it would be wholly misleading to attribute this occupationally actuated change to cultural factors—say, “the culture of taxi-drivers”.
The Original Study
Masuda & Nisbett (2001) conducted an experiment with Japanese and American participants, in which they first presented underwater scenes (termed ‘animated vignettes’) of 20 seconds’ duration, featuring a salient focal fish as well as other smaller objects such as smaller fish, bubbles, shells and rocks (see Figure 1).
Participants were first asked to describe what they had seen. Subsequently, they were presented with different set of object scenes and were asked to judge whether or not the elements depicted in these new scenes were identical to those featured in the original vignette. Figure 2 provides an example of the Figure condition (in which the focal fish was held constant). Some objects were presented with the original background, and others were presented with a neutral or novel background.
The results showed that, with respect to basic description, the Japanese group made about 50% more statements concerning background information and around 70% more statements about inert objects than the American group. While American participants invariably began their descriptions with the salient (focal) object, Japanese participants were much more likely to begin their statements by mentioning background elements (e.g., “There was a pond, and . . . ”). In addition, Japanese participants’ performance in identifying the focal fish was more (adversely) affected by the change of backgrounds; conversely, the Japanese participants reliably outperformed Americans in correctly identifying Ground features of the original scene.
Masuda & Nisbett (2001), Nisbett & Masuda (2003) interpret these results in terms of the aforementioned cultural dichotomy: It is the persisting social and philosophical values of ancient China that predispose Japanese perceivers to holistic attention. However, it is equally possible to interpret these particular findings as due to linguistic, rather than cultural factors, since in this instance there is a confound between language and culture: As we shall show directly, the grammatical and discourse structure of Japanese (i.e., the Japanese language) differs from that of English at least as much as pan-Asian culture differs from that of European Americans.
Towards an Alternative Interpretation: Thinking for Speaking
Dan Slobin’s Thinking For Speaking hypothesis is especially relevant to the present discussion. In a series of papers (Slobin, 1996, 1997, 2000, 2003), Slobin develops the idea that there exists a process of ‘thinking for speaking’, apart from general cognition. He argues that:
The activity of thinking assumes a distinct character when it takes place for speaking, because, in the process of speaking, one needs to adjust one’s thought to immediately available linguistic forms. Each language provides many, but a finite number, of particular words and grammatical constructions to encode reality. In consequence, when one thinks for speaking, one unconsciously focuses on those aspects of objects and events that are most readily encodable in one’s particular language. (Slobin, 2003, p. 157)The paradigm case of a cross-linguistic difference in event construal concerns the encoding of motion events, and involves the semantic components of path and manner of motion. In research stemming from seminal work by Talmy (1975), see also Talmy (1985, 2000), it has been repeatedly observed that languages may be classified into two types—Verb-framed versus satellite-framed—according to how these two semantic components are lexically encoded. In verb-framed languages (V-languages), such as Spanish and Japanese, path is obligatorily expressed as a component of the verb, while manner of motion is (optionally) expressed as an adjunct phrase; by contrast, in predominantly satellite-framed languages (S-languages), such as English or Dutch, manner of motion is directly encoded on the verb, while path is expressed as a separate preposition (or particle). This contrast is illustrated in (1) and (2) below: In English and Dutch, path is expressed by the satellite element (across, over), while in French and Japanese, the same semantic notion is encoded in the main verb (traverse, wataru), with the manner component expressed as an (optional) adjunct phrase (en nageant, oyoi-de):
(1) a. He is swimming across the river. (English)
b. Hij zwemt de rivier over. (Dutch)
he swim-PRES the river over
‘He swims over the river.’
(2) a. Il traverse le fleuve en nageant. (French)
He cross-PRES the river in swim-GERUND
‘He crosses the river, swimming.’
b. 泳いで 川を 渡る (Japanese)
oyoi-de kawa-o wataru.
swim-BY river-ACC cross-PRES
‘Swimming, (He) crosses the river.’
The crucial point to observe about these examples is that this linguistic typology is orthogonal to broad-scale cultural, geographic, or indeed, genetic affiliation4: In this case, French and Japanese pattern together, in contrast to English or Dutch.
Slobin points out that this formal difference in language structure has important consequences for many aspects of language use, including—most relevantly—for narrative descriptions. For example, it is shown that S-language speakers use manner verbs significantly more often than V-language speakers when describing the same events (see Hsiao, 1999; Özçalışkan & Slobin, 1999); that S-language novels have greater type and token frequencies in situations in which human movement is described; S-language writers, overall, give their readers significantly more information—explicit and inferential—about the manners in which their protagonists move about (Özçalışkan & Slobin, 2000) than do V-language writers. Such observations suggest, at the very least, that one should be circumspect about ascribing differences in narrative description to cultural factors, since these typological groupings, which cross-cut cultural spheres of influence, also show clear correlations with narrative style.
The S-language/V-language parameter is not, of course, the only typological distinction to cross-cut genetic boundaries, nor—although it nicely illustrates our general point—is it the parameter that we consider best explains the Japanese-English contrast obtained in Masuda and Nibett’s (2001) study. Instead, the typological linguistic variable that we believe to be at work here is the Head or Head-Directionality Parameter.
The Head Parameter: Overview
One of the most obvious ways in which languages vary syntactically is with respect to clausal word order—the position of phrasal constituents relative to one another. This type of variation at the clausal level results directly from the Head Parameter: Whether the head element of the phrases that make up a sentence appears to the left or right of its respective complement. In a consistently head-initial language, such as English or French, the verb precedes the direct object in the verb phrase, (temporal and modal) auxiliaries precede the verb phrase, clausal complementizers precede the embedded clause they introduce, and the language has prepositions, rather than postpositions. This is illustrated for English in (3), where in each example the relevant head(s) is/are indicated in bold, their complement phrases in italics:
(3) a. John [VP brokeV the vase ].
b. John [ModalP shouldM [NEGP notNEG [ASPP have [VP broken the vase]]]].
c. John said [CompP thatCOMP [S he hadn’tT brokenV the vase ]].
d. John danced [PP aroundP [NP the room [ inP [NP the palace]]]].
Exactly the opposite order is observed in a head-final language such as Japanese, Korean or Turkish: The verb follows its object; tense and mood affixes are invariably expressed as verbal suffixes (where these appear as auxiliaries, they follow the verb-phrase); complementizers appear to the right of complement clauses; the language is postpositional:5
(4) a. John-ga [VP kabin-o wattaV ].
John-NOM vase-ACC break-PAST
‘John broke the vase.’
b. [TP John-wa [[[VP kabin-o waruV ] bekide-waT ] na-NEG ] kattaT ].
John-NOM vase-ACC break should-NOM not-PAST
‘John should not have broken the vase.’
c. John-wa [ [ kabin-o watteV nai S] toCOMP CP] ittav.
John-NOM vase-ACC broke not COMP say-HAVE
‘John said that he hadn’t broken the vase.’
d. John-wa [[[[ kyuden NP] no P PP] hiroma NP] deP PP] odottav.6
John-NOM palace of room in dance-PAST
‘John danced around the room in the palace.’
In generative theory (e.g., Chomsky, 1981, 1995), only head-complement order is relevant to determining the head-parameter for a given phrase; see, in particular, Travis (1984). In other approaches, however—and especially within the typological framework initiated by Greenberg (1978)—all head-modifier relations are potentially relevant to determining the head-initial or head-final status of the language. Thus, the position of attributive adjectives, relative clauses, possessor phrases, and subordinate adjunct clauses are also taken into account. By all of these measures also, Japanese is consistently head-final.
Not all languages display such consistent cross-categorical harmony in head-modifier order.7 Some languages, for example, project right-headed phrases for one syntactic category and left-headed phrases for another, so that it becomes harder to classify the language overall in terms of a single binary parameter. (Mandarin) Chinese is a case in point.
Huang (1994) provides a useful discussion of Chinese word order. The core facts are illustrated by the examples below, which reveal that Chinese is normally head-initial with respect to verbs and TAM auxiliaries (5a)/(5b)—including the position of clausal complements (5b)—and with respect to prepositional phrase (6), but head-final with respect to lexical noun phrases: Both nominal complements (7a) and nominal adjuncts precede the head-noun; relative clauses are internally headed by a right-peripheral head (7b).8 Once again, in each example the relevant head element is indicated in bold, the complement or modifier in italics:
(5) a. Zhangsan meiyou [VP kanjianV [NP Lisi]].
Zhangsan not-HAVE see Lisi
‘Zhangsan did not see Lisi.’
b. Zhangsan [VP zhidaoV [S Lisi [NEGP buNEG [AP chengshi]]]].
Zhangsan know Lisi not honest
‘Zhangsan knows that Lisi is not honest.’
(6) a. Zhangsan [VP zhuV [PP zaiP [NP Meiguo]]].
Zhangsan live at America
‘Zhangsan lives in the US.’
b. Zhangsan fang-le yi-ben shu [PP zaiP [NP zhuozi-shang]].
Zhangsan put- PERF one- CL book at table-top
‘Zhangsan put a book on the table.’
(7) a. [[[ yuyanxue NP] deP PP] yanjiuN NP]
linguistics DE research
‘the study of linguistics’
b. [[[ni zui xihuan S] deP PP] nei-ben shuN NP] mai-wan le.
you most like DE that-CL book sell-out PERF
‘The book that you like most has been sold out.’
Hence, with respect to purely syntactic properties, English and Japanese represent two ends of a grammatical continuum, with Chinese somewhere in the middle (though much more like English in terms of token frequency, and crucially, with respect to verbal projections).
The question that arises at this point is how this typological contrast—however interesting it may be from a linguistic perspective—should explain the attentional patterns observed in Masuda & Nisbett’s (2001) study: Why should head-finality predispose Japanese speakers to greater holistic attention or field-dependence? There are two responses to this question, the first relatively superficial, with few consequences for linguistic relativism, the second rather more complex, but with more interesting implications for the relationship between language and visual cognition.
Taking the superficial relationship first, notice that one of the dependent measures in the Masuda & Nisbett (2001) study was order of mention: Whether participants first mentioned the focal fish (Figure) or the background context (Ground) in their verbal descriptions. Masuda and Nisbett interpret the elements first mentioned as ‘more salient’ and conclude from the fact that Japanese participants consistently mentioned contextual information ahead of focal information that Japanese group paid greater attention to the field than their American counterparts. Yet, as we have just seen, the grammatical and discourse structure of Japanese virtually guarantees this result: If contextual information is to be mentioned at all, it must be mentioned first, since the main predicate is canonically the final sentential constituent; conversely, the discourse structure of English affords American participants more opportunity to mention focal elements first in their verbal descriptions— wholly irrespective of the relative salience of Figure and Ground in their conceptual representations of the event.
If this observation is valid, then it follows that group differences in order of mention effects do not necessarily speak to the issue of holistic versus analytic attention. More importantly, this observation allows us to formulate clear predictions for a new study involving Japanese, English, and Chinese participants: If grammatical and discourse structure determine order of mention in visual descriptions, then the verbal reports of Chinese participants should be intermediate between the other two groups, patterning more with those of English speakers, significantly more in the Figure-to-Ground order (from head to modifier) than in the case of the Japanese group, who are expected to make descriptions (almost exclusively) in the Ground-to-Figure order (from modifier to head). If, on the other hand, order of mention is determined by cognitive styles that are associated with cultural affiliation, then the Chinese and Japanese groups should pattern with each other more similarly than either does to English. These predictions are explored in the first of the studies reported below.
A more interesting question, however, is whether it is possible to connect this typological difference to the other dependent measures in the Masuda & Nisbett (2001) study (besides order of mention). It will be recalled that order of mention was only one of three measures that distinguished Japanese from American participants. The other two were the number of contextual (Ground) features mentioned by each participant for each description, and—most interestingly still—the number of contextual features correctly recalled in representation of still fragments: Japanese participants not only mentioned significantly more background details, they also remembered better which elements they had previously observed. This last measure, in particular, is not obviously related to linguistic typology.
And yet it might be. What is distinctive about Slobin’s (2003) thinking for speaking hypothesis is the extent to which linguistic structures are assumed to penetrate cognition in various ways. In contrast, for example, to Levelt (1989), who also entertains a form of the thinking for speaking hypothesis, but who supposes that effects of language are restricted to the time of utterance—that is, it is only when one prepares to speak that language affects conceptualization—Slobin (2003) speculates that thinking for speaking effects could extend beyond speech time and may induce speakers to form specific attentional patterns, even in the absence of language. In support of this speculation, Slobin cites the work of Pederson, Levinson et al. (1998):
Far more than developing simple habituation, use of the linguistic system, we suggest, actually forces the speaker to make computations he or she might otherwise not make. . . . That is, the linguistic system is far more than just an available pattern for creating internal representations: to learn to speak a language successfully requires speakers to develop an appropriate mental representation which is then available for non-linguistic purposes. (Pederson, Levinson et al., 1998, p. 586 [emphasis in original])Thus for example, when we speak English, we are forced to pay attention to gender of third parties, because the language requires gender specification of (singular) pronouns. On the other hand, when we speak Japanese, we are forced to direct our attention to the asymmetric relationships between individuals: elder/younger, senior/junior, or close/remote, because of the language’s honorific systems. As a consequence, English speakers form a habit of attending to gender and Japanese speakers develop a habit of attending to human relations in non-linguistic contexts also. Slobin thus assumes that thinking for speaking effects induce language-specific attentional preferences beyond the linguistic domain.
In the present case, let us assume that the grammatical and discourse structure of Japanese, which is known have effects on syntactic processing and production, leading to bottom-up parsing routines—as opposed to the top-down strategies of English parsing mechanisms (see Fodor & Inoue, 1994; Nakayama, 1999, for discussion), also impacts on conceptual structures. Suppose that, as is the case for syntactic constituents, final conceptual representations (including representations of causality) are constructed—as they are reported—“from the bottom up”, from background context to focal elements/main arguments, as schematized in Figure 3.
Our conjecture, then—extending the Thinking for Speaking hypothesis—that the (top-down vs. bottom-up) parsing mechanisms that implement different (head-initial vs. head-final) grammatical settings are reflected in the speakers’ discourse strategies, which in turn influences their attentional patterns. Japanese speakers build up phrase structure from complements and/or modifiers to heads; in other words, from semantically and syntactically peripheral elements to core elements, in a bottom-up fashion. It plausibly follows from this that Japanese speakers are predisposed to plan and interpret discourse by placing peripheral elements ahead of the main point. This notion is supported by anecdotal observation: When describing, explaining, excusing, arguing, or persuading, Japanese speakers tend to begin their statements with peripheral elements (reasons, situations, and contexts), before referring to the main points (effects, intentions, and conclusions). As a consequence, they may have developed a perceptual habit of attending to the entire field.
In short, our hypothesis is that Japanese speakers are more likely to attend to contextual information primarily because the grammatical and discourse structure of the language requires speakers to mention contextual information ahead of focal information.
As strained as it may appear at first, this hypothesis nevertheless generates rather clear predictions concerning the behavior of Chinese participants in the present study, with respect to the other two dependent variables at stake (viz., number of mentions of background elements/correct identification of background elements in subsequent presentation). If the head-directionality parameter plays a significant role accounting for the differences between Japanese and American participants in Masuda & Nisbett’s (2001) original study, then Chinese participants are expected to pattern with English participants with respect to these dependent measures, splitting the Asian response. It will be clear that Masuda & Nisbett’s cultural explanation predicts a quite different split.
General Method
In order to test the hypothesis outlined above, we constructed two linguistic tasks to compare the verbal descriptions of Japanese, English, and Chinese speakers with respect to foregrounded and backgrounded information. The first task (story-telling) involved explaining the events depicted in a pivotal scene in well-known children’s story books (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975); the second (picture description) task involved more straightforward description of unfamiliar photographs. Subsequently, as in the Masuda & Nisbett (2001) experiment, fragments of the images previously shown were presented to participants for identification (fragments identification test); the participants were also asked information questions about the photographs they had seen (information recall test).
Participants
The same 120 participants took part in all of the experiments: 43 Japanese native speakers (21 women, 22 men) were recruited from the University of Keio, Japan, 33 English native speakers (23 women, 10 men) from the University of Sheffield, UK, and 44 Chinese native speakers (25 women, 21 men) from the International Study Institute Chukyo, Japan. All of the participants were undergraduate or postgraduate university students, or attended language school students preparing for university entrance, aged 18-30 years old. The Japanese and Chinese participants were tested in Japan, and English participants were tested in UK. All associated language materials were translated, and presented to each group in their own language, by native-speaker experimenters.
Experiment 1
We first conducted a story-telling task to examine the discourse preferences of Japanese, English, and Chinese speakers: Whether contextual information was mentioned before the main point or vice-versa. Given our hypotheses outlined above, it was predicted that the responses of the Japanese group should diverge significantly from those of the other two groups.
Procedure
Participants were presented with illustrations extracted from three well-known picture books for children (Anno, 1977; Donaldson & Scheffier, 2007; Ungerer, 1975) and asked to provide a written description of each of these in their native language, by clarifying when, where, and why the depicted events and situations were taking place. The responses were then coded in terms of the kinds of information included in each description, according to the following descriptors.
- 1. Main events and situations: description of the main story, explaining what main characters are doing in each scene.
- 2. Peripheral events and situations: description of the field, explaining what is shown in each scene but is not explicitly related to the main story or the main characters; for example, the moon is shining, the woods are covered with snow.
- 3. Time: description of the time, explaining when the depicted events and situations were taking place, such as at night, in the daytime, in winter, during summer, and so on.
- 4. Place: description of the place, explaining where the depicted events and situations were taking place, such as at beach, in the woodland, in a small village.
- 5. Cause: description of the reason, explaining what had caused the depicted events and situations to occur. Note that only constituents that were explicitly mentioned with markers for causal relations are included in this category, such as because, as, since, therefore, accordingly, so, and so forth.
- 6. Inferred antecedent events and situations: description of the events and situations that were not shown in each scene but were inferred to have happened before the depicted events and situations occur.
There is a full moon (2. peripheral events and situations)
at night. (3. time)
In a forest, (4. place)
a rabbit is scared (1. main events and situations)
because it sees the enlarged shadow of the mouse reflected in the moonlight against the snow. (5. cause) The mouse might have been trying to retaliate on the rabbit, who has always bullied him. (6. inferred antecedent events and situations)
This sample response was thus coded as 234156. In this way, all of the responses were coded into these six categories and labeled with such serial numbers. The overall measure of interest was the average number of times (out of possible total of three descriptions—one for each picture) that some type of contextual information (2-6) was mentioned ahead of the main point (1), for each language group (Measure 1). We also determined the quantity of contextual information mentioned before the main point; that is, how many contextual descriptions indicating time, place, cause, the field, or inferred antecedent events were made before mentioning the main events and situations across each language group (Measure 2) as well as the relative numbers of different contextual elements overall, in any order of mention (Measure 3).
Results and Discussion
All three measures revealed clear and reliable differences among the three language groups. First, Figure 5 shows the average number of times that some kind of contextual information (2-6) was mentioned ahead of the main point (Measure 1), out of total of three responses (one per picture).
As predicted, the Japanese group were considerably more likely to begin their descriptions with contextual information (M = 2.86) than the Chinese (M = 2.33) and the English group (M = 1.55). An analysis of variance revealed a reliable main effect of Language (F(2, 118) = 21.357, p < .01), with post-hoc comparisons (Bonferroni) showing significant differences between the English and Japanese group and between the English and Chinese group at the p < .01 level, and between the Japanese and Chinese group at the p < .05 level.
We then investigated the quantity of contextual information mentioned ahead of the main point (Measure 2). Figure 6 shows how many contextual descriptions were made before mentioning the main events and situations by English, Chinese and Japanese participants. Again as predicted, the Japanese group reported the highest number of contextual descriptions before the main events (M = 2.87), followed by the Chinese (M = 2.09) and the English group (M = 1.19). The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables, the dependent variable being the number of contextual information reported before the main events and situations. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 19.019, p < .01, with no interaction found between Native Language and Picture. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants and between the Chinese and English participants at the p < .01 level, and also between the Chinese and Japanese participants at the p < .05 level.
Further analysis revealed that the Japanese participants' tendency to report more contextual information before the main events and situations was especially pronounced in these three areas: Time-, Place-, and Field-related information. Figure 7 shows each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations within each language group. Time-related information was reported ahead of the main point in 79.1% of the Japanese, 59.8% of the Chinese, and 42.9% of the English responses across all the three pictures: Statistically significant differences were found only between the Japanese and English participants,χ2 (4) = 40.187, p < .01. Place-related information was reported ahead of the main events in 82.2 % of the Japanese, 51.2% of the Chinese, and 43.9% of the English responses, with significant differences being found between the two contrasts: English versus Japanese, Chinese versus Japanese, χ2 (4) = 57.333, p < .01. Field-related information appeared prior to the mention of main events in 39.5% of the Japanese, 25.2% of the Chinese, and 11.2% of the English responses, with significant differences only being found between English and Japanese participants,χ2 (4) = 34.844, p < .01. Concerning inferred antecedent events mentioned ahead of the main events, the obtained data were too few to be entered into statistical analysis.
In addition, Figure 8 shows relative percentage of the cause-effect order in the descriptions including causal relations. As can be seen, both Japanese and Chinese participants showed tendency to mention cause ahead of its effect in explaining causal relations, while English participants clearly preferred to mention effect prior to its cause, χ2 (4) = 54.072, p < .01.
Finally, Figure 9 shows the overall numbers of different contextual elements reported in participants’ responses, in any order of mention (Measure 3). The results also revealed Japanese participants’ context dependency, with the highest number of contextual elements reported across the three pictures. The data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variables, and Picture (3 levels: Picture 1, Picture 2, Picture 3) as within-subjects variables. The analysis revealed a reliable main effect of Native Language, F(2, 110) = 13.579, p < .01, as well as a significant interaction between Native Language and Picture, F(4, 220) = 3.216, p < .05. Post hoc tests (Bonferroni) revealed significant differences between the Japanese and English participants in Picture 1 and 2 at the p < .01 level, and between the Japanese and Chinese groups in Picture 2 and 3 at the p < .01 level, but none between English and Chinese participants in any condition.
Taken together, these results at once confirm the findings of the Masuda & Nisbett (2001) study—that is, that Japanese speakers clearly prefer to mention more peripheral elements, and earlier in their descriptions, than do English speakers—but they also challenge the previous interpretation—namely, that this difference is the result of some pan-Asian cultural predisposition. The reason, clearly, that these results pose a challenge is that the Chinese group—as predicted by their linguistic typology—exhibit an intermediate behaviour, sometimes patterning with the Japanese participants, for example, with respect to mention of Cause before Effect (Figure 8), sometimes like the English group (e.g., with respect to the total numbers of peripheral items mentioned, shown in Figure 9). In other words, the results show that the bottom-up parsers (Japanese speakers) prefer the bottom-up discourse style and the top-down parsers (English speakers) employ the top-down discourse style, with Chinese speakers located intermediate between the two language groups, as predicted by its grammatical typology.
The following sample illustrates a typical response of Japanese participants for Mouse-Shadow picture (Figure 4), with abundant contextual information mentioned before the main events and situations:
‘It is at night with a full moon shining. In a snowy forest, an animal is walking with a stick searching for food. Then, a little mouse standing on a tree branch, who has been bullied by the bigger animal, hit upon a good idea. With the use of the moonlight, he enlarged his own shadow and casted it onto the ground. The animal, seeing the enlarged shadow, thinks it is a monster and is scared.’
At the very least, the results of this first experiment lend support to the idea that culture may not be the sole, or even key, determinant of the differences observed in the earlier study. As we shall see directly, the next two tasks offer even clearer reasons for scepticism: Not only do Chinese and Japanese participants behave differently, but the ‘Asian response’ is actually split.
Experiment 2
In the second experiment, we conducted two related tasks to determine whether there was any difference in attentional patterns toward the field when making visual descriptions. Our hypothesis was that top-down versus bottom-up parsing patterns embedded in participants’ discourse styles would exercise a larger effect on responses than any cultural affiliation. That is to say, it was anticipated that, as bottom-up parsers (Japanese speakers) need to refer to context first in their discourse, they should attend more to the field than would top-down parsers (English and Chinese speakers); as a consequence, Japanese speakers should mention disproportionately more peripheral elements than central elements—and may recall these better—than either English or Chinese speakers.
The second experiment comprised two tasks, each with its own dependent measures: a picture description task and a visual recall task. In the picture description task, we presented participants with a set of three color photographs and asked for a written description. The description task served two purposes: first, to discover which visual features of shared scenes participants mention in written reporting; second, simply to show participants the photographs for subsequent recall. Participants were not aware that some parts of the photographs would be extracted and presented again in another set of pictures in the visual recall phase. In the subsequent visual recall task, participants were shown those extracted portions of photographs and asked to judge whether these formed parts of the pictures that they had already seen.
Experiment 2, Phase 1: Picture Description
Procedure
The participants saw three photographs in turn and provided a written description for each of these. The photographs selected for description included both salient focal objects and smaller peripheral objects (see Figures 10, 11, and 12). Participants’ responses were then analyzed to determine which aspects of the photographs were more attended to: central objects or peripheral objects.9
Results and Discussion
The participants’ responses are charted in Figure 13. Notice that, as expected, the largest differences are observed in the mean number of peripheral items mentioned by each participant across groups. Here, the Japanese behave once again as predicted, with significantly higher mentions of peripheral items (M = 15.05) than the English group (M = 7.48). Notice in particular that this preference for peripheral items is not shared by Chinese participants, who actually mentioned fewer peripheral elements than even the English group (M = 3.43). The description data were entered into a repeated measures ANOVA with Native Language (3 levels: English, Chinese, Japanese) as the between-subjects variable, and Item Location (2 levels: Central, Peripheral) and Picture as within-subjects variables, the dependent variable being the number of items mentioned. The analysis revealed a reliable main effect of Native Language, F(2, 119) = 50.844, p < .01, as well as a significant interaction between Native Language and Location, F(2, 119) = 32.547, p < .01. The source of this interaction is clearly suggested by the plot in Figure 13.
Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants with respect to peripheral items mentioned at the p < .01 level, and also between the Chinese and English participants at the p < .05 level.
However, the Chinese group's relatively low number of descriptions of the peripheral items might be attributed to the fact that Chinese participants’ total number of descriptions for each photograph was much smaller than the other two groups overall. Therefore, it was also examined whether there was any difference among the three language groups in the ratio of peripheral items to central items mentioned within each language group. The results showed that the ratio of peripheral items to central items mentioned within each language group was 83% (Chinese), 112% (English), and 213% (Japanese), respectively. The results were entered into a repeated measures ANOVA with Native Language as the between-subjects variable, Picture as a within-subjects variable, and the ratio of peripheral items to central items mentioned in each response of each participant as the dependent variable. The analysis again revealed a reliable main effect of Native Language, F(2, 106) = 25.168, p < .01, without any interaction with Picture. Post hoc tests (Bonferroni) revealed significant differences between the Chinese and Japanese participants and between the English and Japanese participants both at the p < .01 level, but none between English and Chinese participants.
It will be recalled that the focus of Masuda & Nisbett’s (2001) study was on differences between Japanese and English mention of peripheral items (the result, it was claimed, of East Asians’ superior attention to Ground over Figure). In the present study, however, the Chinese group mentioned the lowest number of peripheral items, both in absolute and relative terms. Statistically, the largest differences that were observed distinguished Chinese and Japanese participants, with no statistically significant difference found between Chinese and English participants. The Asian response was thus effectively split, divided by the responses of the English group. Whatever explains this pattern cannot plausibly be related to cultural affiliation.
Experiment 2, Phase 2: Visual Recall Task
The final task compared the three groups with respect to visual recall by testing participants’ accuracy in identifying peripheral fragments and recalling information about Ground details. Once again, our hypothesis predicted a split in the Asian response, whereas Masuda & Nisbett’s (2001) cultural interpretation predicted a pan-Asian advantage over the English participant group.
Procedure
To construct the visual recall task, peripheral portions were extracted from each of the three photographs shown in the picture description task and from one of the pictures used in the first story-telling task, five portions from each. These extracted fragments (n = 20) were combined with another 20 fragments extracted from novel pictures to create a set of 40 small pictures (2.5 × 2.5 cm): see Figure 14.
Participants were presented with this set of extracted portions and asked to identify the parts of the pictures that they had already seen in the previous tasks. Each participant’s score was counted according to the number of correctly identified portions: one point for correct identification and equally one point for correct rejection, 40 points in all. Following this identification test, participants were also presented with a forced choice memory test, including questions of the following kind: “In the beach picture, is the balloon yellow or red?” or “In the windmill picture, was the man behind the boy wearing shorts or long trousers?” and so on. There were twelve such questions in all: three questions relating to each of the four pictures used in the identification phase. Thus, this part of the experiment had two dependent measures: (i) number of correctly identified or rejected fragments; (ii) number of correctly answered information questions.
Results and Discussion
Figure 15 displays the picture fragments identification scores for the three language groups. As predicted, the Japanese group recorded the highest average score (M = 31.40); this was followed by the English (M = 26.79) and then the Chinese group (M = 22.07). An analysis of variance revealed reliable main effect of Language (F(2,118) = 45.164, p < .01): Post-hoc comparisons (Bonferroni) showed significant differences between the three language pairs, all ps < .01.
The results for the subsequent forced-choice memory test were also along the same lines (see Figure 16). The Japanese group obtained the highest score (M = 8.02), followed by the Chinese (M = 6.18) and the English group (M = 5.70). Again, a reliable main effect of language was found, F(2,118) = 14.952, p < .01, and again, post-hoc tests (Bonferroni) showed significant differences between the scores for the Chinese and Japanese groups, and the English and Japanese groups both at the p < .01 level, but no difference between the Chinese and English groups.10
These findings thus clearly demonstrate that there is no common Asian response in the visual recall task: Instead, the Chinese participants pattern to a large degree with the English group, separately from the Japanese group.
General Discussion
The experiments reported here examined the attentional biases of Japanese, English, and Chinese speakers across a range of linguistic and non-verbal tasks. Our hypothesis was that top-down versus bottom-up parsing mechanisms would be reflected in the speakers’ discourse strategies, with top-down parsers (English and Chinese speakers) having a preference for top-down discourse patterns—the main point being mentioned before contextual information—and bottom-up parsers (Japanese speakers) building a discourse in which contextual information precedes the main point. These discourse strategies were expected to exercise a larger effect on responses than any cultural affiliation.
We tested this hypothesis by means of narrative description (story-telling), picture description, and visual recall tasks. In the story-telling task, it was clearly shown that Japanese speakers were more likely to report contextual information before the main point than either English and Chinese speakers, consistent with the linguistic typology. Next, in the picture description and visual recall experiment, it was clearly revealed that Japanese speakers not only reported more background detail, but also recalled details about peripheral information significantly more accurately, than English and Chinese speakers, as evidenced by reliably higher scores on both identification and information recall tasks.
The present findings across three tasks are thus consistent with the hypothesis advanced here, namely, that the grammatical structure of particular languages predisposes speakers to particular attentional patterns: In other words, these results are consistent with a particular—and rather far-reaching—interpretation of the thinking for speaking hypothesis (Slobin 2003).
Perhaps more significantly than what these results speak for, is what they speak against: They allow us to reject not only the null hypothesis—that a speaker’s native language has no reliable effect on visual attention and recall scores—but also to reject the alternative hypothesis presented in Masuda & Nisbett (2001), that is, that Japanese participants’ enhanced ability to identify peripheral elements in visual scenes is due to cultural attributes common to Asian cultures. Our experiments show that across all tasks involving visual recall Chinese participants generally behave more like English participants than like Japanese, regardless of cultural affiliation: To reinforce the point—this is in spite of the fact that the test subjects were Chinese students studying Japanese in Japan. Whatever definition of culture one might employ, the expectation must surely be that if cultural factors determine response, these two groups should pattern together. The fact that they did not renders suspect any explanation in terms of immanent cultural values, at least with respect to these data.
Three final points are worth mentioning. First, as noted above (footnote 1), these are not unprecedented results. The findings of the present study directly confirm those of an independent set of studies reported elsewhere Anon (2010), where again the Chinese group’s results patterned directly with those of the English group and separately from the Japanese across a similar set of description and recall tasks. Thus, in a total of six different tasks, we have found that the Asian response has been split in ways that challenge the cultural relativity explanation, advanced by Nisbett and his colleagues.
The second point to observe is that these results do not necessarily refute Masuda and Nisbett’s (2001) interpretation of their own data (though such an explanation would be untenable in the present case). Our results, at the data level, are entirely compatible with those obtained by Masuda & Nesbitt for their Japanese and American participants: We also endorse their concluding suggestion that “[typical] Japanese may simply see far more of the world than do [typical] Americans” (Masuda & Nesbitt, 2001, p. 933 [added by authors]). Where we disagree is in respect of the most plausible explanation of this difference. We claim that if the suggestion is true, then it is not primarily because they are East Asians, but because they speak a head-final language, that Japanese speakers may see far more of the world. Such an interpretation would cover both our and Masuda & Nisbett’s findings: The alternative, that a different explanation applies to our respective experiments, seems to be considerably more ad hoc.
Note that we do not discount the view that cultural traditions may influence modes of perception or categorization in other cases. For example, in other studies, Nisbett and his colleagues have found cognitive (attitudinal and perceptual) differences between ethnic groups of participants sharing a common first language (e.g., monolingual speakers of Turkish in Uskul, Kitayama & Nisbett, 2008; second generation Korean vs. European Americans in Choi, Dalal & Kim-Prieto, 2000): unless the latter participants were also bilingual, then our hypothesis could not be applied to explain such results. All that is claimed here is that observed differences between English and Japanese speakers in these particular types of description and visual recall tasks are better explained in linguistic, rather than cultural, terms: If it were otherwise, Chinese and Japanese participants should pattern together.11
Notice finally that our hypothesis generates further, testable predictions about cross-linguistic splits and groupings within and across cultural spheres. For instance, given our hypothesis, speakers of Korean, another head-final language, should behave like Japanese participants in visual recall, while Vietnamese and Thai speakers should pattern with the Chinese participants. These predictions will be tested in future experiments. If all effects of language structure, whether it is the grammatical parameter of head position or particular discourse strategy, can be shown to outweigh effects of culture for speakers of other languages, then our results will constitute a significant challenge to this part of the evidence base for cultural relativism.
References
Anno, M. (1977). Tabi no ehon. Tokyo: Fukuinkan Shoten.
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conception of time. Cognitive Psychology, 43, 1-22.
Boroditsky, L. (2003). Linguistic relativity. In L. Nadel (Ed.), Encyclopedia of cognitive science (pp. 917-921). London: Macmillan Press.
Bowerman, M. (1996). The origins of children’s semantic categories: Cognitive vs. linguistic determinants. In J. Gumperz, & S. Levinson (Eds.), Rethinking linguistic relativity (pp. 145-176). Cambridge, MA: Cambridge University Press.
Choi, I., Dalal, R., & Kim-Prieto, C. (2000). Information search in causal attribution:
Analytic vs. holistic. Urbana-Champagne: University of Illinois.
Chomsky, N. (1981). Lectures on government and binding. Dordrecht, Holland; Cinnaminson, N.J.: Foris Publications.
Chomsky, N. (1995). The Minimalist program. Cambridge, MA: MIT Press.
Donaldson, J., & Scheffier, A. (2007). The Gruffalo's child. London: Campbell Books.
Duffield, N., & Tajima, Y. (2010). On the non-uniformity of Asian thinking (for speaking): A response to Masuda and Nisbett. In M. Iverson, I. Ivanov, T. Judy, J. Rothman, R. Slabakova, & M. Tryzna (Eds.), Proceedings of the 2009 Mind/Context Divide Workshop (pp. 28-39). Somerville, MA: Cascadilla Proceedings Project.
Fodor, & Inoue, (1994). The diagnosis and cure of garden paths. Journal of Psycholinguistic Research, 23, 5, 407-434.
Greenberg, J. (1978). Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (Ed.), Universals of language. Cambridge, MA: MIT Press.
Hawkins, J. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21, 223-260.
Hsiao, A. H.-H. (1999). Holding the frog in place: Linguistic typology of Mandarin Chinese. Unpublished senior honors thesis, University of California, Berkeley. [cited in Slobin (2003)].
Huang, C.-T. J. (1994). More on Chinese word order and parametric theory. In B. Lust, M. Suñer, & J. Whitman (Eds.), Syntactic theory and first language acquisition: Cross-linguistic perspectives, Vol. 1: Heads, projections and learnability (pp. 15-35). Hillsdale, NJ: Lawrence Erlbaum Associates.
Kitayama, & Nisbett, (2008)
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Li, P., & Gleitman, L. (2002). Turning the tables: language and spatial reasoning. Cognition, 83, 265-294.
Maguire, E. A., Gadian, D. G., Johnsrude, I. S., Good, C. D., Ashburner, J., Frackowiak, R. S. J., & Firth, C. D. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398-4403.
Malt, B., Sloman, S., Gennari, S., Shi, M., & Wang, Y. (1999). Knowing vs. naming: Similarity and the linguistic categorization of artifacts. Journal of Memory and Language, 40, 230-262.
Masuda, T., & Nisbett, R. E. (2001). Attending holistically vs. analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology, 81, 922-934.
Nakayama, M. (1999). Sentence processing. In N. Tsujimura (Ed.), Handbook of Japanese linguistics (pp. 398-424). Oxford: Blackwell.
Özçalışkan, Ş., & Slobin, D. I. (1999). Learning how to search for the frog: Expression of manner of motion in English, Spanish, and Turkish. In A. Greenhill, H. Littlefield, & C. Tano (Eds.), Proceedings of the 23rd annual Boston University Conference on Language Development, Vol. 2 (pp. 541-552). Somerville, MA: Cascadilla Press.
Özçalışkan, Ş., & Slobin, D. I. (2000). Climb up vs. ascend climbing: Lexicalization choices in expressing motion events with manner and path components. In S. C. Howell, S. A. Fish, & T. K-Lucas (Eds.), Proceedings of the 24th Annual Boston University Conference on Language Development, Vol. 2 (pp. 558-570). Somerville, MA: Cascadilla Press.
Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently . . . and why. New York: Free Press.
Nisbett, R.E., & Masuda, T. (2003). Culture and point of view. PNAS, 100(19), 11163-11170.
Nisbett, R.E., & Miyamoto, Y. (2005). The influence of culture: holistic vs. analytic perception. Trends in Cognitive Sciences, 9(10), 467-473.
Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic vs. analytic cognition. Psychological Review, 108, 291-310.
Pederson, E., Danziger, E., Levinson, S., Kita, S., Senft, G., & Wilkins, D. (1998). Semantic typology and spatial conceptualization. Language, 74, 557-589.
Slobin, D. I. (1996). Two ways to travel: Verbs of motion in English and Spanish. In M. Shibatani, & S. A. Thompson (Eds.), Grammatical constructions: Their form and meaning (pp. 195-219). Oxford: Oxford University Press.
Slobin, D. I. (1997). Mind, code, and text. In J. Bybee, J. Haiman, & S. A. Thompson (Eds.), Essays on language function and language type (pp. 437–467). Amsterdam: John Benjamins Publishing Company.
Slobin, D. I. (2000). Verbalized events – a dynamic approach to linguistic relativity and determinism. In S. Niemeyer, & R. Dinsen (Eds.), Evidence for linguistic relativity. Amsterdam: John Benjamins Publishing Company.
Slobin, D. I. (2003). Language and thought online: Cognitive consequences of linguistic relativity. In D. Gentner, & S. Goldin-Meadow (Eds.), Language in mind: Advances in the study of language and thought (pp. 157-192). Cambridge, MA: MIT Press.
Talmy, L. (1975). Semantics and syntax of motion. In J. P. Kimball (Ed.), Syntax and semantics, Vol. 4 (pp. 181-238). New York: Academic Press.
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description, Vol. 3: Grammatical categories and the lexicon (pp. 57-149). New York: Cambridge University Press.
Talmy, L. (2000). Toward a cognitive semantics: Typology and process in concept structuring, Vol. 2. Cambridge, MA: MIT Press.
Travis, L. (1984). Parameters and effects of word order variation. Unpublished PhD dissertation, MIT.
Ungerer, T. (1975). Emile. Tokyo: Bunka Publishing Bureau.
Uskul, A. K., Kitayama, S., & Nisbett, R. N. (2008). Ecocultural basis of cognition: Farmers and fishermen are more holistic than herders. PNAS, 105, 8552-8556.
Footnotes
1 As well as that which immediately precedes it (see Duffield & Tajima 2010). The current study offers a completely new set of experiments, in which we attempted to remedy some methodological shortcomings of the original task. This new experiment (with different participants, materials, and modes of analysis) provides even clearer support for our original hypothesis.
2 Throughout, following Nisbett’s own practice, the term Asian is taken to refer to East Asian ethnic and national groups (especially Chinese, Japanese and Korean groups), rather than to South Asian groups (which is the British default usage of the term): it is unlikely that Nisbett’s claims are intended to extend to any groups beyond the (historical) Han Chinese sphere of influence.
3 This problem comes to the fore in respect of Nisbett’s subsequent work on other cultural groups (Uskul, Kitayama & Nisbett 2008): see below.
4 Other V-languages in Slobin’s survey include Turkish, Spanish, and Hebrew; other S-languages include Mandarin and Russian.
5 It should be clear that the terms head-initial and head-final are completely independent of writing order: Arabic and Hebrew, for example, are head-initial languages that are written from right-to-left; Turkish is a predominantly head-final language written left-to-right.
6 The constituent analysis proposed here simplifies, but does not fundamentally misrepresent, the phrase-structure of Japanese (It may be, for example, that the genitive element no and the post-nominal marker de should bear other category labels, but this does not change the fact that adpositional phrases in Japanese are consistently head-final).
7 The term is due to Hawkins (1990).
8 We are naturally aware of the fact that many generativist linguists, including Huang himself, would treat Mandarin Chinese as underlyingly head-final in the verb-phrase, with verb-movement deriving the overt head-initial order. Be that as it may, what is relevant here are the surface configurations that provide the instructions for parsing and syntactic production: at this level, Chinese patterns—on balance—more like English than like Japanese.
9 The items classified as central and peripheral for each photograph in the picture description task are as follows:
For the Windmill picture (Figure 10), central items are the boy in a yellow T-shirt and the green windmill, while peripheral items are restaurant, table, chair, patrons, trees, shade, European street, passers-by, sunny, summer season, basket, instrument, signboard, pillar, sack, posters, balcony, building, and fallen leaves.
For the Beach Picture (Figure 11), the central item is the boy smiling in the foreground, while the peripheral are balloon, trees, mountains, beach, pebble, sky, clouds, wind, air, buildings, restaurants, construction, sunny, holiday season, resort, and road.
For the Bubble picture (Figure 12), the central is the boy with a bubble-maker, while the peripheral are other children, buildings, shops, street, signboard, air, sunny, summer season, the man with a balloon, the girl sitting on the bench, sack, passers-by, floating bubble, windows, lamp, curtain, and wooden floor.
10 Interestingly, there were only weak correlations for all of the groups concerned between their scores for the identification task and those on the information task (Chinese r = 0.26; English r = 0.13; Japanese r = 0.13). This may suggest that there is no necessary relationship between perceptual knowledge of an event and propositional knowledge about it.
11 This is not to say that our results have no wider implications for Cultural Relativity arguments, but to stress that this is only a first step of a larger project: Ultimately, it would seem to us desirable to account for all putative effects of broad culture in terms of more tangible and more plausible linguistic or local environmental factors. For example, the expectation that inhabitants of high-density urban environments should pay more attention to peripheral visual information than those who live in smaller communities is both plausible and measurable, indeed this is established in Nisbett & Masuda (2003); however, we do not view this effect as ‘cultural’ in any interesting theoretical sense; see introductory discussion.
List of Figures
Figure 2. Sample scene fragments: Focal Fish Condition (from Nisbett and Masuda, 2003).
Figure 3. Top-down versus Bottom-up parsing mechanisms. This figure illustrates the way in which top-down parsers (e.g., English speakers) first decide the whole sentence structure and then fill each slot with words, whereas bottom-up parsers (e.g., Japanese speakers) begin with laying words, and gradually construct the whole sentence.
Figure 4. Sample picture used in the story-telling task: Mouse-Shadow picture (Donaldson & Scheffier, 2007).
Figure 5. Experiment 1 Results (Measure 1): Average number of times that some type of contextual information was mentioned ahead of the main point in the story-telling task by English, Chinese and Japanese participants.
Figure 6. Experiment 1 Results (Measure 2): Mean number of contextual descriptions mentioned before the main events and situations in the story-telling task by English, Chinese and Japanese participants.
Figure 7. Experiment 1 Results: Each percentage of responses where time-, place-, inferred antecedent event-, or the field-related information was mentioned ahead of the main events and situations in the story-telling task.
Figure 8. Experiment 1 Results: Each percentage of responses in which cause was mentioned ahead of its effect (Cause-First) and where effect precedes its cause (Effect-First) in the story-telling task. Only constituents that were explicitly mentioned with markers for causal relations are counted into the data.
Figure 9. Task 1 Results (Measure 3): Mean number of overall different contextual descriptions reported in the story-telling task, in any order of mention.
Figure 10. Windmill Picture used in the picture description task: Picture 1.
Figure 11. Beach Picture used in the picture description task: Picture 2.
Figure 12. Soap Bubble Picture used in the picture description task: Picture 3.
Figure 13. Mean number of central and peripheral items mentioned by Chinese, English, and Japanese participants in the picture description task.
Figure 14. Extracted fragments of pictures used in the identification test of the visual recall task (sample).
Figure 15. Picture fragments identification scores in the visual recall task (by language group).
Figure 16: Mean scores for the forced-choice memory test in the visual recall task (by language group).