The Development of Scoring Criteria for a New Picture Naming Task

(traduction) Objectif : Le but de cette étude est de développer un système de notation pour une nouvelle tâche de dénomination afin d’évaluer la performance de dénomination de jeunes adultes (18 à 30 ans) et d’ainés (65+ ans) unilingues anglophones, unilingues francophones, et bilingues anglais-français. Cette nouvelle tâche de dénomination servira de service de santé important pour aider à diagnostiquer et évaluer les personnes âgées souffrant de troubles cognitifs, tout en servant d’outil pédagogique pour les fournisseurs de soins de santé.


Introduction
Despite the overwhelming increase of bilingualism in Canada, there are no appropriate tools to assess language abilities in older English-French bilingual speakers.A new Naming Task will serve as a tool for healthcare providers to assess naming abilities in bilingual adults.This may be important when assessing older adults for medical conditions that impact language abilities, such as dementia and aphasia.The purpose of the present study is to develop a scoring system for a novel naming task that is suitable for assessing naming performance in monolingual English, monolingual French, and English-French bilinguals.Upon scoring criteria development, this novel naming task will serve as an important health service to help diagnose and assess cognitively impaired older individuals.
Two types of scoring criteria were developed for the Naming Task: strict and lenient scores.Strict scores represented the formal name for an item, while lenient scores included acceptable synonyms or slang terms.The analysis presented in this paper will determine which names are used the most often for each item and establish a clear set of guidelines for strict and lenient scoring in both English and French.Performance across groups will be compared on the strict and lenient scoring criteria, in order to examine the impact of language administration on bilingual performance and to determine if the test is suitable for all language groups.

Literature Review
In the recent decade, research has begun exploring the impact of bilingualism on cognition, especially in the areas of executive function and language.This research has demonstrated that, relative to monolinguals, bilingual individuals show superior performance on tasks of executive function (e.g., inhibition of task-irrelevant information) (Adesope, Lavin, Thompson, & Ungerleider, 2010;Bialystok, 2009;Bialystok, Craik, Green, & Gollan, 2009), but poorer performance on language tasks (e.g., picture naming tasks) (Gollan, Montoya, Fennema-Notestine, & Morris, 2005;Roberts, Garcia, Desrochers, & Hernandez, 2002).In addition, bilingualism can be seen as a protective factor, as research with an immigrant sample living in Toronto has suggested that bilingualism may delay the onset of dementia by five years in older adults (Bialystok, Craik, & Freedman, 2007;Craik, Bialystok, & Freedman, 2010).
The Boston Naming Test (BNT) is a widely used clinical picture-naming task, where patients are asked to name the image displayed (Kaplan, Goodglass, & Weintraub, 1983).Overall, individuals show a decline in naming ability as they age (Kaplan et al., 1983), specifically after the age of 70 (Brouillette et al., 2011).Research examining the utility of the BNT with bilinguals has shown that monolinguals tend to outperform bilinguals and the level of difficulty for the test likely differs between languages (Roberts et al., 2002).For example, in a study comparing English-speaking monolinguals, bilingual Spanish-English speakers, and bilingual English-French speakers, both bilingual groups scored significantly worse than the monolingual English participants (Roberts et al., 2002).Furthermore, bilinguals have demonstrated difficulty with verbal fluency, frequent tip-of-thetongue states, and longer picture naming latencies (Bialystok, 2009), even when completing the task in their dominant language (Gollan & Acenas, 2004).Additional studies have indicated that bilinguals perform worse on naming tasks such as the BNT, both in measures of accuracy (Bialystok, Craik, & Luk, 2008;Kohnert, Hernandez, & Bates, 1998) and response time (Gollan et al., 2005;Gollan, Fennema-Notestine, Montoya, & Jernigan, 2007;Ivanova & Costa, 2008;Roberts et al., 2002).
Research with French Canadians suggests that the French translation of the BNT does not account for cultural appropriateness, which is important when administering the test in a language other than the one in which it was originally developed (Roberts & Doucet, 2011).Specifically, research suggests that the French translation of the BNT is not acceptable for assessing naming abilities in English-French bilinguals or in monolingual French individuals (Roberts & Doucet, 2011;Sheppard, Kousaie, Monetta, & Taler, 2016).It has been suggested that when there is a large inconsistency in naming certain items, these items should be removed or the items should be changed in their order of difficulty (Roberts & Doucet, 2011).For example, research with older adults from Quebec City indicated that there were 13 BNT items with multiple acceptable synonyms (e.g., "seahorse" can either be "hippocampe" or "cheval de mer") and an additional six items that had no clear acceptable response (e.g., "globe"), as native speakers in French disagree on the name of the item (Roberts & Doucet, 2011).Additional research comparing monolingual English and French speakers to English-French bilinguals on the BNT demonstrated that a French administration of the task consistently yielded poorer scores, even in the French monolingual group (Sheppard et al., 2016).Furthermore, after matching for underlying naming ability, differential item functioning analyses suggested that a significant number of items functioned differently across the three participant groups and in different languages of administration (Sheppard et al., 2016), suggesting that the BNT is not equivalent in English and French.

Participants
Six groups of participants were included in this study: younger (n = 44) and older (n = 64) monolingual-English speakers, younger (n=30) and older (n = 30) monolingual-French speakers, and younger (n = 48) and older (n = 52) bilingual English-French speakers.Young adults were aged 18 to 30 and older adults were aged 65 or older.Monolingual English participants and bilingual English-French participants were recruited and tested in the Ottawa-Gatineau region, while monolingual French speakers were recruited and tested in Quebec City.Younger adults were recruited through word of mouth and local undergraduate populations, while older adults were recruited through advertisements in community centres, grocery stores, and newspapers.Monolingual participants had either limited or no exposure to languages other than their native language.Bilinguals had limited exposure to languages other than French and English.All bilingual participants were proficient in both English and French before the age of 13 and selfreported their proficiency in French and English using a 5point Likert scale (see Table 1) on measures of auditory comprehension, reading, speaking, and writing.

Naming Task
The Naming Task consists of 120 images, 100 of which were selected from the coloured Snodgrass set (Rossion & Pourtous, 2004) and the remaining 20 were developed by Dr. Taler, the lead researcher in this study.The Snodgrass im-ages were selected based on their array of difficulty and strong name agreement, while the additional images were created based on the same colour scheme as the Snodgrass set, but with a higher level of naming difficulty.The images were organized in the same randomized order for all participants and were shown on a white background displayed on a computer screen using PowerPoint.Participants were instructed to identify the image on the screen and the research assistant was instructed to record all answers given by the participant.

Neuropsychological Battery
Participants completed a neuropsychological battery, including the forward and backward digit span subtests of the Wechsler Adult Intelligent Scale-Third Edition (Wechsler, 1997); the Montreal Cognitive Assessment (Nasreddine et al., 2005); a version of the Stroop colour-word interference test (Stroop, 1935) in which the number of items produced in 45 seconds was recorded in each of the three conditions (word reading, color naming, and incongruent colour naming); the 64-item Wisconsin Card Sorting Test (Grant & Berg, 1948); and category (animal) and letter (FAS) verbal fluencies (Benton & Hamsher, 1976).Monolingual participants completed the verbal fluency tasks in their native language and bilingual participants completed the tasks in English, in French, and in an administration where they could respond in either language.The neuropsychological battery was administered to demonstrate that all study participants had normal cognitive function.See Table 2

Procedure
All monolingual participants completed the testing in one session of two hours, while bilingual participants completed the testing in two sessions of two hours each.All bilingual participants completed the Naming Task in three administrations: English only, French only, and either-language where they could respond in either English or French.Two language administrations were completed in the first testing session, while the third administration was completed in the second testing session.
The study procedures adhered to federal guidelines for protection of human research participants and received ethical approval from the Research Ethics Board at the Bruyère Research Institute, Laval University, and the University of Ottawa.Participants were remunerated $10/ hour for all testing completed and provided informed consent prior to participating.

Development of Scoring Criteria
Dr. Taler developed preliminary scoring criteria for the Naming Task in English and French; these scoring criteria formed the basis of the strict and lenient scoring protocol that was developed for this study.First, the data from each participant were scored based on the preliminary scoring criteria, wherein one point was awarded for each correct answer.Percentages were then calculated for each image based on the number of participants who named the image correctly.During this process, alternative answers provided by participants were recorded.Two independent reviewers went through each item to determine the strict and lenient scoring criteria.The strict scoring criteria were selected based on the most frequent response provided by participants (i.e., a minimum of 50%) and/or the most formal or known name used in society.Lenient responses were selected based on synonyms (e.g., "ironing board" vs. "ironing table "), clarity of the image (e.g., "violin" vs. "viola"), culturally relevant slang terms (e.g., "baby carriage" vs. "pram"), and shortened names for the image (e.g., "green pepper" vs. "pepper").The two independent researchers then met to discuss their findings.Discrepancies were resolved through discussion and all established scoring criteria were verified by three additional researchers.See Appendix A for a list of strict and lenient responses for each item.

Items Recommended for Removal
Eight items were recommended for removal in English and French: stirrup, gavel, beetle, barn, blouse, and flute were removed due to the clarity and/or quality of the image; rickshaw was removed because no younger or older monolingual French participants could name the image; and necklace was removed as there were too many alternative names for these image (e.g., for necklace: "pearls", "string of pearls", "pearl necklace", and "necklace").

Overall Task Performance
Figures 1 and 2 present an overall summary of task performance by age and language group according to strict and lenient scoring criteria.The largest difference in naming abilities between older and younger adults is seen in the bilingual French administration groups.Overall, older adults performed better than younger adults in all language categories.The only group where younger participants scored higher than older participants was the monolingual French group, and younger participants scored an average of one item higher (strict and lenient).
For both younger and older adult groups, monolingual English participants had the highest overall score across the task, ranging from an average of 99 correct items using strict scoring and 106 correct items using lenient scoring, out of 120 items.Bilingual English-French participants were able to correctly name an average of 92 and 94 (strict and lenient scoring, respectively) of the items when completing the test in English; however, this increased to 95 and 102 (strict and lenient scoring, respectively) when responses were accepted in either language.The majority of bilingual participants in the bilingual administration responded in English (i.e., 52% of older adults and 62% of younger adults).The average number of items named correctly did not improve by more than five items in any group when lenient scoring was added.

Results by Item
Table 3 represents the percentage of participants who correctly identified each item under strict and lenient scoring.Analysis 1: Strict and Lenient Scoring Differences.
There were a number of items where performance improved by one to five extra items once lenient criteria was taken into consideration.The following is a list of items where percentages improved once lenient scoring was included, in both English and French for all language groups: spool of thread, ottoman, candelabra, leopard, eagle, ironing board, bow, coat, and salt shaker.Additionally, there were a number of items that scored higher once lenient scores were included in English only: grasshopper, record player, beetle, light-switch, mitten, colander, and sled; and in French only: hippocampe, truelle, and poivron.
Analysis 2: Language Group Differences.Bilingual participants performed more poorly on the task than monolingual participants in their respective languages.The difference was most extreme when comparing the monolingual French participants and the bilingual-French administration.While there was a similar pattern of results shown with the monolingual English participants and the bilingual-English administration, the performance differences were not as great (i.e., smaller difference between groups) or consistent (i.e., not as many items displaying group differences).It should be noted that there are a small number of items where bilingual English-French speakers scored better than the monolingual groups.In English, these items include cannon, celery, and flute.In French, these items include cyclo-pousse, lèvres, wagon, and bec Bunsen.
Analysis 3: Age Differences.The following is a list of items that had large generational differences, where younger adults scored higher than older adults: necklace, centaur, stroller, gorilla, tambourine, trumpet, and racoon.However, overall, older adults scored higher than younger adults in all languages and language administration groups.

Discussion
The purpose of this study was to develop scoring criteria for a new bilingual naming task, as it will serve as an important health service for cognitively impaired older adults.Older and younger participants were tested using a preliminary scoring criteria to determine if the test was appropriate for both English-and French-speaking individuals.Although the task can easily be administered to all groups, there are differences in how each group of participants performs based on their age group, language group, and for the bilingual participants, language of administration.
Allowing lenient scoring to be considered did improve the average number of correct responses by one to five items per group, with most groups improving by two items.An advantage to having both strict and lenient scoring criteria is that poorer performance on certain items is more likely to be related to item difficulty or language difficulty, as the lenient criteria takes into consideration acceptable synonyms, culturally relevant slang terms, and shortened names for the item.Adding lenient scoring improves the quality of the Naming Task because it demonstrates that although participants may not use the formal name for the item, they still know what the image is representing and can name the item using terms they are familiar with.Some items (e.g., cheetah and leopard) were given two strict scores because this image was very representative for both names, and participants may not be able to accurately distinguish a difference.Some items (e.g., necklace) were removed because there were too many possible responses, making it difficult to score the item.
Based on the quality of the image, a number of items were recommended for removal.Removal criteria was determined based on the responses provided by the participants, indicating that these items were ambiguous, and thus not a good visual representation of the item in question.Furthermore, additional items were recommended for removal as they had a large number of alternate names, making it difficult to score.
There were also large language group differences, with monolingual English participants outperforming every other language group, and the bilingual French administration group performing the most poorly of all the groups.Interestingly, the monolingual French group vastly outperformed the bilinguals in the French administration.This difference might be related to the fact that the bilingual participants were selected from the Ottawa region, which is largely English-dominant.Even though all of the bilinguals had good self-reported proficiency in both languages, the environment in which they live and work may be more Englishdominant than would be expected for bilinguals in Quebec City, where monolingual French participants were selected and tested.
Finally, there were a number of items where older adults outperformed the younger adults.This finding could be attributed to generational differences (Schmitter-Edgecombe, Vesneski, & Jones, 2000), or the idea that older adults may have a greater vocabulary (Hawkins et al., 1993;Sheppard et al., 2016).There may have been a number of items that older adults, but not younger ones, have been exposed to, explaining the difference between age groups (e.g., metronome).The items where there was a very large difference between older and younger adults were not necessarily recommended for removal; however, further analysis of these items is required to determine if the generational differences are significant enough to alter the results of the test for future participants.
Future research should seek to understand why certain language groups, primarily monolingual English individuals, outperform others, and to determine how these discrepancies can be resolved to allow for the Naming Task to serve as an appropriate tool for bilingual older adults.More analysis is required to determine which images should be removed as a consequence of the inequality between language groups and age groups.Research should further focus on data collection with monolingual and bilingual patients with mild cognitive impairment conditions and Alzheimer's disease, to test the validity of the scoring criteria.

Conclusion
The present study established strict and lenient scoring criteria for an English-French picture-naming task.The Naming Task will serve as a health service for both English and French individuals to assess cognitive impairment and can be used as a suitable alternative to the BNT.The Naming Task appears to be suitable for monolingual French and English individuals.However, results are unclear when comparing bilingual to monolingual participants.Results suggest that when possible, a bilingual administration should be used when testing English-French speaking individuals, as responses will be stated in the participant's dominant language, which is affected by their language environment.

Figure 1
Figure1Average number of images named under strict scoring criteria by age and language group.

Figure 2
Figure2Average number of images named under lenient scoring criteria by age and language group.

Table 2
Demographic and neuropsychological performance by participant group (mean ± standard deviation).Verbal fluency scores for bilingual groups are reported where participants could answer in either language.MoCA = Montreal Cognitive Assessment; Digit Span= Wechsler Adult Intelligent Scale-Third Edition; WCST = 64-item Wisconsin Card Sorting Test; FAS = letter verbal fluencies; Animals = category verbal fluencies.

Table 3
Percentage of correct item responses for strict and lenient scoring for participants in monolingual and bilingual groups.ME = Monolingual English; MF = Monolingual French; YA = Younger adults; OA = Older adults; St = Strict; Len = Lenient; Eng = English Administration; Fre = French Administration; Bil = Bilingual Administration.
English and French Strict and Lenient Scoring Criteria.