[转帖]Midwest Association of Language Testing 2005

清风出袖 · 2005-09-03

MwALT 2005
Conference Abstracts

J Charles Alderson , Lancaster University

Computer-based diagnostic testing: the interface between learning and assessment

It is commonplace to assert that computer technology has the potential to revolutionise language assessment, but the evidence to date is that this revolution has yet to occur. The most frequently cited innovation is computer adaptive testing (CAT) yet CAT has hardly impacted on test methods or on the constructs that are assessed in language testing. Indeed, it has been argued that CAT is a force for conservatism, since it relies on IRT statistical analyses and multiple-choice test methods. It is also very labour-intensive since it relies on the creation and maintenance of large item banks. The lack of innovation in computer-based assessment is especially true of high-stakes, large-scale tests. Instead, I argue, innovations are more likely to be found elsewhere: in low-stakes or no-stakes testing, not in proficiency testing. In this talk I shall contend that it is in the area of diagnostic testing that we are most likely to see significant changes in the form, content and uses of assessment, and I shall support my argument by reference to and demonstration of a web-based suite of diagnostic tests in 14 European languages, based on the Common European Framework.

Jennifer Balogh, Ordinate Corporation
Robert Blake, Maria Cetto, University of California, Davis

Assessment of Spanish Using an Automated Spoken Spanish Test (SST)

The power of a test is in its ability to discriminate test takers. In the past, the problem of using a scale that was not sensitive to changes in performance at the low proficiency levels meant that the skills of college students in the first several years of language instruction could not be differentiated (Liskin-Gasparro, 1984). The question we posed in the current study was whether an automated Spoken Spanish Test (SST) could distinguish between students at different levels and potentially be used for placement in college courses. To answer this question, we report a case study in which the Spanish department at University of California, Davis used the SST to assess the spoken language performance of 137 students from eight different Spanish courses (one course was for heritage speakers who have not had formal instruction in the language). The SST is a 10-minute test that is automatically administered over the telephone and produces computer-generated scores. Test takers receive an Overall score on a scale from 20 to 80 and four subscores: Sentence Mastery, Vocabulary, Pronunciation, and Fluency. Analyses of the Overall scores and subscores of the Spanish students show statistically significant separation between the performance of students in lower- versus upper-division courses and between non-native and heritage speakers. The results suggest that language facility improves as courses advance. In addition, the SST appears to differentiate among the first two years of University Spanish study and, therefore, shows promise as a placement instrument.

Deborah Crusan , Wright State University, Dayton, OH

Implementing online directed self-placement for second language writers

Problems inherent in conventional writing placement systems often prompt institutions to move to directed self-placement (DSP). Some of these problems include what can be perceived as inaccurate placement decisions, student subversion of their placements, and significant lack of correlation between students’ placement and their performance in first-year writing courses. With DSP, many of these traditional problems vanish; often students feel empowered and perform well in their self-selected classes. However, when the writers involved are second language (L2) writers, the matter is further complicated. Gatekeepers have often barred L2 writers from DSP, arguing that L2 writers may not place themselves in classes appropriate for their writing abilities because of financial and other dilemmas. This argument, which can be construed as racist, is not without some justification. L2 writers often argue against their compulsory placement into non-credit bearing classes citing delayed graduation and further costs heaped onto already burgeoning tuition bills; consequently, when they are included in DSP, they might feel pressured to place themselves into credit-bearing classes that will count toward graduation and will not add to the costs of their education. Unfortunately, because of linguistic variables, these choices frequently backfire on the second language writer. One answer to this dilemma is online directed self-placement (ODSP), which differs from DSP by channeling students into courses most appropriate to their writing ability using data including online questionnaires and TOEFL scores. This presentation outlines one writing program’s quest for inclusion of second language writers into its innovative placement design.

April Ginther, Slobodanka Dimova, Rui Yang, Xiaoju Zheng, Purdue University

Temporal Measures of Fluency as Indices of Oral English Proficiency

Readily available computer programs now allow language testers to examine temporal characteristics of examinee performance with great accuracy and relative ease. While such examinations remain labor intensive, the rewards are attractive as they allow careful scrutiny of variability in temporal measures across holistic score levels on oral English proficiency tests and may be used to increase our understanding of examinee performance and rater behavior. While it seems reasonable to assume that more proficient speakers are perceived by raters as more ‘fluent’ because they pause less often, their pauses are of shorter duration, they speak at a faster rate, and their phrases display greater density, few studies of oral proficiency test performance have demonstrated that these assumptions are actually the case. In this descriptive study eight temporal measures common in psycholinguistic studies of fluency (total phonation time, total pausing time, total speech time, total number of syllables, mean length of run, pause/time ratio, articulation rate, and total number of runs) were examined in relation to holistic scores on a semi-direct test of oral English proficiency. Each of these measures were calculated for 20 responses by native Chinese speakers at three score levels (N=60) to a single item (Compare and Contrast). Moderate to strong significant correlations were found for speech rate (Spearman’s rho, .72), number of syllables (.63), pausing time (-.55) and overall score. Descriptive statistics indicate greater variability for these measures at level 4 than at levels 3 and 5 suggesting that their contribution to raters’ judgments most likely interacts with other variables by level.

So-young Jang , University of Illinois at Urbana-Champaign

The development of an effect ive rater training model for ESL teachers

A systematic and well-designed training program is one way to improve rater reliability. This paper describes how a rater-centered training program for raters based on the constructs reflecting the characteristics of the groups develops. The purpose of this paper is to identify the characteristics of training for ESL teacher raters and to develop an effective and efficient training model. To do this, data were collected from six teacher-raters at the University of Illinois at Urbana-Champaign in order to establish constructs for a qualified ESL teacher rater. Response data and survey results before and after the training sessions were analyzed with FACETS and SPSS to identify the effects of training as well as the challenges and perceptions that raters have about the training program and the test result. The distinction of intra- and inter-rater reliability is examined before and after training in order to determine whether the effects of training are significant. Also, analyses of the criteria, the levels for the holistic scoring, the validity of the scales, and differences depending on individual background were carried out to explore how valid the test rating procedures are. If the training program fully considered each rater’s judgment processes, we could expect that rating performance would improve. A systematic and well-designed training program would not only be a useful tool for raters, but it would also provide implications for teacher training.

Jiyoung Kim , University of Illinois at Urbana Champaign

Korean EFL students’ task representation of reading-to-write task and their writing performance

Task representation, the manner in which students interpret an assigned task, is considered as an important factor that affects performance of a task. In the integrated reading and writing task, especially, forming a clear representation of the task seems to be more challengeable for students because of the complexity of the task. The present study is designed to investigate Korean EFL students’ task representation of reading-to-write task. Specifically, this study attempts to answer the following questions: 1) how do Korean EFL students represent reading-to-write task to themselves? 2) what is the relationship between students’ task representation and their writing performance? 3) how do students’ self-analysis of their task representation match rater’s perceptions? Forty Korean college students with beginning and intermediate levels of English proficiency engage in reading-to-write task, which requires students to produce writing as a response to reading. Then, they fill out a retrospective questionnaire which is designed to discover students’ task representation of reading-to-write task. The study first discusses students’ task representation in two aspects: sources of information for writing and a writing format. Next, the high and the low writing ability groups, decided by their writing scores, are compared in terms of their task representation. Finally, the study compares students’ task representation to a rater’s task representation of the same writing task. The results of the study have implications for employing reading-to-write task to teach and assess writing.

Jeanne Yu-Chen Lee , Purdue University

Computerized Comparative Analysis of English Intonation Patterns for Reading Aloud.

In recent years, superasegmental aspects such as intonation have been recognized as an important factor in foreign accents. In the field of language teaching and testing, most research and analysis on pitch accent and the intonation patterns of speech production are based on human perception and judgment but have not been accompanied by reliable technical descriptions of how the sound frequency or pitch movements really occur (Brazil, 1997; Clennell, 1997; Hewings, 1995; Pickering, 2001; Taylor, 1993; Wennerstrom, 1998). This descriptive study uses Praat, a voice analytic computer program, to examine the pitch patterns of five native-speakers of English, from the Midwest, USA, and five Chinese ESL speakers on a reading aloud item in a semi-direct oral English proficiency test at Purdue. Preliminary results suggest that the native English speakers prefer using rising tones on stressed syllables followed by the level tone at phrasal boundaries, while the Chinese ESL speakers tend to use the level tone on stressed syllables followed by falling tones at phrasal boundaries. The native English speakers use falling tones only at sentence endings and for the occasional emphasis of words. This paper argues that the rising-level pattern at phrasal boundaries and the falling tone for sentence endings function as grammatical markers in native English speech production. With the pitch pattern of level-falling at phrasal boundaries, the Chinese speakers’ English speech production may be characterized as flat, monotonic, and even hard to understand because of the different patterns that mark the suprasegmental ‘grammar of cohesion’.

Li Li, Pamela Cowan, Steve Walsh , Queen’s University, Belfast

Internet Project-based Model Enhances Collaboration and Knowledge sharing

One of the claimed benefits of ICT is that it enhances collaboration and provides more opportunities for learners to communicate to share knowledge. This paper describes whether and how the Internet project-based model (NetPBL) enhances second language learners to collaborate and share knowledge in a Chinese social and cultural context. From country-specific cultural perspective, collaboration and knowledge sharing is not considered important compared to achievement. 47 high school students participated a one-year net-work based project offered by their English teacher. Classroom observation and pre and post interviews from both students and teachers explored their attitudes towards collaboration and knowledge sharing; their perceptions of benefits that NetPBL brings to second language learning; how NetPBL enhances students collaboration and knowledge sharing locally. The findings support that students and teachers value collaboration and knowledge sharing more important than before, and they take positive attitudes towards NetPBL and proposed benefits of it, among which enhancing collaboration and more opportunities to share knowledge, easy to communicate ranked 1, 2 and 3 among the 10 benefits. At the same time, NetPBL fosters their learning both at group and individual level and might promote high-order thinking skills. The implications are how to embedded curriculum with a new pedagogy accordingly to meet country-specific needs and to implement NetPBL into classroom successfully in order to improving learning.

Shizuka Murazumi, Masayuki Itomitsu , The Ohio State University

Computer-delivered and Interview speaking tests: Qualitative/quantitative analysis

This paper reports different functions of the two speaking tests in an intensive Japanese language program in the U.S.: Oral Interview Test (a direct, prochievement test in face-to-face interview format) and Speaking portion of Japanese Skills Test (single-skill, semi-direct, computer-delivered proficiency test). Both tests are designed to test learners’ ability to communicate appropriately in Japanese in a variety of settings. This paper examines the complementary roles of the two tests, analyzing data qualitatively and quantitatively. After a brief introduction to the program, WE DESCRIBE the Oral Interview Test and the Japanese Skills Test in detail, including test format, test specification, and grading criteria. This paper then presents data from tests that are administered two times during the program--Data are obtained from two administrations of the tests to 2nd and 4th-level learners of Japanese, once at the beginning the intensive program, and again seven weeks after the entry test. The data includes quantitative analysis of inter-rater reliability, as well as correlation studies with other types of data (course grade, scores from other types of tests, etc.). Qualitative analyses on learners’ perception of these two tests are also investigated. The paper concludes that these two types of tests -- prochievement and proficiency tests -- are both useful (Bachman and Palmer 1996) indicators of learners’ abilities, and that both types of tests should be incorporated for evaluating a program and checking learners’ progress. Future directions for test development will also be discussed.

Chih-Min Shih , Ontario Institute for Studies in Education, University of Toronto

Washback: Some suggestions

This paper suggests future research directions and research methods for investigating the impact of tests, known as washback, and propounds methods for promoting favorable washback at the school level. After reviewing the washback literature, I found three gaps: a lack of research (1) investigating students’ learning, (2) examining the family’s impact on students’ test-preparation, and (3) employing interviews to elicit information from students. First, washback studies constantly examined the tests’ impact on teaching, but not on learning. To date, researchers have scant knowledge concerning the tests’ impact on learning. Second, researchers know little about the family’s impact on students’ learning. A previous study tangentially addressed this issue and indicated that parents had a significant impact on students’ learning. However, subsequent washback research did not probe this issue, which I contend should be examined to comprehend the gamut of the washback mechanisms, especially if students are in their early years. Third, previous empirical research predominantly adopted questionnaires to elicit information from students. I argue that interviews are a better approach for eliciting deep information from students. This paper also puts forward ways to promote washback at the school level, which had rarely been discussed. I suggest that the school change students’ viewpoints toward the test, offer more courses relevant to the tested skills, impose adequate, not undue pressure on students, create an English learning environment, involve students and parents in the decision-making process, familiarize teachers with the innovative tests, render sufficient assistance to teachers, and put forth categorical pertinent test policies.

Zia Tajeddin, Allameh Tabatabai University, Teheran

Less proficient vs. more proficient L2 learners. preferences for compensation strategies: L1-based, L2-based, and non-linguistic

Compensation strategies constitute one of the three categories of direct language learning strategies. They enable learners to use L2 for comprehension and production despite missing knowledge of some kind. Previous research on compensation strategies has solely focused on the frequency of their use, concluding that high-proficiency L2 learners outperform low-proficiency ones in drawing on these strategies. These research findings have, however, failed to address the type of compensation strategies used by learners at the early and advanced stages of language learning. This study addressed this neglected issue by bringing to light the differential preferences of high-proficiency and low-proficiency learners for the two main types of compensation strategies: guessing the meaning (L2/L1-based strategies) and compensating for missing knowledge (non-linguistic, L1-based, L2-based, and avoidance strategies). To this end, an ETS version of the TOEFL (as a measure of language proficiency/stage of language learning) and a questionnaire (as a measure of preference for compensation strategies) were administered to 226 male and female Iranian EFL learners. The results showed that there was a simple pattern in the use of guessing strategies in that high-proficiency learners drew more frequently on both total guessing strategies and its subcategories (guessing the meaning of unknown words, not looking up every word, and guessing upcoming message). However, a curvilinear pattern emerged as to the use of compensation strategies of overcoming limitations in speaking and writing. Although more proficient learners manifested less preference for the strategies of overcoming limitations, they used them more effectively than less proficient learners on the grounds that they strongly favored L2-based strategies, while less proficient learners took recourse to L1-based and avoidance strategies to overcome limitations. The results of this study suggest that (1) both early and advanced stages of L2 learning are characterized by the use of compensation strategies; (2) the higher use of compensation strategies offers a distorted picture of significance of these strategies in the absence of data on the nature and categories of strategy use; and (3) as the use of L2-based compensation strategies marks the higher level of language proficiency, their use should be encouraged to speed up the attainment of language proficiency in L2 classrooms.

Joshua Thoms , University of Iowa

Measuring L2 Oral Proficiency of Spanish Learners: An Assessment Battery

Foreign language (FL) programs in a number of institutions across the U.S. (e.g., The University of Minnesota, The University of Pennsylvania) have implemented oral proficiency assessments in order to ensure that their students reach a minimum level of proficiency before satisfying the language requirement for graduation (Chalhoub-Deville, 1997; Chalhoub-Deville, 1998). Currently at The University of Iowa (UI), there are no exit exams or requirements in place within FL departments that are similar in scope to those mentioned above. Therefore, this paper presents an assessment battery intended to measure second language oral proficiency level(s) of students after approximately two years (i.e., four semesters) of Spanish language study at UI. The Communicative Language Ability model (Bachman, 1990) establishes the theoretical underpinnings on which the assessment battery rests. Details outlining the target language use domain in addition to test task specifications will be provided. The paper will highlight the various theoretical and practical considerations related to the construction of this tape-based assessment tool. Tasks used in the assessment battery as well as accompanying rubrics will be presented. Reliability and validity documentation procedures will also be delineated and discussed. Finally, issues concerning the use of technology in the administration of this tape-mediated assessment battery for piloting purposes will be addressed.

Hui-Jeong Woo , University of Illinois at Urbana-Champaign

Some psychometric validity issues on ELL assessment

English Language Learners (ELL) are one of the subgroups that states’ adequate yearly progress (AYP) is required to report for. The sharp increase of the LEP population in the nation’s public schools (9.6% of total enrollment; National Center for Educational Statistics [NCES], 2002) brought the challenge that administrators, teachers, and schools should provide those students with an educational opportunity equal to English-speaking students. The Annual Yearly Progress (AYP) requirements of No Child Left Behind (NCLB) underscore both the mandate and the challenge of assuring that ELL subgroups achieve the same high standards of performance that are expected of their native peers. To assess AYP, ELL subgroups and native speakers are taking the same tests. However, NCLB results suggest that ELL subgroups are being left behind C schools and districts serving significant portions of ELL subgroups are less likely to meet their AYP goals. The purpose of this study is to identify some psychometric validity issues of the annual yearly progress (AYP) requirements of the No Child Left Behind of 2001(NCLB) on English Language Learners (ELL). I examined the possible factors affecting ELL subgroups’ academic performance on measures of annual yearly progress. The factors include ELL subgroups’ lengthof participation in ESL programs, motivation orientation to learn English, and their environment such as their parents’ maximum education attained, parent’s SES, race, and/or school size. Along with measurement issues of the AYP requirements, I also performed a critical analysis of ELL-related measurement issues on the content and design of currently available commercial language proficiency tests.

[转帖]Midwest Association of Language Testing 2005

清风出袖

高级会员