Examining Five Behaviors Conducted by Two Groups of Novice and Experienced Raters in Two Rating Processes

Saheb Mostofee, Nasim Ghanbari, Fateme Nemati


This study aims at comparing five rating behaviors of 8 raters; four novice raters and four experienced raters. The five specific behaviors including number and frequency of referring to the rating scale (Jacobs’ et al. EFL Composition Profile), number of interpretation (justification), total rating time, total score, and number of pauses longer than 5 seconds are compared between the two groups. The 8 raters were asked to rate two essays written by two B.A. students of English Literature attending their 4 semester at Persian Gulf university of Bushehr, Iran. Using TAPs the behaviors conducted by the eight raters were transcribed, then analyzed. It was found that although a similar pattern was observed in both groups’ total scores assigned to the two essays, there was found no consistent trend in both the experienced and novice raters’ number of referring to the rating scale. In addition, we found that the novice raters’ number of referring to the rating scale, and number of pauses were higher than those of the experienced ones, while the experienced raters’ number of interpretation (justification) and total rating time were higher than the novices’. The findings while supporting the findings by the previous research, paves the way for future researchers in this regard.



Experienced raters, Novice raters, Rating behavior, Rating scale, Rating time

Full Text:



Abedi, J. (2010). Performance Assessments for English Language Learners. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing. Elsivier.12: 86–107.

Barkaoui, K. (2008). Effects of scoring method and rater experience on ESL rating outcomes and processes (Unpublished doctoral dissertation). University of Toronto, Toronto, Canada.

Barkaoui, K. (2010a). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75.

Barkaoui, K. (2010b). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74.

Behizadeh, N., & Engelhard, G. (2011). Historical view of the influences of measurement and writing theories on the practice of writing assessment in the United States. Assessing writing, 16(3), 189-211. http://dx.doi.org/10.1016/j.asw.2011.03.001.

Broad, B. (2003). What We Really Value: Beyond Rubrics in Teaching and Assessing Writing. All USU Press Publications. Book 140.

Brown, J.D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25, 587-603.

Bukta, K. (2007). Processes and outcomes in L2 English written performance assessment: Raters’ decision-making processes and awarded scores in rating Hungarian EFL learners’ compositions. (Unpublished doctoral dissertation). Hungary.

Cohen, A. D. (1996). Verbal reports as a source of insights into second language learner strategies. Applied Language Learning, 7(1–2), 5–24.

Cohen, A. D. (1998). Strategies in learning and using a second language. London: Longman.

Congdon, P. J., & McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163-178.

Connor-Linton, J. (1995). Looking behind the curtain: What do L2 composition ratings really mean? TESOL Quarterly, 29, 762–765.

Connor-Linton, J., & Polio, C. (2014). Comparing perspectives on L2 writing: Multiple analyses of a common corpus. Journal of Second Language Writing, 26, pp. 1–9.

Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7, 31±51.

Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), pp. 67-96.

Diederich, P. B., French, J., and Carlton, S. (1961). Factors in judgments of writing ability. ETS Research Bulletin 61-15. Princeton, NJ: Educational Testing Service.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.

Erdosy, U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions (TOEFL Research Report No. RR-03-17). Princeton, NJ: Educational Testing Service.

Ericsson, K. & Simon, H. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.

Ericsson, K. & Simon, H. (1987). Verbal reports on thinking. In C Faerch and G Kasper (eds), Introspection in second language research (pp. 24–53). Clevedon: Multilingual Matters.

Ericsson, K. & Simon, H. (1993). Protocol analysis: Verbal reports as data (rev. ed.). Cambridge, MA: MIT Press.

Faerch, C. and Kasper, G. (1987). From product to process: Introspective methods in second language research. In C Faerch and G Kasper (Eds), Introspection in second language research (pp. 5– 23). Clevedon: Multilingual Matters.

Gamaroff, R. (2000). Rater reliability in language assessment: The bug of all bears. System, 28, 31-53.

Ghanbari, B. Barati, H. and Moinzadeh, A. (2012a). Rating Scales Revisited: EFL Writing Assessment Context of Iran under Scrutiny. Language Testing in Asia, 2 (1), 83-100.

Green, A.J.K. (1997). Verbal protocol analysis in language teaching research. Cambridge University Press and University of Cambridge Local Examinations Syndicate.

Green, A. (1998). Verbal protocol analysis in language testing research: A handbook. Cambridge: Cambridge University Press.

Hamp-Lyons, L. (1994). Rating non-native writing: The trouble with holistic scoring. TESOL Quarterly, 29(4), 759–762.

Hamp-Lyons, L., & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41 (3), 337–373.

Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments?—A generalizability theory approach. Assessing Writing, 13, 201-218.

Huang, J. (2009). Factors affecting the assessment of ESL students’ writing. International Journal of Applied Educational Studies, 5(1), 1-17.

Huang, J. and Foote, C. J. (2010). Grading Between the Lines: What Really Impacts Professors’ Holistic Evaluation of ESL Graduate Student Writing?. Language Assessment Quarterly, 7:3, 219-233.

Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large-scale ESL writing assessment. Assessing Writing, 17, 123–139.

Huot, B. (1993). The influence of holistic scoring procedures on reading and rating students essays. In M. Williamson & B. Huot (Eds.), Validating holisticscoring for writing assessment (pp. 206–236). Cresskill, NJ: Hampton Press.

Jacobs, H. L., Zinkgraf, S. A., Wormouth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: a practical approach. Rowely, MA: Newbury House.

Janssen, G., Meier, V., & Trace, J. (2015). Building a better rubric: Mixed methods rubric revision. Assessing Writing. DOI:10.1016/j.asw.2015.07.002

Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26(4), 485-505.

Johnson, R. L., Penny, J., Gordon, B., Shumate, S. R., & Fisher, S. P. (2005). Resolving score differences in the rating of writing samples: Does discussion improve the accuracy of scores? Language Assessment Quarterly, 2(2), 117-146.

Kobayashi, T. (1992). Native and nonnative reactions to ESL compositions. TESOL Quarterly, 26(1), 81-111.

Lee, Y.W., Gentile, C., & Kantor, R. (2010). Toward automated multi-trait scoring of essays: Investigating links among holistic, analytic, and text feature scores. Applied Linguistics, 31(3), 391– 417.

Li, H & He, L. (2015). A Comparison of EFL Raters’ Essay-Rating Processes across Two Types of Rating Scales. Language Assessment Quarterly, 12: 178–212.

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the Raters? Language Testing. 19:246.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. New York: Peter Lang.

Lynch, B. K. & McNamara, T. F. (1998). Using G-Theory and many facet Rasch measurements in the development of performance assessments of ESL speaking skills of immigrants, Language Testing, 15(2), 158-188.

Maftoon, P. & Akef, K. (2010). Developing rating scale descriptors for assessing the stages of writing process: The constructs underlying students' writing performances. Journal of language and translation, volume 1, number1, pp. 1-18.

Matsumoto, K. (1993). Verbal-report data and introspective methods in second language research. RELC Journal, 24(1), 32–60.

McMillan, J. H., & Schumacher, S. (2001). Research in education: A conceptual introduction (5th ed.). New York: Longman.

Mendelsohn, D., & Cumming, A. (1987). Professors’ ratings of language use and rhetorical organization in ESL compositions. TESL Canada Journal, 5, 9-26.

Milanovic, M., Saville, N. & Shuhong, S. (1996). A study of the decision-making behavior of composition markers. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment. Selected papers from the 15th Language Testing Research Colloquium, Cambridge and Arnhem (pp. 92-1 1 1). Cambridge: Cambridge University Press.

Parker, C. E., Louie, J., & O’Dwyer, L. (2009). New measures of English language proficiency and their relationship to performance on large-scale content assessments. (Issues & Answers Report,REL 2009–No. 066). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from http://ies.ed.gov/ncee/edlabs.

Russo, J.E., Johnson E.J. and Stephens D.L. (1989). The validity of verbal protocols. Memory and Cognition. 17, 759–769.

Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate composition. In M. Milanovic, & A. J. Kunnan (Eds.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium (pp. 129–152). Cambridge: Cambridge University Press.

Sakyi, A. A. (2003). A study of the holistic scoring behaviors of experienced and novice ESL instructors. Unpublished doctoral dissertation, University of Toronto, Toronto, Canada.

Santos, T. (1988). Professors' reactions to the writing of nonnative-speaking students. TESOL Quarterly, 22(1), 69-90.

Sasaki, T. (2003). Recipient orientation in verbal report protocols: Methodological issues in concurrent think-aloud. Second Language Studies. 22, 1–54.

Schoonen, R., Vergeer, M., & Eiting, M. (1997). The assessment of writing ability: Expert readers versus lay readers. Language Testing, 14, 157–184.

Smagorinsky, P. (1994). Think-aloud protocol analysis: Beyond the black box. In P Smagorinsky (ed.), Speaking about writing: Reflections on research methodology (pp. 3–19). Thousand Oaks, CA: Sage.

Smith, D. (2000). Rater judgments in the direct assessment of competency-based second language writing ability. In Studies in immigrant English language assessment, Vol. 1, ed. G. Brindley, 159–89. Sydney: National Centre for English Language Teaching and Research, Macquarie University.

Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking and ESL students? Journal of Second Language Writing, 5, 163-182.

Stratman, J. F., & Hamp-Lyons, L. (1994). Reactivity in concurrent think-aloud protocols: Issues for research. In P. Smagorinsky (Ed.), Potential problems andproblematic potentials of using talk about writing as data about writing processes. (pp. 89-114).

Swartz, C.W., Hooper, S.R., Mongomery, J. W., Wakely, M. B., De-Kruif, R.E.L., Reed, M., Brown, T.T., Levine, M.D. and White, K.P. (1999). Using generalizability theory to estimate the reliability of writing scores derived from holistic and analytical scoring methods. Educational and Psychological Measurement, 59, 492_506.

Vann, R., Meyer, D., & Lorenz, F. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly, 18, 427-440.

Vaughan, C. (1991). Holistic assessment: What goes on in the raters’ minds? In L. Hamp-Lyons (Ed.), Assessingsecond language writing in academic contexts (pp. 111–126). Norwood, NJ: Ablex.Winke, P. and Lim, H. (2015). ESL essay raters’ cognitive processes in applying the Jacobs et al. rubric: An eye-movement study. Assessing Writing, 25, 37–53.

Weigle, S.C. (1994). Effects of training on raters of English as a second language compositions. Quantitative and qualitative approaches. Unpublished PhD dissertation, University of California, Los Angeles.

Weigle, S.C. (1998). Using FACETS to model rater training effects. Language Testing, 15, 263–287.

Weigle, S. C. (2002). Assessing writing. Cambridge, UK: Cambridge University Press.

Weigle, S. C., Boldt, H., & Valsecchi, M. I. (2003). Effects of task and rater background on the evaluation of ESL student writing: A pilot study. TESOL Quarterly, 37, 345-354.

Wolfe, E. W. (2006). Uncovering rater’s cognitive processing and focus using think-aloud protocols. Journal of Writing Assessment, 2, 37–56.

Wolfe, E., Kao, C., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465–492.

DOI: http://dx.doi.org/10.7575/aiac.ijalel.v.5n.4p.199


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2012-2019 (CC-BY) Australian International Academic Centre PTY.LTD

International Journal of Applied Linguistics and English Literature

To make sure that you can receive messages from us, please add the journal emails into your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.