Formative assessment: a critical review

形成性评价 多样性(控制论) 计算机科学 过程(计算) 管理科学 数学教育 心理学 工程类 人工智能 操作系统
作者
Randy Elliot Bennett
出处
期刊:Assessment in Education: Principles, Policy & Practice [Routledge]
卷期号:18 (1): 5-25 被引量:1183
标识
DOI:10.1080/0969594x.2010.513678
摘要

Abstract This paper covers six interrelated issues in formative assessment (aka, ‘assessment for learning’). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under‐representation of measurement principles in that conceptualisation, the teacher‐support demands formative assessment entails, and the impact of the larger educational system. The paper concludes that the term, ‘formative assessment’, does not yet represent a well‐defined set of artefacts or practices. Although research suggests that the general practices associated with formative assessment can facilitate learning, existing definitions admit such a wide variety of implementations that effects should be expected to vary widely from one implementation and student population to the next. In addition, the magnitude of commonly made quantitative claims for effectiveness is suspect, deriving from untraceable, flawed, dated, or unpublished sources. To realise maximum benefit from formative assessment, new development should focus on conceptualising well‐specified approaches built around process and methodology rooted within specific content domains. Those conceptualisations should incorporate fundamental measurement principles that encourage teachers and students to recognise the inferential nature of assessment. The conceptualisations should also allow for the substantial time and professional support needed if the vast majority of teachers are to become proficient users of formative assessment. Finally, for greatest benefit, formative approaches should be conceptualised as part of a comprehensive system in which all components work together to facilitate learning. Keywords: formative assessmentassessment for learning Acknowledgements I am grateful to Steve Chappuis, Joe Ciofalo, Terry Egan, Dan Eignor, Drew Gitomer, Steve Lazer, Christy Lyon, Yasuyo Sawaki, Cindy Tocci, Caroline Wylie, and two anonymous reviewers for their helpful comments on earlier drafts of this paper or the presentation upon which the paper was based; to Brent Bridgeman, Shelby Haberman, and Don Powers for their critique of selected effectiveness studies; to Dylan Wiliam, Jim Popham and Rick Stiggins for their willingness to consider differing points of view; and to Caroline Gipps for suggesting (however unintentionally) the need for a paper such as this one. Notes 1. Influential members of the group have included Paul Black, Patricia Broadfoot, Caroline Gipps, Wynne Harlen, Gordon Stobart, and Dylan Wiliam. See http://www.assessment-reform-group.org/ for more information on the Assessment Reform Group. 2. How does formative assessment differ from diagnostic assessment? Wiliam and Thompson (2008 Wiliam, D. and Thompson, M. 2008. “Integrating assessment with learning: What will it take to make it work?”. In The future of assessment: Shaping teaching and learning, Edited by: Dwyer, C.A. 53–82. New York: Erlbaum. [Google Scholar], 62) consider an assessment to be diagnostic when it provides information about what is going amiss and formative when it provides guidance about what action to take. They note that not all diagnoses are instructionally actionable. Black (1998 Black, P. 1998. Testing, friend or foe? The theory and practice of assessment and testing, London: Routledge/Falmer Press. [Google Scholar], 26) offers a somewhat different view, stating that: ‘… diagnostic assessment is an expert and detailed enquiry into underlying difficulties, and can lead to a radical re‐appraisal of a pupil's needs, whereas formative assessment is more superficial in assessing problems with particular classwork, and can lead to short‐term and local changes in the learning work of a pupil’. 3. Expected growth was calculated from the norms of the Metropolitan Achievement Test Eighth Edition (Harcourt Educational Measurement 2002 Harcourt Educational Measurement. 2002. Metropolitan8: Technical manual, San Antonio, TX: Author. [Google Scholar]), the Iowa Tests of Basic Skills Complete Battery (Hoover, Dunbar, and Frisbie 2001 Hoover, H.D., Dunbar, S.B. and Frisbie, D.A. 2001. Iowa Tests of Basic Skills Complete/Core Battery: Spring norms and score conversions with technical information, Itasca, IL: Riverside. [Google Scholar]), and the Stanford Achievement Test Series Tenth Edition (Pearson 2004 Pearson. 2004. Stanford Achievement Test Series Tenth Edition: Technical data report, Iowa City, IA: Author. [Google Scholar]). 4. Stiggins is reported to no longer stand by the claims quoted here (S. Chappuis, April 6, 2009 Chappuis, J., Chappuis, S. and Stiggins, R. 2009. “Formative assessment and assessment for learning”. In Meaningful measurement: The role of assessments in improving high school education in the twenty‐first century, Edited by: Pinkus, L.M. 55–76. Washington, DC: Alliance for Excellent Education. http://www.all4ed.org/files/MeanMeasCh3ChappuisStiggins.pdf (accessed August 3, 2009) [Google Scholar], personal communication). I have included them because they are published ones still frequently taken by others as fact. See Kahl (2007 Kahl, S. Formative assessment: An overview. Presentation at the Montana Office of Public Instruction ‘Assessment Toolkit’ conference. April23, Helena, MT. http://opi.mt.gov/PDF/Assessment/conf/Presentations/07MON_FormAssmt.ppt (accessed February 11, 2009) [Google Scholar]) for an example. 5. Cohen (1988 Cohen, J. 1988. Statistical power analysis for the behavioral sciences. , 2nd ed., Hillsdale, NJ: Lawrence Erlbaum Associates. [Google Scholar], 25–7) considers effects of .2 to be small, .5 to be medium, and .8 to be large. 6. It is possible that these values represent Black and Wiliam's retrospective extraction from the 1998 Black, P. and Wiliam, D. 1998a. Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2): 139–48. [Web of Science ®] , [Google Scholar] review of the range of mean effects found across multiple meta‐analytical studies done by other investigators on different topics (i.e., the mean effect found in a meta‐analysis on one topic was .4 and the mean effect found in a meta‐analysis on a second topic was .7). If so, the range of observed effects across individual studies would, in fact, be wider than the oft‐quoted .4 to .7 range of effects, as each meta‐analytic mean itself represents a distribution of study effects. But more fundamentally, the construction of any such range would seem specious according to Black and Wiliam's (1998c Black, P. and Wiliam, D. 1998c. Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1): 7–74. [Taylor & Francis Online] , [Google Scholar]) very own critique – i.e., ‘… the underlying differences between the studies are such that any amalgamations of their results would have little meaning’ (53). 7. A partial list of concerns includes confusing association with causation in the interpretation of results, ignoring in the interpretation the finding that results could be explained by (irrelevant) method factors, seemingly computing effect sizes before coding the same studies for the extent of use of formative assessment (introducing the possibility of bias in coding), giving no information on the reliability of the coding, and including many dated studies (57 of the 86 included articles were 30 or more years old) without considering publication date as a moderator variable. 8. The replicability of inferences and adjustments may be challenging to evaluate. It would be easiest to assess in team‐teaching situations in which both teachers might be expected to have a shared understanding of their classroom context and students. Outside of team contexts, replicability might be evaluated through video recording of teachers' formative assessment practice; annotation of the recording by those teachers to indicate their inferences, adjustments, and associated rationales; and review of the recordings and annotations by expert teachers for reasonableness. 9. Kane (2006 Kane, M.T. 2006. “Validation”. In Educational measurement, , 4th ed., Edited by: Brennan, R.L. 17–64. Westport, CT: American Council on Education/Praeger. [Google Scholar], 23) uses ‘interpretive argument’ to refer to claims and ‘validity argument’ to refer to the backing. For simplicity, I've used ‘validity argument’ to refer to both claims and backing. 10. One could certainly conceptualise the relationship between the validity and efficacy arguments the other way around; that is, with the efficacy argument being part of a broader validity argument, a formulation that would be consistent with Kane's (2006 Kane, M.T. 2006. “Validation”. In Educational measurement, , 4th ed., Edited by: Brennan, R.L. 17–64. Westport, CT: American Council on Education/Praeger. [Google Scholar], 53–6) views. Regardless of which argument is considered to be overarching, there is no disagreement on the essential point: both arguments are needed. 11. As suggested, there are other possible underlying causes for student error, some of which may be cognitive and others of which may be affective (e.g., not trying one's hardest to respond). Black and Wiliam (2009 Black, P. and Wiliam, D. 2009. Developing a theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1): 5–31. [Crossref], [Web of Science ®] , [Google Scholar], 17) suggest a variety of cognitive causes, including misinterpretation of language, question purpose or context, or the requirements of the task itself. Affective causes may be situational ones related, for instance, to the type of feedback associated with a particular task or teacher, or such causes may be more deeply rooted, as when a student's history of academic failure dampens motivation to respond even when he or she possesses the requisite knowledge. Boekaerts (as cited in Boekaerts and Corno 2005 Boekaerts, M. and Corno, L. 2005. Self‐regulation in the classroom: A perspective on assessment and intervention. Applied Psychology: An International Review, 54(2): 199–231. [Crossref], [Web of Science ®] , [Google Scholar], 202–3) offers a model to explain how students attempt to balance achievement goals and emotional well‐being in classroom situations.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
潺潺流水完成签到,获得积分10
刚刚
1秒前
sandy完成签到,获得积分20
1秒前
支雨泽发布了新的文献求助10
3秒前
传奇3应助分析采纳,获得10
3秒前
4秒前
淡蓝色完成签到,获得积分20
5秒前
zxzb发布了新的文献求助10
5秒前
桐桐应助闪闪乘风采纳,获得10
6秒前
keyan_xiaojiang完成签到,获得积分20
6秒前
8秒前
水尽云生处完成签到,获得积分10
8秒前
华仔应助叁金采纳,获得10
9秒前
张珅豪发布了新的文献求助10
9秒前
羲和完成签到,获得积分10
9秒前
木雨完成签到 ,获得积分10
11秒前
13秒前
30040完成签到,获得积分10
14秒前
15秒前
15秒前
科研通AI5应助支雨泽采纳,获得10
15秒前
sedum完成签到,获得积分10
16秒前
海北完成签到 ,获得积分10
17秒前
小邹完成签到,获得积分10
18秒前
20秒前
wang完成签到,获得积分10
20秒前
爱坤坤发布了新的文献求助10
21秒前
热心市民远完成签到,获得积分10
21秒前
子言完成签到,获得积分10
22秒前
三三完成签到 ,获得积分10
23秒前
fairy完成签到 ,获得积分10
24秒前
钢铁加鲁鲁完成签到,获得积分0
26秒前
无花果应助qwe123采纳,获得10
28秒前
上官若男应助TonyLee采纳,获得10
28秒前
娃哈哈完成签到,获得积分20
29秒前
29秒前
昏睡的蟠桃应助Alex采纳,获得200
30秒前
赘婿应助JamesTYD采纳,获得10
32秒前
小鱼完成签到,获得积分10
32秒前
34秒前
高分求助中
Thinking Small and Large 500
Algorithmic Mathematics in Machine Learning 500
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Deciphering Earth's History: the Practice of Stratigraphy 200
New Syntheses with Carbon Monoxide 200
Quanterion Automated Databook NPRD-2023 200
Interpretability and Explainability in AI Using Python 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3835055
求助须知:如何正确求助?哪些是违规求助? 3377567
关于积分的说明 10499265
捐赠科研通 3097063
什么是DOI,文献DOI怎么找? 1705468
邀请新用户注册赠送积分活动 820611
科研通“疑难数据库(出版商)”最低求助积分说明 772142