Formative assessment: a critical review

形成性评价 多样性(控制论) 计算机科学 过程(计算) 管理科学 数学教育 心理学 工程类 人工智能 操作系统
作者
Randy Elliot Bennett
出处
期刊:Assessment in Education: Principles, Policy & Practice [Informa]
卷期号:18 (1): 5-25 被引量:1183
标识
DOI:10.1080/0969594x.2010.513678
摘要

Abstract This paper covers six interrelated issues in formative assessment (aka, ‘assessment for learning’). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under‐representation of measurement principles in that conceptualisation, the teacher‐support demands formative assessment entails, and the impact of the larger educational system. The paper concludes that the term, ‘formative assessment’, does not yet represent a well‐defined set of artefacts or practices. Although research suggests that the general practices associated with formative assessment can facilitate learning, existing definitions admit such a wide variety of implementations that effects should be expected to vary widely from one implementation and student population to the next. In addition, the magnitude of commonly made quantitative claims for effectiveness is suspect, deriving from untraceable, flawed, dated, or unpublished sources. To realise maximum benefit from formative assessment, new development should focus on conceptualising well‐specified approaches built around process and methodology rooted within specific content domains. Those conceptualisations should incorporate fundamental measurement principles that encourage teachers and students to recognise the inferential nature of assessment. The conceptualisations should also allow for the substantial time and professional support needed if the vast majority of teachers are to become proficient users of formative assessment. Finally, for greatest benefit, formative approaches should be conceptualised as part of a comprehensive system in which all components work together to facilitate learning. Keywords: formative assessmentassessment for learning Acknowledgements I am grateful to Steve Chappuis, Joe Ciofalo, Terry Egan, Dan Eignor, Drew Gitomer, Steve Lazer, Christy Lyon, Yasuyo Sawaki, Cindy Tocci, Caroline Wylie, and two anonymous reviewers for their helpful comments on earlier drafts of this paper or the presentation upon which the paper was based; to Brent Bridgeman, Shelby Haberman, and Don Powers for their critique of selected effectiveness studies; to Dylan Wiliam, Jim Popham and Rick Stiggins for their willingness to consider differing points of view; and to Caroline Gipps for suggesting (however unintentionally) the need for a paper such as this one. Notes 1. Influential members of the group have included Paul Black, Patricia Broadfoot, Caroline Gipps, Wynne Harlen, Gordon Stobart, and Dylan Wiliam. See http://www.assessment-reform-group.org/ for more information on the Assessment Reform Group. 2. How does formative assessment differ from diagnostic assessment? Wiliam and Thompson (2008 Wiliam, D. and Thompson, M. 2008. “Integrating assessment with learning: What will it take to make it work?”. In The future of assessment: Shaping teaching and learning, Edited by: Dwyer, C.A. 53–82. New York: Erlbaum. [Google Scholar], 62) consider an assessment to be diagnostic when it provides information about what is going amiss and formative when it provides guidance about what action to take. They note that not all diagnoses are instructionally actionable. Black (1998 Black, P. 1998. Testing, friend or foe? The theory and practice of assessment and testing, London: Routledge/Falmer Press. [Google Scholar], 26) offers a somewhat different view, stating that: ‘… diagnostic assessment is an expert and detailed enquiry into underlying difficulties, and can lead to a radical re‐appraisal of a pupil's needs, whereas formative assessment is more superficial in assessing problems with particular classwork, and can lead to short‐term and local changes in the learning work of a pupil’. 3. Expected growth was calculated from the norms of the Metropolitan Achievement Test Eighth Edition (Harcourt Educational Measurement 2002 Harcourt Educational Measurement. 2002. Metropolitan8: Technical manual, San Antonio, TX: Author. [Google Scholar]), the Iowa Tests of Basic Skills Complete Battery (Hoover, Dunbar, and Frisbie 2001 Hoover, H.D., Dunbar, S.B. and Frisbie, D.A. 2001. Iowa Tests of Basic Skills Complete/Core Battery: Spring norms and score conversions with technical information, Itasca, IL: Riverside. [Google Scholar]), and the Stanford Achievement Test Series Tenth Edition (Pearson 2004 Pearson. 2004. Stanford Achievement Test Series Tenth Edition: Technical data report, Iowa City, IA: Author. [Google Scholar]). 4. Stiggins is reported to no longer stand by the claims quoted here (S. Chappuis, April 6, 2009 Chappuis, J., Chappuis, S. and Stiggins, R. 2009. “Formative assessment and assessment for learning”. In Meaningful measurement: The role of assessments in improving high school education in the twenty‐first century, Edited by: Pinkus, L.M. 55–76. Washington, DC: Alliance for Excellent Education. http://www.all4ed.org/files/MeanMeasCh3ChappuisStiggins.pdf (accessed August 3, 2009) [Google Scholar], personal communication). I have included them because they are published ones still frequently taken by others as fact. See Kahl (2007 Kahl, S. Formative assessment: An overview. Presentation at the Montana Office of Public Instruction ‘Assessment Toolkit’ conference. April23, Helena, MT. http://opi.mt.gov/PDF/Assessment/conf/Presentations/07MON_FormAssmt.ppt (accessed February 11, 2009) [Google Scholar]) for an example. 5. Cohen (1988 Cohen, J. 1988. Statistical power analysis for the behavioral sciences. , 2nd ed., Hillsdale, NJ: Lawrence Erlbaum Associates. [Google Scholar], 25–7) considers effects of .2 to be small, .5 to be medium, and .8 to be large. 6. It is possible that these values represent Black and Wiliam's retrospective extraction from the 1998 Black, P. and Wiliam, D. 1998a. Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2): 139–48. [Web of Science ®] , [Google Scholar] review of the range of mean effects found across multiple meta‐analytical studies done by other investigators on different topics (i.e., the mean effect found in a meta‐analysis on one topic was .4 and the mean effect found in a meta‐analysis on a second topic was .7). If so, the range of observed effects across individual studies would, in fact, be wider than the oft‐quoted .4 to .7 range of effects, as each meta‐analytic mean itself represents a distribution of study effects. But more fundamentally, the construction of any such range would seem specious according to Black and Wiliam's (1998c Black, P. and Wiliam, D. 1998c. Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1): 7–74. [Taylor & Francis Online] , [Google Scholar]) very own critique – i.e., ‘… the underlying differences between the studies are such that any amalgamations of their results would have little meaning’ (53). 7. A partial list of concerns includes confusing association with causation in the interpretation of results, ignoring in the interpretation the finding that results could be explained by (irrelevant) method factors, seemingly computing effect sizes before coding the same studies for the extent of use of formative assessment (introducing the possibility of bias in coding), giving no information on the reliability of the coding, and including many dated studies (57 of the 86 included articles were 30 or more years old) without considering publication date as a moderator variable. 8. The replicability of inferences and adjustments may be challenging to evaluate. It would be easiest to assess in team‐teaching situations in which both teachers might be expected to have a shared understanding of their classroom context and students. Outside of team contexts, replicability might be evaluated through video recording of teachers' formative assessment practice; annotation of the recording by those teachers to indicate their inferences, adjustments, and associated rationales; and review of the recordings and annotations by expert teachers for reasonableness. 9. Kane (2006 Kane, M.T. 2006. “Validation”. In Educational measurement, , 4th ed., Edited by: Brennan, R.L. 17–64. Westport, CT: American Council on Education/Praeger. [Google Scholar], 23) uses ‘interpretive argument’ to refer to claims and ‘validity argument’ to refer to the backing. For simplicity, I've used ‘validity argument’ to refer to both claims and backing. 10. One could certainly conceptualise the relationship between the validity and efficacy arguments the other way around; that is, with the efficacy argument being part of a broader validity argument, a formulation that would be consistent with Kane's (2006 Kane, M.T. 2006. “Validation”. In Educational measurement, , 4th ed., Edited by: Brennan, R.L. 17–64. Westport, CT: American Council on Education/Praeger. [Google Scholar], 53–6) views. Regardless of which argument is considered to be overarching, there is no disagreement on the essential point: both arguments are needed. 11. As suggested, there are other possible underlying causes for student error, some of which may be cognitive and others of which may be affective (e.g., not trying one's hardest to respond). Black and Wiliam (2009 Black, P. and Wiliam, D. 2009. Developing a theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1): 5–31. [Crossref], [Web of Science ®] , [Google Scholar], 17) suggest a variety of cognitive causes, including misinterpretation of language, question purpose or context, or the requirements of the task itself. Affective causes may be situational ones related, for instance, to the type of feedback associated with a particular task or teacher, or such causes may be more deeply rooted, as when a student's history of academic failure dampens motivation to respond even when he or she possesses the requisite knowledge. Boekaerts (as cited in Boekaerts and Corno 2005 Boekaerts, M. and Corno, L. 2005. Self‐regulation in the classroom: A perspective on assessment and intervention. Applied Psychology: An International Review, 54(2): 199–231. [Crossref], [Web of Science ®] , [Google Scholar], 202–3) offers a model to explain how students attempt to balance achievement goals and emotional well‐being in classroom situations.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
刚刚
小蘑菇应助L.L采纳,获得10
1秒前
李大能应助wanghb616采纳,获得10
2秒前
2秒前
苏苏发布了新的文献求助10
3秒前
4秒前
4秒前
yueshao发布了新的文献求助10
5秒前
上官若男应助李想的李采纳,获得10
5秒前
yueshao发布了新的文献求助10
5秒前
江新儿完成签到,获得积分10
5秒前
XRT发布了新的文献求助10
6秒前
帅气楼房完成签到,获得积分10
7秒前
kk关闭了kk文献求助
7秒前
7秒前
小超发布了新的文献求助30
8秒前
我哈哈哈发布了新的文献求助10
8秒前
10秒前
11秒前
小陈发布了新的文献求助50
11秒前
viang发布了新的文献求助10
11秒前
无花果应助就叫我小王吧采纳,获得10
12秒前
英姑应助小超采纳,获得10
12秒前
星星完成签到,获得积分10
13秒前
14秒前
帅气楼房发布了新的文献求助10
14秒前
15秒前
香蕉觅云应助我哈哈哈采纳,获得10
15秒前
HJJHJH发布了新的文献求助10
16秒前
ding应助害羞无春采纳,获得10
16秒前
19秒前
20秒前
20秒前
21秒前
21秒前
司空驳发布了新的文献求助100
21秒前
Hilda007应助康K采纳,获得10
22秒前
科研通AI6.2应助WXN采纳,获得10
23秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Kinesiophobia : a new view of chronic pain behavior 2000
The Social Psychology of Citizenship 1000
Streptostylie bei Dinosauriern nebst Bemerkungen über die 540
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
Brittle Fracture in Welded Ships 500
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5920042
求助须知:如何正确求助?哪些是违规求助? 6897709
关于积分的说明 15812369
捐赠科研通 5046758
什么是DOI,文献DOI怎么找? 2715913
邀请新用户注册赠送积分活动 1669106
关于科研通互助平台的介绍 1606494