您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[ACT]:Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CAT - 发现报告
当前位置:首页/行业研究/报告详情/

Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CAT

文化传媒2014-09-15ACT改***
Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CAT

Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CATQingYi Tianyou Wang Jae-Chun BanJ a n u a ry 2 0 0 0 For additional copies write:ACT Research Report Series PO Box 168Iowa City, Iowa 52243-0168© 2000 by ACT, Inc. All rights reserved. Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CATQing Yi Tianyou Wang Jae-Chun Ban AbstractError indices (bias, standard error of estimation, and root mean square error) obtained on different scales of measurement under different test termination rules in a CAT context were examined. Four ability estimation methods (ML£, WLE, EAP, and MAP), three measurement scales (0 , number correct score, and ACT score scale), and three test termination rules (fixed length, fixed standard error, and target information) were studied. The findings indicate that the amount and direction of bias, standard error of estimation, and root mean square error obtained under different ability estimation methods is influenced both by measurement scale and by test termination rule in a CAT environment. WLE performed the best among the four ability estimation methods on the ACT score scale with a target information termination rule. Effects of Scale Transformation and Test Termination Rule on the Precision of Ability Estimates in CATC om puterized adaptive testing (CAT) is designed to construct a unique test for each exam inee, so that the test targets the exam inee’s estim ated ability level. Theoretically, CAT has m any advantages over paper-and-pencil (P&P) tests. One often m entioned advantage is its m easurem ent efficiency or capability to deliver shorter tests. CA T also provides exam inees with the benefits o f test on dem and and im m ediate test scoring and reporting. W ith the recent developm ent in com puter technology and psychom etric knowledge, the popularity of CAT is increasing. Several high-stake testing program s have im plem ented CA T versions of P& P tests (Eignor, W ay, Stocking, & Steffen, 1993; Sands, W aters, & M cBride, 1997). and some others are m oving tow ards using CAT as an alternative test-delivery m ethod (M iller & Davey, 1999). A lthough CA T has many advantages, there are issues that need to be considered in the application of CAT. For exam ple, the effects of scale transform ations and test term ination rules on the precision of ability estim ation methods have not yet been fully investigated.M ost com puterized adaptive tests (CATs) use item selection and scoring algorithm s that depend on item response theory (IRT). However, it may be difficult for the general public with lim ited psychom etric knowledge to understand the m eaning o f the 0 scale. Thus, the m easurem ent scales of CATs are often transform ed from 0 to m ore fam iliar scales, such as num ber correct (NC) score or reported score scales (Stocking, 1996). The test term ination rule is another im portant factor that has to be considered when CATs are im plem ented. The choice o f a stopping rule is affected by several factors, for exam ple, the com parability between P& P tests and CA Ts, cost (e.g., cost of com puter sitting time), m easurem ent efficiency, and am ong others.B ecause most CATs use IRT as the testing model, much o f the previous research on CAT has been based on the IRT 0 scale. F or exam ple, studies that com pared different ability estimation methods in terms of error indices, such as'bias'standard error of estimation (SE), and root mean squared error (RMSE) of those methods, most often made the comparison on the 0 scale (e.g., Bock & Mislevv, 1982; Crichton, 1981; Wang & Vispoel, 1998; Wang, Hanson, & Lau, 1999; Warm, 1989; Weiss & MacBride, 1984). This tendency is quite natural because much’of the basic engine of CAT, such as the item selection algorithm, is based on IRT parameters that are directly related to the 0 scale. However, when CATs move to actual implementation, the final reported score scale is rarely a linear transformation of the 0 scale.For example, the GRE CAT uses the same score scale as the P&P version of the test, which is a nonlinear transformation of the estimated 0 to a NC score and then to a reported score (e.g., Eignor & Schaeffer, 1995; Eignor et al., 1993). One question arising is whether the previous research results regarding the properties of different ability estimation methods or other CAT components (e.g., test termination rule) are scale specific.A recent study indicated that the error indices obtained from the maximum likelihood estimation (MLE) method may be drastically different on the NC scale than on the 0 scale (Yi, 1998). Lord (1980, Chapter 6) provided a theoretical derivation of the effects of scale transfo