Sensitivity and Accuracy of the Mantel-Haenszel Method and Standardization Method: Detection of Item Functioning Differential

Ahmad Rustam, Dali Santun Naga, Yetti Supriyati

Abstract


Detection of differential item functioning (DIF) is needed in the development of tests to obtain useful items. The Mantel-Haenszel method and standardization are tools for DIF detection based on classical theory assumptions. The study was conducted to highlight the sensitivity and accuracy between the Mantel-Haenszel method and the standardization method in DIF detection. Simulation design (a) test participants consisted of 1000 responses in both the reference and focus groups, (b) the size of the proportion of DIF (0.1; 0.25; 0.50; and 0.75), and (c) the length of the multiple choice test with 40 choices the answer. Research shows that the Mantel-Haenszel method has the same sensitivity as the standardization method in DIF proportions of 10% and 25%, however, when the ratio of DIF proportions above 25% the standardization method is less sensitive, and conversely the sensitivity of the Mantel-Haenszel method increases. The standardization method has higher accuracy than the Mantel-Haenszel method in the DIF proportion of 10%, however, when the size of the DIF proportion above 10% the accuracy of the standardization method decreases, the accuracy of the Mantel-Haenszel method is higher than the standardization method. Thus, if the ratio of DIF is detected by the standardization method of (≤0.10), then the results of the standardization method are preferred as a reference. Conversely, if the proportion of DIF detected by the standardization method is (≥0.10), then the result of the Mantel-Haenszel method is chosen as a reference.

Keywords


Mantel-Haenszel, Standardization, DIF, Sensitivity, Accuracy

Full Text:

PDF

References


AERA, APA, & NCME. (1999). Standards for Educational and Psychological Testing. American Psychological Assosiation: Washington, DC.

Agresri, A., & Finlay, B. (2009). Statistical Methods for the Social Sciences. USA: Pearson.

Allen, M. J., & Yen, W. M. (1979). Introduction to Measurement Theory. Monterey: Brooks/Cole Publishing Company.

Anastasi, A. (1976). Psychological Testing. New York: Macmillan Publishing Co., Inc.

Azen, R., & Walker, C. M. (2011). Categorical Data Analysis for the Behavioral and Social Sciences. New York: Routledge Taylor and Francis Group.

Azwar, S. (2000). Reliabilitas dan Validitas (Edisi 4). Yogyakarta: Pustaka Pelajar.

Berenson, M. L., Levine, D. M., & Krehbiel, T. C. (2012). Basic Business Statistics: Concepts and Applications. (Eric Svendsen, Ed.) (Twelfth Ed). New Jersey: Prentice Hall.

Berk, R. A. (1982). Handbook of Methods for Detecting Test Bias. Baltimore, Maryland: The Johns Hopkins University Press.

Budiyono. (2009). The Accuracy of Mantel-Haenszel, Sibstest, and Logistic regression Methods in Differential Item Functioning Detection. Jurnal Penelitian Dan Evaluasi Pendidikan, 12(1), 1–20.

Cizek, G. J., Rosenberg, S. L., & Koons, H. H. (2008). Sources of Validity Evidence for Educational and Psychological Tests. Educational and Psychological Measurement, 68(3), 397–412.

Dorans, N. J. (1989). Applied Measurement in Education Two New Approaches to Assessing Differential Item Functioning : Standardization and the Mantel-Haenszel Method. Applied Measurement in Education, 2(3), 217–233. https://doi.org/10.1207/s15324818ame0203

Dorans, N. J., & Holland, P. W. (1992). DIF Detection and Description: Mantel-Haenszel and Standardization. New Jersey.

Dorans, N. J., & Kulick, E. (1986). Demonstrating the Utility of the Standardization Approach to Assessing Unexpected Differential Item Performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355–368.

Dorans, N. J., & Kulick, E. (2006). Differential Item Functioning on the Mini-Mental State Examination: An Application of the Mantel-Haenszel and Standardization Procedures. Medical Care, 44(11), 107–114.

Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1988). The Standardization Approach to Assessing Differential Speededness.

Ercikan, K., Roth, W., Simon, M., Sandilands, D., & Lyons-thomas, J. (2014). Inconsistencies in DIF Detection for Sub-Groups in Heterogeneous Language Groups. Applied Measurement in Education, 27(4), 273–285. https://doi.org/10.1080/08957347.2014.944306

Gierl, M. J., Gotzmann, A., & Boughton, K. A. (2004). Performance of SIBTEST When the Percentage of DIF Items is Large. Applied Measurement in Education, 17(3), 241–264. https://doi.org/10.1207/s15324818ame1703

Gierl, M., Khalid, S. N., & Boughton, K. (1999). Gender Differential Item Functioning in Mathematics and Science : Prevalence and Policy Implications. In Improving Large-Scale Assessment in Education (pp. 1–25). Canada: Centre for Research in Applied Measurement and Evaluation University of Alberta Pap.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. California: SAGE Publications Inc.

Han, K. T. (2007). WinGen : Windows Software That Generates Item Response Theory Parameters and Item Responses. Applied Psychological Measurement, 31(5), 457–459. https://doi.org/10.1177/0146621607299271

Hidalgo, M. D., Galindo-garre, F., & Gómez-benito, J. (2015). Differential item functioning and cut-off scores : Implications for test score interpretation *. Anuario de Psicología/The UB Journal of Psychology, 45(1), 55–69.

Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.). In Test Validity (pp. 129–145). Erlbaum: Hillsdale, NJ.

Huggins, A. C. (2012). The Effect of Differential Item Functioning on Population Invariance of Item Response Theory True Score Equating. University of Miami.

Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item Response Theory, Application to Psychological Measurement. Illinois: Down Jones-Irwin Homewood.

Jensen, A. R. (1980). Bias in Mental Testing. New York: A Division of Macmillan Publishing Co., Inc.

Kerlinger, F. N. (1986). Asas-asas Penelitian Behavioral (terjemahan L.R. Simatupang). Yogyakarta: Gajahmada University Press.

Lind, D. A., Marchal, W. G., & Wather, S. A. (2012). Statistical Tecnique in Business & Economics. New York: McGraw-Hill Companies, Inc.

Loong, T. (2003). Understanding sensitivity and specificity with the right side of the brain. BMJ, 327, 716–719.

Masters, G. N., & Keeves, J. P. (1999). Advances in Measurement in Educational Research and Assessment. United Kingdom: Elsevier Science Ltd.

Muniz, J., Hambleton, R. K., & Xing, D. (2001). Small Sample Studies to Detect Flaws in Item Translations. International Journal of Testing, 1(2), 115–135.

Naga, D. S. (1992). Pengantar Teori Skor Pada Pengukuran Pendidikan. Jakarta: Besbats.

Nunnally, J. C. (1978). Psychometric theory. New York: McGraw Hill.

Oliveri, M. E., Ercikan, K., & Zumbo, B. D. (2014). Effects of Population Heterogeneity on Accuracy of DIF Detection. Applied Measurement in Education, 27(4), 286–300. https://doi.org/10.1080/08957347.2014.944305

Ong, Y. M. (2010). Understanding Differential Functioning By Gender in Mathematics Assessment. University of Manchester for the degree of Doctor of Philosophy.

Urbina, S. (2004). Essentials of Psychological Testing. New Jersey: John Wiley & Sons,Inc.

Whitmore, M. L., & Schumacker, R. E. (1999). A comparison of logistic regression and analysis of variance differential item functioning detection methods. Educational and Psychological Measurement, 59(6), 910–927.

Yerushalmy, J. (1947). Statistical Problems in Assessing Methods of Medical Diagnosis, with Special Reference to X-Ray Techniques. Public Health Reports, 62(40), 1432–1449.

Zhu, W., Zeng, N., & Wang, N. (2010). Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS ® Implementations. In Section of Health Care and Life Sciences (pp. 1–9). Maryland: Northeast SAS User Group proceedings.

Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert- Type (Ordinal) Item Scores. Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.




DOI: http://dx.doi.org/10.7575/aiac.ijels.v.7n.3p.28

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2013-2019 (CC-BY) Australian International Academic Centre PTY.LTD.

International Journal of Education and Literacy Studies  

You may require to add the 'aiac.org.au' domain to your e-mail 'safe list’ If you do not receive e-mail in your 'inbox'. Otherwise, you may check your 'Spam mail' or 'junk mail' folders.