Analisis Kualitas Tes Bahasa Arab di Indonesia: Studi Systematic Literature Review tentang Validitas, Reliabilitas, Tingkat Kesukaran, dan Daya Beda
Abstract
Evaluation in Arabic language learning is essential to measure students' achievement; however, the quality of tests used in Indonesia still requires improvement. This study employed the Systematic Literature Review (SLR) method to analyze the validity, reliability, difficulty level, and discrimination power of Arabic test items, based on a synthesis of six articles indexed in SINTA and Scopus, published between 2019 and 2024. This SLR approach offers a new contribution by systematically revealing national trends and gaps in item quality, which have not been comprehensively analyzed in previous studies. The findings show that on average, 66% of the items were valid, and most tests demonstrated very high reliability (≥ 0.85), although some tests had low reliability (0.54). The distribution of difficulty levels was imbalanced, with 50.83% of items being too easy and only 7.67% classified as difficult, deviating from the ideal distribution. Additionally, 34% of the items exhibited low discrimination power, reducing the effectiveness of assessments in distinguishing students' abilities. These imbalances can lead to biased evaluations and hinder students' competency development. The practical implications of this study include the importance of teacher training in item analysis, the application of Bloom's Taxonomy to balance item difficulty levels, and the development of a standardized, data-driven item bank. The main contribution of this research is to provide empirical foundations for improving Arabic language assessment policies in Indonesia and to propose a more accurate and fair evidence-based evaluation approach.
Keywords
Full Text:
PDFReferences
Abd-Elmoneim, D. M., Ghandour, H. H., Elrefaie, D. A., & Khodeir, M. S. (2023). Development of an Arabic test for assessment of semantics for the Arabic-speaking children: the Arabic semantic test. The Egyptian Journal of Otolaryngology, 39(1), 49.
Al Fathiyah, S. F. (2019). Analisis butir soal pelajaran Bahasa Arab di MA Roudlotul Ulum Pagak Malang. Tarbiyatuna: Jurnal Pendidikan Ilmiah, 4(1), 76–100.
Ali, S. H., & Ruit, K. G. (2015). The Impact of item flaws, testing at low cognitive level, and low distractor functioning on multiple-choice question quality. Perspectives on medical education, 4, 244-251.
Alwinda, R. H. (2020). Pengembangan instrumen berpikir kreatif matematis siswa berdasarkan teori Taksonomi Bloom dan Evans (Skripsi). UIN Syarif Hidayatullah, Jakarta, Indonesia.
Arifianto, M. L., Amin, M. N., Irhamni, A., Ahsanuddin, M., Nikmah, K., Anwar, M. S., & Fitria, N. (2021). Evaluasi pembelajaran dan pengembangan tes interaktif bahasa Arab. Tonggak Media.
Budiono, A. N., & Hatip, M. (2023). Asesmen pembelajaran pada kurikulum merdeka. Jurnal Axioma: Jurnal Matematika Dan Pembelajaran, 8(1), 109-123.
Chauhan, G. R., Chauhan, B. R., Vaza, J. V., & Chauhan, P. R. (2023). Relations of the Number of Functioning Distractors With the Item Difficulty Index and the Item Discrimination Power in the Multiple Choice Questions. Cureus, 15(7), e42492.
Danni, R., Wahyuni, A., & Tauratiya, T. (2021). Item response theory approach: Kalibrasi butir soal penilaian akhir semester mata pelajaran Bahasa Arab. Arabi: Journal of Arabic Studies, 6(1), 93–104.
Erlinawati, E., & Muslimah, M. (2021). Test validity and reliability in learning evaluation. Bulletin of Community Engagement, 1(1), 26-31.
Fahmi, B., Rizqi, S., & Harmeilinda, N. E. (2022). Analisis butir soal Bahasa Arab MAS Pondok Pesantren Assalam Kampar Riau. Ta’lim Al-’Arabiyyah: Jurnal Pendidikan Bahasa Arab & Kebahasaaraban, 6(1), 95–105.
Fatimah, L. U., & Alfath, K. (2019). Analisis kesukaran soal, daya pembeda dan fungsi distraktor. Al-Manar, 8(2), 37–64.
Fikriyah, N. (2021). Analisis butir soal ulangan tengah semester mata pelajaran Bahasa Arab kelas VII semester ganjil SMP Muhammadiyah 1 Yogyakarta tahun ajaran 2019/2020. Maharaat: Jurnal Pendidikan Bahasa Arab, 3(2), 128–140.
Ismail, M. T., & Ammar, F. M. (2024). Analisis butir soal pelajaran Bahasa Arab sumatif akhir semester ganjil tahun ajaran 2022/2023 kelas XI Sekolah Menengah Atas Al-Fattah Sidoarjo. Jurnal Ilmiah Pendidikan Dasar, 9(2), 3556–3564.
Ismiyati, Raharjo, T. H., Tusyanah, & Sholikah, M. (2023). Pelatihan analisis butir soal berdasarkan teori tes klasik berbantuan Iteman untuk meningkatkan kualitas instrumen penilaian. JAPI, 8(2), 201–210.
Karim, S. A., Sudiro, S., & Sakinah, S. (2021). Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test. EduLite: Journal of English Education, Literature and Culture, 6(2), 256-269.
Koretz, D. (2024). "Improving Balance in Educational Measurement: A Legacy of E.F. Lindquist" (Journal of Educational and Behavioral Statistics, 49(6), 930-945)
Kwon, S., Kim, S., Lee, S., Kim, J. Y., An, S., & Kim, K. (2023, October). Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise Aggregate Influence Function Approach. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (pp. 4674-4680).
Mandinach, E. B., & Schildkamp, K. (2021). Misconceptions about data-based decision making in education: An exploration of the literature. Studies in Educational Evaluation, 69, 100842.
Maulana, D., & Sanusi, A. (2020). Analisis butir soal Bahasa Arab Ujian Akhir Madrasah Bersama Daerah (UAMBD) Madrasah Ibtidaiyah tahun 2017–2018. Jurnal Pendidikan Bahasa Arab & Kebahasaaraban, 4(1), 12–24.
Miladya, J. (2021). Evaluasi dalam pembelajaran Bahasa Arab. Dalam Prosiding Konferensi Nasional Bahasa Arab (KONASBARA) (pp. 179–187). Malang, Indonesia.
Mukhlisa, N. (2023). Validitas Tes. JUARA SD: Jurnal Pendidikan Dan Pembelajaran Sekolah Dasar, 2(1), 142-147.
Musa, M. A., Mutiah, R., & Rahmani. (2024). Analisis butir soal Bahasa Arab di MTsN Kota Parepare. Sipakainge, 2(5), 1–10.
Muzaffar, A. (2016). Validitas Tes dan Kualitas Butir Soal. لســـانـنــا (LISANUNA): Jurnal Ilmu Bahasa Arab dan Pembelajarannya, 5(1), 128-143.
Nurhalimah, S., Hidayati, Y., Rosidi, I., & Hadi, W. P. (2022). Hubungan antara validitas item dengan daya pembeda dan tingkat kesukaran soal pilihan ganda pas. Natural Science Education Research (NSER), 4(3), 249-257.
Panadero, E., Broadbent, J., Boud, D., & Lodge, J. M. (2019). Using formative assessment to influence self-and co-regulated learning: the role of evaluative judgement. European Journal of Psychology of Education, 34, 535-557.
Phelps, R. P. (2012). The effect of testing on student achievement, 1910–2010. International Journal of Testing, 12(1), 21-43.
Pizà-Mir, B. (2022). Validation of the Use of Bloom's Revised Taxonomy as a Tool for the Design of Assessment Tests. Preprints.
Ramadhan, S., Sumiharsono, R., Mardapi, D., & Prasetyo, Z. K. (2020). The Quality of Test Instruments Constructed by Teachers in Bima Regency, Indonesia: Document Analysis. International Journal of Instruction, 13(2), 507-518.
Rezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., ... & Adam, M. I. E. (2024). Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24(1), 445.
Seelawi, H., Tuffaha, I., Gzawi, M., Farhan, W., Talafha, B., Badawi, R., ... & Al-Natsheh, H. (2021, April). ALUE: Arabic language understanding evaluation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (pp. 173-184).
Sharma, B. (2016). A focus on reliability in developmental research through Cronbach’s Alpha among medical, dental and paramedical professionals. Asian Pacific Journal of Health Sciences, 3(4), 271-278.
Solichin, M. (2017). Analisis Daya Beda Soal, Taraf Kesukaran, Validitas Butir Tes, Interpretasi Hasil Tes dan Validitas Ramalan dalam Evaluasi Pendidikan. Dirasat: Jurnal Manajemen Dan Pendidikan Islam, 2(2), 192–213.
Talsma, K., Norris, K., & Schuz, B. (2020). First-year students' academic self-efficacy calibration: Differences by task type, domain specificity, student achievement level, and over time. Student Success, 11(2), 109-121.
DOI: http://dx.doi.org/10.35329/fkip.v21i2.6040
Article Metrics
Abstract views : 37 times |
PDF - 5 times
Alamat Penyunting & Distribusi:
Pepatudzu : Media Pendidikan dan Sosial Kemasyarakatan
Kantor LPPM Universitas Al Asyariah Mandar Gedung Rektorat Lt 1. Jl. Budi Utomo No.2 Manding. Kec. Polewali, Kab. Polewali Mandar, Prov. Sulawesi Barat
Telp./Fax (0428) 21038
Email: pepatudzujurnal@gmail.com
Website: https://journal.lppm-unasman.ac.id/index.php/apepatudzu/index
View Journal | Current Issue | Register
Penerbit:
Lembaga Penelitian dan Pengabdian Masyarakat Universitas Al Asyariah Mandar
Indexed by:

Member of:
Mitra Asosiasi:

Pepatudzu is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





