PreprintPDF Available

Item Response Theory Validation of Social Studies Aptitude Test

Authors:
  • Delta State University of Science and Technology, Ozoro, Delta State, Nigeria
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This study assessed the psychometric properties of the Social Studies Aptitude Test (SSAT) using the 3-Parameter Item Response Theory Model. Four research questions guided the study. A 100-item multiple choice SSAT, developed by Jessa, et al. (2023) was used as an instrument for the study. The data were collated and analysed using chi-square goodness of fit and factor analysis. The findings revealed that all the 100 items measured a single construct; that most of the items (94 out of 100) were either satisfactory (need no revision), good or moderate (needs little or no revision); most of the items (89 out of 100) were either very easy or easy; and most of the items (73 out of 100) are not susceptible to guessing. The study recommended amongst others, that the developed SSAT should be used by Social Studies teachers for the assessment of secondary school students, especially during mock examinations.
Content may be subject to copyright.
Page 1/25
Item Response Theory Validation of Social Studies
Aptitude Test
Morrison Jessa ( teezednigeria@gmail.com )
Delta State University, Abraka Nigeria https://orcid.org/0009-0000-0559-9188
Patrick Osadebe
Delta State University, Abraka Nigeria
Kingsley Ashibuogwu
Delta State University of Science and Technology, Ozoro
Research Article
Keywords: Item Response Theory, Social Studies Aptitude Test, 3-parameter, Validation, Psychometrics
Posted Date: December 29th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-3800666/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. 
Read Full License
Additional Declarations: The authors declare no competing interests.
Page 2/25
Abstract
This study assessed the psychometric properties of the Social Studies Aptitude Test (SSAT) using the 3-
Parameter Item Response Theory Model. Four research questions guided the study. A 100-item multiple
choice SSAT, developed by Jessa, et al. (2023) was used as an instrument for the study. The data were
collated and analysed using chi-square goodness of t and factor analysis. The ndings revealed that all
the 100 items measured a single construct; that most of the items (94 out of 100) were either satisfactory
(need no revision), good or moderate (needs little or no revision); most of the items (89 out of 100) were
either very easy or easy; and most of the items (73 out of 100) are not susceptible to guessing. The study
recommended amongst others, that the developed SSAT should be used by Social Studies teachers for
the assessment of secondary school students, especially during mock examinations.
Introduction
Between 2014 and 2022 after the revision of the curriculum of basic education in the nation’s education
system, it is expected that test items for all subjects will reect changes inherent in the revised
curriculum. However, from available data, the researcher observed that questions used in social studies
Basic Education Certicate Examination in Delta State from 2016 to 2019 did not measure all the skills
identied in the revised basic education curriculum. This has serious implication for the achievement of
the objectives guiding the revised curriculum. It also has an implication on the achievement of the goals
of Education for Sustainable Development (ESD), which emphasise the development of knowledge, skills,
values and attitudes needed for a more sustainable and just society for all. Given the above, the aim of
this study is to assess the psychometric properties of the social studies aptitude test (SSAT) developed
by Jessa, et al. (2023).
Aligned with the study's objectives, the research was grounded in the Item Response Theory (IRT) of
Measurement. Item Response Theory, also called latent response theory, comprises a set of mathematical
models designed to elucidate the connection between unobservable traits (latent characteristics or
attributes) and their expressions (such as observed outcomes, responses, or performance). These models
establish a relationship among the features of items in an instrument, individuals responding to these
items, and the underlying trait being measured. IRT posits that both the latent construct (e.g., stress,
knowledge, attitudes) and the items of a measure are arranged along an imperceptible continuum.
Consequently, its primary aim is to determine an individual's position on that continuum (Osadebe &
Jessa, 2018).
IRT originated in the 1950s and 1960s through the efforts of Frederic Lord and other psychometricians
(Lord, 1952; Lord & Novick, 1968) with the objective of creating a methodology capable of assessing
respondents without relying on the inclusion of identical test items (Hambleton & Jodoin, 2003). As a
result, IRT evolved from classical measurement theory to address several of its limitations (Hambleton,
1994). This statistical theory encompasses various mathematical models characterized by: a) the ability
to predict individual scores based on latent traits or abilities, and b) the establishment of a connection
Page 3/25
between an individual's item performance and the underlying traits through a function known as the "item
characteristic curve" (Hambleton, Swaminathan & Rogers, 1991). These features are achievable because
IRT models maintain item and ability parameter consistency for both test items and individuals when the
relevant IRT model accurately ts the available test data. In simpler terms, the same items used across
different samples will retain their statistical properties (such as diculty and discrimination), and
individuals' scores representing the ability or latent traits on a specic construct will not be contingent on
the specic test items administered.
Item response theory (IRT) rests on two basic postulates: (a) The performance of an examinee on a test
item can be predicted (or explained) by a set of factors called traits, latent traits, or abilities; and (b) the
relationship between examinees' item performance and the set of traits underlying item performance can
be described by a monotonically increasing function called an item characteristic function or item
characteristic curve (ICC). This function species that as the level of the trait increases, the probability of
a correct response to an item increase.
IRT models include a set of assumptions about the data to which the model is applied. Although the
viability of assumptions cannot be determined directly some indirect evidence can be collected and
assessed and the overall t of the model to the test data can be assessed as well. Four distinct
assumptions dominate the practice of Item Response Theory (IRT). These assumptions include
unidimensionality assumption, assumption of local independence, assumption of monotonicity and
assumption of speededness.Three common IRT models for binary data are the one, two and three-
parameter logistic models. They differentiate themselves for the number of parameters estimated. Before
explaining what the 3-Parameter logistic model entails, there is a need to provide a short examination of
the 1-Parameter and 2-Parameter logistic models.
Research Questions
The following research questions guided the study:
1. To what extent does the SSAT meet the requirement of unidimensionality?
2. To what extent does the SSAT meet the requirement of a-parameter?
3. To what extent does the SSAT meet the requirement of b-parameter?
4. To what extent does the SSAT meet the requirement of c-parameter?
Methods
Design
This study adopted an instrumentation research design. This design is suitable for the study because the
intention of the researcher was geared towards the assessment of the psychometric properties of the
social studies aptitude test.
Page 4/25
Participants Selection
A total of 1,000 students precipitated in the study. The choice of the sample size was based on the
recommendation of Lord (1968), that a minimum of 50 items and 1,000 examinees are required to
estimate an
a
-parameter with high accuracy. A total of 40 students in each local government area of the
state were selected to make a total of 1,000 students. This was done through simple random and cluster
sampling techniques. In this case, the schools in each Local Government Area of the state were treated as
clusters, such that the researcher randomly selected one school in each Local Government Area to make
a total of 25 schools. This was done through a simple random sampling technique of the balloting
method. Using this procedure, the researcher wrote the name of all the schools in each local government
area on pieces of paper, folded and poured them into a container. He then shued them and picked one
piece of paper from the container. Schools picked from this process were the selected school in that
Local Government Area. This was done for all Local Government Areas until all the 25 schools (one for
each Local Government Area) were selected.
The above procedure produced 25 clusters, one for each Local Government Area. For each cluster, the
researcher randomly selected one classroom out of the various classrooms in Basic 3. All the students in
the selected classroom were used for the study, because they were up to 40 students in the selected
school.
Measure
The Social Studies Aptitude Test developed by Jessa, et al. (2023) was used as instrument for the study.
The test comprised 100 multiple-choice items which were derived from the Basic 3 social studies
Syllabus. The syllabus was obtained from the Ministry of Basic and Secondary Education, Asaba, Delta
State. The items in the test comprise 5 options; one key and four distracters.
The social studies aptitude test was administered to the students directly by the researcher, with the help
of 5 research assistants, who were trained on the purpose of the study. The research team visited the
schools personally before the testing date to make their intention known to the principal or head of the
school and to obtain permission. The research assistants were briefed on the purpose of the study and
how to approaches testees. The data were collected on the spot from the respondents.
Data Analysis
The data obtained were collated, coded and entered into a computer system through the Statistical
Package for Social Science (SPSS), version 26. Factor analysis using Principal Component Analysis of
the varimax method was used to answer research question 1. The a, b and c parameters item response
theory dichotomous models were used to answer research question 2, 3 and 4 respectively. The
researcher internally modied the Statistical Package for Social Science (SPSS) version 26, by installing
and conguring R extensions and plug-ins as well as integrating the R software into the SPSS. This
allowed the SPSS to behave differently from its original purpose of statistical analysis tools. The
remodelled SPSS was then used to answer the research questions 2–4. These research questions were
Page 5/25
answered using three different Parameter Logistic Models, that is, the 1- Parameter Logistic Model (1-
PLM) for research question 2, the 2-Parameter Logistic Model (2-PLM) for research question 3, and the 3-
Parameter Logistic Model (3-PLM) for research question 4.
Results
Research Question 1
To what extent does the SSAT meet the requirement of unidimensionality?
In answering the above research question, the data obtained from the eld were subjected to a factor
analysis using principal component analysis of the varimax method. Using Guttman-Kaiser rule, all
factors with eigenvalues greater than 1 were retained as the factor that the items measure. Analysis of
the scree plot as shown in Fig.2 shows the underlining construct or unidimensionality of the SSAT. The
SPSS statistical software was used in the analysis.
Figure 2 shows the scree plot for the SSAT. From the gure, a careful examination of the scree plot shows
that there is only one construct before the breaking point or elbow joint. This therefore succinctly shows
the unidimensionality of the underlining construct of the SSAT. All the items measure one construct.
Research Question 2
To what extent does the SSAT meet the requirement of a-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the a-parameter. In analysing the data, the a-parameter item response theory dichotomous model was
utilised. The aim is to determine the discrimination level of the items in the SSAT. The result is shown in
Table3.
Page 6/25
Table 3
a-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT40 9.620 12.574 .765 Satisfactory (No Revision Required)
SSAT56 4.999 .710 7.045 Satisfactory (No Revision Required)
SSAT88 4.819 .539 8.934 Satisfactory (No Revision Required)
SSAT55 3.829 .507 7.549 Satisfactory (No Revision Required)
SSAT68 3.529 .518 6.811 Satisfactory (No Revision Required)
SSAT23 3.429 .703 4.875 Satisfactory (No Revision Required)
SSAT48 3.031 .502 6.039 Satisfactory (No Revision Required)
SSAT78 3.029 .588 5.153 Satisfactory (No Revision Required)
SSAT61 2.900 .386 7.511 Satisfactory (No Revision Required)
SSAT77 2.874 .341 8.420 Satisfactory (No Revision Required)
SSAT4 2.797 .584 4.788 Satisfactory (No Revision Required)
SSAT46 2.774 .710 3.909 Satisfactory (No Revision Required)
SSAT70 2.763 .382 7.235 Satisfactory (No Revision Required)
SSAT5 2.705 .406 6.656 Satisfactory (No Revision Required)
SSAT94 2.653 .481 5.515 Satisfactory (No Revision Required)
SSAT85 2.627 .318 8.262 Satisfactory (No Revision Required)
SSAT62 2.623 .308 8.526 Satisfactory (No Revision Required)
SSAT50 2.599 .350 7.426 Satisfactory (No Revision Required)
SSAT57 2.556 .269 9.490 Satisfactory (No Revision Required)
SSAT72 2.503 .260 9.625 Satisfactory (No Revision Required)
SSAT96 2.498 .263 9.489 Satisfactory (No Revision Required)
SSAT83 2.463 .328 7.514 Satisfactory (No Revision Required)
SSAT74 2.408 .303 7.947 Satisfactory (No Revision Required)
SSAT67 2.373 .283 8.372 Satisfactory (No Revision Required)
SSAT81 2.360 .334 7.062 Satisfactory (No Revision Required)
SSAT24 2.349 .401 5.860 Satisfactory (No Revision Required)
Criterion: See page 88
Page 7/25
Item Value Standard Error Z Remark
SSAT41 2.310 .268 8.635 Satisfactory (No Revision Required)
SSAT59 2.273 .348 6.528 Satisfactory (No Revision Required)
SSAT97 2.256 .560 4.030 Satisfactory (No Revision Required)
SSAT52 2.178 .292 7.452 Satisfactory (No Revision Required)
SSAT63 2.150 .219 9.817 Satisfactory (No Revision Required)
SSAT45 2.124 .609 3.486 Satisfactory (No Revision Required)
SSAT80 2.084 .235 8.879 Satisfactory (No Revision Required)
SSAT98 2.046 .254 8.069 Satisfactory (No Revision Required)
SSAT36 2.021 .236 8.546 Satisfactory (No Revision Required)
SSAT99 1.996 .312 6.386 Satisfactory (No Revision Required)
SSAT39 1.949 .276 7.054 Satisfactory (No Revision Required)
SSAT35 1.917 .223 8.592 Satisfactory (No Revision Required)
SSAT93 1.899 .229 8.308 Satisfactory (No Revision Required)
SSAT92 1.872 .270 6.941 Satisfactory (No Revision Required)
SSAT71 1.817 .271 6.700 Satisfactory (No Revision Required)
SSAT82 1.788 .262 6.822 Satisfactory (No Revision Required)
SSAT18 1.751 .379 4.620 Satisfactory (No Revision Required)
SSAT60 1.744 .274 6.370 Satisfactory (No Revision Required)
SSAT47 1.699 .385 4.411 Good (Little or no Revision Required)
SSAT73 1.671 .193 8.665 Good (Little or no Revision Required)
SSAT27 1.666 .444 3.751 Good (Little or no Revision Required)
SSAT19 1.609 .355 4.537 Good (Little or no Revision Required)
SSAT49 1.591 .246 6.481 Good (Little or no Revision Required)
SSAT91 1.547 .225 6.867 Good (Little or no Revision Required)
SSAT53 1.513 .687 2.203 Good (Little or no Revision Required)
SSAT44 1.505 .118 12.708 Good (Little or no Revision Required)
SSAT95 1.477 .236 6.269 Good (Little or no Revision Required)
Criterion: See page 88
Page 8/25
Item Value Standard Error Z Remark
SSAT11 1.476 .129 11.462 Good (Little or no Revision Required)
SSAT29 1.435 .120 11.921 Good (Little or no Revision Required)
SSAT16 1.433 .126 11.415 Good (Little or no Revision Required)
SSAT2 1.408 .145 9.698 Good (Little or no Revision Required)
SSAT33 1.407 .116 12.113 Good (Little or no Revision Required)
SSAT30 1.389 .289 4.813 Good (Little or no Revision Required)
SSAT79 1.388 .228 6.086 Good (Little or no Revision Required)
SSAT66 1.385 .231 6.006 Good (Little or no Revision Required)
SSAT6 1.373 .191 7.190 Good (Little or no Revision Required)
SSAT89 1.363 .193 7.078 Good (Little or no Revision Required)
SSAT54 1.346 .196 6.856 Good (Little or no Revision Required)
SSAT31 1.332 .214 6.212 Moderate (Little or no Revision Required)
SSAT87 1.323 .280 4.721 Moderate (Little or no Revision Required)
SSAT26 1.321 .190 6.954 Moderate (Little or no Revision Required)
SSAT43 1.321 .252 5.245 Moderate (Little or no Revision Required)
SSAT34 1.320 .271 4.874 Moderate (Little or no Revision Required)
SSAT38 1.298 .487 2.666 Moderate (Little or no Revision Required)
SSAT75 1.298 .253 5.136 Moderate (Little or no Revision Required)
SSAT25 1.292 .311 4.153 Moderate (Little or no Revision Required)
SSAT84 1.290 1.491 .865 Moderate (Little or no Revision Required)
SSAT7 1.266 .242 5.228 Moderate (Little or no Revision Required)
SSAT51 1.257 .255 4.933 Moderate (Little or no Revision Required)
SSAT17 1.225 .107 11.419 Moderate (Little or no Revision Required)
SSAT9 1.182 .104 11.317 Moderate (Little or no Revision Required)
SSAT15 1.179 .121 9.766 Moderate (Little or no Revision Required)
SSAT10 1.166 .102 11.465 Moderate (Little or no Revision Required)
SSAT13 1.117 .098 11.348 Moderate (Little or no Revision Required)
Criterion: See page 88
Page 9/25
Item Value Standard Error Z Remark
SSAT3 1.074 .107 10.027 Moderate (Little or no Revision Required)
SSAT22 1.074 .201 5.355 Moderate (Little or no Revision Required)
SSAT37 1.049 .196 5.363 Moderate (Little or no Revision Required)
SSAT1 1.040 .313 3.322 Moderate (Little or no Revision Required)
SSAT28 1.032 .095 10.883 Moderate (Little or no Revision Required)
SSAT69 1.006 .205 4.914 Moderate (Little or no Revision Required)
SSAT20 .929 .099 9.370 Moderate (Little or no Revision Required)
SSAT21 .906 .203 4.469 Moderate (Little or no Revision Required)
SSAT12 .800 .159 5.036 Moderate (Little or no Revision Required)
SSAT100 .746 .224 3.336 Moderate (Little or no Revision Required)
SSAT86 .671 .893 .751 Moderate (Little or no Revision Required)
SSAT58 .624 .079 7.878 Marginal (Needs Revision)
SSAT32 .494 .075 6.594 Marginal (Needs Revision)
SSAT64 .453 .080 5.660 Marginal (Needs Revision)
SSAT8 .214 .074 2.897 Poor (Should be Eliminated or Revised)
SSAT90 .166 .074 2.251 Poor (Should be Eliminated or Revised)
SSAT14 .139 .068 2.051 Poor (Should be Eliminated or Revised)
SSAT42 − .051 .068 − .757 Poor (Should be Eliminated or Revised)
SSAT76 − .318 .116 -2.734 Poor (Should be Eliminated or Revised)
SSAT65 − .987 .896 -1.101 Poor (Should be Eliminated or Revised)
Criterion: See page 88
As shown in Table3, the item discrimination index ranged from − 987 to 9.620, with a higher index
indicating a satisfactory item and a lower index indicating a poor item. From the result, items 8, 90, 14,
42, 76 and 65 had a discriminatory index of .214, .166, .139, − .051, − .318 and − .987, indicating poor
items that need to be eliminated or revised. Items 58, 32 and 64 had a discriminatory index of .624, .494
and .453, indicating marginal items that needs to be revised before they can be used. All other items are
either satisfactory (needing no revision), good or moderate (needing little or no revision). The distribution
can be visualized in Fig.3.
Research Question 3
Page 10/25
To what extent does the SSAT meet the requirement of b-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the b-parameter. In analysing the data, the b-parameter item response theory dichotomous model was
utilised. The aim is to determine the diculty level of the items in the SSAT. The result is shown in
Table4.
Page 11/25
Table 4
b-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT76 -7.702 2.720 -2.832 Very Easy
SSAT65 -4.971 2.668 -1.863 Very Easy
SSAT2 -1.999 .149 -13.417 Very Easy
SSAT15 -1.939 .161 -12.061 Very Easy
SSAT20 -1.797 .176 -10.202 Very Easy
SSAT3 -1.678 .150 -11.187 Very Easy
SSAT16 -1.369 .105 -12.998 Very Easy
SSAT43 -1.328 .484 -2.743 Very Easy
SSAT11 -1.326 .102 -12.970 Very Easy
SSAT1 -1.108 .914 -1.213 Very Easy
SSAT9 -1.033 .103 -10.056 Very Easy
SSAT17 -1.010 .100 -10.136 Very Easy
SSAT29 − .974 .089 -10.918 Very Easy
SSAT26 − .872 .259 -3.371 Very Easy
SSAT21 − .828 .610 -1.356 Very Easy
SSAT52 − .824 .148 -5.579 Very Easy
SSAT28 − .818 .103 -7.938 Very Easy
SSAT33 − .814 .084 -9.708 Very Easy
SSAT60 − .804 .219 -3.670 Very Easy
SSAT10 − .703 .090 -7.774 Very Easy
SSAT61 − .675 .107 -6.327 Very Easy
SSAT36 − .672 .122 -5.499 Very Easy
SSAT31 − .670 .276 -2.430 Very Easy
SSAT13 − .633 .090 -7.033 Very Easy
SSAT35 − .618 .127 -4.864 Very Easy
SSAT67 − .606 .105 -5.758 Very Easy
Criterion: See page 89
Page 12/25
Item Value Standard Error Z Remark
SSAT39 − .587 .159 -3.685 Very Easy
SSAT55 − .570 .077 -7.388 Very Easy
SSAT56 − .549 .070 -7.810 Very Easy
SSAT44 − .545 .071 -7.632 Very Easy
SSAT79 − .541 .248 -2.183 Very Easy
SSAT57 − .536 .081 -6.594 Very Easy
SSAT62 − .473 .091 -5.194 Very Easy
SSAT70 − .469 .108 -4.350 Very Easy
SSAT63 − .460 .083 -5.558 Very Easy
SSAT82 − .457 .167 -2.736 Very Easy
SSAT34 − .444 .328 -1.354 Very Easy
SSAT6 − .429 .193 -2.224 Very Easy
SSAT7 − .408 .311 -1.309 Very Easy
SSAT25 − .401 .394 -1.018 Very Easy
SSAT54 − .388 .195 -1.985 Very Easy
SSAT72 − .364 .073 -4.979 Very Easy
SSAT73 − .363 .117 -3.108 Very Easy
SSAT24 − .235 .142 -1.653 Very Easy
SSAT91 − .222 .159 -1.395 Very Easy
SSAT49 − .184 .169 -1.084 Very Easy
SSAT98 − .153 .104 -1.476 Very Easy
SSAT12 − .137 .383 − .358 Very Easy
SSAT80 − .125 .089 -1.408 Very Easy
SSAT22 − .096 .271 − .354 Very Easy
SSAT92 − .087 .129 − .678 Very Easy
SSAT50 − .086 .084 -1.020 Very Easy
SSAT74 − .077 .088 − .876 Very Easy
Criterion: See page 89
Page 13/25
Item Value Standard Error Z Remark
SSAT93 − .077 .101 − .768 Very Easy
SSAT41 − .053 .084 − .631 Very Easy
SSAT89 .011 .144 .078 Easy
SSAT77 .024 .067 .365 Easy
SSAT96 .252 .057 4.396 Easy
SSAT81 .293 .085 3.459 Easy
SSAT48 .373 .077 4.828 Easy
SSAT88 .385 .040 9.676 Easy
SSAT19 .431 .174 2.474 Easy
SSAT30 .452 .169 2.672 Easy
SSAT85 .469 .053 8.828 Easy
SSAT58 .477 .131 3.644 Easy
SSAT71 .490 .089 5.529 Easy
SSAT4 .502 .099 5.070 Easy
SSAT37 .504 .172 2.936 Easy
SSAT32 .535 .166 3.219 Easy
SSAT59 .663 .074 8.988 Easy
SSAT75 .738 .131 5.650 Easy
SSAT68 .740 .055 13.526 Easy
SSAT27 .783 .161 4.875 Easy
SSAT83 .822 .063 13.005 Easy
SSAT66 .843 .106 7.977 Easy
SSAT100 .908 .324 2.799 Easy
SSAT42 .912 10.803 .084 Easy
SSAT23 .951 .071 13.392 Easy
SSAT51 .956 .126 7.566 Easy
SSAT5 .960 .066 14.612 Easy
Criterion: See page 89
Page 14/25
Item Value Standard Error Z Remark
SSAT69 .969 .164 5.926 Easy
SSAT47 1.010 .126 7.991 Dicult
SSAT95 1.057 .104 10.206 Dicult
SSAT97 1.142 .107 10.653 Dicult
SSAT99 1.174 .086 13.620 Dicult
SSAT94 1.243 .077 16.149 Dicult
SSAT87 1.248 .119 10.461 Dicult
SSAT18 1.316 .122 10.758 Dicult
SSAT14 1.455 .931 1.563 Dicult
SSAT78 1.594 .091 17.437 Dicult
SSAT46 1.780 .130 13.651 Dicult
SSAT64 1.829 .340 5.382 Dicult
SSAT40 2.108 .109 19.254 Very Dicult
SSAT45 2.240 .254 8.809 Very Dicult
SSAT38 2.464 .425 5.801 Very Dicult
SSAT53 2.682 .600 4.473 Very Dicult
SSAT8 3.515 1.243 2.828 Very Dicult
SSAT90 4.192 1.924 2.179 Very Dicult
SSAT84 4.298 3.102 1.385 Very Dicult
SSAT86 5.236 4.589 1.141 Very Dicult
Criterion: See page 89
As shown in Table4, the item diculty index ranged from − 7.702 to 5.236, with a higher index indicating
a very dicult item and a lower index indicating a very easy item. From the result, items 40, 45, 38, 53, 8,
90, 84 and 86 had a diculty index of 2.108, 2.240, 2.464, 2.682, 3.515, 4.192, 4.298 and 5.236
respectively, indicating that the items are very dicult. 47, 95, 97, 99, 94, 87, 18, 14, 78, 46 and 64 had a
diculty index of 1.010, 1.057, 1.142, 1.174, 1.243, 1.248, 1.316, 1.455, 1.594, 1.780 and 1.829,
respectively, indicating that the items are dicult. Other items were either very easy or easy. Having
obtained a diculty index that ranged from − 7.702 for item 76 to 0.969 for item 69. The distribution can
be visualized in Fig.4.
Page 15/25
Research Question 4
To what extent does the SSAT meet the requirement of c-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the c-parameter. In analysing the data, the c-parameter item response theory dichotomous model was
utilised. The aim is to determine the guessing level of the items in the SSAT. The result is shown in
Table5.
Page 16/25
Table 5
c-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT2 .000 .011 .009 Not Guessable
SSAT3 .000 .004 .006 Not Guessable
SSAT8 .000 .007 .018 Not Guessable
SSAT9 .000 .000 .001 Not Guessable
SSAT10 .000 .001 .003 Not Guessable
SSAT11 .000 .000 .002 Not Guessable
SSAT13 .000 .001 .003 Not Guessable
SSAT14 .000 .021 .017 Not Guessable
SSAT15 .000 .000 .002 Not Guessable
SSAT16 .000 .000 .001 Not Guessable
SSAT17 .000 .001 .004 Not Guessable
SSAT20 .000 .001 .004 Not Guessable
SSAT28 .000 .003 .007 Not Guessable
SSAT29 .000 .000 .001 Not Guessable
SSAT32 .000 .002 .005 Not Guessable
SSAT33 .000 .000 .001 Not Guessable
SSAT44 .000 .000 .003 Not Guessable
SSAT58 .000 .002 .005 Not Guessable
SSAT64 .000 .000 .002 Not Guessable
SSAT76 .000 .000 .001 Not Guessable
SSAT90 .001 .046 .025 Not Guessable
SSAT73 .010 .048 .201 Not Guessable
SSAT43 .011 .256 .042 Not Guessable
SSAT42 .013 .272 .047 Not Guessable
SSAT37 .024 .064 .379 Not Guessable
SSAT22 .025 .103 .242 Not Guessable
Criterion: c > 0.20
Page 17/25
Item Value Standard Error Z Remark
SSAT36 .031 .058 .535 Not Guessable
SSAT6 .033 .083 .394 Not Guessable
SSAT12 .039 .120 .327 Not Guessable
SSAT63 .039 .033 1.159 Not Guessable
SSAT72 .054 .031 1.732 Not Guessable
SSAT85 .057 .021 2.721 Not Guessable
SSAT26 .073 .113 .645 Not Guessable
SSAT45 .073 .013 5.498 Not Guessable
SSAT89 .073 .056 1.304 Not Guessable
SSAT66 .076 .038 2.013 Not Guessable
SSAT96 .077 .023 3.352 Not Guessable
SSAT78 .086 .014 6.233 Not Guessable
SSAT87 .089 .036 2.477 Not Guessable
SSAT71 .091 .037 2.501 Not Guessable
SSAT51 .093 .046 2.005 Not Guessable
SSAT57 .103 .037 2.816 Not Guessable
SSAT75 .103 .050 2.061 Not Guessable
SSAT21 .105 .209 .504 Not Guessable
SSAT88 .106 .019 5.701 Not Guessable
SSAT100 .106 .100 1.057 Not Guessable
SSAT69 .112 .051 2.174 Not Guessable
SSAT50 .117 .039 3.043 Not Guessable
SSAT65 .119 .019 6.388 Not Guessable
SSAT95 .119 .028 4.184 Not Guessable
SSAT35 .121 .058 2.090 Not Guessable
SSAT79 .124 .102 1.222 Not Guessable
SSAT91 .127 .065 1.951 Not Guessable
Criterion: c > 0.20
Page 18/25
Item Value Standard Error Z Remark
SSAT99 .127 .022 5.714 Not Guessable
SSAT67 .136 .051 2.675 Not Guessable
SSAT83 .139 .022 6.428 Not Guessable
SSAT93 .141 .043 3.247 Not Guessable
SSAT38 .142 .026 5.405 Not Guessable
SSAT84 .143 .019 7.722 Not Guessable
SSAT40 .145 .013 11.331 Not Guessable
SSAT80 .148 .038 3.892 Not Guessable
SSAT94 .151 .021 7.142 Not Guessable
SSAT5 .152 .022 6.987 Not Guessable
SSAT46 .158 .018 8.941 Not Guessable
SSAT54 .166 .074 2.233 Not Guessable
SSAT62 .167 .044 3.846 Not Guessable
SSAT86 .168 .044 3.857 Not Guessable
SSAT41 .175 .037 4.709 Not Guessable
SSAT77 .177 .033 5.430 Not Guessable
SSAT98 .177 .046 3.878 Not Guessable
SSAT7 .180 .116 1.556 Not Guessable
SSAT55 .184 .040 4.637 Not Guessable
SSAT59 .192 .029 6.588 Not Guessable
SSAT74 .201 .040 5.042 Guessable
SSAT92 .204 .054 3.802 Guessable
SSAT34 .206 .122 1.681 Guessable
SSAT81 .207 .036 5.696 Guessable
SSAT82 .214 .071 3.039 Guessable
SSAT49 .217 .065 3.317 Guessable
SSAT56 .221 .038 5.755 Guessable
Criterion: c > 0.20
Page 19/25
Item Value Standard Error Z Remark
SSAT53 .222 .023 9.841 Guessable
SSAT31 .230 .104 2.215 Guessable
SSAT68 .234 .023 10.223 Guessable
SSAT30 .235 .060 3.949 Guessable
SSAT52 .237 .070 3.384 Guessable
SSAT18 .241 .030 8.066 Guessable
SSAT39 .262 .069 3.830 Guessable
SSAT48 .278 .034 8.125 Guessable
SSAT60 .282 .090 3.125 Guessable
SSAT61 .282 .052 5.385 Guessable
SSAT25 .292 .130 2.244 Guessable
SSAT23 .318 .026 12.157 Guessable
SSAT19 .329 .058 5.662 Guessable
SSAT47 .331 .038 8.835 Guessable
SSAT70 .336 .049 6.859 Guessable
SSAT24 .368 .055 6.651 Guessable
SSAT97 .373 .031 12.142 Guessable
SSAT1 .375 .260 1.441 Guessable
SSAT27 .408 .048 8.508 Guessable
SSAT4 .444 .035 12.597 Guessable
Criterion: c > 0.20
As shown in Table5, the item guessing index ranged from .000 to .444, with a higher index indicating a
guessable item and a lower index indicating not guessable item. The recommended range should be
between 0.00 and 0.20. this is because the SSAT that has 5 alternatives, a low examinee should have 1/5 
= 0.20 chance of guessing the correct answer. Since c = 0.20 for this 5-alternative item, once the right key
is isolated, the examinees will be guessing among the remaining four options. From the result, 27 items
had a guessing index above the recommended 0.20, meaning that they are guessable. Other items (73
out of 100) had a guessing index that ranged from 0.000 to 0.192, indicating that they are not guessable.
The distribution can be visualized in Fig.5.
Page 20/25
Discussion
Index of Unidimensionality of the Items in the Social Studies Aptitude Test (SSAT)
The rst nding revealed that allthe 100 itemsmeasured a single construct, as shown in the scree plot.
The model assumes that there is one dominant latent trait being measured by the test and that this trait
is the driving force for the responses observed for each item in the Social Studies Aptitude Test.It is
commonly assumed that only one ability or trait is necessary to "explain," or "account" for examinee test
performance. Item response models that assume a single latent ability are referred to as
unidimensional.Several researchers have used factor analysis to determine the unidimensionality of a
test and were successful. For instance, Kpolovie and Emekene (2016) validated the advanced progressive
matrices for Nigerian sample using Item Response Theory. They used factor analysis to determine the
unidimensionality of the scale and found that the unidimensionality of the underlining construct of the
APM scale, namely intelligence or uid ability and that all 36 items of the scale measure one construct,
the uid ability of the test taker as conrmed by the scree plot. They concluded that all the items APM
unquestionably measure just one general intelligence factor in Nigeria just as it does in all other countries
that the test is actively in use.
a-Parameter Index for each Item in the Social Studies Aptitude Test (SSAT)
The second nding revealed that most of the items (94 out of 100) were either satisfactory (need no
revision), good or moderate (needs little or no revision). This means that the items have the tendency to
discriminate well between high achievers and low achievers. The above nding agrees with the study of
Kpolovie and Emekene (2016), who used item response theory to validate the Raven's Advanced
Progressive Matrices (APM). The authors found that all items of the test yield favourable statistics under
3-Parameter Logistic IRT Model with regards to discrimination, diculty and guessing.
b-Parameter Index (Item Diculty Parameter) for each Item of the Social Studies Aptitude Test (SSAT)
The third nding revealed that most of the items (89 out of 100) were either very easy or easy. The rest 19
are either very dicult (8 out of 100) or diculty (11 out of 100). This means that the test items are
within the ability level of the students.The term item diculty is used in the education eld to describe
how dicult it is to achieve a 0.5 probability of a correct response for a specic item given the
respondent’s level of the latent variable (theta). Therefore, the more dicult it is for a student to have a
50% chance of correctly answering an item, the higher the ability level needed to achieve this
goal.According to Kpolovie and Emekene (2016), the diculty index ranges in theory from negative to
positive innity, but in practice from -3.0 (very easy) to +3.0 (very dicult). The author reported a similar
nding, the b-parameter is related to the classical P statistic, as items with low P values will tend to have
higher (more positive) b-parameters and items with high P values will tend to have lower (more negative)
b-parameters.
Page 21/25
c-Parameter Index for each Item (Item Discriminatory Parameter) of the Social Studies Aptitude Test
(SSAT)
The fourth nding revealed that most of the items (73 out of 100) are not guessable. Meaning that the
instrument is not subject to guessing tendency, as it meets the assumption of item response theory in
terms of the c-parameter. The c-parameter equals the probability of an examinee of innitely low θ
obtaining a correct response due to guessing. Thus, c is also the lower asymptote of the IRF. From the
nding of the study, most of the items (73 out of 100) had a guessing index that ranged from 0.000 to
0.192, indicating that they are not guessable. Therefore, the degree of guessing can be said to be low
amongst the students for whom the test was developed.
The above nding was supported by Kamiri (2010) that the lowest c-values, the better indicating a lower
probability of getting the answer correct by mere guessing of low ability examinees. Harris (2005)
concluded that the items with 0.30 or greater c-values are considered not very good, rather c-values of
0.20 or lower are desirable. In like manner, Akindele (2003) also noted that items do not have perfect c-
values because examinees do not guess randomly when they do not know the answer.
The nding is also in line with Ani (2014), who applied item response theory in the development and
validation of multiple-choice test in Economics, and found that 49 out of 50 items of the multiple-choice
question in Economics were reliable based on three parameter model (3pl) model. The nding also
agreed with Kpolovie and Emekene (2016), who applied item response theory to validate the advanced
progressive matrices in Nigeria, and found that all items of the test yield favourable statistics under 3-
Parameter Logistic IRT Model with regard to discrimination, diculty and guessing.
Conclusion and Recommendations
Based on the ndings, it can be concluded that the Social Studies Aptitude Test items have a good
psychometric property and can therefore be used for the assessment of Upper Basic School students in
the cognitive domain. The test is unidimensional in nature, hence, measure a single trait.Based on the
ndings from this study the following recommendations were made:
1. The developed Social Studies Aptitude Test should be used by Social Studies teachers for the
assessment of secondary school students, especially during mock examination, in preparation for
external examinations;
2. The test should be added to the already existing item bank domiciled in the Ministry of Basic and
Secondary Education, since the psychometric properties of the test has been shown to be sound;
3. Most of the items should be modied in terms of their diculty, discrimination and guessing power
so that they will be useful in the assessment of Upper Basic School students.
Declarations
Page 22/25
The participants for the study were asked to give their informed consent to participate in the study. They
willingly provided the informed consent.
References
1. Akindele, B. P. (2003).
The development of an item bank for selection tests into Nigerian universities:
An exploratory study.
Unpublished doctoral dissertation, University of Ibadan, Nigeria.
2. Ani, E. N. (2014).
Application of item response theory in the development and validation of multiple-
choice test in economics.
Unpublished M.Ed. Dissertation, University of Nigeria, Nsukka.
3. Bloom, B. S. (1956).
Taxonomy of educational objectives.
New York: David Mckay.
4. Field, A. (2013).
Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock “N”
Roll,
4th Edition, Sage, Los Angeles, London, New Delhi.
5. Hambleton, R. K. (1994). Item response theory: A broad psychometric framework for measurement
advances.
Psicothema
, 6(3), 535–556.
. Hambleton, R. K., & Jodoin, M. (2003). ‘Item response theory: models and features.’ In R. Fernández-
Ballesteros (Ed.),
Encyclopaedia of Psychological Assessment.
London: Sage; p. 509–14.
7. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory.
Vol 2. Sage Publications, Incorporated.
. Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short-Length Tests
and Computerized Adaptive Tests.
Applied Psychological Measurement,
40(4), 289–301.
https://doi.org/10.1177/0146621616631317
9. Harris, D. (2005). Educational measurement issues and practice: comparison of 1-, 2-, and 3-
parameter IRT models. DOI: 10.1111/j.1745-3992.1989.tb00313.x.
10. Jessa, M. O., Odili, J. N., & Osadebe, P. U. (2023). Development of Social Studies Aptitude Test for
Testing Critical Thinking Skills: Implication for the Achievement of Education for Sustainable
Development (ESD).
Canadian Journal of Educational and Social Studies,
3(4), 99-119
11. Kamiri, H. (2010).
A Differential Item Functioning analysis of a language prociency test: An
investigation of background knowledge bias.
Unpublished Masters Thesis. University of Tehran, Iran.
12. Kpolovie, P. J. (2014).
Test, measurement and evaluation in education
(2nd edition). New Owerri:
Spring Field Publishers Ltd.
13. Kpolovie, P. J., & Emekene, C.O. (2016) Psychometric advent of advanced progressive matrices- smart
version (APM-SV) for use in Nigeria.
European Journal of Statistics and Probability,
4(3), 20-30.
14. Lord, F. M. (1952). A theory of test scores.
Psychometric Monograph
.
15. Lord, F. M. (1953). An application of condence intervals and of maximum likelihood to the
estimation of an examinee's ability.
Psychometrika,
18, 57-75.
1. Lord, F. M., & Novick, M. R. (1968).
Statistical theories of mental test scores.
Reading Mass: Addison-
Wesley.
Page 23/25
17. Lord, T., & Baviskar, S. (2007). Moving students from information recitation to information
understanding: exploiting Bloom’s taxonomy in creating science questions.
Journal of College
Science Teaching,
5, 40-44
1. Osadebe, P. U., & Jessa, M. O. (2018). Development of social studies achievement testfor
assessment of secondary school students. 
European Journal of Open Education and E-Learning
Studies,
3(1), 104-124.
Figures
Figure 1
Figure 2: Scree Plot for the SSAT
Page 24/25
Figure 2
Figure 3: Histogram showing the discriminatory index of the SSAT
Figure 3
Page 25/25
Figure 4: Histogram showing the diculty index of the SSAT
Figure 4
Figure 5: Histogram showing the guessing index of the SSAT
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This study developed a social studies aptitude test (SSAT) that captured and measured critical thinking skills for the achievement of education for sustainable development. Eight research questions guided the study. The instrumentation research design was adopted for the study. The population comprised 72,854 Upper Basic School students, with a total of 1,000 sample size selected using simple random and cluster sampling techniques. A 100-item multiple choice Social Studies Aptitude Test (SSAT) was developed by the researcher, which was used as instrument for the study. The data were collated, entered into a computer system and analysed using chi-square goodness of fit, frequency, percentage, Item Characteristics Curve and factor analysis. The findings revealed that most of the items in the SSAT have a good fit; that the test questions measured the skills of enquiry, intellectual, manipulative and societal values; that the test items are distributed according to Upper Basic Education curriculum; and that the Social Studies Aptitude Test is reliable, having obtained 0.89 index. The study also found that all the 100 items measured a single construct; that most of the items (94 out of 100) were either satisfactory (need no revision), good or moderate (needs little or no revision); that most of the items (89 out of 100) were either very easy or easy; and most of the items (73 out of 100) are not susceptible to guessing. Based on the findings, the study concluded that the Social Studies Aptitude Test items are valid and reliable. The study recommended amongst others, that the developed Social Studies Aptitude Test should be used by Social Studies teachers for the assessment of secondary school students, especially during mock examination, in preparation for external examinations. The study has contributed to knowledge by providing a test that measures the objectives of the revised upper basic education curriculum as well as the achievement of Education for Sustainable Development (ESD).
Article
Full-text available
The Raven's Advanced Progressive Matrices (APM) test is a leading global non-verbal measure of mental ability, helping to identify individuals with advanced observation and clear thinking skills who can handle rigorous study programmes as well as the complexity and ambiguity of the modern workplace. APM scale is largely employed by researchers and practitioners in the field of psychometrics, education, medicine and the social sciences. A sample of 3100 participants in Nigeria was randomly drawn to answer nine research questions. Triangulation research design, adopting item response theory (IRT) guided the study. The study developed an abridged form of the APM dubbed Advanced Progressive Matrices-Smart Version (APM-SV). Results revealed that all 15 items of the APM-SV test yield favourable statistics under 3-Parameter Logistic IRT Model with regards to item discrimination, difficulty and guessing. Item Response Function showed preponderance of APM-SV's reliability of 0.92. The APM-SV showed perfect fit, is bias-free and very suitable for use in Nigeria. APM-SV scale strongly and positively correlated well with other measures of fluid ability such as the APM scale itself, CFIT, Digit Span scale, and Bennett Mechanical Comprehension Test (BMCT).
Article
Full-text available
A critical shortcoming of the maximum likelihood estimation (MLE) method for test score estimation is that it does not work with certain response patterns, including ones consisting only of all 0s or all 1s. This can be problematic in the early stages of computerized adaptive testing (CAT) administration and for tests short in length. To overcome this challenge, test practitioners often set lower and upper bounds of theta estimation and truncate the score estimation to be one of those bounds when the log likelihood function fails to yield a peak due to responses consisting only of 0s or 1s. Even so, this MLE with truncation (MLET) method still cannot handle response patterns in which all harder items are correct and all easy items are incorrect. Bayesian-based estimation methods such as the modal a posteriori (MAP) method or the expected a posteriori (EAP) method can be viable alternatives to MLE. The MAP or EAP methods, however, are known to result in estimates biased toward the center of a prior distribution, resulting in a shrunken score scale. This study introduces an alternative approach to MLE, called MLE with fences (MLEF). In MLEF, several imaginary “fence” items with fixed responses are introduced to form a workable log likelihood function even with abnormal response patterns. The findings of this study suggest that, unlike MLET, the MLEF can handle any response patterns and, unlike both MAP and EAP, results in score estimates that do not cause shrinkage of the theta scale.
Article
Full-text available
Measurement theory and practice have changed considerably in the last 25 years. For many assessment specialists to-day, item response theory (IRT) has replaced classical measurement theory as a framework for test development, scale construction, score reporting, and test evaluation. The most popular of the item response models for multiple-choice tests are the one-parameter (i.e., the Rasch) and three-parameter models. Some researchers have been quite adamant about using only the one-parameter model and have been rather critical of applications of multi-parameter models such as the three-parameter model. In this paper, nine arguments are offered for continuing research and applying multi-parameter IRT models. Also, the position is taken that both single and multi-parameter IRT models (and many others) have potentially important roles to play in the advancement of measurement practice and judgments about which models to use in particular situations should depend on model fit to the test data
Article
Australian and New Zealand environmental economists have played a significant role in the development of concepts and their application across three fields within their subdiscipline: non-market valuation, institutional economics and bioeconomic modelling. These contributions have been spurred on by debates within and outside the discipline. Much of the controversy has centred on the validity of valuations generated through the application of stated preference methods such as contingent valuation. Suggestions to overcome some shortcomings in the work of environmental economists include the commissioning of a sequence of non-market valuation studies to fill existing gaps to improve the potential for benefit transfer. Copyright 2005 Australian Agricultural and Resource Economics Society Inc. and Blackwell Publishing Asia Pty Ltd..
The development of an item bank for selection tests into Nigerian universities: An exploratory study. Unpublished doctoral dissertation
  • B P Akindele
Akindele, B. P. (2003). The development of an item bank for selection tests into Nigerian universities: An exploratory study. Unpublished doctoral dissertation, University of Ibadan, Nigeria.
Application of item response theory in the development and validation of multiplechoice test in economics. Unpublished M
  • E N Ani
Ani, E. N. (2014). Application of item response theory in the development and validation of multiplechoice test in economics. Unpublished M.Ed. Dissertation, University of Nigeria, Nsukka.
Item response theory: models and features
  • R K Hambleton
  • M Jodoin
Hambleton, R. K., & Jodoin, M. (2003). 'Item response theory: models and features.' In R. Fernández-Ballesteros (Ed.), Encyclopaedia of Psychological Assessment. London: Sage; p. 509-14.