Available via license: CC BY 4.0
Content may be subject to copyright.
Page 1/25
Item Response Theory Validation of Social Studies
Aptitude Test
Morrison Jessa ( teezednigeria@gmail.com )
Delta State University, Abraka Nigeria https://orcid.org/0009-0000-0559-9188
Patrick Osadebe
Delta State University, Abraka Nigeria
Kingsley Ashibuogwu
Delta State University of Science and Technology, Ozoro
Research Article
Keywords: Item Response Theory, Social Studies Aptitude Test, 3-parameter, Validation, Psychometrics
Posted Date: December 29th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-3800666/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Additional Declarations: The authors declare no competing interests.
Page 2/25
Abstract
This study assessed the psychometric properties of the Social Studies Aptitude Test (SSAT) using the 3-
Parameter Item Response Theory Model. Four research questions guided the study. A 100-item multiple
choice SSAT, developed by Jessa, et al. (2023) was used as an instrument for the study. The data were
collated and analysed using chi-square goodness of t and factor analysis. The ndings revealed that all
the 100 items measured a single construct; that most of the items (94 out of 100) were either satisfactory
(need no revision), good or moderate (needs little or no revision); most of the items (89 out of 100) were
either very easy or easy; and most of the items (73 out of 100) are not susceptible to guessing. The study
recommended amongst others, that the developed SSAT should be used by Social Studies teachers for
the assessment of secondary school students, especially during mock examinations.
Introduction
Between 2014 and 2022 after the revision of the curriculum of basic education in the nation’s education
system, it is expected that test items for all subjects will reect changes inherent in the revised
curriculum. However, from available data, the researcher observed that questions used in social studies
Basic Education Certicate Examination in Delta State from 2016 to 2019 did not measure all the skills
identied in the revised basic education curriculum. This has serious implication for the achievement of
the objectives guiding the revised curriculum. It also has an implication on the achievement of the goals
of Education for Sustainable Development (ESD), which emphasise the development of knowledge, skills,
values and attitudes needed for a more sustainable and just society for all. Given the above, the aim of
this study is to assess the psychometric properties of the social studies aptitude test (SSAT) developed
by Jessa, et al. (2023).
Aligned with the study's objectives, the research was grounded in the Item Response Theory (IRT) of
Measurement. Item Response Theory, also called latent response theory, comprises a set of mathematical
models designed to elucidate the connection between unobservable traits (latent characteristics or
attributes) and their expressions (such as observed outcomes, responses, or performance). These models
establish a relationship among the features of items in an instrument, individuals responding to these
items, and the underlying trait being measured. IRT posits that both the latent construct (e.g., stress,
knowledge, attitudes) and the items of a measure are arranged along an imperceptible continuum.
Consequently, its primary aim is to determine an individual's position on that continuum (Osadebe &
Jessa, 2018).
IRT originated in the 1950s and 1960s through the efforts of Frederic Lord and other psychometricians
(Lord, 1952; Lord & Novick, 1968) with the objective of creating a methodology capable of assessing
respondents without relying on the inclusion of identical test items (Hambleton & Jodoin, 2003). As a
result, IRT evolved from classical measurement theory to address several of its limitations (Hambleton,
1994). This statistical theory encompasses various mathematical models characterized by: a) the ability
to predict individual scores based on latent traits or abilities, and b) the establishment of a connection
Page 3/25
between an individual's item performance and the underlying traits through a function known as the "item
characteristic curve" (Hambleton, Swaminathan & Rogers, 1991). These features are achievable because
IRT models maintain item and ability parameter consistency for both test items and individuals when the
relevant IRT model accurately ts the available test data. In simpler terms, the same items used across
different samples will retain their statistical properties (such as diculty and discrimination), and
individuals' scores representing the ability or latent traits on a specic construct will not be contingent on
the specic test items administered.
Item response theory (IRT) rests on two basic postulates: (a) The performance of an examinee on a test
item can be predicted (or explained) by a set of factors called traits, latent traits, or abilities; and (b) the
relationship between examinees' item performance and the set of traits underlying item performance can
be described by a monotonically increasing function called an item characteristic function or item
characteristic curve (ICC). This function species that as the level of the trait increases, the probability of
a correct response to an item increase.
IRT models include a set of assumptions about the data to which the model is applied. Although the
viability of assumptions cannot be determined directly some indirect evidence can be collected and
assessed and the overall t of the model to the test data can be assessed as well. Four distinct
assumptions dominate the practice of Item Response Theory (IRT). These assumptions include
unidimensionality assumption, assumption of local independence, assumption of monotonicity and
assumption of speededness.Three common IRT models for binary data are the one, two and three-
parameter logistic models. They differentiate themselves for the number of parameters estimated. Before
explaining what the 3-Parameter logistic model entails, there is a need to provide a short examination of
the 1-Parameter and 2-Parameter logistic models.
Research Questions
The following research questions guided the study:
1. To what extent does the SSAT meet the requirement of unidimensionality?
2. To what extent does the SSAT meet the requirement of a-parameter?
3. To what extent does the SSAT meet the requirement of b-parameter?
4. To what extent does the SSAT meet the requirement of c-parameter?
Methods
Design
This study adopted an instrumentation research design. This design is suitable for the study because the
intention of the researcher was geared towards the assessment of the psychometric properties of the
social studies aptitude test.
Page 4/25
Participants Selection
A total of 1,000 students precipitated in the study. The choice of the sample size was based on the
recommendation of Lord (1968), that a minimum of 50 items and 1,000 examinees are required to
estimate an
a
-parameter with high accuracy. A total of 40 students in each local government area of the
state were selected to make a total of 1,000 students. This was done through simple random and cluster
sampling techniques. In this case, the schools in each Local Government Area of the state were treated as
clusters, such that the researcher randomly selected one school in each Local Government Area to make
a total of 25 schools. This was done through a simple random sampling technique of the balloting
method. Using this procedure, the researcher wrote the name of all the schools in each local government
area on pieces of paper, folded and poured them into a container. He then shued them and picked one
piece of paper from the container. Schools picked from this process were the selected school in that
Local Government Area. This was done for all Local Government Areas until all the 25 schools (one for
each Local Government Area) were selected.
The above procedure produced 25 clusters, one for each Local Government Area. For each cluster, the
researcher randomly selected one classroom out of the various classrooms in Basic 3. All the students in
the selected classroom were used for the study, because they were up to 40 students in the selected
school.
Measure
The Social Studies Aptitude Test developed by Jessa, et al. (2023) was used as instrument for the study.
The test comprised 100 multiple-choice items which were derived from the Basic 3 social studies
Syllabus. The syllabus was obtained from the Ministry of Basic and Secondary Education, Asaba, Delta
State. The items in the test comprise 5 options; one key and four distracters.
The social studies aptitude test was administered to the students directly by the researcher, with the help
of 5 research assistants, who were trained on the purpose of the study. The research team visited the
schools personally before the testing date to make their intention known to the principal or head of the
school and to obtain permission. The research assistants were briefed on the purpose of the study and
how to approaches testees. The data were collected on the spot from the respondents.
Data Analysis
The data obtained were collated, coded and entered into a computer system through the Statistical
Package for Social Science (SPSS), version 26. Factor analysis using Principal Component Analysis of
the varimax method was used to answer research question 1. The a, b and c parameters item response
theory dichotomous models were used to answer research question 2, 3 and 4 respectively. The
researcher internally modied the Statistical Package for Social Science (SPSS) version 26, by installing
and conguring R extensions and plug-ins as well as integrating the R software into the SPSS. This
allowed the SPSS to behave differently from its original purpose of statistical analysis tools. The
remodelled SPSS was then used to answer the research questions 2–4. These research questions were
Page 5/25
answered using three different Parameter Logistic Models, that is, the 1- Parameter Logistic Model (1-
PLM) for research question 2, the 2-Parameter Logistic Model (2-PLM) for research question 3, and the 3-
Parameter Logistic Model (3-PLM) for research question 4.
Results
Research Question 1
To what extent does the SSAT meet the requirement of unidimensionality?
In answering the above research question, the data obtained from the eld were subjected to a factor
analysis using principal component analysis of the varimax method. Using Guttman-Kaiser rule, all
factors with eigenvalues greater than 1 were retained as the factor that the items measure. Analysis of
the scree plot as shown in Fig.2 shows the underlining construct or unidimensionality of the SSAT. The
SPSS statistical software was used in the analysis.
Figure 2 shows the scree plot for the SSAT. From the gure, a careful examination of the scree plot shows
that there is only one construct before the breaking point or elbow joint. This therefore succinctly shows
the unidimensionality of the underlining construct of the SSAT. All the items measure one construct.
Research Question 2
To what extent does the SSAT meet the requirement of a-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the a-parameter. In analysing the data, the a-parameter item response theory dichotomous model was
utilised. The aim is to determine the discrimination level of the items in the SSAT. The result is shown in
Table3.
Page 6/25
Table 3
a-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT40 9.620 12.574 .765 Satisfactory (No Revision Required)
SSAT56 4.999 .710 7.045 Satisfactory (No Revision Required)
SSAT88 4.819 .539 8.934 Satisfactory (No Revision Required)
SSAT55 3.829 .507 7.549 Satisfactory (No Revision Required)
SSAT68 3.529 .518 6.811 Satisfactory (No Revision Required)
SSAT23 3.429 .703 4.875 Satisfactory (No Revision Required)
SSAT48 3.031 .502 6.039 Satisfactory (No Revision Required)
SSAT78 3.029 .588 5.153 Satisfactory (No Revision Required)
SSAT61 2.900 .386 7.511 Satisfactory (No Revision Required)
SSAT77 2.874 .341 8.420 Satisfactory (No Revision Required)
SSAT4 2.797 .584 4.788 Satisfactory (No Revision Required)
SSAT46 2.774 .710 3.909 Satisfactory (No Revision Required)
SSAT70 2.763 .382 7.235 Satisfactory (No Revision Required)
SSAT5 2.705 .406 6.656 Satisfactory (No Revision Required)
SSAT94 2.653 .481 5.515 Satisfactory (No Revision Required)
SSAT85 2.627 .318 8.262 Satisfactory (No Revision Required)
SSAT62 2.623 .308 8.526 Satisfactory (No Revision Required)
SSAT50 2.599 .350 7.426 Satisfactory (No Revision Required)
SSAT57 2.556 .269 9.490 Satisfactory (No Revision Required)
SSAT72 2.503 .260 9.625 Satisfactory (No Revision Required)
SSAT96 2.498 .263 9.489 Satisfactory (No Revision Required)
SSAT83 2.463 .328 7.514 Satisfactory (No Revision Required)
SSAT74 2.408 .303 7.947 Satisfactory (No Revision Required)
SSAT67 2.373 .283 8.372 Satisfactory (No Revision Required)
SSAT81 2.360 .334 7.062 Satisfactory (No Revision Required)
SSAT24 2.349 .401 5.860 Satisfactory (No Revision Required)
Criterion: See page 88
Page 7/25
Item Value Standard Error Z Remark
SSAT41 2.310 .268 8.635 Satisfactory (No Revision Required)
SSAT59 2.273 .348 6.528 Satisfactory (No Revision Required)
SSAT97 2.256 .560 4.030 Satisfactory (No Revision Required)
SSAT52 2.178 .292 7.452 Satisfactory (No Revision Required)
SSAT63 2.150 .219 9.817 Satisfactory (No Revision Required)
SSAT45 2.124 .609 3.486 Satisfactory (No Revision Required)
SSAT80 2.084 .235 8.879 Satisfactory (No Revision Required)
SSAT98 2.046 .254 8.069 Satisfactory (No Revision Required)
SSAT36 2.021 .236 8.546 Satisfactory (No Revision Required)
SSAT99 1.996 .312 6.386 Satisfactory (No Revision Required)
SSAT39 1.949 .276 7.054 Satisfactory (No Revision Required)
SSAT35 1.917 .223 8.592 Satisfactory (No Revision Required)
SSAT93 1.899 .229 8.308 Satisfactory (No Revision Required)
SSAT92 1.872 .270 6.941 Satisfactory (No Revision Required)
SSAT71 1.817 .271 6.700 Satisfactory (No Revision Required)
SSAT82 1.788 .262 6.822 Satisfactory (No Revision Required)
SSAT18 1.751 .379 4.620 Satisfactory (No Revision Required)
SSAT60 1.744 .274 6.370 Satisfactory (No Revision Required)
SSAT47 1.699 .385 4.411 Good (Little or no Revision Required)
SSAT73 1.671 .193 8.665 Good (Little or no Revision Required)
SSAT27 1.666 .444 3.751 Good (Little or no Revision Required)
SSAT19 1.609 .355 4.537 Good (Little or no Revision Required)
SSAT49 1.591 .246 6.481 Good (Little or no Revision Required)
SSAT91 1.547 .225 6.867 Good (Little or no Revision Required)
SSAT53 1.513 .687 2.203 Good (Little or no Revision Required)
SSAT44 1.505 .118 12.708 Good (Little or no Revision Required)
SSAT95 1.477 .236 6.269 Good (Little or no Revision Required)
Criterion: See page 88
Page 8/25
Item Value Standard Error Z Remark
SSAT11 1.476 .129 11.462 Good (Little or no Revision Required)
SSAT29 1.435 .120 11.921 Good (Little or no Revision Required)
SSAT16 1.433 .126 11.415 Good (Little or no Revision Required)
SSAT2 1.408 .145 9.698 Good (Little or no Revision Required)
SSAT33 1.407 .116 12.113 Good (Little or no Revision Required)
SSAT30 1.389 .289 4.813 Good (Little or no Revision Required)
SSAT79 1.388 .228 6.086 Good (Little or no Revision Required)
SSAT66 1.385 .231 6.006 Good (Little or no Revision Required)
SSAT6 1.373 .191 7.190 Good (Little or no Revision Required)
SSAT89 1.363 .193 7.078 Good (Little or no Revision Required)
SSAT54 1.346 .196 6.856 Good (Little or no Revision Required)
SSAT31 1.332 .214 6.212 Moderate (Little or no Revision Required)
SSAT87 1.323 .280 4.721 Moderate (Little or no Revision Required)
SSAT26 1.321 .190 6.954 Moderate (Little or no Revision Required)
SSAT43 1.321 .252 5.245 Moderate (Little or no Revision Required)
SSAT34 1.320 .271 4.874 Moderate (Little or no Revision Required)
SSAT38 1.298 .487 2.666 Moderate (Little or no Revision Required)
SSAT75 1.298 .253 5.136 Moderate (Little or no Revision Required)
SSAT25 1.292 .311 4.153 Moderate (Little or no Revision Required)
SSAT84 1.290 1.491 .865 Moderate (Little or no Revision Required)
SSAT7 1.266 .242 5.228 Moderate (Little or no Revision Required)
SSAT51 1.257 .255 4.933 Moderate (Little or no Revision Required)
SSAT17 1.225 .107 11.419 Moderate (Little or no Revision Required)
SSAT9 1.182 .104 11.317 Moderate (Little or no Revision Required)
SSAT15 1.179 .121 9.766 Moderate (Little or no Revision Required)
SSAT10 1.166 .102 11.465 Moderate (Little or no Revision Required)
SSAT13 1.117 .098 11.348 Moderate (Little or no Revision Required)
Criterion: See page 88
Page 9/25
Item Value Standard Error Z Remark
SSAT3 1.074 .107 10.027 Moderate (Little or no Revision Required)
SSAT22 1.074 .201 5.355 Moderate (Little or no Revision Required)
SSAT37 1.049 .196 5.363 Moderate (Little or no Revision Required)
SSAT1 1.040 .313 3.322 Moderate (Little or no Revision Required)
SSAT28 1.032 .095 10.883 Moderate (Little or no Revision Required)
SSAT69 1.006 .205 4.914 Moderate (Little or no Revision Required)
SSAT20 .929 .099 9.370 Moderate (Little or no Revision Required)
SSAT21 .906 .203 4.469 Moderate (Little or no Revision Required)
SSAT12 .800 .159 5.036 Moderate (Little or no Revision Required)
SSAT100 .746 .224 3.336 Moderate (Little or no Revision Required)
SSAT86 .671 .893 .751 Moderate (Little or no Revision Required)
SSAT58 .624 .079 7.878 Marginal (Needs Revision)
SSAT32 .494 .075 6.594 Marginal (Needs Revision)
SSAT64 .453 .080 5.660 Marginal (Needs Revision)
SSAT8 .214 .074 2.897 Poor (Should be Eliminated or Revised)
SSAT90 .166 .074 2.251 Poor (Should be Eliminated or Revised)
SSAT14 .139 .068 2.051 Poor (Should be Eliminated or Revised)
SSAT42 − .051 .068 − .757 Poor (Should be Eliminated or Revised)
SSAT76 − .318 .116 -2.734 Poor (Should be Eliminated or Revised)
SSAT65 − .987 .896 -1.101 Poor (Should be Eliminated or Revised)
Criterion: See page 88
As shown in Table3, the item discrimination index ranged from − 987 to 9.620, with a higher index
indicating a satisfactory item and a lower index indicating a poor item. From the result, items 8, 90, 14,
42, 76 and 65 had a discriminatory index of .214, .166, .139, − .051, − .318 and − .987, indicating poor
items that need to be eliminated or revised. Items 58, 32 and 64 had a discriminatory index of .624, .494
and .453, indicating marginal items that needs to be revised before they can be used. All other items are
either satisfactory (needing no revision), good or moderate (needing little or no revision). The distribution
can be visualized in Fig.3.
Research Question 3
Page 10/25
To what extent does the SSAT meet the requirement of b-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the b-parameter. In analysing the data, the b-parameter item response theory dichotomous model was
utilised. The aim is to determine the diculty level of the items in the SSAT. The result is shown in
Table4.
Page 11/25
Table 4
b-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT76 -7.702 2.720 -2.832 Very Easy
SSAT65 -4.971 2.668 -1.863 Very Easy
SSAT2 -1.999 .149 -13.417 Very Easy
SSAT15 -1.939 .161 -12.061 Very Easy
SSAT20 -1.797 .176 -10.202 Very Easy
SSAT3 -1.678 .150 -11.187 Very Easy
SSAT16 -1.369 .105 -12.998 Very Easy
SSAT43 -1.328 .484 -2.743 Very Easy
SSAT11 -1.326 .102 -12.970 Very Easy
SSAT1 -1.108 .914 -1.213 Very Easy
SSAT9 -1.033 .103 -10.056 Very Easy
SSAT17 -1.010 .100 -10.136 Very Easy
SSAT29 − .974 .089 -10.918 Very Easy
SSAT26 − .872 .259 -3.371 Very Easy
SSAT21 − .828 .610 -1.356 Very Easy
SSAT52 − .824 .148 -5.579 Very Easy
SSAT28 − .818 .103 -7.938 Very Easy
SSAT33 − .814 .084 -9.708 Very Easy
SSAT60 − .804 .219 -3.670 Very Easy
SSAT10 − .703 .090 -7.774 Very Easy
SSAT61 − .675 .107 -6.327 Very Easy
SSAT36 − .672 .122 -5.499 Very Easy
SSAT31 − .670 .276 -2.430 Very Easy
SSAT13 − .633 .090 -7.033 Very Easy
SSAT35 − .618 .127 -4.864 Very Easy
SSAT67 − .606 .105 -5.758 Very Easy
Criterion: See page 89
Page 12/25
Item Value Standard Error Z Remark
SSAT39 − .587 .159 -3.685 Very Easy
SSAT55 − .570 .077 -7.388 Very Easy
SSAT56 − .549 .070 -7.810 Very Easy
SSAT44 − .545 .071 -7.632 Very Easy
SSAT79 − .541 .248 -2.183 Very Easy
SSAT57 − .536 .081 -6.594 Very Easy
SSAT62 − .473 .091 -5.194 Very Easy
SSAT70 − .469 .108 -4.350 Very Easy
SSAT63 − .460 .083 -5.558 Very Easy
SSAT82 − .457 .167 -2.736 Very Easy
SSAT34 − .444 .328 -1.354 Very Easy
SSAT6 − .429 .193 -2.224 Very Easy
SSAT7 − .408 .311 -1.309 Very Easy
SSAT25 − .401 .394 -1.018 Very Easy
SSAT54 − .388 .195 -1.985 Very Easy
SSAT72 − .364 .073 -4.979 Very Easy
SSAT73 − .363 .117 -3.108 Very Easy
SSAT24 − .235 .142 -1.653 Very Easy
SSAT91 − .222 .159 -1.395 Very Easy
SSAT49 − .184 .169 -1.084 Very Easy
SSAT98 − .153 .104 -1.476 Very Easy
SSAT12 − .137 .383 − .358 Very Easy
SSAT80 − .125 .089 -1.408 Very Easy
SSAT22 − .096 .271 − .354 Very Easy
SSAT92 − .087 .129 − .678 Very Easy
SSAT50 − .086 .084 -1.020 Very Easy
SSAT74 − .077 .088 − .876 Very Easy
Criterion: See page 89
Page 13/25
Item Value Standard Error Z Remark
SSAT93 − .077 .101 − .768 Very Easy
SSAT41 − .053 .084 − .631 Very Easy
SSAT89 .011 .144 .078 Easy
SSAT77 .024 .067 .365 Easy
SSAT96 .252 .057 4.396 Easy
SSAT81 .293 .085 3.459 Easy
SSAT48 .373 .077 4.828 Easy
SSAT88 .385 .040 9.676 Easy
SSAT19 .431 .174 2.474 Easy
SSAT30 .452 .169 2.672 Easy
SSAT85 .469 .053 8.828 Easy
SSAT58 .477 .131 3.644 Easy
SSAT71 .490 .089 5.529 Easy
SSAT4 .502 .099 5.070 Easy
SSAT37 .504 .172 2.936 Easy
SSAT32 .535 .166 3.219 Easy
SSAT59 .663 .074 8.988 Easy
SSAT75 .738 .131 5.650 Easy
SSAT68 .740 .055 13.526 Easy
SSAT27 .783 .161 4.875 Easy
SSAT83 .822 .063 13.005 Easy
SSAT66 .843 .106 7.977 Easy
SSAT100 .908 .324 2.799 Easy
SSAT42 .912 10.803 .084 Easy
SSAT23 .951 .071 13.392 Easy
SSAT51 .956 .126 7.566 Easy
SSAT5 .960 .066 14.612 Easy
Criterion: See page 89
Page 14/25
Item Value Standard Error Z Remark
SSAT69 .969 .164 5.926 Easy
SSAT47 1.010 .126 7.991 Dicult
SSAT95 1.057 .104 10.206 Dicult
SSAT97 1.142 .107 10.653 Dicult
SSAT99 1.174 .086 13.620 Dicult
SSAT94 1.243 .077 16.149 Dicult
SSAT87 1.248 .119 10.461 Dicult
SSAT18 1.316 .122 10.758 Dicult
SSAT14 1.455 .931 1.563 Dicult
SSAT78 1.594 .091 17.437 Dicult
SSAT46 1.780 .130 13.651 Dicult
SSAT64 1.829 .340 5.382 Dicult
SSAT40 2.108 .109 19.254 Very Dicult
SSAT45 2.240 .254 8.809 Very Dicult
SSAT38 2.464 .425 5.801 Very Dicult
SSAT53 2.682 .600 4.473 Very Dicult
SSAT8 3.515 1.243 2.828 Very Dicult
SSAT90 4.192 1.924 2.179 Very Dicult
SSAT84 4.298 3.102 1.385 Very Dicult
SSAT86 5.236 4.589 1.141 Very Dicult
Criterion: See page 89
As shown in Table4, the item diculty index ranged from − 7.702 to 5.236, with a higher index indicating
a very dicult item and a lower index indicating a very easy item. From the result, items 40, 45, 38, 53, 8,
90, 84 and 86 had a diculty index of 2.108, 2.240, 2.464, 2.682, 3.515, 4.192, 4.298 and 5.236
respectively, indicating that the items are very dicult. 47, 95, 97, 99, 94, 87, 18, 14, 78, 46 and 64 had a
diculty index of 1.010, 1.057, 1.142, 1.174, 1.243, 1.248, 1.316, 1.455, 1.594, 1.780 and 1.829,
respectively, indicating that the items are dicult. Other items were either very easy or easy. Having
obtained a diculty index that ranged from − 7.702 for item 76 to 0.969 for item 69. The distribution can
be visualized in Fig.4.
Page 15/25
Research Question 4
To what extent does the SSAT meet the requirement of c-parameter?
In answering the above research question, the modied SPSS statistical software was used to estimate
the c-parameter. In analysing the data, the c-parameter item response theory dichotomous model was
utilised. The aim is to determine the guessing level of the items in the SSAT. The result is shown in
Table5.
Page 16/25
Table 5
c-Parameter Index for the SSAT
Item Value Standard Error Z Remark
SSAT2 .000 .011 .009 Not Guessable
SSAT3 .000 .004 .006 Not Guessable
SSAT8 .000 .007 .018 Not Guessable
SSAT9 .000 .000 .001 Not Guessable
SSAT10 .000 .001 .003 Not Guessable
SSAT11 .000 .000 .002 Not Guessable
SSAT13 .000 .001 .003 Not Guessable
SSAT14 .000 .021 .017 Not Guessable
SSAT15 .000 .000 .002 Not Guessable
SSAT16 .000 .000 .001 Not Guessable
SSAT17 .000 .001 .004 Not Guessable
SSAT20 .000 .001 .004 Not Guessable
SSAT28 .000 .003 .007 Not Guessable
SSAT29 .000 .000 .001 Not Guessable
SSAT32 .000 .002 .005 Not Guessable
SSAT33 .000 .000 .001 Not Guessable
SSAT44 .000 .000 .003 Not Guessable
SSAT58 .000 .002 .005 Not Guessable
SSAT64 .000 .000 .002 Not Guessable
SSAT76 .000 .000 .001 Not Guessable
SSAT90 .001 .046 .025 Not Guessable
SSAT73 .010 .048 .201 Not Guessable
SSAT43 .011 .256 .042 Not Guessable
SSAT42 .013 .272 .047 Not Guessable
SSAT37 .024 .064 .379 Not Guessable
SSAT22 .025 .103 .242 Not Guessable
Criterion: c > 0.20
Page 17/25
Item Value Standard Error Z Remark
SSAT36 .031 .058 .535 Not Guessable
SSAT6 .033 .083 .394 Not Guessable
SSAT12 .039 .120 .327 Not Guessable
SSAT63 .039 .033 1.159 Not Guessable
SSAT72 .054 .031 1.732 Not Guessable
SSAT85 .057 .021 2.721 Not Guessable
SSAT26 .073 .113 .645 Not Guessable
SSAT45 .073 .013 5.498 Not Guessable
SSAT89 .073 .056 1.304 Not Guessable
SSAT66 .076 .038 2.013 Not Guessable
SSAT96 .077 .023 3.352 Not Guessable
SSAT78 .086 .014 6.233 Not Guessable
SSAT87 .089 .036 2.477 Not Guessable
SSAT71 .091 .037 2.501 Not Guessable
SSAT51 .093 .046 2.005 Not Guessable
SSAT57 .103 .037 2.816 Not Guessable
SSAT75 .103 .050 2.061 Not Guessable
SSAT21 .105 .209 .504 Not Guessable
SSAT88 .106 .019 5.701 Not Guessable
SSAT100 .106 .100 1.057 Not Guessable
SSAT69 .112 .051 2.174 Not Guessable
SSAT50 .117 .039 3.043 Not Guessable
SSAT65 .119 .019 6.388 Not Guessable
SSAT95 .119 .028 4.184 Not Guessable
SSAT35 .121 .058 2.090 Not Guessable
SSAT79 .124 .102 1.222 Not Guessable
SSAT91 .127 .065 1.951 Not Guessable
Criterion: c > 0.20
Page 18/25
Item Value Standard Error Z Remark
SSAT99 .127 .022 5.714 Not Guessable
SSAT67 .136 .051 2.675 Not Guessable
SSAT83 .139 .022 6.428 Not Guessable
SSAT93 .141 .043 3.247 Not Guessable
SSAT38 .142 .026 5.405 Not Guessable
SSAT84 .143 .019 7.722 Not Guessable
SSAT40 .145 .013 11.331 Not Guessable
SSAT80 .148 .038 3.892 Not Guessable
SSAT94 .151 .021 7.142 Not Guessable
SSAT5 .152 .022 6.987 Not Guessable
SSAT46 .158 .018 8.941 Not Guessable
SSAT54 .166 .074 2.233 Not Guessable
SSAT62 .167 .044 3.846 Not Guessable
SSAT86 .168 .044 3.857 Not Guessable
SSAT41 .175 .037 4.709 Not Guessable
SSAT77 .177 .033 5.430 Not Guessable
SSAT98 .177 .046 3.878 Not Guessable
SSAT7 .180 .116 1.556 Not Guessable
SSAT55 .184 .040 4.637 Not Guessable
SSAT59 .192 .029 6.588 Not Guessable
SSAT74 .201 .040 5.042 Guessable
SSAT92 .204 .054 3.802 Guessable
SSAT34 .206 .122 1.681 Guessable
SSAT81 .207 .036 5.696 Guessable
SSAT82 .214 .071 3.039 Guessable
SSAT49 .217 .065 3.317 Guessable
SSAT56 .221 .038 5.755 Guessable
Criterion: c > 0.20
Page 19/25
Item Value Standard Error Z Remark
SSAT53 .222 .023 9.841 Guessable
SSAT31 .230 .104 2.215 Guessable
SSAT68 .234 .023 10.223 Guessable
SSAT30 .235 .060 3.949 Guessable
SSAT52 .237 .070 3.384 Guessable
SSAT18 .241 .030 8.066 Guessable
SSAT39 .262 .069 3.830 Guessable
SSAT48 .278 .034 8.125 Guessable
SSAT60 .282 .090 3.125 Guessable
SSAT61 .282 .052 5.385 Guessable
SSAT25 .292 .130 2.244 Guessable
SSAT23 .318 .026 12.157 Guessable
SSAT19 .329 .058 5.662 Guessable
SSAT47 .331 .038 8.835 Guessable
SSAT70 .336 .049 6.859 Guessable
SSAT24 .368 .055 6.651 Guessable
SSAT97 .373 .031 12.142 Guessable
SSAT1 .375 .260 1.441 Guessable
SSAT27 .408 .048 8.508 Guessable
SSAT4 .444 .035 12.597 Guessable
Criterion: c > 0.20
As shown in Table5, the item guessing index ranged from .000 to .444, with a higher index indicating a
guessable item and a lower index indicating not guessable item. The recommended range should be
between 0.00 and 0.20. this is because the SSAT that has 5 alternatives, a low examinee should have 1/5
= 0.20 chance of guessing the correct answer. Since c = 0.20 for this 5-alternative item, once the right key
is isolated, the examinees will be guessing among the remaining four options. From the result, 27 items
had a guessing index above the recommended 0.20, meaning that they are guessable. Other items (73
out of 100) had a guessing index that ranged from 0.000 to 0.192, indicating that they are not guessable.
The distribution can be visualized in Fig.5.
Page 20/25
Discussion
Index of Unidimensionality of the Items in the Social Studies Aptitude Test (SSAT)
The rst nding revealed that allthe 100 itemsmeasured a single construct, as shown in the scree plot.
The model assumes that there is one dominant latent trait being measured by the test and that this trait
is the driving force for the responses observed for each item in the Social Studies Aptitude Test.It is
commonly assumed that only one ability or trait is necessary to "explain," or "account" for examinee test
performance. Item response models that assume a single latent ability are referred to as
unidimensional.Several researchers have used factor analysis to determine the unidimensionality of a
test and were successful. For instance, Kpolovie and Emekene (2016) validated the advanced progressive
matrices for Nigerian sample using Item Response Theory. They used factor analysis to determine the
unidimensionality of the scale and found that the unidimensionality of the underlining construct of the
APM scale, namely intelligence or uid ability and that all 36 items of the scale measure one construct,
the uid ability of the test taker as conrmed by the scree plot. They concluded that all the items APM
unquestionably measure just one general intelligence factor in Nigeria just as it does in all other countries
that the test is actively in use.
a-Parameter Index for each Item in the Social Studies Aptitude Test (SSAT)
The second nding revealed that most of the items (94 out of 100) were either satisfactory (need no
revision), good or moderate (needs little or no revision). This means that the items have the tendency to
discriminate well between high achievers and low achievers. The above nding agrees with the study of
Kpolovie and Emekene (2016), who used item response theory to validate the Raven's Advanced
Progressive Matrices (APM). The authors found that all items of the test yield favourable statistics under
3-Parameter Logistic IRT Model with regards to discrimination, diculty and guessing.
b-Parameter Index (Item Diculty Parameter) for each Item of the Social Studies Aptitude Test (SSAT)
The third nding revealed that most of the items (89 out of 100) were either very easy or easy. The rest 19
are either very dicult (8 out of 100) or diculty (11 out of 100). This means that the test items are
within the ability level of the students.The term item diculty is used in the education eld to describe
how dicult it is to achieve a 0.5 probability of a correct response for a specic item given the
respondent’s level of the latent variable (theta). Therefore, the more dicult it is for a student to have a
50% chance of correctly answering an item, the higher the ability level needed to achieve this
goal.According to Kpolovie and Emekene (2016), the diculty index ranges in theory from negative to
positive innity, but in practice from -3.0 (very easy) to +3.0 (very dicult). The author reported a similar
nding, the b-parameter is related to the classical P statistic, as items with low P values will tend to have
higher (more positive) b-parameters and items with high P values will tend to have lower (more negative)
b-parameters.
Page 21/25
c-Parameter Index for each Item (Item Discriminatory Parameter) of the Social Studies Aptitude Test
(SSAT)
The fourth nding revealed that most of the items (73 out of 100) are not guessable. Meaning that the
instrument is not subject to guessing tendency, as it meets the assumption of item response theory in
terms of the c-parameter. The c-parameter equals the probability of an examinee of innitely low θ
obtaining a correct response due to guessing. Thus, c is also the lower asymptote of the IRF. From the
nding of the study, most of the items (73 out of 100) had a guessing index that ranged from 0.000 to
0.192, indicating that they are not guessable. Therefore, the degree of guessing can be said to be low
amongst the students for whom the test was developed.
The above nding was supported by Kamiri (2010) that the lowest c-values, the better indicating a lower
probability of getting the answer correct by mere guessing of low ability examinees. Harris (2005)
concluded that the items with 0.30 or greater c-values are considered not very good, rather c-values of
0.20 or lower are desirable. In like manner, Akindele (2003) also noted that items do not have perfect c-
values because examinees do not guess randomly when they do not know the answer.
The nding is also in line with Ani (2014), who applied item response theory in the development and
validation of multiple-choice test in Economics, and found that 49 out of 50 items of the multiple-choice
question in Economics were reliable based on three parameter model (3pl) model. The nding also
agreed with Kpolovie and Emekene (2016), who applied item response theory to validate the advanced
progressive matrices in Nigeria, and found that all items of the test yield favourable statistics under 3-
Parameter Logistic IRT Model with regard to discrimination, diculty and guessing.
Conclusion and Recommendations
Based on the ndings, it can be concluded that the Social Studies Aptitude Test items have a good
psychometric property and can therefore be used for the assessment of Upper Basic School students in
the cognitive domain. The test is unidimensional in nature, hence, measure a single trait.Based on the
ndings from this study the following recommendations were made:
1. The developed Social Studies Aptitude Test should be used by Social Studies teachers for the
assessment of secondary school students, especially during mock examination, in preparation for
external examinations;
2. The test should be added to the already existing item bank domiciled in the Ministry of Basic and
Secondary Education, since the psychometric properties of the test has been shown to be sound;
3. Most of the items should be modied in terms of their diculty, discrimination and guessing power
so that they will be useful in the assessment of Upper Basic School students.
Declarations
Page 22/25
The participants for the study were asked to give their informed consent to participate in the study. They
willingly provided the informed consent.
References
1. Akindele, B. P. (2003).
The development of an item bank for selection tests into Nigerian universities:
An exploratory study.
Unpublished doctoral dissertation, University of Ibadan, Nigeria.
2. Ani, E. N. (2014).
Application of item response theory in the development and validation of multiple-
choice test in economics.
Unpublished M.Ed. Dissertation, University of Nigeria, Nsukka.
3. Bloom, B. S. (1956).
Taxonomy of educational objectives.
New York: David Mckay.
4. Field, A. (2013).
Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock “N”
Roll,
4th Edition, Sage, Los Angeles, London, New Delhi.
5. Hambleton, R. K. (1994). Item response theory: A broad psychometric framework for measurement
advances.
Psicothema
, 6(3), 535–556.
. Hambleton, R. K., & Jodoin, M. (2003). ‘Item response theory: models and features.’ In R. Fernández-
Ballesteros (Ed.),
Encyclopaedia of Psychological Assessment.
London: Sage; p. 509–14.
7. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory.
Vol 2. Sage Publications, Incorporated.
. Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short-Length Tests
and Computerized Adaptive Tests.
Applied Psychological Measurement,
40(4), 289–301.
https://doi.org/10.1177/0146621616631317
9. Harris, D. (2005). Educational measurement issues and practice: comparison of 1-, 2-, and 3-
parameter IRT models. DOI: 10.1111/j.1745-3992.1989.tb00313.x.
10. Jessa, M. O., Odili, J. N., & Osadebe, P. U. (2023). Development of Social Studies Aptitude Test for
Testing Critical Thinking Skills: Implication for the Achievement of Education for Sustainable
Development (ESD).
Canadian Journal of Educational and Social Studies,
3(4), 99-119
11. Kamiri, H. (2010).
A Differential Item Functioning analysis of a language prociency test: An
investigation of background knowledge bias.
Unpublished Masters Thesis. University of Tehran, Iran.
12. Kpolovie, P. J. (2014).
Test, measurement and evaluation in education
(2nd edition). New Owerri:
Spring Field Publishers Ltd.
13. Kpolovie, P. J., & Emekene, C.O. (2016) Psychometric advent of advanced progressive matrices- smart
version (APM-SV) for use in Nigeria.
European Journal of Statistics and Probability,
4(3), 20-30.
14. Lord, F. M. (1952). A theory of test scores.
Psychometric Monograph
.
15. Lord, F. M. (1953). An application of condence intervals and of maximum likelihood to the
estimation of an examinee's ability.
Psychometrika,
18, 57-75.
1. Lord, F. M., & Novick, M. R. (1968).
Statistical theories of mental test scores.
Reading Mass: Addison-
Wesley.
Page 23/25
17. Lord, T., & Baviskar, S. (2007). Moving students from information recitation to information
understanding: exploiting Bloom’s taxonomy in creating science questions.
Journal of College
Science Teaching,
5, 40-44
1. Osadebe, P. U., & Jessa, M. O. (2018). Development of social studies achievement testfor
assessment of secondary school students.
European Journal of Open Education and E-Learning
Studies,
3(1), 104-124.
Figures
Figure 1
Figure 2: Scree Plot for the SSAT
Page 24/25
Figure 2
Figure 3: Histogram showing the discriminatory index of the SSAT
Figure 3
Page 25/25
Figure 4: Histogram showing the diculty index of the SSAT
Figure 4
Figure 5: Histogram showing the guessing index of the SSAT