February 22th, 2015
Indian states: G and S factors
Emil O. W. Kirkegaard1
I reanalyze data published by Lynn and Yadav (2015) for Indian states. I find both G and S factors which
correlate at .61. Method of correlated vectors was applied to both factors which yielded correlations of .87 and .
97. In an experimental method, it variant of the method was applied to both factors combined, which yielded a
correlation of .89.
Key words: intelligence, IQ, cognitive ability, S factor general socioeconomic factor, India, Indian states,
Richard Lynn and Prateek Yadav (2015) recently published both cognitive and socioeconomic data for 33 Indian
states. However, their analysis was quite limited and consisted entirely of a correlation matrix. In this study I
reanalyze their data to examine whether the dataset supports the existence of aggregate-level general cognitive
ability (G) and general socioeconomic status (S) factors (Kirkegaard, 2014; Rindermann, 2007).
The data are described as:
Language Scores Class III (T1). These data consisted of the language scores of class III 11–12 year
old school students in the National Achievement Survey (NAS) carried out in Cycle-3 by the
National Council of Educational Research and Training (2013). The population sample comprised
104,374 students in 7046 schools across 33 states and union territories (UTs). The sample design for
each state and UT involved a three-stage cluster design which used a combination of two probability
sampling methods. At the first stage, districts were selected using the probability proportional to size
(PPS) sampling principle in which the probability of selecting a particular district depended on the
number of class 5 students enrolled in that district. At the second stage, in the chosen districts, the
requisite number of schools was selected. PPS principles were again used so that large schools had a
higher probability of selection than smaller schools. At the third stage, the required number of
students in each school was selected using the simple random sampling (SRS) method. In schools
where class 5 had multiple sections, an extra stage of selection was added with one section being
1 Ulster Institute for Social Research. Email: email@example.com
Page 1 of 7.
sampled at random using SRS.
The language test consisted of reading comprehension and vocabulary, assessed by identifying the
word for a picture. The test contained 50 items and the scores were analyzed using both Classical
Test Theory (CTT) and Item Response Theory (IRT). The scores were transformed to a scale of 0–
500 with a mean of 250 and standard deviation of 50. There were two forms of the test, one in
English and the other in Hindi.
Mathematics Scores Class III (T2). These data consisted of the mathematics scores of Class III
school students obtained by the same sample as for the Language Scores Class III described above.
The test consisted of identifying and using numbers, learning and understanding the values of
numbers (including basic operations), measurement, data handling, money, geometry and patterns.
The test consisted of 50 multiple-choice items scored from 0 to 500 with a mean score was set at 250
with a standard deviation of 50.
Language Scores Class VIII (T3). These data consisted of the language scores of class VIII (14–
15 year olds) obtained in the NAS (National Achievement Survey) a program carried out by the
National Council of Educational Research and Training, 2013) Class VIII (Cycle-3).The sampling
methodology was the same as that for class III described above. The population sample comprised
188,647 students in 6722 schools across 33 states and union territories. The test was a more difficult
version of that for class III, and as for class III, scores were analyzed using both Classical Test
Theory (CTT) and Item Response Theory (IRT), and were transformed to a scale of 0–500 with a
Mathematics Scores Class VIII (T4). These data consisted of the mathematics scores of Class VIII
(14–15 year olds) school students obtained by the same sample as for the Language Scores Class
VIII described above. As with the other tests, the scores were transformed to a scale of 0–500 with a
mean 250 and standard deviation of 50.
Science Scores Class VIII (T5). These data consisted of the science scores of Class VIII (14–15 year
olds) school students obtained by the same sample as for the Language Scores Class VIII described
above. As with the other tests, the scores were transformed to a scale of 0–500 with a mean 250 and
standard deviation of 50. The data were obtained in 2012.
Teachers’ Index (TI). This index measures the quality of the teachers and was taken from the
Page 2 of 7.
Elementary State Education Report compiled by the District Information System for Education
(DISE, 2013). The data were recorded in September 2012 for teachers of grades 1–8 in 35 states and
union territories. The sample consisted of 1,431,702 schools recording observations from 199.71
million students and 7.35 million teachers. The teachers’ Index is constructed from the percentages
of schools with a pupil–teacher ratio in primary greater than 35, and the percentages single-teacher
schools, teachers without professional qualification, and female teachers (in schools with 2 and more
Infrastructure Index (II). These data were taken from the Elementary State Education Report 2012–
13 compiled by the District Information System for Education (2013). The sample was the same as
for the Teachers’ Index described above. This index measures the infrastructure for education and
was constructed from the percentages of schools with proper chairs and desks, drinking water, toilets
for boys and girls, and with kitchens.
GDP per capita (GDP per cap). These data are the net state domestic product of the Indian states in
2008–09 at constant prices given by the Reserve Bank of India (2013). Data are not available for the
Literacy Rate (LR). This consists of the percentage of population aged 7 and above in given in the
2011 census published by the Registrar General and Census Commission of India (2011).
Infant Mortality Rate (IMR). This consists of the number of deaths of infants less than one year of
age per 1000 live births in 2005–06 given in the National Family Health Survey, Infant and Child
Mortality given by the Indian Institute of Population Sciences (2006).
Child Mortality Rate (CMR). This consists of the number of deaths of children 1–4 years of age per
1000 live births in the 2005–06 given by the Indian Institute of Population Sciences (2006).
Life Expectancy (LE). This consists of the number of years an individual is expected to live after
birth, given in a 2007 survey carried out by Population Foundation of India (2008).
Page 3 of 7.
Fertility Rate (FR). This consists of the number of children born per woman in each state and union
territories in 2012 given by Registrar General and Census Commission of India (2012).
Latitude (LAT). This consists of the latitude of the center of the state.
Coast Line (CL). This consists of whether states have a coast line or are landlocked and is included
to examine whether the possession of a coastline is related to the state IQs.
Percentage of Muslims (MS). This is included to examine a possible relation to the state IQs.
A single cell was found to have an incorrect value. The authors were contacted and the correct value was
A few cells had missing data (2.8%) and these were imputed using IRMI (Templ, Kowarik, & Filzmoser, 2011).
Both the cognitive (variables T1-T5) and socioeconomic (variables 6-13) data were factor analyzed (loadings
shown further down). Both analyses revealed a general factor. The factors were then scored. Figure 1 shows the
scatterplot between G and S factors.
Page 4 of 7.
3.1. Method of correlated vectors
This study is special in that we have two latent variables each with its own set of indicator variables. This means
that we can use Jensen’s method of correlated vectors (MCV; Jensen, 1998), and also a new version which I shall
creatively dub “double MCV”, DMCV
using both latent factors instead of only
The method consists of correlating the
factor loadings of a set of indicator
variables for a factor with the correlations
of each indicator variable with a criteria
variable. Jensen used this with the general
intelligence factor (g-factor) and its
subtests with criterion variables such as
inbreeding depression in IQ scores and
Figures 2- show the results of MCV
applied to G, S and the experimental
DMCV method which correlates S and
Page 5 of 7.
Figure 1: Scatterplot of cognitive ability (G) and general socioeconomic factor (S) for 33 Indian states.
Figure 2: Method of correlated vectors for the G factor.
G’s correlations with indicators from both factors.
The results are: .89, .97 and .87. In other
words, MCV gives a strong indication
that it is the latent traits that are
responsible for the observed correlations.
Page 6 of 7.
Figure 3: Method of correlated vectors for the S factor.
Figure 4: Double method of correlated vectors applied to G and S factors.
Supplementary material and acknowledgments
Supplementary materials including code, high quality figures and data can be found at https://osf.io/3uz4f/files/.
This paper was edited in March 2017 to improve readability and use higher quality figures. No numerical results
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international
rankings. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/09/the-
Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and
latitude across the states of India. Intelligence, 49, 179–185. https://doi.org/10.1016/j.intell.2015.01.009
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results
in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706.
Templ, M., Kowarik, A., & Filzmoser, P. (2011). Iterative stepwise regression imputation using standard and
robust methods. Computational Statistics & Data Analysis, 55(10), 2793–2806.
Page 7 of 7.