ArticlePDF Available

Indian states: G and S factors

Authors:
  • Ulster Institute for Social Research

Abstract and Figures

I reanalyze data published by Lynn and Yadav (2015) for Indian states. I find both G and S factors which correlate at .61. The statistical language R is used thruout the paper and the code is explained. The paper thus is both an analysis as a walkthru of how to conduct this type of study.
Content may be subject to copyright.
The Winnower
February 22th, 2015
Indian states: G and S factors
Emil O. W. Kirkegaard1
Abstract
I reanalyze data published by Lynn and Yadav (2015) for Indian states. I find both G and S factors which
correlate at .61. Method of correlated vectors was applied to both factors which yielded correlations of .87 and .
97. In an experimental method, it variant of the method was applied to both factors combined, which yielded a
correlation of .89.
Key words: intelligence, IQ, cognitive ability, S factor general socioeconomic factor, India, Indian states,
inequality
1. Introduction
Richard Lynn and Prateek Yadav (2015) recently published both cognitive and socioeconomic data for 33 Indian
states. However, their analysis was quite limited and consisted entirely of a correlation matrix. In this study I
reanalyze their data to examine whether the dataset supports the existence of aggregate-level general cognitive
ability (G) and general socioeconomic status (S) factors (Kirkegaard, 2014; Rindermann, 2007).
2. Data
The data are described as:
1.
Language Scores Class III (T1). These data consisted of the language scores of class III 11–12 year
old school students in the National Achievement Survey (NAS) carried out in Cycle-3 by the
National Council of Educational Research and Training (2013). The population sample comprised
104,374 students in 7046 schools across 33 states and union territories (UTs). The sample design for
each state and UT involved a three-stage cluster design which used a combination of two probability
sampling methods. At the first stage, districts were selected using the probability proportional to size
(PPS) sampling principle in which the probability of selecting a particular district depended on the
number of class 5 students enrolled in that district. At the second stage, in the chosen districts, the
requisite number of schools was selected. PPS principles were again used so that large schools had a
higher probability of selection than smaller schools. At the third stage, the required number of
students in each school was selected using the simple random sampling (SRS) method. In schools
where class 5 had multiple sections, an extra stage of selection was added with one section being
1 Ulster Institute for Social Research. Email: emil@emilkirkegaard.dk
Page 1 of 7.
sampled at random using SRS.
The language test consisted of reading comprehension and vocabulary, assessed by identifying the
word for a picture. The test contained 50 items and the scores were analyzed using both Classical
Test Theory (CTT) and Item Response Theory (IRT). The scores were transformed to a scale of 0–
500 with a mean of 250 and standard deviation of 50. There were two forms of the test, one in
English and the other in Hindi.
2.
Mathematics Scores Class III (T2). These data consisted of the mathematics scores of Class III
school students obtained by the same sample as for the Language Scores Class III described above.
The test consisted of identifying and using numbers, learning and understanding the values of
numbers (including basic operations), measurement, data handling, money, geometry and patterns.
The test consisted of 50 multiple-choice items scored from 0 to 500 with a mean score was set at 250
with a standard deviation of 50.
3.
Language Scores Class VIII (T3). These data consisted of the language scores of class VIII (14–
15 year olds) obtained in the NAS (National Achievement Survey) a program carried out by the
National Council of Educational Research and Training, 2013) Class VIII (Cycle-3).The sampling
methodology was the same as that for class III described above. The population sample comprised
188,647 students in 6722 schools across 33 states and union territories. The test was a more difficult
version of that for class III, and as for class III, scores were analyzed using both Classical Test
Theory (CTT) and Item Response Theory (IRT), and were transformed to a scale of 0–500 with a
mean 250.
4.
Mathematics Scores Class VIII (T4). These data consisted of the mathematics scores of Class VIII
(14–15 year olds) school students obtained by the same sample as for the Language Scores Class
VIII described above. As with the other tests, the scores were transformed to a scale of 0–500 with a
mean 250 and standard deviation of 50.
5.
Science Scores Class VIII (T5). These data consisted of the science scores of Class VIII (14–15 year
olds) school students obtained by the same sample as for the Language Scores Class VIII described
above. As with the other tests, the scores were transformed to a scale of 0–500 with a mean 250 and
standard deviation of 50. The data were obtained in 2012.
6.
Teachers’ Index (TI). This index measures the quality of the teachers and was taken from the
Page 2 of 7.
Elementary State Education Report compiled by the District Information System for Education
(DISE, 2013). The data were recorded in September 2012 for teachers of grades 1–8 in 35 states and
union territories. The sample consisted of 1,431,702 schools recording observations from 199.71
million students and 7.35 million teachers. The teachers’ Index is constructed from the percentages
of schools with a pupil–teacher ratio in primary greater than 35, and the percentages single-teacher
schools, teachers without professional qualification, and female teachers (in schools with 2 and more
teachers).
7.
Infrastructure Index (II). These data were taken from the Elementary State Education Report 2012–
13 compiled by the District Information System for Education (2013). The sample was the same as
for the Teachers’ Index described above. This index measures the infrastructure for education and
was constructed from the percentages of schools with proper chairs and desks, drinking water, toilets
for boys and girls, and with kitchens.
8.
GDP per capita (GDP per cap). These data are the net state domestic product of the Indian states in
2008–09 at constant prices given by the Reserve Bank of India (2013). Data are not available for the
Union Territories.
9.
Literacy Rate (LR). This consists of the percentage of population aged 7 and above in given in the
2011 census published by the Registrar General and Census Commission of India (2011).
10.
Infant Mortality Rate (IMR). This consists of the number of deaths of infants less than one year of
age per 1000 live births in 2005–06 given in the National Family Health Survey, Infant and Child
Mortality given by the Indian Institute of Population Sciences (2006).
11.
Child Mortality Rate (CMR). This consists of the number of deaths of children 1–4 years of age per
1000 live births in the 2005–06 given by the Indian Institute of Population Sciences (2006).
12.
Life Expectancy (LE). This consists of the number of years an individual is expected to live after
birth, given in a 2007 survey carried out by Population Foundation of India (2008).
13.
Page 3 of 7.
Fertility Rate (FR). This consists of the number of children born per woman in each state and union
territories in 2012 given by Registrar General and Census Commission of India (2012).
14.
Latitude (LAT). This consists of the latitude of the center of the state.
15.
Coast Line (CL). This consists of whether states have a coast line or are landlocked and is included
to examine whether the possession of a coastline is related to the state IQs.
16.
Percentage of Muslims (MS). This is included to examine a possible relation to the state IQs.
A single cell was found to have an incorrect value. The authors were contacted and the correct value was
received.
A few cells had missing data (2.8%) and these were imputed using IRMI (Templ, Kowarik, & Filzmoser, 2011).
3. Analyses
Both the cognitive (variables T1-T5) and socioeconomic (variables 6-13) data were factor analyzed (loadings
shown further down). Both analyses revealed a general factor. The factors were then scored. Figure 1 shows the
scatterplot between G and S factors.
Page 4 of 7.
3.1. Method of correlated vectors
This study is special in that we have two latent variables each with its own set of indicator variables. This means
that we can use Jensen’s method of correlated vectors (MCV; Jensen, 1998), and also a new version which I shall
creatively dub “double MCV”, DMCV
using both latent factors instead of only
one.
The method consists of correlating the
factor loadings of a set of indicator
variables for a factor with the correlations
of each indicator variable with a criteria
variable. Jensen used this with the general
intelligence factor (g-factor) and its
subtests with criterion variables such as
inbreeding depression in IQ scores and
brain size.
Figures 2- show the results of MCV
applied to G, S and the experimental
DMCV method which correlates S and
Page 5 of 7.
Figure 1: Scatterplot of cognitive ability (G) and general socioeconomic factor (S) for 33 Indian states.
Figure 2: Method of correlated vectors for the G factor.
G’s correlations with indicators from both factors.
The results are: .89, .97 and .87. In other
words, MCV gives a strong indication
that it is the latent traits that are
responsible for the observed correlations.
Page 6 of 7.
Figure 3: Method of correlated vectors for the S factor.
Figure 4: Double method of correlated vectors applied to G and S factors.
Supplementary material and acknowledgments
Supplementary materials including code, high quality figures and data can be found at https://osf.io/3uz4f/files/.
This paper was edited in March 2017 to improve readability and use higher quality figures. No numerical results
were changed.
References
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international
rankings. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/09/the-
international-general-socioeconomic-factor-factor-analyzing-international-rankings/
Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and
latitude across the states of India. Intelligence, 49, 179–185. https://doi.org/10.1016/j.intell.2015.01.009
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results
in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706.
https://doi.org/10.1002/per.634
Templ, M., Kowarik, A., & Filzmoser, P. (2011). Iterative stepwise regression imputation using standard and
robust methods. Computational Statistics & Data Analysis, 55(10), 2793–2806.
https://doi.org/10.1016/j.csda.2011.04.012
Page 7 of 7.
... In two previous studies, I analyzed the S factors in 33 Indian states (Kirkegaard, 2015a) and 31 Chinese regions (Kirkegaard, 2015b). Both studies found strongish S factors and they both correlated positively with cognitive estimates (IQ or G). ...
... Although originally invented for use on cognitive test data and the general intelligence factor, I have previously used it in other areas (e.g. Kirkegaard, 2014Kirkegaard, , 2015a. ...
Article
Full-text available
I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75. Ethnoracial demographics of the states are related to the state's IQ and S in the expected order (White>Hispanic>Black).
... This is a pity, because the data allow for a more interesting analysis with the S factor (Kirkegaard, 2014b). Previously (Kirkegaard, 2015), I reanalyzed Lynn and Yadav (2015) and found both general cognitive (G) and general socioeconomic (S) factors (Rindermann, 2007). In this paper I reanalyze data published by Lynn and Cheng (2013) as well as additional data downloaded from the Chinese statistical agency. 2 ...
... Kirkegaard, 2014b), and lower than that found in India (r = .61; Kirkegaard, 2015). The result is mostly due to the two large cities areas of Beijing and Shanghai, and thus are not particularly convincing. ...
Article
Full-text available
I analyze the S factor in Chinese states using data obtained from Lynn and Cheng as well as new data obtained from the Chinese statistical agency. I find that S correlates .42 with IQ and .48 with ethnic Han%.
... Many recent studies have examined within-country regional correlates of (general) cognitive ability (also known as (general) intelligence, general mental ability, g),. This has been done for the British Isles (Lynn, 1979;Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010;Kirkegaard, 2015e), Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d;Lynn & Yadav, 2015), China (Kirkegaard, 2015f;Lynn & Cheng, 2013), Japan (Kura, 2013), the US (Kirkegaard, 2015b;McDaniel, 2006;Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil. ...
Article
Full-text available
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).
... In a number of recent articles (Kirkegaard, 2015c(Kirkegaard, , 2015b(Kirkegaard, , 2015a(Kirkegaard, , 2015d, I have analyzed within-country regional data to examine the general socioeconomic factor, if it exists in the dataset (for the origin of the term, see e.g. Kirkegaard (2014)). ...
Article
Full-text available
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
... A number of my own recent papers have reanalyzed data reported by Lynn, as well as additional data I collected. These cover Italy, India, United States, and China (Kirkegaard, 2015c(Kirkegaard, , 2015b(Kirkegaard, , 2015a(Kirkegaard, , 2015d. This paper reanalyzes Lynn's 1979 paper. ...
Article
Full-text available
I reanalyze data reported by Richard Lynn in a 1979 paper concerning IQ and socioeconomic variables in 12 regions of the United Kingdom as well as Ireland. I find a substantial S factor across regions (66% of variance with MinRes extraction). I produce a new best estimate of the G scores of regions. The correlation of this with the S scores is .79. The MCV with reversal correlation is .47.
Article
Full-text available
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets. A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r’s .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were similar when using rank-transformed data. The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.
Article
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
Article
Full-text available
Two sets of socioeconomic data for 90-96 French departements were analyzed. One dataset was found in Lynn (1980) and contained four socioeconomic variables. Mixed results were found for this dataset, both with regards to the factor structure and the relationship to cognitive ability. Another dataset with 53 variables was created by compiling variables from the official French statistics bureau (Insee). This dataset contained an impure general socioeconomic (S) factor (some undesirable variables loaded positively), but after controlling for the presence of immigrants, the S factor became purer. This was especially salient for crime, unemployment and poverty variables. The two S factors correlated at r = 0.66 [CI95:0.52-0.76; N = 88]. The IQ scores from the 1950s dataset correlated at 0.33 [CI95:0.13-0.51, N = 88] with the S factor from the 2010-2015 dataset.
Article
Full-text available
Many studies have examined the correlations between national IQs and various country-level indexes of well-being. The analyses have been unsystematic and not gathered in one single analysis or dataset. In this paper I gather a large sample of country-level indexes and show that there is a strong general socioeconomic factor (S factor) which is highly correlated (.86-.87) with national cognitive ability using either Lynn and Vanhanen's dataset or Altinok's. Furthermore, the method of correlated vectors shows that the correlations between variable loadings on the S factor and cognitive measurements are .99 in both datasets using both cognitive measurements, indicating that it is the S factor that drives the relationship with national cognitive measurements, not the remaining variance.
Article
Regional differences in cognitive ability are presented for 33 states and union territories of India. Ability was positively correlated with GDP per capita, literacy and life expectancy and negatively correlated with infant and child mortality, fertility and the percentage of Muslims. Ability was higher in the south than in the north and in states with a coast line than with those that were landlocked.
Article
International cognitive ability and achievement comparisons stem from different research traditions. But analyses at the interindividual data level show that they share a common positive manifold. Correlations of national ability means are even higher to very high (within student assessment studies, r = .60–.98; between different student assessment studies [PISA-sum with TIMSS-sum] r = .82–.83; student assessment sum with intelligence tests, r = .85–.86). Results of factor analyses indicate a strong g-factor of differences between nations (variance explained by the first unrotated factor: 94–95%). Causes of the high correlations are seen in the similarities of tests within studies, in the similarities of the cognitive demands for tasks from different tests, and in the common developmental factors at the individual and national levels including known environmental and unknown genetic influences. Copyright © 2007 John Wiley & Sons, Ltd.
Article
Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the “recommended software” for imputations in international and national statistical institutions. Using artificial data and real data sets from official statistics and other fields, the advantages of IRMI over IVEWARE–especially with respect to robustness–are demonstrated.
Article
A pesar de la relativamente corta historia de la Psicología como ciencia, existen pocos constructos psicológicos que perduren 90 años después de su formulación y que, aún más, continúen plenamente vigentes en la actualidad. El factor «g» es sin duda alguna uno de esos escasos ejemplos y para contrastar su vigencia actual tan sólo hace falta comprobar su lugar de preeminencia en los modelos factoriales de la inteligencia más aceptados en la actualidad, bien como un factor de tercer orden en los modelos jerárquicos o bien identificado con un factor de segundo orden en el modelo del recientemente desaparecido R.B.Cattell.
This consists of the number of years an individual is expected to live after birth, given in a 2007 survey carried out by Population Foundation of India
Life Expectancy (LE). This consists of the number of years an individual is expected to live after birth, given in a 2007 survey carried out by Population Foundation of India (2008).