ArticlePDF Available

An S factor among census tracts of Boston

Authors:
  • Ulster Institute for Social Research

Abstract and Figures

A factor analysis was carried out on 6 socioeconomic variables for 506 census tracts of Boston. An S factor was found with positive loadings for median value of owner-occupied homes and average number of rooms in these; negative loadings for crime rate, pupil-teacher ratio, NOx pollution, and the proportion of the population of ‘lower status’. The S factor scores were negatively correlated with the estimated proportion of African Americans in the tracts r = -.36 [CI95 -0.43; -0.28]. This estimate was biased downwards due to data error that could not be corrected for.
Content may be subject to copyright.
An S factor among census tracts of Boston
Abstract
A factor analysis was carried out on 6 socioeconomic variables for 506 census tracts of Boston. An S
factor was found with positive loadings for median value of owner-occupied homes and average
number of rooms in these; negative loadings for crime rate, pupil-teacher ratio, NOx pollution, and the
proportion of the population of 'lower status'. The S factor scores were negatively correlated with the
estimated proportion of African Americans in the tracts r = -.36 [CI95 -0.43; -0.28]. This estimate was
biased downwards due to data error that could not be corrected for.
Introduction
The general socioeconomic factor (s/S
1
) is a similar construct to that of general cognitive ability (GCA;
g factor, intelligence, etc., (Gottfredson, 2002; Jensen, 1998). For ability data, it has been repeatedly
found that performance on any cognitive test is positively related to performance on any other test, no
matter which format (pen pencil, read aloud, computerized), and type (verbal, spatial, mathematical,
figural, or reaction time-based) has been tried. The S factor is similar. It has been repeatedly found that
desirable socioeconomic outcomes tend are positively related to other desirable socioeconomic
outcomes, and undesirable outcomes positively related to other undesirable outcomes. When this
pattern is found, one can extract a general factor such that the desirable outcomes have positive
loadings and then undesirable outcomes have negative loadings. In a sense, this is the latent factor that
underlies the frequently used term “socioeconomic status” except that it is broader and not just
restricted to income, occupation and educational attainment, but also includes e.g. crime and health.
So far, S factors have been found for country-level (Kirkegaard, 2014b), state/regional-level (e.g.
Kirkegaard, 2015), country of origin-level for immigrant groups (Kirkegaard, 2014a) and first name-
level data (Kirkegaard & Tranberg, In preparation). The S factors found have not always been strictly
general in the sense that sometimes an indicator loads in the 'wrong direction', meaning that either an
undesirable variable loads positively (typically crime rates), or a desirable outcome loads negatively.
These findings should not be seen as outliers to be explained away, but rather to be explained in some
coherent fashion. For instance, crime rates may load positively despite crime being undesirable because
the justice system may be better in the higher S states, or because of urbanicity tends to create crime
and urbanicity usually has a positive loading. To understand why some indicators sometimes load in the
wrong direction, it is important to examine data at many levels. This paper extends the S factor to a
new level, that of census tracts in the US.
Data source
While taking a video course on statistical learning based on James, Witten, Hastie, & Tibshirani (2013),
I noted that a dataset used as an example would be useful for an S factor analysis. The dataset concerns
506 census tracts of Boston and includes the following variables (Harrison & Rubinfeld, 1978):
Median value of owner-occupied homes
Average number of rooms in owner units.
Proportion of owner units built before 1940.
Proportion of the population that is 'lower status'. “Proportion of adults without, some high
school education and proportion of male workers classified as laborers)”.
Crime rate.
Proportion of residential land zoned for lots greater than 25k square feet.
Proportion of nonretail business acres.
Full value property tax rate.
Pupil-teacher ratios for schools.
Whether the tract bounds the Charles River.
Weighted distance to five employment centers in the Boston region.
Index of accessibility to radial highways.
Nitrogen oxide concentration. A measure of air pollution.
Proportion of African Americans.
See the original paper for a more detailed description of the variables.
This dataset has become very popular as a demonstration dataset in machine learning and statistics
which shows the benefits of data sharing (Wicherts & Bakker, 2012). As Gilley & Pace (1996) note
“Essentially, a cottage industry has sprung up around using these data to examine alternative statistical
techniques.”. However, as they re-checked the data, they found a number of errors. The corrected data
can be downloaded here, which is the dataset used for this analysis.
The proportion of African Americans
The variable concerning African Americans have been transformed by the following formula: 1000(x
- .63)
2
. Because one has to take the square root to reverse the effect of taking the square, some
information is lost. For example, if we begin with the dataset {2, -2, 2, 2, -2, -2} and take the square of
these and get {4, 4, 4, 4, 4, 4}, it is impossible someone to reverse this transformation and get the
original because they cannot tell whether 4 results from -2 or 2 being squared.
In case of the actual data, the distribution is shown in Figure 1.
Due to the transformation, the values around 400 actually mean that the proportion of blacks is around
0. The function for back-transforming the values is shown in Figure 2.
We can now see the problem of back-transforming the data. If the transformed data contains a value
between 0 and about 140, then we cannot tell which original value was with certainty. For instance, a
transformed value of 100 might correspond to an original proportion of .31 or .95.
To get a feel for the data, one can use the Racial Dot Map explorer and look at Boston. Figure 3 shows
the Boston area color-coded by racial groups.
As can be seen, the races tend to live rather separate with large areas dominated by one group. From
looking at it, it seems that Whites and Asians mix more with each other than with the other groups, and
that African Americans and Hispanics do the same. One might expect this result based on the groups'
relative differences in S factor and GCA (Fuerst, 2014). Still, this should be examined by numerical
analysis, a task which is left for another investigation.
Still, we are left with the problem of how to back-transform the data. The conservative choice is to use
only the left side of the function. This is conservative because any proportion above .63 will get back-
transformed to a lower value. E.g. .80 will become .46, a serious error. This is the method used for this
analysis.
Factor analysis
Of the variables in the dataset, there is the question of which to use for S factor analysis. In general
when doing these analyses, I have sought to include variables that measure something
socioeconomically important and which is not strongly influenced by the local natural environment.
For instance, the dummy variable concerning the River Charles fails on both counts. I chose the
following subset:
Median value of owner-occupied homes
Average number of rooms in owner units.
Proportion of the population that is 'lower status'.
Crime rate.
Pupil-teacher ratios for schools.
Nitrogen oxide concentration. A measure of air pollution.
Which concern important but different things. Figure 4 shows the loadings plot for the factor analysis
(reversed).
2
The S factor was confirmed for this data without exceptions, in that all indicator variables loaded in the
expected direction. The factor was moderately strong, accounting for 47% of the variance.
Relationship between S factor and proportions of African Americans
Figure 5 shows a scatter plot of the relationship between the back-transformed proportion of African
Americans and the S factor.
We see that there is a wide variation in S factor even among tracts with no or very few African
Americans. These low S scores may be due to Hispanics or simply reflect the wide variation within
Whites (there few Asians back then). The correlation between proportion of African Americans and S is
-.36 [CI95 -0.43; -0.28].
We see that many very low S points lie around S [-3 to -1.5]. Some of these points may actually be
census tracts with very high proportions of African Americans that were back-transformed incorrectly.
Discussion
The value of r = -.36 should not be interpreted as an estimate of effect size of ancestry on S factor for
census tracts in Boston because the proportions of the other sociological races were not used. A
multiple regression or similar method with all sociological races as the predictors is necessary to
answer this question. Still, the result above is in the expected direction based on known data concerning
the mean GCA of African Americans, and the relationship between GCA and socioeconomic outcomes
(Gottfredson, 1997).
Limitations
The back-transformation process likely introduced substantial error in the results.
Data are relatively old and may not reflect reality in Boston as it is now.
Supplementary material
Data, high quality figures and R source code is available at the Open Science Framework repository.
References
Fuerst, J. (2014). Ethnic/Race Differences in Aptitude by Generation in the United States: An
Exploratory Meta-analysis. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/07/ethnicrace-differences-in-aptitude-by-generation-in-the-
united-states-an-exploratory-meta-analysis/
Gilley, O. W., & Pace, R. K. (1996). On the Harrison and Rubinfeld data. Journal of Environmental
Economics and Management, 31(3), 403–405.
Gottfredson, L. S. (1997). Why g matters: The complexity of everyday life. Intelligence, 24(1), 79–132.
http://doi.org/10.1016/S0160-2896(97)90014-3
Gottfredson, L. S. (2002). Where and Why g Matters: Not a Mystery. Human Performance, 15(1-2),
25–46. http://doi.org/10.1080/08959285.2002.9668082
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal
of Environmental Economics and Management, 5(1), 81–102.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (Eds.). (2013). An introduction to statistical
learning: with applications in R. New York: Springer.
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2014a). Crime, income, educational attainment and employment among
immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/10/crime-income-educational-attainment-and-employment-
among-immigrant-groups-in-norway-and-finland/
Kirkegaard, E. O. W. (2014b). The international general socioeconomic factor: Factor analyzing
international rankings. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-
analyzing-international-rankings/
Kirkegaard, E. O. W. (2015). Examining the S factor in Mexican states. The Winnower. Retrieved from
https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
Kirkegaard, E. O. W., & Tranberg, B. (In preparation). What is a good name? The S factor in Denmark
at the name-level. Open Differential Psychology. Retrieved from https://osf.io/t2h9c/
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity
of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of
Personality, 21(5), 667–706. http://doi.org/10.1002/per.634
Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish
your data too? Intelligence, 40(2), 73–76. http://doi.org/10.1016/j.intell.2012.01.004
1 Capital S is used when the data are aggregated, and small s is used when it is individual level data. This follows the
nomenclature of (Rindermann, 2007).
2 To say that it is reversed is because the analysis gave positive loadings for undesirable outcomes and negative for
desirable outcomes. This is because the analysis includes more indicators of undesirable outcomes and the factor
analysis will choose the direction to which most indicators point as the positive one. This can easily be reversed by
multiplying with -1.
Figure 1: Transformed data for the proportion of blacks by census
tract.
Figure 2: The transformation function.
Figure 3: Racial dot map of Boston area.
Figure 4: Loadings plot for the S factor.
Figure 5: Scatter plot of S scores and the back-transformed
proportion of African Americans by census tract in Boston.
... Median_house_price had a strong positive loading around .80 in a prior S analysis of Boston census districts (6), but in the present analysis the loading was near zero: -.11, to .12. ...
... Overall results were similar to previous S factor studies. Only one prior study has examined administrative divisions of a major city before and it too found similar results (6). The odd finding of a negative loading for female annual pay in the rank-order analysis deserves more attention in the future, as does the general case where gendered versions of a variable give markedly different results. ...
Article
Full-text available
A dataset of 30 diverse socioeconomic variables was collected covering 32 London boroughs. Factor analysis of the data revealed a general socioeconomic factor. This factor was strongly related to GCSE (General Certificate of Secondary Education) scores (r's .683 to .786) and and had weak to medium sized negative relationships to demographic variables related to immigrants (r's -.295 to -.558). Jensen's method indicated that these relationships were related to the underlying general factor, especially for GCSE (coefficients |.48| to |.69|). In multiple regression, about 60% of the variance in S outcomes could be accounted for using GCSE and one variable related to immigrants.
... The factor has been called the general socioeconomic factor (S factor) and is similar to the g factor of mental ability (Jensen, 1998;Kirkegaard, 2014b). The S factor has been replicated across numerous datasets at different levels of analysis (Kirkegaard, 2014a(Kirkegaard, , 2014b(Kirkegaard, , 2015aKirkegaard & Fuerst, 2014;Kirkegaard & Tranberg, 2015). ...
Article
Full-text available
Two datasets of Japanese socioeconomic data for Japanese prefectures (N=47) were obtained and merged. After quality control, there were 44 variables for use in a factor analysis. Indicator sampling reliability analysis revealed poor reliability (54% of the correlations were |r| > .50). Inspection of the factor loadings revealed no clear S factor with many indicators loading in opposite than expected directions. A cognitive ability measure was constructed from three scholastic ability measures (all loadings > .90). On first analysis, cognitive ability was not strongly related to 'S' factor scores, r = -.19 [CI95: -.45 to .19; N=47]. Jensen's method did not support the interpretation that the relationship is between latent 'S' and cognitive ability (r = -.15; N=44). Cognitive ability was nevertheless related to some socioeconomic indicators in expected ways. A reviewer suggested controlling for population size or population density. When this was done, a relatively clear S factor emerged. Using the best control method (log population density), indicator sampling reliability was high (93% |r|>.50). The scores were strongly related to cognitive ability r = .67 [CI95: .48 to .80]. Jensen's method supported the interpretation that cognitive ability was related to the S factor (r = .78) and not just to the non-general factor variance.
... When such socioeconomic variables are factor analyzed, though, a general socioeconomic factor (S factor) emerges such that, most of the time, desirable outcomes load positively and undesirable outcomes load negatively on it. Previous research has found S factors at the national level (Kirkegaard, 2014b), the state/region/department level (Carl, 2015;Kirkegaard, 2015bKirkegaard, , 2015d and the city district level (Kirkegaard, 2015a). Analyses of national and state level data showed that Human Development Index (HDI) scores correlated strongly with S factor scores at typically >.9. ...
Article
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
... The S factor has been found in numerous studies in the past two years at many different levels of analysis: between countries (Kirkegaard, 2014b), between regions within a country (Kirkegaard, 2015b, c, d, f, g, h, i, j, k), between districts within a city (Kirkegaard, 2015a), between people grouped by country of origin (Kirkegaard, 2014a;Kirkegaard & Fuerst, 2014) and between first names (Kirkegaard & Tranberg, 2015b). Substantial correlations with cognitive ability and demographic variables have often been reported as well. ...
Article
Full-text available
Two sets of socioeconomic data for 90-96 French departements were analyzed. One dataset was found in Lynn (1980) and contained four socioeconomic variables. Mixed results were found for this dataset, both with regards to the factor structure and the relationship to cognitive ability. Another dataset with 53 variables was created by compiling variables from the official French statistics bureau (Insee). This dataset contained an impure general socioeconomic (S) factor (some undesirable variables loaded positively), but after controlling for the presence of immigrants, the S factor became purer. This was especially salient for crime, unemployment and poverty variables. The two S factors correlated at r = 0.66 [CI95:0.52-0.76; N = 88]. The IQ scores from the 1950s dataset correlated at 0.33 [CI95:0.13-0.51, N = 88] with the S factor from the 2010-2015 dataset.
Article
Full-text available
A dataset of socioeconomic, demographic and geographic data for US counties (N≈3,100) was created by merging data from several sources. A suitable subset of 28 socioeconomic indicators was chosen for analysis. Factor analysis revealed a clear general socioeconomic factor (S factor) which was stable across extraction methods and different samples of indicators (absolute split-half sampling reliability = .85). Self-identified race/ethnicity (SIRE) population percentages were strongly, but non-linearly, related to cognitive ability and S. In general, the effect of White% and Asian% were positive, while those for Black%, Hispanic% and Amerindian% were negative. The effect was unclear for Other/mixed%. The best model consisted of White%, Black%, Asian% and Amerindian% and explained 41/43% of the variance in cognitive ability/S among counties. SIRE homogeneity had a non-linear relationship to S, both with and without taking into account the effects of SIRE variables. Overall, the effect was slightly negative due to low S, high White% areas. Geospatial (latitude, longitude, and elevation) and climatological (temperature, precipitation) predictors were tested in models. In linear regression, they had little incremental validity. However, there was evidence of non-linear relationships. When models were fitted that allowed for non-linear effects of the environmental predictors, they were able to add a moderate amount of incremental validity. LASSO regression, however, suggested that much of this predictive validity was due to overfitting. Furthermore, it was difficult to make causal sense of the results. Spatial patterns in the data were examined using multiple methods, all of which indicated strong spatial autocorrelation for cognitive ability, S and SIRE (k nearest spatial neighbor regression [KNSNR] correlations of .62 to .89). Model residuals were also spatially autocorrelated, and for this reason the models were re-fit controlling for spatial autocorrelation using KNSNR-based residuals and spatial local regression. The results indicated that the effects of SIREs were not due to spatially autocorrelated confounds except possibly for Black% which was about 50% weaker in the controlled analyses. Pseudo-multilevel analyses of both the factor structure of S and the SIRE predictive model showed results consistent with the main analyses. Specifically, the factor structure was similar across levels of analysis (states and counties) and within states. Furthermore, the SIRE predictors had similar betas when examined within each state compared to when analyzed across all states. It was tested whether the relationship between SIREs and S was mediated by cognitive ability. Several methods were used to examine this question and the results were mixed, but generally in line with a partial mediation model. Jensen's method (method of correlated vectors) was used to examine whether the observed relationship between cognitive ability and S scores was plausibly due to the latent S factor. This was strongly supported (r = .91, Nindicators=28). Similarly, it was examined whether the relationship between SIREs and S scores was plausibly due to the latent S factor. This did not appear to be the case.
Article
Full-text available
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets. A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r’s .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were similar when using rank-transformed data. The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.
Article
Full-text available
Cognitive ability differences between racial/ethnic groups are of interest to social scientists and policy makers. In many discussions of group differences, racial/ethnic groups are treated as monolithic wholes. However, subpopulations within these broad categories need not perform as the racial/ethnic groups do on average. Such subpopulation differences potentially have theoretical import when it comes to causal explanations of racial/ethnic differentials. As no meta-analysis has previously been conducted on the topic, we investigated the magnitude of racial/ethnic differences by migrant generations (first, second, and third+). We conducted an exploratory meta-analysis using 18 samples for which we were able to decompose scores by sociologically defined race/ethnicity and immigrant generation. For Blacks and Whites of the same generation, the first, second, and third+ generation B/W d-values were 0.79, 0.79, and 1.00. For Hispanics and Whites of the same generation, the first, second, and third+ generation H/W d-values were 0.76, 0.67, and 0.57. For Asians and Whites of the same generation, the first, second, and third+ generation d-values were-0.08,-0.21, and 0.00. Relative to third+ generation Whites, the average d-values were 0.99, 0.84, and 1.00 for first, second, and third+ generation Black individuals, 1.04, 0.71, and 0.57 for first, second, and third+ generation Hispanic individuals, 0.16,-0.18, and-0.01 for first, second, and third+ generation Asian individuals, and 0.24 and 0.04 for first and second generation Whites.
Article
Full-text available
We present and analyze data from a dataset of 2358 Danish first names and socioeconomic outcomes not previously made available to the public (“Navnehjulet”, the Name Wheel). We visualize the data and show that there is a general socioeconomic factor with indicator loadings in the expected directions (positive: income, owning your own place; negative: having a criminal conviction, being without a job). This result holds after controlling for age and for each gender alone. It also holds when analyzing the data in age bins. The factor loading of being married depends on analysis method, so it is more difficult to interpret. A pseudofertility is calculated based on the population size for the names for the years 2012 and 2015. This value is negatively correlated with the S factor score r = -.35 [95CI: -.39; -.31], but the relationship seems to be somewhat non-linear and there is an upward trend at the very high end of the S factor. The relationship is strongly driven by relatively uncommon names who have high pseudofertility and low to very low S scores. The n-weighted correlation is -.21 [95CI: -.25; -.17]. This dysgenic pseudofertility was mostly driven by Arabic and African names. All data and R code is freely available.
Article
Full-text available
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
Article
Full-text available
I present new predictive analyses for crime, income, educational attainment and employment among immigrant groups in Norway and crime in Finland. Furthermore I show that the Norwegian data contains a strong general socioeconomic factor (S) which is highly predictable from country-level variables (National IQ .59, Islam prevalence -.71, international general socioeconomic factor .72, GDP .55), and correlates highly (.78) with the analogous factor among immigrant groups in Denmark. Analyses of the prediction vectors show very high correlations (generally > ±.9) between predictors which means that the same variables are relatively well or weakly predicted no matter which predictor is used. Using the method of correlated vectors shows that it is the underlying S factor that drives the associations between predictors and socioeconomic traits, not the remaining variance (all correlations near unity).
Article
Full-text available
Many studies have examined the correlations between national IQs and various country-level indexes of well-being. The analyses have been unsystematic and not gathered in one single analysis or dataset. In this paper I gather a large sample of country-level indexes and show that there is a strong general socioeconomic factor (S factor) which is highly correlated (.86-.87) with national cognitive ability using either Lynn and Vanhanen's dataset or Altinok's. Furthermore, the method of correlated vectors shows that the correlations between variable loadings on the S factor and cognitive measurements are .99 in both datasets using both cognitive measurements, indicating that it is the S factor that drives the relationship with national cognitive measurements, not the remaining variance.
Book
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Article
Personnel selection research provides much evidence that intelligence (g) is an important predictor of performance in training and on the job, especially in higher level work. This article provides evidence that g has pervasive utility in work settings because it is essentially the ability to deal with cognitive complexity, in particular, with complex information processing. The more complex a work task, the greater the advantages that higher g confers in performing it well. Everyday tasks, like job duties, also differ in their level of complexity. The importance of intelligence therefore differs systematically across different arenas of social life as well as economic endeavor. Data from the National Adult Literacy Survey are used to show how higher levels of cognitive ability systematically improve individual's odds of dealing successfully with the ordinary demands of modern life (such as banking, using maps and transportation schedules, reading and understanding forms, interpreting news articles). These and other data are summarized to illustrate how the advantages of higher g, even when they are small, cumulate to affect the overall life chances of individuals at different ranges of the IQ bell curve. The article concludes by suggesting ways to reduce the risks for low-IQ individuals of being left behind by an increasingly complex postindustrial economy.
Article
g is a highly general capability for processing complex information of any type. This explains its great value in predicting job performance. Complexity is the major distinction among jobs, which explains why g is more important further up the occupational hierarchy. The predictive validities of g are moderated by the criteria and other predictors considered in selection research, but the resulting gradients of g's effects are systematic. The pattern provides personnel psychologists a road map for how to design better selection batteries. Despite much literature on the meaning and impact of g, there nonetheless remains an aura of mystery about where and why g cognitive tests might be useful in selection. The aura of mystery encourages false beliefs and false hopes about how we might reduce disparate impact in employee selection. It is also used to justify new testing techniques whose major effect, witting or not, is to reduce the validity of selection in the service of racial goals.
Book
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Article
g is a highly general capability for processing complex information of any type. This explains its great value in predicting job performance. Complexity is the ma-jor distinction among jobs, which explains why g is more important further up the occupational hierarchy. The predictive validities of g are moderated by the criteria and other predictors considered in selection research, but the resulting gradients of g's effects are systematic. The pattern provides personnel psychologists a road map for how to design better selection batteries. Despite much literature on the meaning and impact of g, there nonetheless remains an aura of mystery about where and why g cognitive tests might be useful in selection. The aura of mystery encourages false beliefs and false hopes about how we might reduce disparate impact in em-ployee selection. It is also used to justify new testing techniques whose major effect, witting or not, is to reduce the validity of selection in the service of racial goals. The general mental ability factor—g—is the best single predictor of job perfor-mance. It is probably the best measured and most studied human trait in all of psy-chology. Much is known about its meaning, distribution, and origins thanks to re-search across a wide variety of disciplines (Jensen, 1998). Many questions about g remain unanswered, including its exact nature, but g is hardly the mystery that some people suggest. The totality—the pattern—of evidence on g tells us a lot about where and why it is important in the real world. Theoretical obtuseness about g is too often used to justify so–called technical advances in personnel selection that minimize, for sociopolitical purposes, the use of g in hiring.