Sangit Chatterjee

Northeastern University, Boston, Massachusetts, United States

Are you Sangit Chatterjee?

Claim your profile

Publications (16)16.91 Total impact

  • Frederick Wiseman, Sangit Chatterjee
    [Show abstract] [Hide abstract]
    ABSTRACT: Estimating the salaries of professional athletes has received a substantial amount of attention both in the press and in academic journals. A statistical technique that can be used to obtain an estimate of a player's salary with a given set of performance characteristics is the classical least squares regression analysis. This technique does not work well, however, if the data upon which the model is based contain outliers or are not normally distributed. In this paper we focus our attention on the salaries of American League baseball players in 2007 and demonstrate the usefulness of an alternative estimation approach – that of quantile regression analysis. Our results indicate that ordinary least squares regression overestimates the salaries of poor players, and underestimates the salaries of star players. This, we believe, is a compelling reason to apply quantile regression in the prediction of baseball player salaries.
    Journal of Quantitative Analysis in Sports 01/2010; 6(1):7-7. DOI:10.2202/1559-0410.1177
  • Sangit Chatterjee, Aykut Firat
    The American Statistician 08/2007; 61(3):248-254. DOI:10.1198/000313007X220057 · 0.88 Impact Factor
  • Source
    Aykut Firat, Sangit Chatterjee, Mustafa Yilmaz
    [Show abstract] [Hide abstract]
    ABSTRACT: In the era of globalization, traditional theories and models of social systems are shifting their focus from isolation and independence to networks and connectedness. Analyzing these new complex social models is a growing, and computationally demanding area of research. In this study, we investigate the integration of genetic algorithms (GAs) with a random-walk-based distance measure to find subgroups in social networks. We test our approach by synthetically generating realistic social network data sets. Our clustering experiments using random-walk-based distances reveal exceptionally accurate results compared with the experiments using Euclidean distances.
    Computational Statistics & Data Analysis 08/2007; DOI:10.1016/j.csda.2007.01.010 · 1.15 Impact Factor
  • Sangit Chatterjee, Frederick Wiseman
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in statistical estimation theory have resulted in the development of new procedures, called robust methods, that can be used to estimate the coefficients of a regression model. Because such methods take into account the impact of discrepant data points during the initial estimation process, they offer a number of advantages over ordinary least squares and other analytical procedures (such as the analysis of outliers or regression diagnostics).This paper describes the robust method of analysis and illustrates its potential usefulness by applying the technique to two data sets. The first application uses artificial data; the second uses a data set analyzed previously by Tufte [15] and, more recently, by Chatterjee and Wiseman [6].
    Decision Sciences 06/2007; 16(4):333 - 342. DOI:10.1111/j.1540-5915.1985.tb01486.x · 1.36 Impact Factor
  • Sangit Chatterjee, Allen G. Greenwood
    [Show abstract] [Hide abstract]
    ABSTRACT: Polynomial regression models have applications in the social sciences and in business research. Unfortunately, such models have a high degree of multicollinearity that creates problems with the statistical assessment of the model. In fact, the collinearity may be so severe that it could lead to an incorrect conclusion that some of the terms in the model are not statistically significant and should therefore be omitted from the model. This note provides a simple transformation to achieve orthogonality in polynomial models between the linear and quadratic terms, thereby eliminating the collinearity problem. It also shows that the same procedure does not achieve orthogonality for higher-order terms. An example data set is analyzed to show the benefits of such a procedure.
    Decision Sciences 06/2007; 21(1):241 - 245. DOI:10.1111/j.1540-5915.1990.tb00327.x · 1.36 Impact Factor
  • Sangit Chatterjee, Aykut Firat
    The American Statistician 02/2007; 61(August):248-254. · 0.88 Impact Factor
  • Frederick Wiseman, Sangit Chatterjee
    [Show abstract] [Hide abstract]
    ABSTRACT: Researchers have investigated the relationship between different shot-making measures and performance on the PGA Tour. Prior studies have typically focused on a short period of time or used a restricted sample so long-term trends were not discernible. To remedy this situation, the present study looked at the longitudinal performance of professional golfers from 1990-2004. The findings indicated a remarkable stability in terms of the relative importance of Greens In Regulation and Putting Average in explaining the variability in Scoring Average. The findings also indicated a declining importance of driving in recent years due, in part, to a strengthening of the negative relationship between Driving Distance and Driving Accuracy.
    Perceptual and Motor Skills 03/2006; 102(1):109-17. DOI:10.2466/PMS.102.1.109-117 · 0.66 Impact Factor
  • Source
    Frederick Wiseman, Sangit Chatterjee
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper examines the relationship between team payroll and team performance in major league baseball from 1985 to 2002. The results indicate that the relationship has changed over time. Unlike the early years, there is now a much clearer relationship between payroll and performance. Specifically, in the latter part of the 1990s and continuing into the 21st century, the greater the team payroll and the more equally this payroll is distributed among team members, the better the on-field performance of the team. This is a problem of particular concern because of the growing disparity in team payrolls which, in turn, affects the competitive balance of the sport. This growing disparity was also at the heart of last year's contract negotiations between players and owners.
  • Sangit Chatterjee, Frederick Wiseman, Robert Perez
    [Show abstract] [Hide abstract]
    ABSTRACT: The topic of improved performances by athletes in both team and individual sports has shown that each sport has its own unique set of characteristics and these have to be analysed accordingly. This paper presents an extensive analysis of the nature and extent of improvement in golf by analysing the performances of the top players in the Masters tournament throughout the entire history of the event. The results indicate that golfers are obtaining lower scores over time and that the variation of the scores has declined. Further, the distributions of scores are symmetric and display a monotonic reduction of peakedness (kurtosis). These findings are indicative of rapid and improved performance and increased competition.
    Journal of Applied Statistics 11/2002; 29(8):1219-1227. DOI:10.1080/0266476022000011283 · 0.45 Impact Factor
  • Frederick Wiseman, Sangit Chatterjee
    [Show abstract] [Hide abstract]
    ABSTRACT: A dataset consisting of salaries of major league baseball players is published at the start of each season in USA Today, and is also made available on the Internet. It is argued that such an easily available dataset and those similar to it can be successfully used by students in a first statistics course for an interesting introduction to data analysis through summary measures and graphical displays. Such an approach is most natural for many students because of a strong interest in sports and economics. Other statistical ideas can be explored as a natural consequence of the discussions that ensue from such an analysis.
    The American Statistician 11/1997; 51(4):350-352. DOI:10.1080/00031305.1997.10474411 · 0.88 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The won-loss percentage for all games played in a season by NBA teams is modeled as a function of team performance statistics. Four variables field goals, free throws, rebounds, and turnovers are found to be statistically significant and account for more than 90% of the variation in the data. The regression coefficients also appear to be stable from year to year. Predictions for four years obtained from the estimates of one season show good agreement with the observed data. Forecasts for the 1993 season are also given based on mid-season statistics. Predictions for play-off winners are also briefly discussed. The analysis is carried out in the spirit of exploratory data analysis using graphical methods.
    Managerial and Decision Economics 09/1994; 15(5):521 - 535. DOI:10.1002/mde.4090150514
  • Sangit Chatterjee, A. Narayanan, Frederick Wiseman
    [Show abstract] [Hide abstract]
    ABSTRACT: A new method for discriminating among multivariate populations, called the Hausdorff procedure, is introduced to the marketing literature. Rules for classification are defined and a limited simulation study is conducted. For the simulation, both the level of collinearity among the discriminating variables and the level of overlap among the populations are varied. The results indicate that this new procedure is particularly suitable when there is either a high degree of collinearity among the predictor variables or considerable overlap of the populations being investigated. The Hausdorff procedure is also applied to two sets of consumer data. In each instance, it is found to be superior to linear discriminant analysis with respect to the percentage of correct classifications.
    Marketing Letters 09/1993; 4(4):349-360. DOI:10.1007/BF00994353 · 0.63 Impact Factor
  • Sangit Chatterjee, Linda Jamieson, Frederick Wiseman
    [Show abstract] [Hide abstract]
    ABSTRACT: At the mathematical level, a factor or principal component of a factor analysis is simply a linear combination of variables under some constraints. Therefore, as in regression analysis, there are conditions under which individual or joint observations can be influential in the sense that their presence or absence significantly influences the obtained values of the estimated factor loadings. The nature of these effects as well as potential effects due to “gross errors” in the data set should be investigated in order to determine which observations, if any, need to be analyzed separately or excluded entirely. The purpose of this paper is (1) to propose a new technique for identifying influential observations and observations containing “gross errors” and (2) to discuss situations under which each is likely to significantly alter the results of a factor analysis.
    Marketing Science 05/1991; 10(2):145-160. DOI:10.1287/mksc.10.2.145 · 2.36 Impact Factor
  • Source
    Ravi Sarathy, Sangit Chatterjee
    [Show abstract] [Hide abstract]
    ABSTRACT: A comparison of the balance sheet structure of the largest U.S. and Japanese firms for 1979 shows that Japanese firms significantly differ from U.S. firms in their greater reliance on bank-funded short-term debt, low levels and composition of net working capital, parsimonious use of stockholder's equity, and greater commitment to long-term investments in other corporations.© 1984 JIBS. Journal of International Business Studies (1984) 15, 75–89
    Journal of International Business Studies 09/1984; 15(3):75-89. DOI:10.1057/palgrave.jibs.8490496 · 3.56 Impact Factor
  • Sangit Chatterjee, Frederick Wiseman
    [Show abstract] [Hide abstract]
    ABSTRACT: In a regression analysis there may be certain data points (which may or may not be outliers) that are influential in the sense that their presence or absence significantly influences the obtained values of the estimated regression coefficients. The nature of these effects needs to be analyzed in order to determine which, if any, data points should be removed from the data set in order to improve coefficient estimates. A relatively new technique for identifying influential data points is called regression diagnostics. In this presentation, the technique is discussed and its potential usefulness demonstrated by an application on a data set previously analyzed by Tufte (1974).
    American Journal of Political Science 08/1983; 27(3):601. DOI:10.2307/2110986 · 2.76 Impact Factor
  • Source
    Bruce Carson, Sangit Chatterjee, Frederick Wiseman
    [Show abstract] [Hide abstract]
    ABSTRACT: Two simple statistics, a vector length and a vector angle, calculated from the Go board were investigated in order to determine whether they could be used to predict the type of game being played - a game between two amateurs or a game between two professionals. Results indicated the former variable was of predictive value, while the latter was not. Two classification schemes were used for prediction purposes -- linear discriminant analysis and Classification and Regression Trees (CART). The fact that the vector length was of predictive power should encourage others to calculate similar statistics from the Go board so as to bring about an increase in predictive power. This increase in predictive power may eventually lead to ideas in programming the game of Go.