Conference Paper

Implement of salary prediction system to improve student motivation using data mining technique

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents a salary prediction system using a profile of graduated students as a model. A data mining technique is applied to generate a model to predict a salary for individual students who have similar attributes to the training data. In this work, we also made an experiment to compare five data mining techniques including Decision trees, Naive Bayes, K-Nearest neighbor, Support vector machines, and Neural networks to find the suitable technique to the salary prediction. In the experiment, 13,541 records of graduated student data were used with 10-fold cross validation method. Results showed that K-Nearest neighbor provided the best efficiency to be used as a model for salary prediction. For usage evaluation, a questionnaire survey was conducted with 50 user samplings and a result showed that the system was effective in boosting students' motivation for studying and also gave them a positive future viewpoint. The result also informed that they found they satisfied with the implemented system since the system was easy to use, and the prediction results were simple to understand without requiring any background knowledge.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Another goal is to detect several styles of learner behavior and forecast his performance [9]. One more goal is to forecast the student's pay after graduation based on the student's previous record and behavior during the study [10]. In general, services are considered products. ...
... As an application of data mining techniques in the education industry, Khongchai and Songmuang [10] devised an incentive for students by estimating the learner's future pay. Learners are often bored with academic studies. ...
... (J48 DT), Multilayer Perceptron (MLP), and Support Vector Machine (SVM) were compared in terms of accuracy. As findings reveal in [10], the k-NN algorithm is the most accurate algorithm with an 84.7 percent accuracy level. 297 students' records were used as a dataset in [18]. ...
Article
Many business applications rely on their history data to anticipate their company future. The marketing products process is one of the essential procedures for the firm. Customer needs supply a useful piece of information that helps to promote the suitable products at the proper moment. Moreover, services are recognized recently as products. The development of education and health services is reliant on historical data. For the more, lowering online social media networks problems and crimes need a big supply of information. Data analysts need to utilize an efficient categorization system to predict the future of such businesses. However, dealing with a vast quantity of data demands tremendous time to process. Data mining encompasses numerous valuable techniques that are used to anticipate statistical data in a number of business applications. The classification technique is one of the most extensively utilized with a range of algorithms. In this work, numerous categorization methods are revised in terms of accuracy in diverse domains of data mining applications. A complete analysis is done following delegated reading of 20 papers in the literature. This study intends to allow data analysts to identify the best suitable classification algorithm for numerous commercial applications including business in general, online social media networks, agriculture, health, and education. Results reveal FFBPN is the best accurate algorithm in the business arena. The Random Forest algorithm is the most accurate in categorizing online social networks (OSN) activity. Naïve Bayes method is the most accurate to classify agriculture datasets. OneR is the most accurate method to classify occurrences inside the health domain. The C4.5 Decision Tree method is the most accurate to classify students’ records to forecast degree completion time.
... Another goal is to detect several styles of learner behavior and forecast his performance [9]. One more goal is to forecast the student's salary after graduation based on the student's previous record and behavior during the study [10]. In general, services are considered products. ...
... k-Nearest Neighbors (k-NN), Naïve Bayes (NB), J48 Decision Tree (J48 DT), Multilayer Perceptron (MLP), and Support Vector Machine (SVM) were compared in terms of accuracy. As results show in [10], the k-NN algorithm is the most accurate algorithm with an 84.7 percent accuracy level. 297 students' records were used as a dataset in [18]. ...
... The Cross-Industry Standard Process for Data Mining (CRISP-DM) model was used. WEKA and (R tool) are data mining tools based on open-source language applied for statistical and data analysis.As an application of data mining techniques in the education field, Khongchai and Songmuang[10] created an incentive for students by predicting the learner's future salary. Learners are often bored with academic studies. ...
Article
Full-text available
Many business applications rely on their historical data to predict their busi- ness future. The marketing products process is one of the core processes for the business. Customer needs give a useful piece of information that helps to market the appropriate products at the appropriate time. Moreover, services are considered recently as products. The development of education and health services is depending on historical data. For the more, reducing online social media networks problems and crimes need a significant source of in- formation. Data analysts need to use an efficient classification algorithm to predict the future of such businesses. However, dealing with a huge quantity of data requires great time to process. Data mining involves many useful techniques that are used to predict statistical data in a variety of business ap- plications. The classification technique is one of the most widely used with a variety of algorithms. In this paper, various classification algorithms are re- vised in terms of accuracy in different areas of data mining applications. A comprehensive analysis is made after delegated reading of 20 papers in the li- terature. This paper aims to help data analysts to choose the most suitable classification algorithm for different business applications including business in general, online social media networks, agriculture, health, and education. Results show FFBPN is the most accurate algorithm in the business domain. The Random Forest algorithm is the most accurate in classifying online social networks (OSN) activities. Naïve Bayes algorithm is the most accurate to classify agriculture datasets. OneR is the most accurate algorithm to classify instances within the health domain. The C4.5 Decision Tree algorithm is the most accurate to classify students’ records to predict degree completion time.
... Machine learning implementations are growing in every organization for predicting their personnel working nature and the ability of the employee by calculating the time taken by the employee to ample the task [4]. A lot of studies [3,[5][6][7][8] have been published recently to evaluate how well different classification algorithms fare in forecasting employee salary classifications. Among these algorithms are Bayesian belief networks, naïve Bayes, support vector machines, decision trees, and neural networks [9]. ...
Article
Full-text available
Machine learning implementations are growing in every organization for forecasting their employee's working nature and competence by calculating the time taken by the employee to complete the task. Numerous recent research have been published to evaluate the effectiveness of various clssification algorithms for predicting the employee salary classes. From the machine learning perception, Salary prediction is a difficult task due to the small sample size, relatively high dimensionality, and presence of noise. To address this, to find more useful features, deeper architectures are required. Additionally, more data analysis and data processing can be pragmatic to make the prediction model go beyond the correlation and precision standards by feature extraction techniques. Hence, this study proposes an enhanced method for salary prediction that selects a subset of characteristics from all available data using a PCA system and a deep neural network (DNN) model for the classification process. Upon assessment with other classical machine learning methods such as DT and RF. Better classification accuracy, precision, recall, and F-score are achieved by the proposed DNN model. Furthermore, the proposed DNN model achieves the highest MAE of 94.9% as compared to DT and RF, which attain an MAE score of 89.6% and 76.4% respectively. This result suggests that the proposed model has a prediction error of 5.1% which is fewer when compared to DT and RF which has prediction error of as much as 10.4% and 23.6%, thereby, signifying the dominance of deep learning algorithm over conventional machine learning algorithms in salary classification and prediction task.
... Following mandatory data cleansing, the final dataset included 1170 instances with ten features, resulting in a final response rate of 49.76%. Khongchai and Songmuang (2016) have suggested experimenting with machine learning techniques like Naive Bayes, K-Nearest neighbor, Support vector machines, and Neural networks to advance the prediction of salaries and enhance student motivation. To partition the train and test datasets, a resampling technique of 10-fold cross-validation was employed. ...
Article
India’s young and dynamic workforce is a boon to the economy, but the challenge of high employee turnover looms large. Tosustain viability and attract top talent, organizations must prioritize the well-being and growth of their workforce. Providingopportunities for development and nurturing emerging talent ensures a continuous influx of fresh ideas. To retain employeeswho value fair remuneration, organizations must devise a comprehensive compensation strategy aligned with market trendsand employee expectations. In the cut-throat business world, attracting and retaining top talent is a crucial differentiator, andcompanies must adapt to the evolving needs of their employees to provide an environment that fosters growth, innovation, andjob satisfaction. The present study aims to apply five distinct machine learning algorithms to a sample of 1,170 IT workers from 61enterprises in the NCR region to investigate the human capital characteristics that influence remuneration in Indian IT companies.The study’s results indicate that the Random Forest model performed better than other models in predicting IT compensationbased on the selected performance metric. Specifically, the study highlights experience, the candidate’s alma mater, education,and the individual’s skill set as the most significant predictors of compensation design. The study has noteworthy implicationsfor job seekers and firms seeking to attract top talent. However, the present research could not utilize a deep learning model dueto a lack of data, and future research could investigate institutional factors. Finally, four agendas have been outlined to provideadditional direction for future research in this area
... Das et al. (2020) identified experience and job position as crucial factors that impact compensation. Khongchai and Songmuang (2016) experimented with several MLM such as naive Bayes, K-nearest neighbor, support vector machines and neural networks, eventually employing random forests to develop a salary prediction system aimed at enhancing student motivation. This system implemented the resampling technique of 10fold cross-validation to divide the data set into training and testing subsets. ...
Article
Purpose Amidst the turbulent tides of geopolitical uncertainty and pandemic-induced economic disruptions, the information technology industry grapples with alarming attrition and aggravating talent gaps, spurring a surge in demand for specialized digital proficiencies. Leveraging this imperative, firms seek to attract and retain top-tier talent through generous compensation packages. This study introduces a holistic, integrated theoretical framework integrating machine learning models to develop a compensation model, interrogating the multifaceted factors that shape pay determination. Design/methodology/approach Drawing upon a stratified sample of 2488 observations, this study determines whether compensation can be accurately predicted via constructs derived from the integrated theoretical framework, employing various cutting-edge machine learning models. This study culminates in discovering a random forest model, exhibiting 99.6% accuracy and 0.08° mean absolute error, following a series of comprehensive robustness checks. Findings The empirical findings of this study have revealed critical determinants of compensation, including but not limited to experience level, educational background, and specialized skill-set. The research also elucidates that gender does not play a role in pay disparity, while company size and type hold no consequential sway over individual compensation determination. Practical implications The research underscores the importance of equitable compensation to foster technological innovation and encourage the retention of top talent, emphasizing the significance of human capital. Furthermore, the model presented in this study empowers individuals to negotiate their compensation more effectively and supports enterprises in crafting targeted compensation strategies, thereby facilitating sustainable economic growth and helping to attain various Sustainable Development Goals. Originality/value The cardinal contribution of this research lies in the inception of an inclusive theoretical framework that persuasively explicates the intricacies of a machine learning-driven remuneration model, ennobled by the synthesis of diverse management theories to capture the complexity of compensation determination. However, the generalizability of the findings to other sectors is constrained as this study is exclusively limited to the IT sector.
... Martín et al. [4] reports that Random Forest and "ensembles" perform better among several regression and classification algorithms for predicting the salaries of job offers in the information technology sector using sectoral features. Khongchai and Songmuang [5] studies the salary estimation from students' profiles where the past salaries are normalized using linear equations. KNN is reported as the best performant technique against Decision Tree (J48), Naïve Bayes, Multilayer Perceptron, and SVM. ...
Conference Paper
Knowing the salary range of a position is beneficial for both job seekers and employers. This work examines the performance of different machine learning methods on salary estimation using industrial variables. The methods are applied to a dataset obtained from Turkey’s largest employment platform Kariyer.net. We perform various exploratory analyzes of the data, then use feature engineering techniques for improving the quality of the training data. The effect of the heavy-tailed distribution of salaries is mitigated with various response variable transformations. A timeliness standardization is performed using inflation rates as data from different time periods. Analyses and experiments show that standardization does not have a significant effect on the performance of the model. On the contrary, response variable transformation seems to have a significant effect. As for the models, we conclude that the XGBoost and the artificial neural networks achieve the highest success.
... Security plays a major role in such operations in the institutions. In today's world security of systems should not be restricted to only passwords [9]. ...
Article
Full-text available
In today's world large amount of data is generated in the organizations. Analyzing and proper utilization of this data is very essential. Financial data is the backbone of any educational institute or organization. It portrays a broader picture about the current status and the future prospectus of the organization. If an organization is financially stable only then it can withstand in the market for a longer period of time. In order to achieve the status of financially strong organization, predicting the financial trends for coming future is a necessity. In order to predict the financial trends it requires to carry out a process of examining financial data sets to draw the conclusions about the behavior of the information they contain increasingly with the aid of specialized systems and software's. All of this can be achieved by Data Analytics. Data Analytics predominantly refers to assortment of applications from basic intelligence, reporting and online analytical processing to various forms of analytics. In this paper we propose Data Analytics as a method for performance evaluation over educational organization's financial data.
... Among all, Random forest performs best with 89% accuracy. It has been proved that Random Forest [10] generates the best efficiency model for pre-diction as compared to performing decision tree and C4.5.Evaluations have been done by confusion matrix. Confusion matrix is a table which divides and tells how many instances are correctly predicted and how many are not. ...
Article
Full-text available
One of the major concerns of the students after graduation is the job opportunities offered to them. Not only students, but also the universities are inclined towards maximizing the job offers for their students through campus recruitment drives. Against this background, the scope of this study is to gauge the performance of top four known classification techniques of data mining, which are, Decision tree, Random forest, Naive Bayes & KNN. These machine learning algorithms are applied on students’ data, collected from the university database of Manipal University Jaipur and student models are created which will predict the employability status of students in future and discover factors which will significantly contribute to their employability. After applying and studying the ac- curacies of these algorithms, we have found that Random forest behaves better than the rest of the algorithms with 89% accuracy.
... Khongchai P and Songmuang [5] illustrated the interface of salary prediction which contains several attributes like gender, job training, certification, and GPA which the system compared and displayed the predicted salary of 3 graduated people. They have collated the different data mining techniques in which the highest accuracy was predicted for K-nearest neighbor and lowest for Multilayer perceptron. ...
Article
Full-text available
Regression analysis is extensively used for prediction and prognostication, and its use has substantial overlap with the domain of machine learning. The main objective of this paper is to compare the performance of two regression techniques namely Simple Linear Regression (SLR) and Multiple Linear Regression (MLR) algorithms by two cases: predicting the salary of employees after certain years and predicting the prices of real estates. An employee’s salary depends on numerous factors, such as total employee experience, certifications, and overall experience as a lead and manager. The factors in predicting house prices are the area of land (sqft_living), condition, waterfront, number of bedrooms, and so on. The dataset used in this experiment is an open-source dataset from KaggleInc. The algorithms were compared using parameters like R-squared value, Mean absolute error (MAE), Mean Squared Error (MSE), Median Absolute Error (MDAE), Variance Score, and Root Mean Square Error (RMSE). Results have shown that MLR provides the better efficiency in comparison to SLR.
... For example, Lazar [2004] used support vector machine on survey data provided by the US Census Bureau for income prediction. Khongchai and Songmuang [2016] applied five methods including decision trees, K-nearest neighbor, neural networks, support vector machines, and naive Bayes for predicting salaries of individual students. Recently, LinkedIn salary prediction is raising increasing attention Kenthapadi et al. 2017]. ...
Article
Employer Brand Evaluation (EBE) is to understand an employer’s unique characteristics to identify competitive edges. Traditional approaches rely heavily on employers’ financial information, including financial reports and filings submitted to the Securities and Exchange Commission (SEC), which may not be readily available for private companies. Fortunately, online recruitment services provide a variety of employers’ information from their employees’ online ratings and comments, which enables EBE from an employee’s perspective. To this end, in this article, we propose a method named Company Profiling–based Collaborative Topic Regression (CPCTR) to collaboratively model both textual (i.e., reviews) and numerical information (i.e., salaries and ratings) for learning latent structural patterns of employer brands. With identified patterns, we can effectively conduct both qualitative opinion analysis and quantitative salary benchmarking. Moreover, a Gaussian processes--based extension, GPCTR, is proposed to capture the complex correlation among heterogeneous information. Extensive experiments are conducted on three real-world datasets to validate the effectiveness and generalizability of our methods in real-life applications. The results clearly show that our methods outperform state-of-the-art baselines and enable a comprehensive understanding of EBE.
... The system uses profiles of graduates students as the training set. Also, they use a survey to evaluate the system and prove its clearness and easiness [9,10,11]. Using salary information mentioned in the job description as the training set, we propose a system that predicts the salary of automotive job openings based on the requirement of the jobs. Our system compares between Decision trees, Gradient boosting and Random Forest regressors to predict the salary of a job. ...
... They do classification to predict the student's performance in math subject [6]. To improve student's motivation by Classification implementation to make the salary prediction system [7]. The building of the recommendation system based on prediction can also be used to help a student in the interesting course choosing [8]. ...
Chapter
Full-text available
La presente investigación se realizó en el programa de Biología, Química y Laboratorio de la Universidad Nacional de Chimborazo, Ecuador, con el objetivo de estudiar el impacto y beneficios del laboratorio como estrategia didáctica para el proceso enseñanza-aprendizaje de macromoléculas. El diseño de la investigación es de corte cuantitativo, de carácter pre-experimental, la muestra participante se ha extraído de los estudiantes de sexto y séptimo semestres de la carrera. Se ejecutó la investigación en dos fases, en la fase 1 se aplicó el cuestionario de diagnóstico de la utilización del laboratorio, con la finalidad de conocer aspectos relacionados con la frecuencia del trabajo, pertinencia de las guías usadas, características de las actividades experimentales, objetivos que persiguen las prácticas, e inconvenientes presentados al inicio del proceso de intervención; en la fase 2 se pretendió desarrollar competencias generales y específicas en la asignatura de Bioquímica en las temáticas relacionadas con macromoléculas, para lo cual, se aplicó una escala de registro del impacto del laboratorio como estrategia didáctica para el proceso enseñanza-aprendizaje de macromoléculas (Pre test); se desarrollaron diversas tipos de actividades de laboratorio, luego se valoró nuevamente el logro de las competencias (Post test). Los resultados del post test fueron significativos, se logró fortalecer en los estudiantes sus conocimientos acerca de las macromoléculas y mejorar sus habilidades para el trabajo en equipo y laboratorio. De la investigación se puede concluir que las prácticas de laboratorio mejoran el desarrollo de competencias en el aprendizaje de Bioquímica, específicamente en relación con macromoléculas
Article
Full-text available
Number of employees are increases with growing in companies. Firms basically make salary increases for their employees in order not to lose their talents and moreover to increase them. Although there is not much problem in how to in-crease the salary in small organizations, this process should be carried out carefully in terms of many parameters in large organizations and should not result in negativities that may disrupt employee motivation. For companies with a large number of employees, creating a model in which the market conditions are determined correctly and all economic parameters are taken into account reveals the need for a process that needs to be worked on for months. In this context, a machine learning-based salary increase prediction system was designed with the study. Specific attributes were determined and a specific scale was developed for performance score for this study.
Book
Full-text available
Este libro trata precisamente de la Educación del siglo XXI y cada autor, en cada capítulo, pone su mayor empeño por aportar, desde los resultados de su investigación, para que se materialicen los principios y orientaciones que el término significa. El lector podrá notar que los objetivos, teorías y resultados de los trabajos que contiene, tienen una sola finalidad: aportar el logro de la revolución al sistema de educación. El fin último es lograr que la nueva categoría de estudiantes desarrolle las habilidades, destrezas y capacidades necesarias para desempeñarse, primero como personas y luego como profesionales, en el Nuevo Orden Mundial. De otra manera estamos poniendo en riesgo la misma supervivencia de la especie, porque esta categoría de estudiantes será la encargada de solucionar los problemas complejos que notros le estamos heredando.
Chapter
Full-text available
La necesidad de desarrollar competencias digitales y promover las carreras STEM se evidencia como respuesta al sistema educativo imperante, producto de la pandemia y las desigualdades en nuestro país. La enseñanza STEM, que desde principios de siglo ha propiciado una enseñanza interdisciplinaria, tiene como característica la incorporación de las TIC en los procesos de enseñanza aprendizaje. En este trabajo se evalúan las competencias digitales de entrada en estudiantes de formación inicial de Profesor de Química, con énfasis en competencias STEM, y los compromisos asumidos en los planes de estudio para formar profesores en este contexto. La investigación de tipo Mixta, se llevó a cabo en estudiantes de primer semestre, aplicando el Cuestionario de Competencia Digital del Alumnado de Educación Superior (CDAES), para determinar las competencias digitales, y se desarrolló una Matriz de Análisis de Categorías Apriorísticas contenidas en los programas que comprometen la formación en esta área. Los y las estudiantes declararon tener competencias básicas, más bien instrumentales, sin grandes diferencias de género. El análisis de los programas de primer año muestra que las competencias digitales se alejan de las competencias declaradas por el MINEDUC, pero potencian niveles intermedios que permitirían el logro de niveles cognitivos más complejos con su avance curricular. En conclusión, los y las estudiantes declararon una actitud positiva frente a las competencias digitales, que deberían desarrollarse progresivamente durante su FID, con el fin de que los futuros Profesores de Química tengan las competencias digitales docentes que les permitan ser capaces de abordar sus desafíos en el aula y promover en sus estudiantes el seguir carreras STEM que permitan caminar hacia la sustentabilidad.
Article
In this paper an attempt has been made to develop a quantitative approach for predicting the factors that affect the salary of an individual. The Aspiring Minds’ Employability Outcomes (AMEO-2015) dataset consisting of Aspiring Minds’ Computer Adaptive Test (AMCAT) score along with job seeker personal and employment details of Indian students has been considered for the study. It has been observed from the analysis that B.Tech is the most preferred course in India with Electronics and Communication engineering stream as the most preferred branch with highest package around 13 lakhs per annum and average package around 5 lakhs but with 50% of engineers are underemployed. It is observed that there is no linear relation between college score and salary, there are many other factors which play a role in deciding the different amount of salary for students who have same college scores. In order to analyze the effect of more than one independent variable on dependent variable multiple linear regression models has been applied. The model has been used on the training data to predict dependent variables and to extract features with highest impact on salary prediction. It has been observed that quant and logical scores are the best predictors for salary. The developed model has root relative squared error as 82.3056 %. The study concludes that efforts are required for developing the skills with amendment in the educational policies and course curriculum.
Article
Full-text available
Using data from the 2006 Survey of Recent College Graduates, this study examines how education–job match and salary may explain recent college graduates’ job satisfaction in the public, non-profit, and for-profit sectors. The results imply that while education–job match increases job satisfaction in all three sectors, for-profit workers may compensate the loss in job satisfaction due to poor match with increased satisfaction from higher salary. The findings suggest that, in the public and non-profit sectors, increased salary cannot make up the loss in job satisfaction from poor education–job match as much as it does in the for-profit sector.
Article
Full-text available
This paper proposes to apply data mining techniques to predict school failure and dropout. We use real data on 670 middle-school students from Zacatecas, México, and employ white-box classification methods, such as induction rules and decision trees. Experiments attempt to improve their accuracy for predicting which students might fail or dropout by first, using all the available attributes; next, selecting the best attributes; and finally, rebalancing data and using cost sensitive classification. The outcomes have been compared and the models with the best results are shown.
Article
Full-text available
Educational data mining (EDM) is an emerging interdisciplinary research area that deals with the development of methods to explore data originating in an educational context. EDM uses computational approaches to analyze educational data in order to study educational questions. This paper surveys the most relevant studies carried out in this field to date. First, it introduces EDM and describes the different groups of user, types of educational environments, and the data they provide. It then goes on to list the most typical/common tasks in the educational environment that have been resolved through data-mining techniques, and finally, some of the most promising future lines of research are discussed.
Article
Full-text available
Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.
Article
Full-text available
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g.\ Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives-accuracy, F-measure, precision, and recall-since each is appropriate in different situations. The results reveal that a new feature selection metric we call 'Bi-Normal Separation' (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly challenging for induction algorithms. A new evaluation methodology is offered that focuses on the needs of the data mining practitioner faced with a single dataset who seeks to choose one (or a pair of) metrics that are most likely to yield the best performance. From this perspective, BNS was the top single choice for all goals except precision, for which Information Gain yielded the best result most often. This analysis also revealed, for example, that Information Gain and Chi-Squared have correlated failures, and so they work poorly together. When choosing optimal pairs of metrics for each of the four performance goals, BNS is consistently a member of the pair---e.g., for greatest recall, the pair BNS + F1-measure yielded the best performance on the greatest number of tasks by a considerable margin.
Conference Paper
Full-text available
This paper proposes a LMS (learning management system) with intelligent agent to provide effective adaptive messages to a learner. The unique features of this paper are shown as follows: The agent system proposed in this paper has a learner model, which is automatically and continually constructed by applying the decision tree model constructed from the learning histories data stored in the data-base. The constructed leaner model predicts a learner's future final status (1. Failed, 2. Abandon, 3. Successful, 4.Excellent) using his/her current learning history data. The constructed leaner model becomes more exact as the amount of data accumulated in the database increases. The agent system presents the optimal instructional message based on the learner's predicted future state. The agent provides some attention cues according to Ueno (2004) at the timing when a learner begins to be bored with his/her learning. In addition, this paper demonstrates the effectiveness of this system through actual e-learning classes.
Article
The ability to predict students' mark could be useful in a great number of different ways associated with university-level learning. In this study, student's mark prediction models have been developed using institutional internal databases and external open data sources. The results of empirical study for undergraduate students' first year mark prediction show that prediction models based on institutional internal and external data sources provide better performance with more accurate models compared to the models based on only institutional internal data sources. Moreover, this study explores the external data sources (such as National Student Survey result) as one of the best predictors in students' mark prediction. Also, we found that students' first semester performance is the most informative for their first year performance. We envisage that results such as the ones described in this study may increasingly improve the design of future students' predictive models to support students to perform better in their study.
Article
The authors' purpose was to investigate the predictive value of faculty salaries on outgoing salaries of master of business administration (MBA) students when controlling for other student and program variables. Data were collected on 976 MBA programs using Barron's Guide to Graduate Business Schools over the years 1988–2005 and the Princeton Review's The Best 295 Business Schools 2014 edition. A hierarchical linear regression analysis was conducted with student and program characteristics as control variables, faculty salary as the predictor variable, and average outgoing salary as the dependent variable. In general, higher faculty salaries were associated with higher starting salaries for MBA students upon graduation. Potential explanations and limitations are discussed.
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Article
Several studies have considered whether American college students' hold ‘realistic’ wage expectations. The consensus is that they do not – overestimation of future earnings is in the region of 40–50%. But is it just college students who overestimate the success they will have in the labor market, or is this something common to all young adults? In this paper, I analyze National Educational Longitudinal Study (1988) data to consider whether 20-year-old college men are more realistic about their future income than their peers (of the same age) who are already in the labor force. My findings suggest that young people in employment actually make worse predictions of their future income (on average) than certain student groups, so long as the latter successfully obtain a university degree.
Article
Infants and young children appear to be propelled by curiosity, driven by an intense need to explore, interact with, and make sense of their environment. As one author puts it, "Rarely does one hear parents complain that their pre-schooler is 'unmotivated' " (James Raffini 1993). Unfortunately, as children grow, their passion for learning frequently seems to shrink. Learning often becomes associated with drudgery instead of delight. A large number of students--more than one in four--leave school before graduating. Many more are physically present in the classroom but largely mentally absent; they fail to invest themselves fully in the experience of learning. Awareness of how students' attitudes and beliefs about learning develop and what facilitates learning for its own sake can assist educators in reducing student apathy. What Is Student Motivation? Student motivation naturally has to do with students' desire to participate in the learning process. But it also concerns the reasons or goals that underlie their involvement or noninvolvement in academic activities. Although students may be equally motivated to perform a task, the sources of their motivation may differ. A student who is intrinsically motivated undertakes an activity "for its own sake, for the enjoyment it provides, the learning it permits, or the feelings of accomplishment it evokes" (Mark Lepper 1988). An extrinsically motivated student performs "in order to obtain some reward or avoid some punishment external to the activity itself," such as grades, stickers, or teacher approval (Lepper). The term motivation to learn has a slightly different meaning. It is defined by one author as "the meaningfulness, value, and benefits of academic tasks to the learner--regard-less of whether or not they are intrinsically interesting" (Hermine Marshall 1987). Another notes that motivation to learn is characterized by long-term, quality involvement in learning and commitment to the process of learning (Carole Ames 1990).
Article
Several techniques for discriminant analysis are applied to a set of data from patients with severe head injuries, for the purpose of prognosis. The data are such that multidimensionality, continuous, binary and ordered categorical variables and missing data must be coped with. The various methods are compared using criteria of prognostic success and reliability. In general, performance varies more with choice of the set of predictor variables than with that of the discriminant rule.
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.
Article
Neural network ensemble is a learning paradigm where many neural networks are jointly used to solve a problem. In this paper, the relationship between the ensemble and its component neural networks is analyzed from the context of both regression and classification, which reveals that it may be better to ensemble many instead of all of the neural networks at hand. This result is interesting because at present, most approaches ensemble all the available neural networks for prediction. Then, in order to show that the appropriate neural networks for composing an ensemble can be effectively selected from a set of available neural networks, an approach named GASEN is presented. GASEN trains a number of neural networks at first. Then it assigns random weights to those networks and employs genetic algorithm to evolve the weights so that they can characterize to some extent the fitness of the neural networks in constituting an ensemble. Finally it selects some neural networks based on the evolved weights to make up the ensemble. A large empirical study shows that, compared with some popular ensemble approaches such as Bagging and Boosting, GASEN can generate neural network ensembles with far smaller sizes but stronger generalization ability. Furthermore, in order to understand the working mechanism of GASEN, the bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.
Conference Paper
We report on the use of various Machine Learning algorithms on an electronic database of breast cancer patients. The task was to predict breast cancer recurrence using a short subset of clinical attributes such as tumor presence, tumor size, invasive nature of tumor, number of lymph nodes involved, severity of lymphedema and stage of tumor. The predictive accuracy over fifty runs employing test sets not used to build the model were 63.4%(Cart), 63.9%(C45), 62.5%(C45rules), 66.4%(FOCL) and 68.3%(Naive Bayes). An extension of the model using additional features and larger datasets is contemplated.
Article
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods. All the methods produce an attribute ranking, a useful devise of isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the rankings with respect to a learning scheme to find the best attributes. Results are reported for a selection of standard data sets and two learning schemes C4.5 and naive Bayes.
Article
Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.
Article
The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
Predicting School Failure and Dropout by Using Data Mining Techniques
  • V Marquez
  • Carlos
Feature Selection and Classification Methods for Decision Making: A Comparative Analysis
  • V Osiris
Osiris V., "Feature Selection and Classification Methods for Decision Making: A Comparative Analysis", College of Engineering and Computing Nova Southeastern University, 2015.