
Halil Altay Güvenir- Professor
- Professor at Bilkent University
Halil Altay Güvenir
- Professor
- Professor at Bilkent University
About
129
Publications
28,556
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,691
Citations
Introduction
Halil Altay Güvenir currently works at the Department of Computer Engineering, Bilkent University. Halil Altay does research in Computer Engineering, Data Mining and Computing in Mathematics, Natural Science, Engineering and Medicine.
Current institution
Additional affiliations
January 1987 - September 2019
September 1987 - present
Education
August 1982 - July 1987
Publications
Publications (129)
Son yıllarda kripto paraların artan popülaritesi yatırımcıların da dikkatini çekmeyi başarmıştır. Yatırımlarını spekülatif bir yatırım aracı olan kripto paralarla da değerlendirmek isteyen yatırımcılar volatilesi çok yüksek olan bu paralara yatırım yapmaktadır. Ancak son yıllarda sayıları hızla artan binlere ulaşan kripto para birimlerinin birçoğu...
This paper presents a method for the detection of animated scenes in movie trailers. Regardless of the studio, artists, and the unique features of diverse animation creation techniques, machine learning tools can provide concise detection methods of animated scenes in movie components. A dataset is prepared by selecting scenes from trailers using s...
Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommende...
This study proposes a robust similarity score-based time series feature extraction method that is termed as Window-based Time series Feature ExtraCtion (WTC). Specifically, WTC generates domain-interpretable results and involves significantly low computational complexity thereby rendering itself useful for densely sampled and populated time series...
Aims
The aims of this study include (i) pursuing data-mining experiments on the Angiotensin II-Antagonist in Paroxysmal Atrial Fibrillation (ANTIPAF-AFNET 2) trial dataset containing atrial fibrillation (AF) burden scores of patients with many clinical parameters and (ii) revealing possible correlations between the estimated risk factors of AF and...
Aims
Data mining is the computational process to obtain information from a data set and transform it for further use. Herein, through data mining with supportive statistical analyses, we identified and consolidated variables of the Flecainide Short-Long (Flec-SL—AFNET 3) trial dataset that are associated with the primary outcome of the trial, recur...
Objective:
The aim of the presented study is to investigate the impact of progesterone change in the late follicular phase on the pregnancy rates of both agonist and antagonist protocols in normoresponders.
Study design:
A total of 201 normoresponder patients, who underwent embryo transfer were consecutively selected. 118 patients were stimulate...
In medicine, estimating the chance of success for treatment is important in deciding whether to begin the treatment or not. This paper focuses on the domain of in vitro fertilization (IVF), where estimating the outcome of a treatment is very crucial in the decision to proceed with treatment for both the clinicians and the infertile couples. IVF tre...
Ex vivo recorded action potentials (APs) in human right atrial tissue from patients in sinus rhythm (SR) or atrial fibrillation (AF) display a characteristic spike-and-dome or triangular shape, respectively, but variability is huge within each rhythm group. The aim of our study was to apply the machine-learning algorithm ranking instances by maximi...
Objectives: Atrial fibrillation (AF) is the most common sustained arrhythmia in general population. Although several studies reported a high incidence and prevalence of AF in haemodialysis patients there is no substantial data on thromboembolic risk scales such as CHADS2 or CHADS-Vasc predicting stroke and mortality. The aim of this analysis was to...
Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constitutes a major public health problem. Patients with AF often have a variety of co-morbidities and need frequent hospitalizations. The present retrospective cohort study used medical claims data to evaluate the rates of hospitalization in patients with AF...
Many machine learning algorithms require the features to be categorical. Hence, they require all numeric-valued data to be discretized into intervals. In this paper, we present a new discretization method based on the receiver operating characteristics (ROC) Curve (AUC) measure. Maximum area under ROC curve-based discretization (MAD) is a global, s...
The aim of this study was to validate additive and logistic European System for Cardiac Operative Risk Evaluation (EuroSCORE) models on Turkish adult cardiac surgical population.
TurkoSCORE project involves a reliable web-based database to build up Turkish risk stratification models. Current patient population consisted of 9443 adult patients who u...
Voting features based classifiers, shortly VFC, have been shown to perform well on most real-world data sets. They are robust to irrelevant features and missing feature values. In this paper, we introduce an extension to VFC, called voting features based classifier with feature construction, VFCC for short, and show its application to the problem o...
In a typical application of association rule learning from market basket data, a set of transactions for a fixed period of time is used as input to rule learning algorithms. For example, the well-known Apriori algorithm can be applied to learn a set of association rules from such a transaction set. However, learning association rules from a set of...
Data mining is the efficient discovery of patterns in large databases, and classification rules are perhaps the most important type of patterns in data mining applications. However, the number of such classification rules is generally too huge that selection of interesting ones among all discovered rules becomes an important task. In this paper, fa...
Classification learning is an important research topic in machine learning and data mining disciplines. In our study, CUFP (Classification by Using Feature Projections), a feature projection-based incremental classification-learning algorithm, was developed and tested on real world data sets, giving promising results. The training phase of CUFP con...
Data mining is the efficient discovery of patterns in large databases, and classification rules are perhaps the most important
type of patterns in data mining applications. However, the number of such classification rules is generally very big that
selection of interesting ones among all discovered rules becomes an important task. In this paper, fa...
Knowledge base verification, a part of the validation process in expert system development, includes checking the knowledge base for completeness and consistency to guard against a variety of errors that can arise during the process of transferring expertise from a human expert to a computer system. Regardless of how an expert system is developed,...
Cooperating experts approach attempts to integrate and coordinate the activities of multiple specialised problem solvers that come together to solve complex tasks such as design, medical diagnosis, business management and so on. Due to the different goals, knowledge and viewpoints of agents, conflicts may arise at any phase of the problem-solving p...
This paper proposes a learning mechanism to acquire structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the s...
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rule...
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rule...
A new classification algorithm, called benefit maximizing classifier on feature projections (BCFP), is developed and applied to the problem of diagnosis of gastric carcinoma. The domain contains records of patients with known diagnosis through gastroscopy results. Given a training set of such records, the BCFP classifier learns how to differentiate...
An ew instance-based learning method is presented for regression problems with high-dimensional data. As an instance-based approach, the conventional method, KNN, is very popular for classification. Although KNN performs well on classification tasks, it does not perform as well on regression problems. We have developed a new instance-based method,...
This paper describes a machine learning method, called Regression by Feature Projections (RFP), for predicting a real-valued target feature. In RFP training is based on simply storing the projections of the training
instances on each feature separately. Prediction is computed through two approximation procedures. The first approximation
process is...
Data mining is the efficient discovery of patterns in large databases, and classification rules are perhaps the most important type of patterns in data mining applications. However, the number of such classification rules is generally very big that selection of interesting ones among all discovered rules becomes an important task. In this paper, fa...
In some domains, the cost of a wrong classification may be different for all pairs of predicted and actual classes. Also the benefit of a correct prediction is different for each class. In this paper, a new classification algorithm, called BCFP (for Benefit Maximizing Classifier on Feature Projections), is presented. The BCFP classifier learns a se...
There is a great need for classification methods that can properly handle asymmetric cost and benefit constraints of classifications.
In this study, we aim to emphasize the importance of classification benefits by means of a new classification algorithm, Benefit-Maximizing classifier with Feature Intervals (BMFI) that uses feature projection based...
Many researchers have worked on example-based machine translation and different techniques have been investigated in the area.
In literature, a method of using translation templates learned from bilingual example pairs was proposed. The paper investigates
the possibility of applying the same idea for close languages where word order is preserved. I...
There is a great need for classification methods that can properly handle asymmetric cost and benefit constraints of classifications. In this study, we aim to emphasize the importance of classification benefits by means of a new classification algorithm, Benefit Maximizing classifier with Feature Intervals (BMFI) that uses feature projection based...
The expansion of on-line text with the rapid growth of the Internet imposes utilizing Data Mining techniques to reveal the information embedded in these documents. Therefore text classification and text summarization are two of the most important application areas. In this work, we attempt to integrate these two techniques to help the user to compi...
Due to the increase in data mining research and applications, selection of interesting rules among a huge number of learned rules is an important task in data mining applications. In this paper, the metrics for the interestingness of a rule is investigated and an algorithm that can classify the learned rules according to their interestingness is de...
In some domains, the cost of a wrong classification may be different for all pairs of predicted and actual classes. Also the benefit of a correct prediction is different for each class. In this paper, a new classification algorithm, called BCFP (for Benefit Maximizing Classifier on Feature Projections), is presented. The BCFP classifier learns a se...
A new classification algorithm, called BCFP (for Benefit Maximizing Classifier on Feature Projections), is developed and applied to the problem of diagnosis of gastric carcinoma. The domain contains records of patients with known diagnosis through gastroscopy results. Given a training set of such records, the BCFP classifier learns how to different...
The expansion of on-line text with the rapid growth of the Internet imposes utilizing Data Mining techniques to reveal the information embedded in these documents. Therefore text classification and text summarization are two of the most important application areas. In this work, we attempt to integrate these two techniques to help the user to compi...
Clustered linear regression (CLR) is a new machine learning algorithm that improves the accuracy of classical linear regression by partitioning training space into subspaces. CLR makes some assumptions about the domain and the data set. Firstly, target value is assumed to be a function of feature values. Second assumption is that there are some lin...
A new classication algorithm, called CFI (for Classication on Feature Intervals)
In this paper, we propose an algorithm for finding frequent itemsets in transaction databases. The basic idea of our algorithm is inspired from the Direct Hashing and Pruning (DHP) algorithm, which is in fact a variation of the well-known Apriori algorithm. In the DHP algorithm, a hash table is used in order to reduce the size of the candidate k+1...
A mechanism for learning lexical correspondences between two languages from sets of translated sentence pairs is presented. These lexical level correspondences are learned using analogical reasoning between two translation examples. Given two translation examples, the similar parts of the sentences in the source language must correspond to the simi...
This paper describes a machine learning method, called Regression by Selecting Best Feature Projections
(RSBFP). In the training phase, RSBFP projects the training data on each feature dimension and aims to find the predictive power of each feature attribute by constructing simple linear regression lines, one per each continuous feature and number...
Abstract This paper describes a machine learning method, called Regression by Selecting Best Feature Projections (RSBFP) In the training phase, RSBFP projects the training data on each feature dimension and aims to find the predictive power of each feature attribute by constructing simple linear regression lines, one per each continuous feature and...
Most of the data mining algorithms produce a long list of rules in which it is up to user to find the ones that are really important and profitable. An automated process is needed in order to eliminate this undesired situation. What we may call a second level of data mining is this process of finding interesting rules among already generated ones....
This paper proposes a new approach to classification based on a majority voting on individual classifications made by the projections of the training set on each feature. We have applied the k-nearest neighbor algorithm to determine the classifications made on individual feature projections. We called the resulting algorithm k-NNFP, for k-Nearest N...
One of application areas of the genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called cont...
This paper describes a machine learning method, called Regression on Feature Projections (RFP), for predicting a real-valued target feature, given the values of multiple predictive features. In RFP training is based on simply storing the projections of the training instances on each feature separately. Prediction of the target value for a query poi...
This paper presents an expert system for differential diagnosis of erythemato-squamous diseases incorporating decisions made by three classification algorithms: nearest neighbor classifier, naive Bayesian classifier and voting feature intervals-5. This tool enables doctors to differentiate six types of erythemato-squamous diseases using clinical an...
A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of Erythemato-Squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records the VFI5 classifier learns how to differentiate a new case in the domain. VFI5 r...
http://funapp.cs.bilkent.edu.tr/DataSets/
Citation:
H. Altay Guvenir and I. Uysal, Bilkent University Function Approximation Repository, http://funapp.cs.bilkent.edu.tr, 2000.
Predicting or learning numeric features is called regression in the statistical literature, and it is the subject of research in both machine learning and statistics. This paper reviews the important techniques and algorithms for regression developed by both communities. Regression is important for many applications, since lots of real life problem...
This paper is about the implementation of a visual tool for Differential Diagnosis of Erythemato-Squamous Diseases based on the classification algorithms; Nearest Neighbor Classifier (NN), Naive Bayesian Classifier using Normal Distribution (NBC) and Voting Feature Intervals-5 (VFI5). This tool enables the doctors to differentiate six types of Eryt...
This report is about the implementation of a visual tool for Differential Diagnosis of Erythemato-Squamous Diseases based on the classification algorithms; Nearest Neighbor Classifier (NN), Naive Bayesian Classifier using Normal Distribution (NBCN) and Voting Feature Intervals-5 (VFI5). This tool enables the doctors to perform all the necessary ope...
The development of bar-code technology provided accurate and large market databases for researchers who deal with datasets. Since the data is large both in dimension and size, most exploratory analysis techniques of statistics are not appropriate for such tasks. In this paper, we describe a high-level algorithm, and the application of it on a large...
Bankruptcy prediction has been an important decision-making process for financial analysts. One of the most common approaches for the bankruptcy prediction problem is the Discriminant Analysis. Also, the k-Nearest Neighbor classifier is very successful in such domains. This paper proposes a Feature Projection based classification algorithm, and exp...
A new classification algorithm called VFI (for Voting Feature Intervals) is proposed. A concept is represented by a set of feature intervals on each feature dimension separately. Each feature participates in the classification by distributing real-valued votes among classes. The class receiving the highest vote is declared to be the predicted class...
This paper describes a machine learning method, called Regression on Feature Projections (RFP), for predicting a real-valued target feature, given the values of multiple predictive features. In RFP training is based on simply storing the projections of the training instances on each feature separately. Prediction of the target value for a query poi...
Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the presence of such irrelevant features. In this paper, w...
. Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the presence of such irrelevant features. In this paper,...
CONFIDENCE FACTOR ASSIGNMENT TO TRANSLATION TEMPLATES Zeynep Orhan M.S. in Computer Engineering and Information Science Supervisor: Asst. Prof. Ilyas C icekli September, 1998 TTL (Translation Template Learner) algorithm learns lexical level correspondences between two translation examples by using analogical reasoning. The sentences used as transla...
. This paper presents the results of the application of an instance-based learning algorithm k-Nearest Neighbor Method on Feature Projections (k-NNFP) to text categorization and compares it with k-Nearest Neighbor Classifier (k-NN). k-NNFP is similar to k-NN except it finds the nearest neighbors according to each feature separately. Then it combine...
This paper proposes a mechanism for learning lexical level correspondences between two languages from a set of translated sentence pairs. The proposed mechanism is based on an analogical reasoning between two translation examples. Given two translation examples, the similar parts of the sentences in the source language must correspond the similar p...
A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of erythemato-squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records, the VFI5 classifier learns how to differentiate a new case in the domain. VFI5...
. This paper proposes an extension to the k Nearest Neighbor algorithm on Feature Projections, called kNNFP. The kNNFP algorithm has been shown to achieve comparable accuracy with the well-known kNN algorithm. However, kNNFP algorithm has a very low time complexity compared to kNN. The extension to kNNFP introduced here assigns weights to features,...
This paper proposes a new approach to classification based on a majority voting on individual classifications made by the projections of the training set on each feature. We have applied the k-nearest neighbor algorithm to determine the classifications made on individual feature projections. We called the resulting algorithm k-NNFP, for k-Nearest N...
This article presents a new form of exemplar-based learning method, based on overlapping feature intervals. In this model, a concept is represented by a collection of overlappling intervals for each feature and class. Classifica- s . tion with Overlapping Feature Intervals COFI is a particular implementa- tion of this technique. In this incremental...
One of the application areas of genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called cont...
This paper proposes a learning mechanism to acquire structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the s...
This paper proposes a mechanism for learning structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentence...
. This paper proposes a learning mechanism to acquire structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the...
A new machine learning algorithm for the diagnosis of cardiac
arrhythmia from standard 12 lead ECG recordings is presented. The
algorithm is called VF15 for Voting Feature Intervals. VF15 is a
supervised and inductive learning algorithm for inducing classification
knowledge from examples. The input to VF15 is a training set of records.
Each record...
A new classification algorithm called VFI (for Voting Feature Intervals) is proposed. A concept is represented by a set of feature intervals on each feature dimension separately. Each feature participates in the classification by distributing real-valued votes among classes. The class receiving the highest vote is declared to be the predicted class...
This paper proposes two methods for learning feature weights to improve the classification accuracy of the k-NNFP algorithm. In the k-NNFP algorithm, instances are stored as their projections on each feature dimension. The classification of unseen examples are made on the basis of feature projections by a majority voting among the k ( k 1) predicti...
This paper proposes two methods for learning feature weights to improve the classification accuracy of the k-NNFP algorithm. In the k-NNFP algorithm, instances are stored as their projections on each feature dimension. The classification of unseen examples are made on the basis of feature projections by a majority voting among the k ( k 1) predicti...
A decision tree induction algorithm using genetic programming (GP) is presented. The best decision tree is defined as the one which achieves maximum accuracy with minimum number of internal nodes. In this approach every individual is a decision tree candidate. The results are satisfactory in the sense that it can find the optimum solution, i.e. the...
In this paper we use genetic algorithms to learn feature weights for the Nearest Neighbor classification algorithm. We represent feature weights as real values in [0..1] and their sum is 1. A new crossover operation, called continuous uniform crossover, is introduced where the legality of chromosomes is preserved after the crossover operation. This...
. This paper proposes a mechanism for learning pattern correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentences...
This paper proposes a mechanism for learning structural correspondences between two languages from a corpus of translated sentence pairs. The proposed mechanism uses analogical reasoning between two translations. Given a pair of translations, the similar parts of the sentences in the source language must correspond the similar parts of the sentence...
This paper presents a new form of exemplar-based learning, based on a representation scheme called feature partitioning, and a particular implementation of this technique called CFP (for Classification by Feature Partitioning). Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint sets of values cal...
This paper presents a new form of exemplar-based learning, based on a representation scheme called feature partitioning, and a particular implementation of this technique called CFP (for Classification by Feature Partitioning). Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint sets of values cal...
This paper proposes an extension to thekNearest Neighbor algorithm on Feature Projections, calledkNNFP. ThekNNFP algorithm has been shown to achieve comparable accuracy with the well-knownkNN algorithm. However,kNNFP algorithm has a very low time complexity compared tokNN. The extension tokNNFP introduced here assigns weights to features, therefore...
Over the recent years several prototypes of intelligent tutoring systems for scientific subjects have been developed. Meanwhile, the object-oriented paradigm has become popular in the software engineering and artificial intelligence communities. The objective of the research presented in this paper is an application of the object-oriented paradigm...
This paper reports on the preliminary phase of our ongoing research towards developing an intelligent tutoring environment for Turkish grammar. One of the components of this environment is a corpus search tool which, among other aspects of the language, will be used to present the learner sample sentences along with their morphological analyses. Fo...
Systems for inducing concept descriptions from examples are valuable tools for assisting in the task of knowledge acquisition for expert systems. In this research three machine learning techniques are applied to the problem of predicting the daily changes in the index of Istanbul Stock Market, given the price changes in other investment instruments...
Systems for inducing concept descriptions from examples are valuable tools for assisting in the task of knowledge acquisition for expert systems. In this research three machine learning techniques are applied to the problem of predicting the daily changes in the index of Istanbul Stock Market, given the price changes in other investment instruments...
. In Genetic programming (GP) applications the programs are expressed as parse trees. A node of a parse tree is an element either from the function-set or terminal-set, and an element of a terminal set can be used in a parse tree more than once. However, when we attempt to use the elements in the terminal set at most once, we encounter problems in...
The paper presents an application of the Classification by Feature Partitioning (CFP) algorithm to the problem of soil classification. CFP is an exemplar based, incremental and supervised learning algorithm. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. Application of t...
One of application areas of the genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called cont...
This paper reports on the preliminary phase of our ongoing research
towards developing an intelligent tutoring environment for Turkish
grammar. One of the components of this environment is a corpus search
tool which, among other aspects of the language, will be used to present
the learner sample sentences along with their morphological analyses.
Fo...
This report presents a new methodology of learning from examples, based on feature partitioning. Classification by Feature Partitioning (CFP) is a particular implementation of this technique, which is an inductive, incremental, and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimensio...
This report presents a new methodology of learning from examples, based on feature partitioning . Classification by Feature Partitioning (CFP) is a particular implementation of this technique, which is an inductive, incremental, and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimensi...