Vedrana Vidulin

Vedrana Vidulin
Jožef Stefan Institute | IJS · Department of Knowledge Technologies

PhD

About

42
Publications
6,965
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
457
Citations
Additional affiliations
November 2016 - December 2016
Jožef Stefan Institute
Position
  • PostDoc Position
March 2014 - November 2016
Ruđer Bošković Institute
Position
  • PostDoc Position
December 2005 - March 2014
Jožef Stefan Institute
Position
  • PhD Student

Publications

Publications (42)
Chapter
Hierarchical multi-label classification (HMC) is a supervised machine learning task, where each example can be assigned more than one label and the possible labels are organized in a hierarchy. HMC problems emerge in domains like functional genomics, habitat modelling, text and image categorization. They can be addressed with global model induction...
Article
Full-text available
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of...
Article
Full-text available
The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed...
Article
The increase of the number of web pages prompts for improvement of the search engines. One such improvement is specifying the desired web genre of the resulting web pages. The prediction of web genres triggers expectations about the type of information contained in a given web page. More specifically, web genres can be seen as textual categories su...
Article
Full-text available
Background: The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. Results: We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of g...
Chapter
In this work, we address the task of phenotypic traits prediction using methods for semi-supervised learning. More specifically, we propose to use supervised and semi-supervised classification trees as well as supervised and semi-supervised random forests of classification trees. We consider 114 datasets for different phenotypic traits referring to...
Conference Paper
Full-text available
In this work, we address the task of phenotypic traits prediction using methods for semi-supervised learning. More specifically, we propose to use supervised and semi-supervised classification trees as well as supervised and semi-supervised random forests of classification trees. We consider 114 datasets for different phenotypic traits referring to...
Conference Paper
Full-text available
We describe ProTraits, a machine learning pipeline that systematically annotates microbes with phenotypes using a large amount of textual data from scientific literature and other online resources, as well as genome sequencing data. Moreover, by relying on a multi-view non-negative matrix factorization approach, ProTraits pipeline is also able to d...
Article
Full-text available
Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed Pro-Traits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These ann...
Conference Paper
Full-text available
In this work we compare different information fusion approaches in the context of large-scale multi-label classification problems, typical today in bio-domains: early fusion, late fusion and hybrid fusion approach. The experiments are performed on two novel large-scale classification datasets for gene function prediction and prokaryotic phenotype p...
Article
Full-text available
Motivation The number of sequenced genomes rises steadily, but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesize that AFP approaches which draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systemati...
Conference Paper
Full-text available
The increase of the number of web pages prompts for improvement of the search engines. One such improvement can be by specifying the desired web genre of the result web pages. This opens the need for web genre prediction based on the information on the web page. Typically, this task is addressed as multi-class classification, with some recent studi...
Poster
Full-text available
We present a novel pipeline for annotating prokaryotic genes with Gene Ontology functions based on supervised machine learning in a hierarchical multi-label setting. 14,945,154 genes from 5,271 Bacteria/Archaea are used to form a singular learning dataset via mappings of the genes to 60,892 COG/NOG groups, predicting 8,005 different GO terms. Four...
Article
Full-text available
Can a model constructed using data mining (DM) programs be trusted? It is known that a decision-tree model can contain relations that are statistically significant, but, in reality, meaningless to a human. When the task is domain analysis, meaningless relations are problematic, since they can lead to wrong conclusions and can consequently undermine...
Poster
Full-text available
Poster at The 17th International Conference on Discovery Science (DS 2014)
Conference Paper
Full-text available
V prispevku je predstavljen pregled inteligentnih algoritmov za gradnjo modelov namenjenih sklepanju iz senzorskih podatkov. Opisi takšnih algoritmov se najbolj pogosto pojavljajo v kontekstu pametnega okolja, v katerem imajo glavno nalogo sklepati o stanju in obnašanju uporabnika. Pregled je motiviran znatnim porastom raziskovalnih aktivnosti na t...
Article
Full-text available
This paper presents a summary of the doctoral dissertation of the author on the topic of searching for credible relations in machine learning.
Thesis
Full-text available
Can a model constructed by machine learning or data mining programs be trusted? For example, it is known that a decision tree model can contain less-credible parts caused by pathologies in induction algorithms, noise and missing values in data, or simply because of the complexity of a domain. Such models typically contain relations that are statist...
Presentation
Full-text available
PhD thesis presentation
Article
Full-text available
This paper describes a novel algorithm for finding the most important relations with the use of data mining. As an example application, the impact of high-level knowledge on economic welfare was analyzed. Our approach, based on interactive data mining, not only helps to discover the most relevant models, but also enables an evaluation of their rele...
Article
Full-text available
A web page is a complex document which can share conventions of several genres, or contain several parts, each belonging to a different genre. To properly address the genre interplay, a recent proposal in automatic web genre identification is multi-label classification. The dominant approach to such classification is to transform one multi-label ma...
Conference Paper
Full-text available
We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their...
Conference Paper
Full-text available
Modern search engines are typically queried with keywords, which foremostly convey the topic of the sought web page. Consequently the resulting top hits are often topically relevant, but nonetheless not what the user wants. The premise of this paper is that the relevance of the hits can be improved when also searching by genre, classification crite...
Conference Paper
Full-text available
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, two machine learning algorit...
Conference Paper
Full-text available
Abstract This paper,presents experiments,on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, three machine learn...
Article
Full-text available
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, two machine learning algorit...
Conference Paper
Full-text available
Genetic algorithm is an evolutionary search technique that is becoming increasingly popular in solving practical problems like timetabling, scheduling, engineering design, and other optimization problems. In this paper we present a computer program implemented to perform basic experimentation with a simple genetic algorithm with intention to gain u...
Article
Full-text available
Modern search engines aim at classifying web pages not only according to topics, but also according to genres. This paper presents the results of an attempt to train a genre classifier. We present features extracted from a 20-genre corpus used for training the genre classifiers and the results of using different machine learning (ML) algorithms in...
Article
Full-text available
Has greater investment in education and research and development (R&D) a positive impact on economic welfare? We analyzed this question using the Weka machine learning and data mining systems. We collected data from the statistical databases for the year 2001. The obtained classification trees show that the level of participation in higher levels o...

Network

Cited By

Projects

Projects (2)