Dina Vishnyakova

Dina Vishnyakova
Novartis

PhD

About

44
Publications
6,351
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
316
Citations
Citations since 2017
4 Research Items
135 Citations
20172018201920202021202220230510152025
20172018201920202021202220230510152025
20172018201920202021202220230510152025
20172018201920202021202220230510152025

Publications

Publications (44)
Conference Paper
Full-text available
Author name disambiguation (AND) in publication and citation resources is a well-known problem. Often, information about email address and other details in the affiliation is missing. In cases where such information is not available, identifying the authorship of publications becomes very challenging. Consequently, there have been attempts to resol...
Article
Full-text available
Objective: Author-centric analyses of fast-growing biomedical reference databases are challenging due to author ambiguity. This problem has been mainly addressed through author disambiguation using supervised machine-learning algorithms. Such algorithms, however, require adequately designed gold standards that reflect the reference database proper...
Preprint
Full-text available
Email is the primary means of communication for scientists. However, scientific authors change email address over time. Using a new method, we have calculated that approximately 18% of all authors' contact email addresses in MEDLINE are invalid. While an unfortunate number, it is, however, lower than previously estimated. To mitigate this problem,...
Article
OBJECTIVE: Author-centric analyses of fast-growing biomedical reference databases are challenging due to author ambiguity. This problem has been mainly addressed through author disambiguation using supervised machine-learning algorithms. Such algorithms, however, require adequately designed gold standards that reflect the reference database properl...
Article
Full-text available
Email is the primary means of communication for scientists. However, scientific authors change email address over time. Using a new method, we have calculated that approximately 18% of all authors’ contact email addresses in MEDLINE are invalid. While an unfortunate number, it is, however, lower than previously estimated. To mitigate this problem,...
Conference Paper
Full-text available
The reuse of clinical data for the research environment is becoming one of the important tasks in medical informatics. The automatic assignment of the medical codes to the pre-identified concepts is turning to the Sisyphean task. For the MedNLP task in NTCIR-12 a new approach to automatically enrich the dictionary using online data is proposed. We...
Conference Paper
Full-text available
In order to reuse data for clinical research it is then necessary to overcome two main challenges – to formalize data sources and to increase the portability. Once the challenge is resolved, it then will allow research applications to reuse clinical data. In this paper, three data models such as entity-attribute-value, ontological and data-driven a...
Article
Full-text available
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of S...
Article
In order to reuse data for clinical research it is then necessary to overcome two main challenges - to formalize data sources and to increase the portability. Once the challenge is resolved, it then will allow research applications to reuse clinical data. In this paper, three data models such as entity-attribute-value, ontological and data-driven a...
Article
Full-text available
Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from th...
Article
Full-text available
We present an electronic capture tool to process informed consents, which are mandatory recorded when running a clinical trial. This tool aims at the extraction of information expressing the duration of the consent given by the patient to authorize the exploitation of biomarker-related information collected during clinical trials. The system integr...
Article
Employing the bridge between Clinical Information System (CIS) and Clinical Research Environment (CRE) can provide functionality, which is not easily, implemented by traditional legacy EHR system. In this paper, the experience of such implementation at the University Hospitals of Geneva is described. General overview of the mapping of extracted fro...
Article
Full-text available
Unlabelled: Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full te...
Article
The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. O...
Article
Full-text available
With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance eff...
Article
Full-text available
The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as...
Article
Full-text available
The Khresmoi project is developing a multilingual multimodal search and access system for medical and health information and documents. This scientific demonstration presents the current state of the Khresmoi integrated system, which includes components for text and image annotation, semantic search, search by image similarity and machine translati...
Article
For the machine-reading task of biomedical texts about the Alz-heimer disease we have used a Question-Answering approach by adapting func-tionalities of Question-Answering (Q-A) engine EAGLi. We didn't involve any other Natural Language Processing method. As a knowledge store we used the biggest resource of biomedical literature-MEDLINE. Our final...
Conference Paper
Full-text available
Khresmoi is a European Integrated Project developing a multilingual multimodal search and access system for medical and health information and documents. It addresses the challenges of searching through huge amounts of medical data, including general medical information available on the internet, as well as radiology data in hospital archives. It i...
Article
Full-text available
Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve r...
Article
We present a new approach to perform biomedical documents classification and prioritization for the Comparative Toxicogenomics Database (CTD). This approach is motivated by needs such as literature curation, in particular applied to the human health environment domain. The unique integration of chemical, genes/proteins and disease data in the biome...
Conference Paper
We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer) to perform biomedical documents classification and prioritization in order to speed up curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, whe...
Article
Full-text available
We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary cla...
Article
We present a new approach for pathogens and gene product normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, Staphyloccocus aureus, ...) and their gene products (protein ArsC...
Article
Full-text available
Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent searc...
Article
Full-text available
Objective: While the broad use of antibiotics has reached its limits with the apparition of bacterial resistance, it became of major importance to regulate antibiotic prescriptions. In this paper, we present KART, a system to facilitate the creation of clinical guidelines in the context of infec- tious diseases. Methods: This system is composed of...
Conference Paper
The BiTeM group participated in the first TREC Medical Records Track in 2011 relying on a strong background in medical records processing and medical terminologies. For this campaign, we submitted a baseline run, computed with a simple free-text index in the Terrier platform, which achieved fair results (0.468 for P10). We also performed automatic...
Conference Paper
For the third year, the BiTeM group participated in the TREC Chemical IR Track. For this campaign, we applied strategies that already showed their effectiveness, as the Citations Feedback, which takes benefit from the citations of the retrieved documents in order to re-arrange the ranking. But we also investigated a new inter-lingua model built wit...
Article
Full-text available
We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost,...
Data
GN annotation guidelines (AnnotationGuidelines.pdf)
Data
Full-text available
Introduction to TAP-k (What is TAP.pdf) The evaluation data and TAP-k software are freely available at the BioCreative III Website: http://www.biocreative.org.
Article
Full-text available
Optimal antibiotic prescriptions rely on evidence-based clinical guidelines, but creating such guidelines requires a time-consuming systematic review of the literature. We aim at facilitating this process by proposing an innovative tool to extract antibiotic treatments from the literature.
Article
Full-text available
Bacterial resistance to drugs has reached alarming levels but useful cross-site monitoring systems to track resistance evolution are lacking. In this paper we present the TrendMon surveillance system, a platform for querying, integrating and visualising antimicrobial resistance information.
Article
Full-text available
Background: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high anno...
Article
We present exploratory investigations of multimodal mining to help designing clinical guidelines for antibiotherapy. Our approach is based on the assumption that combining various sources of data, such as the literature, a clinical datawarehouse, as well as information regarding costs will result in better recommendations. Compared to our baseline...
Conference Paper
For two years, the TREC Chemical Track aims at evaluating participant systems in chemical patent searching. In 2010, it continued with the two tasks from 2009: Prior Art search (PA) and Technology Survey (TS). The BiTeM group participated in both tasks and obtained satisfactory results, relying on a large panel of strategies which were evaluated wi...
Article
Full-text available
Accurate classification of patent documents according to the IPC system is vital for the interoperability between differ-ent patent offices and for the prior art search task involved in a patent application procedure. It is essential for com-panies and governments to track changes in technology in order to asses their investments and create new bra...

Network

Cited By

Projects

Projects (2)
Project
Author name disambiguation (AND) in citation databases is a well-known challenge. For instance a significant part of MEDLINE queries is based on author names. However, author names can be highly ambiguous and information about the affiliation (mainly contact details) is often missing, which complicates any author search and further analysis. The goal of our project is to foster the improvement of methodologies in AND domain as well as setting a reference for future evaluations. We have already developed an effective AND methodology for MEDLINE based on machine learning technique, which enables the understanding of contributions by individual biomedical scientists and the identification of experts in particular topics.