
Juan Antonio Lossio-VenturaNational Institutes of Health | NIH · National Institute of Mental Health (NIMH)
Juan Antonio Lossio-Ventura
PhD in Computer Science
About
40
Publications
7,445
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
388
Citations
Citations since 2017
Introduction
- Research Scientist at NIH.
- Former Postdoc at Stanford University.
- Artificial Intelligence (NLP & ML) for Health & Biomedicine.
- Co-founder of SIMBig Conference.
Additional affiliations
January 2021 - May 2021
May 2019 - January 2021
June 2016 - April 2019
Publications
Publications (40)
Objective:
The adoption of electronic health records (EHRs) has produced enormous amounts of data, creating research opportunities in clinical data sciences. Several concept recognition systems have been developed to facilitate clinical information extraction from these data. While studies exist that compare the performance of many concept recogni...
The COVID-19 pandemic and associated restrictions have been a major stressor that has exacerbated mental health worldwide. Qualitative data play a unique role in documenting mental state, via both language features and content. Within an online longitudinal study on mental health during the early COVID-19 pandemic, we analyzed free responses to the...
Background
The national increase in opioid use and misuse has become a public health crisis in the U.S. To tackle this crisis, the systematic evaluation and monitoring of opioid prescribing patterns is necessary. Thus, opioid prescriptions from electronic health records (EHRs) must be standardized to morphine milligram equivalent (MME) to facilitat...
COVID-19 has presented an unprecedented challenge to human welfare. Indeed, we have witnessed people experiencing a rise of depression, acute stress disorder, and worsening levels of subclinical psychological distress. Finding ways to support individuals' mental health has been particularly difficult during this pandemic. An opportunity for interve...
Background
Internet provides different tools for communicating with patients, such as social media (e.g., Twitter) and email platforms. These platforms provided new data sources to shed lights on patient experiences with health care and improve our understanding of patient-provider communication. Several existing topic modeling and document cluster...
This book constitutes the refereed proceedings of the 7th International Conference on Information Management and Big Data, SIMBig 2020, held in Lima, Peru, in October 2020.*
The 32 revised full papers and 7 revised short papers presented were carefully reviewed and selected from 122 submissions. The papers address topics such as natural language pr...
The COVID-19 crisis has produced worldwide changes from people's lifestyles to travel restrictions imposed by world's nations aiming to keep the virus out. Several countries have created digital information applications to help control and manage the COVID-19 crisis, such as the creation of contact tracing apps. The Peruvian government in collabora...
Objective:
The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic...
This book constitutes the refereed proceedings of the 6th International Conference on Information Management and Big Data, SIMBig 2019, held in Lima, Peru, in August 2019.
The 15 full papers and 16 short papers presented were carefully reviewed and selected from 104 submissions. The papers address issues such as data mining, artificial intelligence...
The adoption of electronic health records has increased the volume of clinical data, which has opened an opportunity for healthcare research. There are several biomedical annotation systems that have been used to facilitate the analysis of clinical data. However, there is a lack of clinical annotation comparisons to select the most suitable tool fo...
Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to...
Background: There is strong scientific evidence linking obesity and overweight to the risk of various cancers and to cancer survivorship. Nevertheless, the existing online information about the relationship between obesity and cancer is poorly organized, not evidenced-based, of poor quality, and confusing to health information consumers. A formal k...
Background:
Rapid advancements in biomedical research have accelerated the number of relevant electronic documents published online, ranging from scholarly articles to news, blogs, and user-generated social media content. Nevertheless, the vast amount of this information is poorly organized, making it difficult to navigate. Emerging technologies s...
Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal know...
This book constitutes the refereed proceedings of the Second Annual International Symposium on Information Management and Big Data, SIMBig 2015, held in Cusco, Peru, in September 2015, and of the Third Annual International Symposium on Information Management and Big Data, SIMBig 2016, held in Cusco, Peru, in September 2016.
The 11 revised full pape...
Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre...
Polysemy is the capacity for a word to have multiple meanings. Polysemy detection is a first step for Word Sense Induction (WSI), which allows to find different meanings for a term. The polysemy detection is also important for information extraction (IE) systems. In addition, the polysemy detection is important for building/enriching terminologies...
We propose in this paper to handle the problem of overload in social interactions by grouping messages according to three important dimensions: (i) content (textual and hashtags), (ii) users, and (iii) time difference. We evaluated our approach on a Twitter data set and we compared it to other existing approaches and the results are promising and e...
Biomedical ontologies play an important role for information extraction in the biomedical domain. We present a workflow for updating automatically biomedical ontologies, composed of four steps. We detail two contributions concerning the concept extraction and semantic linkage of extracted terminology.
Big Data for biomedicine domain deals with a major issue, the analyze of large volume of heterogeneous data (e.g. video, audio, text, image). Ontology, conceptual models of the reality, can play a crucial role in biomedical to automate data processing, querying, and matching heterogeneous data. Various English resources exist but there are consider...
With the large amounts of textual data related to agriculture now available, indexing becomes a crucial issue for research organizations. One way to index documents consists in extracting terminology. This paper investigates the use and combination of text mining methodologies to highlight and publish the most appropriate terms from documents in op...
Terminologyextraction is an essential task in domain knowledge acquisition, as well as for information retrieval. It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems...
La polysémie est la caractéristique d'un terme à avoir plusieurs significations. La prédiction de la polysémie est une première étape pour l'Induction de Sens (IS), qui permet de trouver des significations différentes pour un terme, ainsi que pour les systèmes d'extraction d'information. En outre, la détection de la polysémie est importante pour la...
Dans le contexte des masses de données textuelles liées à l’agriculture aujourd’hui disponibles, leur indexation devient un enjeu crucial pour les organismes de recherche. Une manière d’indexer au mieux les documents consiste à en extraire la terminologie. Cet article explore l’utilisation et la combinaison de méthodologies de fouille de textes afi...
Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve...
Term extraction is an essential task in domain knowledge acquisition. Although hundreds of terminologies and ontologies exist in the biomedical domain, the language evolves faster than our ability to formalize and catalog it. We may be interested in the terms and words explicitly used in our corpus in order to index or mine this corpus or just to e...
The Semantic Indexing of French Biomedical Data Resources project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data.
Comprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. Automatic Term Extraction (or Recognition) methods use natural language processing to do so. Methods feat...
The objective of this paper is to present a methodology to extract and rank automatically biomedical terms from free text. The authors present new extraction methods taking into account linguistic patterns specialized for the biomedical domain, statistic term extraction measures such as C-value and statistic keyword extraction measures such as Okap...
Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve...
The objective of this work is to extract and to rank biomedical terms from free text. We present new extraction methods that use linguistic patterns specialized for the biomedical field, and use term extraction measures, such as C-value, and keyword extraction measures, such as Okapi BM25, and TFIDF. We propose several combinations of these measure...
We propose a socio-semantic approach for building conversations from social interactions following three steps: (i) content linkage, (ii) participants (users) linkage, and (iii) temporal linkage. Preliminary evaluations on a Twitter dataset show promising and interesting results.
Projects
Projects (2)
The SIFR project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data.