About
19
Publications
1,404
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
304
Citations
Introduction
Current institution
Publications
Publications (19)
Misogynistic content in cyberspaces is a problem highlighted in many studies previously and is potentially harmful in the context of women's equality efforts and normalization of discrimination on online platforms. Many studies highlight the presence of misogynis-tic content and how this content may be proliferated and magnified through algorithms...
In this work, we introduce EILEEN (Efficient Inference for Language-based Extraction of EHR Notes), a novel multi-modal natural language processing (NLP) framework designed to extract various alcohol consumption patterns from unstructured clinical notes, particularly in bilingual and non-English contexts. Recent advances in NLP have significantly i...
This paper highlights the developing need for quantitative modes for capturing and monitoring malicious communication in social media. There has been a deliberate "weaponization" of messaging through the use of social networks including by politically oriented entities both state sponsored and privately run. The article identifies a use of AI/ML ch...
A rapidly developing threat to societal well-being is from misinformation widely spread on social media. Even more concerning is ”mal-info” (malicious) which is amplified on certain social networks. Now there is an additional dimension to that threat, which is the use of Generative AI to deliberately augment the mis-info and mal-info. This paper hi...
The growing popularity of generative AI, particularly ChatGPT, has sparked both enthusiasm and caution among practitioners and researchers in education. To effectively harness the full potential of ChatGPT in educational contexts, it is crucial to analyze its impact and suitability for different educational purposes. This paper takes an initial ste...
As a key modifiable risk factor, alcohol consumption is clinically crucial information that allows medical professionals to further understand their patients’ medical conditions and suggest appropriate lifestyle modifying interventions. However, identifying alcohol-related information from unstructured free-text clinical notes is often challenging....
Featured Application
The study presents an improved and easily obtainable method in terms of automatic smoking classification from unstructured bilingual electronic health records.
Abstract
Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bi...
BACKGROUND
Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR).
OBJECTIVE
We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language...
Featured Application
With its term mapping capability, MARIE can be used to improve data interoperability between different biomedical institutions. It can also be applied to text data pre-processing or normalization in non-biomedical domains.
Abstract
With growing interest in machine learning, text standardization is becoming an increasingly impo...
Due to its simplicity and intuitive interpretability, spherical k-means is often used for clustering a large number of documents. However, there exist a number of drawbacks that need to be addressed for much effective document clustering. Without well-dispersed initial points, spherical k-means fails to converge quickly, which is critical for clust...
While driving a vehicle, data are collected from a huge number of sensors that generate both categorical and continuous variables with varying scales. In order to understand the status of the vehicles and the drivers’ behaviors, it is crucial to segment and identify different phases within this time series data. However, data often lacks labels to...
Two document representation methods are mainly used in solving text mining problems. Known for its intuitive and simple interpretability, the bag-of-words method represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality, and fails to preserve accurate proximity information when the number of...