April 2025
·
1 Read
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
April 2025
·
1 Read
April 2025
·
1 Read
February 2025
·
3 Reads
January 2025
·
55 Reads
This study investigates the dissemination of disinformation on social media platforms during the DANA event (DANA is a Spanish acronym for Depresion Aislada en Niveles Altos, translating to high-altitude isolated depression) that resulted in extremely heavy rainfall and devastating floods in Valencia, Spain, on October 29, 2024. We created a novel dataset of 650 TikTok and X posts, which was manually annotated to differentiate between disinformation and trustworthy content. Additionally, a Few-Shot annotation approach with GPT-4o achieved substantial agreement (Cohen's kappa of 0.684) with manual labels. Emotion analysis revealed that disinformation on X is mainly associated with increased sadness and fear, while on TikTok, it correlates with higher levels of anger and disgust. Linguistic analysis using the LIWC dictionary showed that trustworthy content utilizes more articulate and factual language, whereas disinformation employs negations, perceptual words, and personal anecdotes to appear credible. Audio analysis of TikTok posts highlighted distinct patterns: trustworthy audios featured brighter tones and robotic or monotone narration, promoting clarity and credibility, while disinformation audios leveraged tonal variation, emotional depth, and manipulative musical elements to amplify engagement. In detection models, SVM+TF-IDF achieved the highest F1-Score, excelling with limited data. Incorporating audio features into roberta-large-bne improved both Accuracy and F1-Score, surpassing its text-only counterpart and SVM in Accuracy. GPT-4o Few-Shot also performed well, showcasing the potential of large language models for automated disinformation detection. These findings demonstrate the importance of leveraging both textual and audio features for improved disinformation detection on multimodal platforms like TikTok.
January 2025
·
22 Reads
Equitable access to reliable health information is vital for public health, but the quality of online health resources varies by language, raising concerns about inconsistencies in Large Language Models (LLMs) for healthcare. In this study, we examine the consistency of responses provided by LLMs to health-related questions across English, German, Turkish, and Chinese. We largely expand the HealthFC dataset by categorizing health-related questions by disease type and broadening its multilingual scope with Turkish and Chinese translations. We reveal significant inconsistencies in responses that could spread healthcare misinformation. Our main contributions are 1) a multilingual health-related inquiry dataset with meta-information on disease categories, and 2) a novel prompt-based evaluation workflow that enables sub-dimensional comparisons between two languages through parsing. Our findings highlight key challenges in deploying LLM-based tools in multilingual contexts and emphasize the need for improved cross-lingual alignment to ensure accurate and equitable healthcare information.
January 2025
·
200 Reads
·
1 Citation
IEEE Access
Artificial intelligence is reshaping the legal landscape, with software tools now impacting various aspects of legal work. The intersection of Natural Language Processing (NLP) and law holds potential to transform how legal professionals, including lawyers and judges, operate, resolve disputes, and retrieve case information to formulate their decisions. To identify the current state of the applications of Transformers (also known as Large Language Models or LLMs ) in the legal domain, we analysed the existing literature from 2017 to 2023 through a database search and snowballing method. From 61 selected publications, we identified key application categories such as legal document analysis, case prediction, and contract review, along with their main characteristics. We observed a discernible upsurge in the volume of scholarly publications, a diversification of tasks undertaken (e.g., legal research, contract analysis, and regulatory compliance), and an increased range of languages considered. There has been a notable enhancement in the methodological sophistication employed by researchers in practical applications. The performance of models grounded in the Generative Pre-trained Transformer (GPT) architecture has consistently improved across various legal domains, including contract review, legal document summarization, and case outcome prediction. This paper makes several significant contributions to the field. Firstly, it identifies emerging trends in the application of LLMs within the legal domain, highlighting the growing interest and investment in this area. Secondly, it pinpoints methodological gaps in current research, suggesting areas where further development and refinement are needed. Lastly, it discusses the broader implications of these advancements for real-world legal tasks, offering insights into how LLM-based AI can enhance legal practice while addressing the associated challenges.
October 2024
·
428 Reads
This is the third book in the Online Hate Speech Trilogy. It focuses on presenting methods for detecting, analysing, and combating toxic language on the Internet. Alongside the legal dilemmas born from a desire to punish hate speech disseminators, identifying online hate speech is one of the biggest challenges in the field of studies on violent narratives and virtual attacks. The authors analyse the challenges of identifying violent narratives through automation, the advantages of manually coding social media posts, and the opportunities offered by AI in this field of research.
September 2024
·
47 Reads
·
4 Citations
September 2024
·
41 Reads
·
3 Citations
September 2024
·
12 Reads
·
1 Citation
... Existing research has primarily focused on sentiment analysis, topic modeling, or keyword extraction approaches that may not fully capture the structural relationships between regulatory concepts and emerging risk patterns. Studies such as Correa and Correa (2022) have demonstrated the effectiveness of neural text classification for financial regulatory documents, but these approaches often lack the ability to represent the interconnected nature of regulatory concepts [2] . Current methodologies frequently treat regulatory documents as isolated texts rather than as components of a broader regulatory ecosystem. ...
January 2025
IEEE Access
... We share all code and data used in our experiment, as well as hyperparameter ranges and best hyperparameters for each dataset online. 4 ...
September 2023
... GPT-3 is generally considered to have benefited from OpenAI's carefully curated, even if largely undocumented, training dataset, whereas GPT-J was pretrained on an open data set called the Pile (Gao et al., 2020), which is presumably far less carefully curated. Another source of evidence for the importance of diversity in training data is the rapid degradation of model performance and breaks in information integrity that have been found to occur when LLMs are trained on data generated by other LLMs, which is inherently far less diverse than language produced by humans (Shumailov et al., 2023), as has been demonstrated repeatedly in recent research on LLM detection (Bevendorff et al., 2024;. ...
September 2024
... This is the author's version which has not been fully edited and content may change prior to final publication. However, lack of demographic metadata complicating a broader understanding of happiness [46]. Sentiments on altruistic behaviors through a free pizza case study examined using SVM, and RF to classify users' requests for free offer on their success, achievements, and goal. ...
January 2024
IEEE Transactions on Affective Computing
... In recent years, the advent of Arti cial Intelligence Generated Content (AIGC) has injected new vitality into the education sector (Chen, 2024;Sun, 2024). AIGC technologies, leveraging natural language processing, can generate high-quality linguistic materials and dynamic scenarios, enriching digital teaching with enhanced interactivity and contextuality (Korenčić et al, 2024). By integrating AIGC with digital scenario-based teaching, authentic linguistic communication contexts can be created, effectively improving students' English pro ciency and practical application skills (Sun & Han, 2023;Kong & Yang, 2024). ...
July 2024
Expert Systems
... Addressing this need, recent advancements have been made in depression identification using deep learning on both single-modal (5)(6)(7)(8) and multi-modal data (9)(10)(11)(12)(13)(14)(15). For instance, employing shortterm speech segments (16) or integrating various acoustic features (17) through deep learning models has shown promise. ...
March 2024
Lecture Notes in Computer Science
... Identification of toxicity and other undesirable contents in user-generated texts is an active research area in NLP (Bevendorff et al., 2024). As a proactive combat (besides deletion), the task of automatic rewriting/rephrasing has received increasing attention from the NLP community (Villate-Castillo et al., 2024). ...
March 2024
Lecture Notes in Computer Science
... Rosso [11] introduced a unique author profile and text classification system. This study's application to a number of datasets, including Twitter feeds, shows the framework's versatility and effectiveness in identifying authors and categorizing texts by stylistic and content features. ...
February 2024
Expert Systems
... The definition of HS often encompasses a range of negative behaviors including cyberbullying, flaming, profanity, abusive language, expressions of toxicity, and acts of discrimination [8,31,32]. Each of these forms can lead to highly controversial discussions and escalate tensions, potentially resulting in serious social consequences such as violent crimes or physical attacks [17,41]. ...
January 2024
Expert Systems
... Moffitt et al. [2021] developed a classifier of conspiracy tweets and used it for propagation analysis. Two recent MediaEval challenges which focused on classification of conspiracy texts [Pogorelov et al., 2021 led to a number of approaches demonstrating that the state-of-the-art architecture is a multi-task classifier [Peskine et al., 2021, Korenčić et al., 2023 based on CT-BERT [Müller et al., 2023]. ...
December 2023