Mohamed Ali Hadj TaiebData Engineering and Semantics Resaerch Unit. Faculty of Sciences of Sfax. University of Sfax. Tunisia
Mohamed Ali Hadj Taieb
Dr.
Assistant Professor at the Faculty of Sciences of Sfax, Computer Science Department
About
81
Publications
39,614
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
924
Citations
Introduction
Additional affiliations
September 2013 - August 2016
HIAST (Sousse) - Higher Institute of Applied Sciences and Technology - Sousse, Tunisia
Position
- Professor (Assistant)
Publications
Publications (81)
This study presents a comprehensive framework to enhance Wikidata as an open and collaborative knowledge graph by integrating Open Biological and Biomedical Ontologies (OBO) and Medical Subject Headings (MeSH) keywords from PubMed publications. The primary data sources include OBO ontologies and MeSH keywords, which were collected and classified us...
Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive...
This paper investigates the role of text categorization in streamlining stopword extraction in natural language processing (NLP), specifically focusing on nine African languages alongside French. By leveraging the MasakhaNEWS, African Stopwords Project, and Masakha-POS datasets, our findings emphasize that text categorization effectively filters ou...
Tunisian (ISO 693-3: aeb) is a set of linguistic varieties spoken in Tunisia and evolved from the Arabic language. In this research work, we attempt to establish a standardized pronunciation that can serve as a communication tool for the diverse Tunisian population. This paper draws on prior linguistic research and uses a rich dataset maintained by...
Tunisian Arabic (ISO 693-3: aeb) is a distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA* guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource...
These slides outline the efforts of the Data Engineering and Semantics Research Unit alongside Masakhane, an African grassroots organization of Natural Language Processing researchers, and Derja Association, the non-profit organization promoting the daily use of Tunisian, to develop an Applied Linguistics background for research around Tunisian Ara...
Dans cette présentation, on explore l'utilisation de Wikidata dans le domaine de la santé. Wikidata, en raison de sa flexibilité et de son modèle en triplets, offre un potentiel considérable pour améliorer la gestion des données médicales. Financé par la Fondation Wikimédia, ce projet vise à enrichir Wikidata avec des données biomédicales essentiel...
In these presentation slides, we provide an overview of research and development efforts related to Wikidata, a large-scale open and collaborative FAIR knowledge graph. We begin by analyzing the research productivity related to Wikidata by analyzing Scopus, a controlled bibliographic database maintained by Elsevier. Then, we continue by providing a...
This brief research report analyzes the availability of Digital Object Identifiers (DOIs) worldwide, highlighting the dominance of large publishing houses and the need for unique persistent identifiers to increase the visibility of publications from developing countries. The study reveals that a considerable amount of publications from developing c...
This brief research report analyzes the availability of Digital Object Identifiers (DOIs) worldwide, highlighting the dominance of large publishing houses and the need for unique persistent identifiers to increase the visibility of publications from developing countries. The study reveals that a considerable amount of publications from developing c...
The proliferation of open knowledge graphs has led to a surge in scholarly research on the topic over the past decade. This paper presents a bibliometric analysis of the scholarly literature on open knowledge graphs published between 2013 and 2023. The study aims to identify the trends, patterns, and impact of research in this field, as well as the...
The amount of data, its heterogeneity and the speed at which it is generated are increasingly diverse and the current systems are not able to handle on-demand real-time data access. In traditional data integration approaches such as ETL, physically loading the data into data stores that use different technologies is becoming costly, time-consuming,...
During the last years, many computer systems have been developed to track and monitor COVID-19 social network interactions. However, these systems have been mainly based on robust probabilistic approaches like Latent Dirichlet Allocation (LDA). In another context, health recommender systems have always been personalized to the needs of single users...
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arisin...
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arisin...
This comment discusses the benefits of representing and reusing the information in Electronic Health Record databases as knowledge graphs in the RDF format based on the FHIR RDF specification. As a structured representation of clinical data, FHIR RDF-based electronic health records allow a simpler and more effective integration of biomedical inform...
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task s...
In this research letter, we build upon recent studies about the sleeping beauties awakened by the COVID-19 pandemic. We prove that a peak of citations for sleeping beauties is associated with a sharp increase in the number of citations received by their references. This demonstrates the existence of a cascading activation of citation-based sleeping...
Purpose
The intensive blooming of social media, specifically social networks, pushed users to be integrated into more than one social network and therefore many new “cross-network” scenarios have emerged, including cross-social networks content posting and recommendation systems. For this reason, it is mightily a necessity to identify implicit brid...
Named Entity Recognition (NER) is currently a key technique is knowledge engineering, particularly in the context of biomedical informatics. In this context, Wikidata has been used as a Knowledge Graph to drive named entity recognition for various applications such as news tracking and question answering. Current Wikidata NER Systems are mainly bas...
Online users are typically involved in multiple online social networks simultaneously to enjoy a variety of social network services, thus bringing about the interconnection of online social networks via bridge users called anchor users. Anchor users can be beneficial to a wide range of social network analysis applications such as cross-domain syste...
Beyond class imbalance and the sample size per class, this research paper studies the effect of knowledge-driven class generalization (KCG) on the accuracy of classical machine learning algorithms for mono-label classification. We apply our analysis on five classical machine learning models (Perception, Support Vectors, Random Forest, K-nearest nei...
In this position paper, we explain how the combination of Wikipedia Categories already in use for driving semantic applications with Wikidata statements related to the categories and their direct members is possible from a technical perspective thanks to the flexible data models of Wikipedia Categories and Wikidata statements and to the programmati...
Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenge...
Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artif...
Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artif...
We have noticed that many scientists process scholarly publications with advanced neural network-driven machine learning techniques to extract semantic information as if they were an ordinary corpus of natural language texts. In this opinion article, we give more emphasis to the value of Bibliometric-Enhanced Information Retrieval as a major field...
Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benefits of social data analysis for the healthcare practices and curing domain. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited Twitter as source. In this...
In this letter, we explain how intuitive and explainable methods inspired from human physiology and computational biology can serve to simplify and ameliorate the way we process and generate knowledge resources.
During the last years, several infectious diseases have caused widespread nationwide epidemics that affected information seeking behaviours, people mobility, economics and research trends. Examples of these epidemics are 2003 severe acute respiratory syndrome (SARS) epidemic in mainland China and Hong Kong, 2014–2016 Ebola epidemic in Guinea and Si...
This work is a companion reproducibility paper of the experiments and results reported in Lastra-Diaz et al. (2019a), which is based on the evaluation of a companion reproducibility dataset with the HESML V1R4 library and the long-term reproducibility tool called Reprozip. Human similarity and relatedness judgements between concepts underlie most o...
Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a new technique based on the measure of Semantic Simila...
This letter discusses the limitations of the use of filters to enhance the accuracy of the extraction of parenthetic abbreviations from scholarly publications and proposes the usage of the parentheses level count algorithm to efficiently extract entities between parentheses from raw texts as well as of machine learning-based supervised classificati...
Integrating social networks data in the process of promoting business and marketing applications is widely addressed by several researchers. However, regarding the isolation between social network platforms managing such data has become a challenging task facing data scientist. In this respect, the present paper is designed to put forward a special...
This brief communication discusses the usefulness of semantic similarity measures for the evaluation and amelioration of the accuracy of supervised classification learning. It proposes a semantic similarity-based method to enhance the choice of adequate labels for the classification algorithm as well as two metrics (SS-Score and TD-Score) and a cur...
Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one works best is still a challenging task. Thus, in this article, we give a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsi...
Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benets of social data analysis to the healthcare practices and curing. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited twitter as source. In this paper, we...
Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a semantic driven bibliometric techniques for co-citati...
Nature and Science are two major multidisciplinary journals, well-known among the general public and highly-cited by scholarly communities. This article presents Google Trends, a web service providing detailed information on the Google search behavior of Internet users from all countries during the period 2004–2019 and illustrates the preference be...
Although the overall analysis of the citations received by Nobel laureates in the scientific background of their Nobel Prize gives an overview of how and when Nobel-awarded discoveries have been achieved and published, it will be interesting to consider the number of mentions of each work co-authored by a Nobel winner in his Nobel Prize scientific...
Scholia is an interface that uses scholarly metadata available in Wikidata, a free knowledge base, through a SPARQL endpoint to generate research assessment profiles for publications, scientists, institutions, journals, countries, topics and other entities. In these presentation slides, we evaluate this interface and identify its matters to be solv...
Wikibase is the software that enables MediaWiki to store structured data or access data that is stored in a structured data repository in RDF format. Since 2012, it has been used to establish Wikidata as a high-scale knowledge graph and has consequently benefited from several improvements allowing automatic ontology validation as well as a flexible...
Created in October 2012, Wikidata is a large-scale, human-readable, machine-readable,
multilingual, multidisciplinary, centralized, editable, structured, and linked knowledge-base with an
increasing diversity of use cases. Here, we raise awareness of the potential use of Wikidata as a useful
resource for biomedical data integration and semantic int...
Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields...
This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity re...
Social network websites are mainly constructed around the notion of user identities, as set up on the bases of their profiles, and online generated contents such as texts, videos, photos. Still, while some profiles gain an important position in the network, others do not. Similarly, some online generated contents appear to gain a great deal of atte...
This research letter discusses whether Arab Spring explains the changes in research productivity and impact of Arab countries by identifying non-sociopolitical factors that can be behind the variations of the research performance of several Arab nations such as Qatar, United Arab Emirates, Saudi Arabia, Tunisia, Lebanon and Algeria.
This letter proves the interest of scientists in writing and citing letters to the editor and shows the importance of letters to the editor for worldwide scientific community.
The ageing of the human population is a threat to many countries in the world and this fact creates new challenges for age-friendly living, recreational and working environments. Therefore, solutions that can support senior citizens (Longinos for the men and Longinas for the women) will be necessary, in order to help them stay actively involved in...
This letter explains how MeSH qualifiers can be used to enhance the precision and recall of sentence-level biomedical relation extraction from scientific publications. After, it proposes that encyclopedic review allows a better sentence-level extraction of biomedical relations than any other type of scientific publications. Finally, it shows how th...
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine translation, information extraction...
Social media analytics is a research axis focused on extracting useful insights from social media data, with the aim of helping individuals and organizations take the most optimum decisions regarding several disciplines of life (business, marketing, politics, health, etc.). In this respect, social networks, microblogging, and media-sharing websites...
Semantic similarity and relatedness measures have increasingly become core elements in the recent research within the semantic technology community. Nowadays, the search for efficient meaning-centered applications that exploit computational semantics has become a necessity. Researchers, have therefore, become increasingly interested in the developm...
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine tra...
Computing the semantic similarity/relatedness between terms is an important research area for several disciplines, including artificial intelligence, cognitive science, linguistics, psychology, biomedicine and information retrieval. These measures exploit knowledge bases to express the semantics of concepts. Some approaches, such as the information...
The measurement of the semantic relatedness between words has gained increasing interest in several research fields, including cognitive science, artificial intelligence, biology, and linguistics. The development of efficient measures is based on knowledge resources, such as Wikipedia, a huge and living encyclopedia supplied by net surfers. In this...
Quantifying the semantic relation between words is a key element in several applications including the treatments at the meaning level. A great variety of approaches are proposed in order to quantify the semantic proximity between concepts or words. These approaches exploit computational models including the hierarchical and textual information of...
Web site: http://wnetss-api.smr-team.org/ WNetSS is a java API allowing the use of a wide WordNet-based semantic similarity measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. Determining the Semantic Similarity (SS) between word pairs is an important component in several research fields. It...
Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledg...
The exploitation of heterogeneous clinical sources and healthcare records is fundamental in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing and structuring of textual resources. Some of these measures have been adapted to the...
Computing Semantic Similarity (SS) between words is an important issue of many research fields. Several features from knowledge databases can be involved in the definition of semantic computing models. Gloss-based approaches exploit the short and precise description for a concept for better expressing its semantics. In this paper, we propose a new...
The investigation of measuring Semantic Similarity (SS) between sentences is to find a method that can simulate the thinking process of human. In fact, it has become an important task in several applications including Artificial Intelligence and Natural Language Processing. Though this task depends strongly on word SS, the latter is not the only im...
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various fields, including artificial intelligence, knowledge management, information retrieval and natural la...
Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Th...
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, w...
Computing semantic relatedness is a key component of information
retrieval tasks and natural processing language applications. Wikipedia provides
a knowledge base for computing word relatedness with more coverage than
WordNet. In this paper we use a new intrinsic information content (IC) metric
with Wikipedia category graph (WCG) to measure the sem...
Semantics constitute one of the major stakes in the information retrieval (IR) system evolu-tion. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity mea-surements to carry out comparisons between concepts. This paper presents a ne...