• Home
  • Mohamed Ali Hadj Taieb
Mohamed Ali Hadj Taieb

Mohamed Ali Hadj Taieb
Data Engineering and Semantics Resaerch Unit. Faculty of Sciences of Sfax. University of Sfax. Tunisia

Dr.
Assistant Professor at the Faculty of Sciences of Sfax, Computer Science Department

About

81
Publications
39,614
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
924
Citations
Additional affiliations
September 2013 - August 2016
HIAST (Sousse) - Higher Institute of Applied Sciences and Technology - Sousse, Tunisia
Position
  • Professor (Assistant)

Publications

Publications (81)
Article
Full-text available
This study presents a comprehensive framework to enhance Wikidata as an open and collaborative knowledge graph by integrating Open Biological and Biomedical Ontologies (OBO) and Medical Subject Headings (MeSH) keywords from PubMed publications. The primary data sources include OBO ontologies and MeSH keywords, which were collected and classified us...
Article
Full-text available
Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive...
Conference Paper
Full-text available
This paper investigates the role of text categorization in streamlining stopword extraction in natural language processing (NLP), specifically focusing on nine African languages alongside French. By leveraging the MasakhaNEWS, African Stopwords Project, and Masakha-POS datasets, our findings emphasize that text categorization effectively filters ou...
Conference Paper
Full-text available
Tunisian (ISO 693-3: aeb) is a set of linguistic varieties spoken in Tunisia and evolved from the Arabic language. In this research work, we attempt to establish a standardized pronunciation that can serve as a communication tool for the diverse Tunisian population. This paper draws on prior linguistic research and uses a rich dataset maintained by...
Conference Paper
Full-text available
Tunisian Arabic (ISO 693-3: aeb) is a distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA* guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource...
Presentation
Full-text available
These slides outline the efforts of the Data Engineering and Semantics Research Unit alongside Masakhane, an African grassroots organization of Natural Language Processing researchers, and Derja Association, the non-profit organization promoting the daily use of Tunisian, to develop an Applied Linguistics background for research around Tunisian Ara...
Presentation
Full-text available
Dans cette présentation, on explore l'utilisation de Wikidata dans le domaine de la santé. Wikidata, en raison de sa flexibilité et de son modèle en triplets, offre un potentiel considérable pour améliorer la gestion des données médicales. Financé par la Fondation Wikimédia, ce projet vise à enrichir Wikidata avec des données biomédicales essentiel...
Presentation
Full-text available
In these presentation slides, we provide an overview of research and development efforts related to Wikidata, a large-scale open and collaborative FAIR knowledge graph. We begin by analyzing the research productivity related to Wikidata by analyzing Scopus, a controlled bibliographic database maintained by Elsevier. Then, we continue by providing a...
Preprint
Full-text available
This brief research report analyzes the availability of Digital Object Identifiers (DOIs) worldwide, highlighting the dominance of large publishing houses and the need for unique persistent identifiers to increase the visibility of publications from developing countries. The study reveals that a considerable amount of publications from developing c...
Article
Full-text available
This brief research report analyzes the availability of Digital Object Identifiers (DOIs) worldwide, highlighting the dominance of large publishing houses and the need for unique persistent identifiers to increase the visibility of publications from developing countries. The study reveals that a considerable amount of publications from developing c...
Preprint
Full-text available
The proliferation of open knowledge graphs has led to a surge in scholarly research on the topic over the past decade. This paper presents a bibliometric analysis of the scholarly literature on open knowledge graphs published between 2013 and 2023. The study aims to identify the trends, patterns, and impact of research in this field, as well as the...
Chapter
The amount of data, its heterogeneity and the speed at which it is generated are increasingly diverse and the current systems are not able to handle on-demand real-time data access. In traditional data integration approaches such as ETL, physically loading the data into data stores that use different technologies is becoming costly, time-consuming,...
Chapter
During the last years, many computer systems have been developed to track and monitor COVID-19 social network interactions. However, these systems have been mainly based on robust probabilistic approaches like Latent Dirichlet Allocation (LDA). In another context, health recommender systems have always been personalized to the needs of single users...
Preprint
Full-text available
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arisin...
Preprint
Full-text available
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arisin...
Article
Full-text available
This comment discusses the benefits of representing and reusing the information in Electronic Health Record databases as knowledge graphs in the RDF format based on the FHIR RDF specification. As a structured representation of clinical data, FHIR RDF-based electronic health records allow a simpler and more effective integration of biomedical inform...
Article
Full-text available
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task s...
Article
Full-text available
In this research letter, we build upon recent studies about the sleeping beauties awakened by the COVID-19 pandemic. We prove that a peak of citations for sleeping beauties is associated with a sharp increase in the number of citations received by their references. This demonstrates the existence of a cascading activation of citation-based sleeping...
Article
Purpose The intensive blooming of social media, specifically social networks, pushed users to be integrated into more than one social network and therefore many new “cross-network” scenarios have emerged, including cross-social networks content posting and recommendation systems. For this reason, it is mightily a necessity to identify implicit brid...
Conference Paper
Full-text available
Named Entity Recognition (NER) is currently a key technique is knowledge engineering, particularly in the context of biomedical informatics. In this context, Wikidata has been used as a Knowledge Graph to drive named entity recognition for various applications such as news tracking and question answering. Current Wikidata NER Systems are mainly bas...
Article
Online users are typically involved in multiple online social networks simultaneously to enjoy a variety of social network services, thus bringing about the interconnection of online social networks via bridge users called anchor users. Anchor users can be beneficial to a wide range of social network analysis applications such as cross-domain syste...
Chapter
Full-text available
Beyond class imbalance and the sample size per class, this research paper studies the effect of knowledge-driven class generalization (KCG) on the accuracy of classical machine learning algorithms for mono-label classification. We apply our analysis on five classical machine learning models (Perception, Support Vectors, Random Forest, K-nearest nei...
Conference Paper
Full-text available
In this position paper, we explain how the combination of Wikipedia Categories already in use for driving semantic applications with Wikidata statements related to the categories and their direct members is possible from a technical perspective thanks to the flexible data models of Wikipedia Categories and Wikidata statements and to the programmati...
Article
Full-text available
Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenge...
Preprint
Full-text available
Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artif...
Article
Full-text available
Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artif...
Article
Full-text available
We have noticed that many scientists process scholarly publications with advanced neural network-driven machine learning techniques to extract semantic information as if they were an ordinary corpus of natural language texts. In this opinion article, we give more emphasis to the value of Bibliometric-Enhanced Information Retrieval as a major field...
Article
Full-text available
Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benefits of social data analysis for the healthcare practices and curing domain. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited Twitter as source. In this...
Article
Full-text available
In this letter, we explain how intuitive and explainable methods inspired from human physiology and computational biology can serve to simplify and ameliorate the way we process and generate knowledge resources.
Article
Full-text available
During the last years, several infectious diseases have caused widespread nationwide epidemics that affected information seeking behaviours, people mobility, economics and research trends. Examples of these epidemics are 2003 severe acute respiratory syndrome (SARS) epidemic in mainland China and Hong Kong, 2014–2016 Ebola epidemic in Guinea and Si...
Article
This work is a companion reproducibility paper of the experiments and results reported in Lastra-Diaz et al. (2019a), which is based on the evaluation of a companion reproducibility dataset with the HESML V1R4 library and the long-term reproducibility tool called Reprozip. Human similarity and relatedness judgements between concepts underlie most o...
Chapter
Full-text available
Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a new technique based on the measure of Semantic Simila...
Article
Full-text available
This letter discusses the limitations of the use of filters to enhance the accuracy of the extraction of parenthetic abbreviations from scholarly publications and proposes the usage of the parentheses level count algorithm to efficiently extract entities between parentheses from raw texts as well as of machine learning-based supervised classificati...
Article
Full-text available
Integrating social networks data in the process of promoting business and marketing applications is widely addressed by several researchers. However, regarding the isolation between social network platforms managing such data has become a challenging task facing data scientist. In this respect, the present paper is designed to put forward a special...
Preprint
Full-text available
This brief communication discusses the usefulness of semantic similarity measures for the evaluation and amelioration of the accuracy of supervised classification learning. It proposes a semantic similarity-based method to enhance the choice of adequate labels for the classification algorithm as well as two metrics (SS-Score and TD-Score) and a cur...
Article
Full-text available
Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one works best is still a challenging task. Thus, in this article, we give a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsi...
Preprint
Full-text available
Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benets of social data analysis to the healthcare practices and curing. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited twitter as source. In this paper, we...
Article
Full-text available
Co-citation analysis can be exploited as a bibliometric technique used for mining information on the relationships between scientific papers. Proposed methods rely, however, on co-citation counting techniques that slightly take the semantic aspect into consideration. The present study proposes a semantic driven bibliometric techniques for co-citati...
Article
Full-text available
Nature and Science are two major multidisciplinary journals, well-known among the general public and highly-cited by scholarly communities. This article presents Google Trends, a web service providing detailed information on the Google search behavior of Internet users from all countries during the period 2004–2019 and illustrates the preference be...
Article
Full-text available
Although the overall analysis of the citations received by Nobel laureates in the scientific background of their Nobel Prize gives an overview of how and when Nobel-awarded discoveries have been achieved and published, it will be interesting to consider the number of mentions of each work co-authored by a Nobel winner in his Nobel Prize scientific...
Presentation
Full-text available
Scholia is an interface that uses scholarly metadata available in Wikidata, a free knowledge base, through a SPARQL endpoint to generate research assessment profiles for publications, scientists, institutions, journals, countries, topics and other entities. In these presentation slides, we evaluate this interface and identify its matters to be solv...
Presentation
Full-text available
Wikibase is the software that enables MediaWiki to store structured data or access data that is stored in a structured data repository in RDF format. Since 2012, it has been used to establish Wikidata as a high-scale knowledge graph and has consequently benefited from several improvements allowing automatic ontology validation as well as a flexible...
Article
Created in October 2012, Wikidata is a large-scale, human-readable, machine-readable, multilingual, multidisciplinary, centralized, editable, structured, and linked knowledge-base with an increasing diversity of use cases. Here, we raise awareness of the potential use of Wikidata as a useful resource for biomedical data integration and semantic int...
Article
Full-text available
Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields...
Article
Full-text available
This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity re...
Chapter
Full-text available
Social network websites are mainly constructed around the notion of user identities, as set up on the bases of their profiles, and online generated contents such as texts, videos, photos. Still, while some profiles gain an important position in the network, others do not. Similarly, some online generated contents appear to gain a great deal of atte...
Article
Full-text available
This research letter discusses whether Arab Spring explains the changes in research productivity and impact of Arab countries by identifying non-sociopolitical factors that can be behind the variations of the research performance of several Arab nations such as Qatar, United Arab Emirates, Saudi Arabia, Tunisia, Lebanon and Algeria.
Article
Full-text available
This letter proves the interest of scientists in writing and citing letters to the editor and shows the importance of letters to the editor for worldwide scientific community.
Chapter
Full-text available
The ageing of the human population is a threat to many countries in the world and this fact creates new challenges for age-friendly living, recreational and working environments. Therefore, solutions that can support senior citizens (Longinos for the men and Longinas for the women) will be necessary, in order to help them stay actively involved in...
Article
Full-text available
This letter explains how MeSH qualifiers can be used to enhance the precision and recall of sentence-level biomedical relation extraction from scientific publications. After, it proposes that encyclopedic review allows a better sentence-level extraction of biomedical relations than any other type of scientific publications. Finally, it shows how th...
Chapter
Full-text available
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine translation, information extraction...
Article
Full-text available
Social media analytics is a research axis focused on extracting useful insights from social media data, with the aim of helping individuals and organizations take the most optimum decisions regarding several disciplines of life (business, marketing, politics, health, etc.). In this respect, social networks, microblogging, and media-sharing websites...
Article
Full-text available
Semantic similarity and relatedness measures have increasingly become core elements in the recent research within the semantic technology community. Nowadays, the search for efficient meaning-centered applications that exploit computational semantics has become a necessity. Researchers, have therefore, become increasingly interested in the developm...
Conference Paper
Full-text available
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine tra...
Article
Full-text available
Computing the semantic similarity/relatedness between terms is an important research area for several disciplines, including artificial intelligence, cognitive science, linguistics, psychology, biomedicine and information retrieval. These measures exploit knowledge bases to express the semantics of concepts. Some approaches, such as the information...
Article
The measurement of the semantic relatedness between words has gained increasing interest in several research fields, including cognitive science, artificial intelligence, biology, and linguistics. The development of efficient measures is based on knowledge resources, such as Wikipedia, a huge and living encyclopedia supplied by net surfers. In this...
Conference Paper
Full-text available
Quantifying the semantic relation between words is a key element in several applications including the treatments at the meaning level. A great variety of approaches are proposed in order to quantify the semantic proximity between concepts or words. These approaches exploit computational models including the hierarchical and textual information of...
Code
Web site: http://wnetss-api.smr-team.org/ WNetSS is a java API allowing the use of a wide WordNet-based semantic similarity measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. Determining the Semantic Similarity (SS) between word pairs is an important component in several research fields. It...
Article
Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledg...
Article
Full-text available
The exploitation of heterogeneous clinical sources and healthcare records is fundamental in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing and structuring of textual resources. Some of these measures have been adapted to the...
Conference Paper
Full-text available
Computing Semantic Similarity (SS) between words is an important issue of many research fields. Several features from knowledge databases can be involved in the definition of semantic computing models. Gloss-based approaches exploit the short and precise description for a concept for better expressing its semantics. In this paper, we propose a new...
Conference Paper
Full-text available
The investigation of measuring Semantic Similarity (SS) between sentences is to find a method that can simulate the thinking process of human. In fact, it has become an important task in several applications including Artificial Intelligence and Natural Language Processing. Though this task depends strongly on word SS, the latter is not the only im...
Article
Full-text available
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various fields, including artificial intelligence, knowledge management, information retrieval and natural la...
Article
Full-text available
Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Th...
Article
Full-text available
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, w...
Conference Paper
Full-text available
Computing semantic relatedness is a key component of information retrieval tasks and natural processing language applications. Wikipedia provides a knowledge base for computing word relatedness with more coverage than WordNet. In this paper we use a new intrinsic information content (IC) metric with Wikipedia category graph (WCG) to measure the sem...
Conference Paper
Full-text available
Semantics constitute one of the major stakes in the information retrieval (IR) system evolu-tion. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity mea-surements to carry out comparisons between concepts. This paper presents a ne...