Wissam AntounAmerican University of Beirut | AUB · Department of Electrical and Computer Engineering
Wissam Antoun
Bachelor of Engineering in Computer and Communication Engineering
About
14
Publications
27,891
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,157
Citations
Introduction
AI, ML, NLP, Deep-Learning, Transformers
Publications
Publications (14)
French language models, such as CamemBERT, have been widely adopted across industries for natural language processing (NLP) tasks, with models like CamemBERT seeing over 4 million downloads per month. However, these models face challenges due to temporal concept drift, where outdated training data leads to a decline in performance, especially when...
Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed meth...
Recent advances in NLP have significantly improved the performance of language models on a variety of tasks. While these advances are largely driven by the availability of large amounts of data and computational power, they also benefit from the development of better training methods and architectures. In this paper, we introduce CamemBERTa, a Fren...
Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language models such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NL...
Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replac...
Recently, pretrained transformer-based architectures have proven to be very efficient at language modeling and understanding, given that they are trained on a large enough corpus. Applications in language generation for Arabic is still lagging in comparison to other NLP advances primarily due to the lack of advanced Arabic language generation model...
The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently...
This paper presents state of the art methods for addressing three important challenges in automated fake news detection: fake news detection, domain identification, and bot identification in tweets. The proposed solutions achieved first place in a recent international competition on fake news. For fake news detection, we present two models. The win...
The Arabic language is a morphologically rich and complex language with relatively little resources and a less explored syntax compared to English. Given these limitations, tasks like Sentiment Analysis (SA), Named Entity Recognition (NER), and Question Answering (QA), have proven to be very challenging to tackle. Recently, with the surge of transf...
Arabic is a complex language with limited resources which makes it challenging to produce accurate text classification tasks such as sentiment analysis. The utilization of transfer learning (TL) has recently shown promising results for advancing accuracy of text classification in English. TL models are pre-trained on large corpora, and then fine-tu...
In this research, a high-resolution real-time fire detection system was implemented. NVIDIA CUDA framework was used to parallelize a serial version that was implemented using OpenCV. The algorithm relies on color thresholding with other de-noising image processing techniques applied to track the fire. Both implementations of the fire detection algo...
Questions
Question (1)
The only datasets available are under LDC licenses, and I'm not sure if the authors are allowed to provide me the datasets for free if I asked them.
Also, any idea on where to scrap for Arabic chat datasets?