Home
La Rochelle Université
Laboratoire Informatique, Image et Interaction
Antoine Doucet

Antoine Doucet
La Rochelle Université · Laboratoire Informatique, Image et Interaction

PhD

About

227

Publications

32,295

Reads

2,416

Citations

Antoine Doucet currently works at the Laboratoire Informatique, Image et Interaction, University of La Rochelle.

Skills and Expertise

Surveillance

Information Extraction

Natural Language Processing

Text Mining

Data Mining and Knowledge Discovery

Text Classification

Web Mining

Computational Intelligence

Clustering

Data Science

Publications

Novel Query Suggestions: Initial Work Report

Conference Paper

Full-text available

Nov 2014

Query auto-completion (QAC) is one of the most recogniz-able and widely used services of modern search engines. Its goal is to assist a user in the process of query formulation. Current QAC systems are mainly reactive. They respond to the present request using past knowledge. Specifically, they mostly rely on query logs analysis [11, 10, 12] or cor...

Document summarization based on word associations

Article

Full-text available

Jul 2014

In the age of big data, automatic methods for creating summaries of documents become increasingly important. In this paper we propose a novel, unsupervised method for (multi-)document summarization. In an unsupervised and language-independent fashion, this approach relies on the strength of word associations in the set of documents to be summarized...

Dating Color Images with Ordinal Classification

Conference Paper

Full-text available

Apr 2014

This paper proposes a new approach for automatically dating a photograph, based solely on its content. Building on recent advances in computer vision, the images are first described by a set of features. Then, the age group of every image is predicted by a classifier trained with annotated data. The key strength of our approach -- which makes it pe...

Any Language Early Detection of Epidemic Diseases from Web News Streams

Conference Paper

Full-text available

Sep 2013

In this paper, we introduce a multilingual epidemiological news surveillance system. Its main contribution is its ability to extract epidemic events in any language, hence succeeding where state-of-the-art in surveillance systems usually fails : the objective of reactivity. Most systems indeed focus on a selected list of languages, deemed important...

Data Mining Meets Collocations Discovery

Article

Full-text available

Jan 2005

In this paper we discuss the problem of discovering interesting word se- quences in the light of two traditions: sequential pattern mining(from data mining) and collocations discovery(from computational linguistics). Smadja (1993) defines a collocation as "a recurrent combination of words that co- occur more often than chance and that correspond to...

An example of BIO and NOBI annotation regimes in the ACTER corpus

Parallel Coordinates Plot in performance of XLMR classifier for the...

Performance in P and R per term length per domain in English ACTER test set

Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?

Article

Full-text available

Mar 2024

Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat ATE as a sequence-labeling task and explore the efficacy of XLMR in evaluating cross-lingual and multilingual learning against monoling...

Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining—A Comprehensive Survey

Article

Jan 2024

Historical document processing (HDP) corresponds to the task of converting the physical-bind form of historical archives into a web-based centrally digitized form for their conservation , preservation , and ubiquitous access . Besides the conservation of these invaluable historical collections, the key agenda is to make these geographically...

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Conference Paper

Nov 2023

In this paper, we address the challenge of document image analysis for historical index table documents with handwritten records. Demographic studies can gain insight from the use of automatic document analysis in such documents through the study of population movements. To evaluate the efficacy of automatic layout analysis tools, we release the PA...

STRAS: A Semantic Textual-Cues Leveraged Rule-Based Approach for Article Separation in Historical Newspapers

Conference Paper

Nov 2023

The digitization of historical documents is a critical task for preserving cultural heritage and making vast amounts of information accessible to the wider public. One of the challenges in this process is separating individual articles from old newspaper images, which is significant for text analysis and information retrieval. In this work, we pres...

Benchmarking NAS for Article Separation in Historical Newspapers

Conference Paper

Nov 2023

The digitization of historical newspapers is a crucial task for preserving cultural heritage and making it accessible for various natural language processing and information retrieval tasks. One of the key challenges in digitizing old newspapers is article separation, which consists of identifying and extracting individual articles from scanned new...

Partitioning of signed social ego network G into M signed social ego...

Diagrammatic depiction of proposed SignECGRS

Can we please everyone? Group recommendations in signed social networks

Article

Full-text available

Nov 2023

The ubiquity of social networks and the unprecedented growth in web data have generated an ample resource of information for researchers as well as for market analysts to generate user-oriented recommendations. While many social recommender systems have been designed for individual users, some have been proposed for a group of users intended to han...

A Review of Deep Learning Models for Twitter Sentiment Analysis: Challenges and Opportunities

Article

Nov 2023

Microblogging site Twitter (re-branded to X since July 2023) is one of the most influential online social media websites, which offers a platform for the masses to communicate, expresses their opinions, and shares information on a wide range of subjects and products, resulting in the creation of a large amount of unstructured data. This has attract...

A Quantitative Analysis of Noise Impact on Document Ranking

Conference Paper

Oct 2023

Overview of DocILE 2023: Document Information Localization and Extraction

Chapter

Sep 2023

This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these task...

DocILE Benchmark for Document Information Localization and Extraction

Chapter

Aug 2023

This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly 1M unlabeled documents for unsupervised pre-training. The dataset has been...

Analyzing the Impact of Tokenization on Multilingual Epidemic Surveillance in Low-Resource Languages

Chapter

Aug 2023

Pre-trained language models have been widely successful, particularly in settings with sufficient training data. However, achieving similar results in low-resource multilingual settings and specialized domains, such as epidemic surveillance, remains challenging. In this paper, we propose hypotheses regarding the factors that could impact the perfor...

Receipt Dataset for Document Forgery Detection

Chapter

Aug 2023

The widespread use of unsecured digital documents by companies and administrations as supporting documents makes them vulnerable to forgeries. Moreover, image editing software and the capabilities they offer complicate the tasks of digital image forensics. Nevertheless, research in this field struggles with the lack of publicly available realistic...

Detecting Forged Receipts with Domain-Specific Ontology-Based Entities & Relations

Chapter

Aug 2023

In this paper, we tackle the task of document fraud detection. We consider that this task can be addressed with natural language processing techniques. We treat it as a regression-based approach, by taking advantage of a pre-trained language model in order to represent the textual content, and by enriching the representation with domain-specific on...

Subgraph-Induced Extraction Technique for Information (SETI) from Administrative Documents

Chapter

Aug 2023

Information Extraction plays a key role in the automation of auditing processes in administrative documents. However, variety in layout and language is always a challenging task. On the other hand, large volumes of public training datasets related to administrative documents such as invoices are rare to find. In this work, we use Graph Attention Ne...

Vision Transformer for Pneumonia Classification in X-ray Images

Conference Paper

Jul 2023

Named Entity Recognition and Classification in Historical Documents: A Survey

Article

Jun 2023

After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is...

DocILE Benchmark for Document Information Localization and Extraction

Conference Paper

Full-text available

Jun 2023

This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Local-ization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly 1M unlabeled documents for unsupervised pre-training. The dataset has bee...

Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Conference Paper

Jun 2023

An OER on digital historical research on European historical newspapers with the NewsEye platform

Article

May 2023

In this article, we introduce an Open Education Resource (OER) on digital historical research with historical newspapers,11The URL will be given with the camera-ready version of this paper. intended to give students the means to understand the induced risks in working with large collections of digitised documents, as well as the keys to benefit fro...

Statistical description of corpora (PERS = person, LOC = location, ORG...

Comparative results using the three datasets (mi- cro).

Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Preprint

Full-text available

Mar 2023

Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers....

Example of a bipartite graph with the adjacency matrix of both domains...

Illustration of node features of both domains U and V

Illustration of the embedding process for bipartite networks. The...

Illustration of multi-order relationship within a standard graph. The...

Example of structural consideration within networks. The figure...

A survey on bipartite graphs embedding

Article

Full-text available

Mar 2023

Research on graph representation learning (a.k.a. embedding) has received great attention in recent years and shows effective results for various types of networks. Nevertheless, few initiatives have been focused on the particular case of embeddings for bipartite graphs. In this paper, we first define the graph embedding problem in the case of bipa...

Injecting Temporal-Aware Knowledge in Historical Named Entity Recognition

Chapter

Full-text available

Mar 2023

In this paper, we address the detection of named entities in multilingual historical collections. We argue that, besides the multiple challenges that depend on the quality of digitization (e.g., misspellings and linguistic errors), historical documents can pose another challenge due to the fact that such collections are distributed over a long enou...

Automatic Matching and Expansion of Abbreviated Phrases Without Context

Chapter

Feb 2023

In many documents, like receipts or invoices, textual information is constrained by the space and organization of the document. The document information has no natural language context, and expressions are often abbreviated to respect the graphical layout, both at word level and phrase level. In order to analyze the semantic content of these types...

Knowledge-Based Techniques for Document Fraud Detection: A Comprehensive Study

Chapter

Feb 2023

Due to the availability of cost-effective scanners, printers, and image processing software, document fraud detection is, unfortunately, quite common nowadays. The main challenges of this task are the lack of freely available annotated data and the overflow of mainly computer vision approaches. We consider that relying on the textual content of for...

DocILE Benchmark for Document Information Localization and Extraction

Preprint

Full-text available

Feb 2023

This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been...

Archive TimeLine Summarization (ATLS): Conceptual Framework for Timeline Generation over Historical Document Collections

Preprint

Jan 2023

Archive collections are nowadays mostly available through search engines interfaces, which allow a user to retrieve documents by issuing queries. The study of these collections may be, however, impaired by some aspects of search engines, such as the overwhelming number of documents returned or the lack of contextual knowledge provided. New methods...

Ranking of solutions using dominance and non-dominance criteria...

Comparisons of top systems submitted in the shared task on CL-SciSumm...

Multi-view multi-objective clustering-based framework for scientific document summarization using citation context

Article

Full-text available

Jan 2023

Due to the expanding rate of scientific publications, it has become a necessity to summarize scientific documents to allow researchers to keep track of recent developments. In this paper, we formulate the scientific document summarization problem in a multi-view clustering (MVC) framework. Two views of the scientific documents, semantic and syntact...

Figure 1: Summary of Contextual Leap2Trend.

Contextualizing Emerging Trends in Financial News Articles

Preprint

Full-text available

Jan 2023

Identifying and exploring emerging trends in the news is becoming more essential than ever with many changes occurring worldwide due to the global health crises. However, most of the recent research has focused mainly on detecting trends in social media, thus, benefiting from social features (e.g. likes and retweets on Twitter) which helped the tas...

Fig. 1. The combination of keywords to search for term datasets and...

Fig. 2. Feature group and subgroup for machine learning models based on...

Fig. 3. An example of how three types of neural ATE classifiers work...

Fig. 4. The overview of different evaluation metrics in ATE task.

The Recent Advances in Automatic Term Extraction: A survey

Preprint

Full-text available

Jan 2023

Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. As units of knowledge in a specific field of expertise, extracted terms are not only beneficial for several terminographical tasks, but also support and...

L3I++ at SemEval-2023 Task 2: Prompting for Multilingual Complex Named Entity Recognition

Conference Paper

Full-text available

Jan 2023

An Explorative Guide on How to Detect Forged Car Insurance Claims with Language Models

Conference Paper

Jan 2023

Fig. 1: Empirical evaluation of pretrained language models on the ATE task.

Fig. 2: The general ensembling workflow.

Fig. 3: F1-score improvement by combining two best classifiers in ACTER.

Results of monolingual term extraction on the ACTER dataset.

Results of monolingual term extraction on the RSDO5 dataset.

Ensembling Transformers for Cross-domain Automatic Term Extraction

Preprint

Full-text available

Dec 2022

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ab...

Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

Chapter

Dec 2022

To prevent historical knowledge’s fading, research in event detection could facilitate access to digitized collections. In this paper, we propose a method for annotating multilingual historical documents for event detection in an unsupervised manner by leveraging entities and semantic notions of event types. We automatically annotate the documents...

Ensembling Transformers for Cross-domain Automatic Term Extraction

Conference Paper

Full-text available

Dec 2022

Utilizing Keywords Evolution in Context for Emerging Trend Detection in Scientific Publications

Conference Paper

Dec 2022

This paper studies the dynamics between how the representation of terms changes through time and its potential emergence as a trending topic in the future. Previous research focused on contrasting directly two of the most recent representations of detected keywords to form a basis for predicting emerging topics. We, thus, propose the Term Context E...

Contextualizing Emerging Trends in Financial News Articles

Conference Paper

Full-text available

Dec 2022

Can Cross-Domain Term Extraction Benefit from Cross-lingual Transfer?

Chapter

Nov 2022

Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we experiment with XLM-RoBERTa to evaluate the abilities of cross-lingual and multilingual versus monolingual learning in the cross-domain ATE t...

IJS at TextGraphs-16 Natural Language Premise Selection Task: Will Contextual Information Improve Natural Language Premise Selection?

Conference Paper

Full-text available

Oct 2022

Natural Language Premise Selection (NLPS) is a mathematical Natural Language Processing (NLP) task that retrieves a set of applicable relevant premises to support the end-user finding the proof for a particular statement. This paper evaluates the impact of Transformer-based contextual information and different fundamental similarity scores toward N...

Weighting Sliding Tiles For Writer Identification in Handwritten Musical Scores

Conference Paper

Oct 2022

A Transformer-based Sequence-labeling Approach to the Slovenian Cross-domain Automatic Term Extraction

Conference Paper

Full-text available

Sep 2022

Automatic term extraction (ATE) is a popular research task that eases the time and effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat terminology extraction as a sequence-labeling task and experiment with a Transformer-based model XLM-RoBERTa to evaluate the performance...

Assessing the impact of OCR noise on multilingual event detection over digitised documents

Article

Full-text available

Sep 2022

Event detection is a crucial task for natural language processing and it involves the identification of instances of specified types of events in text and their classification into event types. The detection of events from digitised documents could enable historians to gather and combine a large amount of information into an integrated whole, a pan...

Tracking News Stories in Short Messages in the Era of Infodemic

Chapter

Aug 2022

Tracking news stories in documents is a way to deal with the large amount of information that surrounds us everyday, to reduce the noise and to detect emergent topics in news. Since the Covid-19 outbreak, the world has known a new problem: infodemic. News article titles are massively shared on social networks and the analysis of trends and growing...

Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

Chapter

Aug 2022

This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more l...

Figure 1: Sentence transformer architecture at inference to compute...

Using contextual sentence analysis models to recognize ESG concepts

Preprint

Full-text available

Jul 2022

This paper summarizes the joint participation of the Trading Central Labs and the L3i laboratory of the University of La Rochelle on both sub-tasks of the Shared Task FinSim-4 evaluation campaign. The first sub-task aims to enrich the 'Fortia ESG taxonomy' with new lexicon entries while the second one aims to classify sentences to either 'sustainab...

Examples of historical newspaper documents [57–62]

Examples of mentions and their links to Wikipedia pages. Sentences...

Our global model architecture shown for the mention Hon. Peter...

Flow chart of the filtering module. This process is applied to all...

Example of the filter application for the mention Great Britain in a...

MELHISSA: a multilingual entity linking architecture for historical press articles

Article

Full-text available

Jun 2022

Digital libraries have a key role in cultural heritage as they provide access to our culture and history by indexing books and historical documents (newspapers and letters). Digital libraries use natural language processing (NLP) tools to process these documents and enrich them with meta-information, such as named entities. Despite recent advances...

Correction to: MELHISSA: a multilingual entity linking architecture for historical press articles

Article

Full-text available

Jun 2022

ReadOCR: A Novel Dataset and Readability Assessment of OCRed Texts

Chapter

May 2022

Results of digitisation projects sometimes suffer from the limitations of optical character recognition software which is mainly designed for modern texts. Prior work has examined the impact of OCR errors on information retrieval (IR) and downstream natural language processing (NLP) tasks. However, questions remain open regarding the actual readabi...

Diachronic Analysis of Time References in News Articles

Conference Paper

Apr 2022

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

Chapter

Apr 2022

We present the HIPE-2022 shared task on named entity processing in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, this edition confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. HIPE-2022 is part o...

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

Book

Apr 2022

Exploring Entities in Event Detection as Question Answering

Book

Apr 2022

Exploring Entities in Event Detection as Question Answering

Chapter

Apr 2022

In this paper, we approach a recent and under-researched paradigm for the task of event detection (ED) by casting it as a question-answering (QA) problem with the possibility of multiple answers and the support of entities. The extraction of event triggers is, thus, transformed into the task of identifying answer spans from a context, while also fo...

In-Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking

Article

Full-text available

Mar 2022

Named entities (NEs) are among the most relevant type of information that can be used to properly index digital documents and thus easily retrieve them. It has long been observed that NEs are key to accessing the contents of digital library portals as they are contained in most user queries. However, most digitized documents are indexed through the...

Elastic Embedded Background Linking for News Articles with Keywords, Entities and Events

Conference Paper

Full-text available

Mar 2022

In this paper, we present a collection of five flexible background linking models created for the News Track in TREC 2021 that generate ranked lists of articles to provide contextual information. The collection is based on the use of sentence embeddings indexes, created with Sentence BERT and Open Distro for ElasticSearch. For each model, we explor...

L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition

Conference Paper

Jan 2022

Selection of Human Resources Prospective Student Using SAW and AHP Methods

Chapter

Full-text available

Jan 2022

The role of Human Resources (HR) in an organization is significant, and the role of the world of education plays an essential role in producing and educating qualified and qualified human resources. In this paper, 20 students were evaluated as learning with Simple Additive Wight (SAW) and Analytic Hierarchy Process (AHP), which applied six criteria...

Adapting Transformers for Detecting Emergency Events on Social Media

Conference Paper

Jan 2022

Using Contextual Sentence Analysis Models to Recognize ESG Concepts

Conference Paper

Jan 2022

Named entity recognition architecture combining contextual and global features

Preprint

Full-text available

Dec 2021

Named entity recognition (NER) is an information extraction technique that aims to locate and classify named entities (e.g., organizations, locations,...) within a document into predefined categories. Correctly identifying these phrases plays a significant role in simplifying information access. However, it remains a difficult task because named en...

Named Entity Recognition Architecture Combining Contextual and Global Features

Chapter

Nov 2021

Named entity recognition (NER) is an information extraction technique that aims to locate and classify named entities (e.g., organizations, locations, ...) within a document into predefined categories. Correctly identifying these phrases plays a significant role in simplifying information access. However, it remains a difficult task because named e...

Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise

Book

Nov 2021

Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise

Chapter

Nov 2021

Unsupervised topic models such as Latent Dirichlet Allocation (LDA) are popular tools to analyse digitised corpora. However, the performance of these tools have been shown to degrade with OCR noise. Topic models that incorporate word embeddings during inference have been proposed to address the limitations of LDA, but these models have not seen muc...

Multilingual Epidemic Event Extraction

Chapter

Nov 2021

In this paper, we focus on epidemic event extraction in multilingual and low-resource settings. The task of extracting epidemic events is defined as the detection of disease names and locations in a document. We experiment with a multilingual dataset comprising news articles from the medical domain with diverse morphological structures (Chinese, En...

Fig. 1. Swiss journal L'Impartial, issue of 31 Dec 1918. Facsimile of...

Fig. 2. Results of a CRF, BiLSTM-CRF and BERT-based NER systems on...

Illustration of IOB tagging scheme (example 1).

Named Entity Recognition and Classification on Historical Documents: A Survey

Preprint

Full-text available

Sep 2021

Token-Level Multilingual Epidemic Dataset for Event Extraction

Chapter

Sep 2021

In this paper, we present a dataset and a baseline evaluation for multilingual epidemic event extraction. We experiment with a multilingual news dataset which we annotate at the token level, a common tagging scheme utilized in event extraction systems. We approach the task of extracting epidemic events by first detecting the relevant documents from...

A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines

Chapter

Sep 2021

In this paper, we present an efficient and accurate method to represent events from numerous public sources, such as Wikidata or more specific knowledge bases. We focus on events happening in the real world, such as festivals or assassinations. Our method merges knowledge from Wikidata and Wikipedia article summaries to gather entities involved in...

Information Extraction from Invoices

Chapter

Sep 2021

The present paper is focused on information extraction from key fields of invoices using two different methods based on sequence labeling. Invoices are semi-structured documents in which data can be located based on the context. Common information extraction systems are model-driven, using heuristics and lists of trigger words curated by domain exp...

Figure 1. Proposed melanoma prediction system (this figure was created...

Figure 3. Triangular cycle decreases the cycle amplitude by half after...

Figure 4. Loss of ORI, BON, and BLF models during training and on the...

Figure 5. Sensitivity and specificity of ORI, BON, and BLF models...

Figure 6. Performance evaluated of three best models of ORI, BON, and...

AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function

Article

Full-text available

Sep 2021

Melanoma, one of the most dangerous types of skin cancer, results in a very high mortality rate. Early detection and resection are two key points for a successful cure. Recent researches have used artificial intelligence to classify melanoma and nevus and to compare the assessment of these algorithms to that of dermatologists. However, training neu...

HistoInformatics2021: The 6 th International Workshop on Computational History

Conference Paper

Sep 2021

THE IMPLEMENTATION OF MEMETIC ALGORITHM ON IMAGE: A SURVEY

Article

Full-text available

Aug 2021

The growth of information technology is equal to the use of the algorithm. One of the most well-known algorithms is Memetic Algorithm (MA). MA is a part of the evolutionary algorithm and has been implemented on the most complex computational challenges. MA could be implemented in any field of research such as optimization, scheduling, prediction, i...

Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians

Article

Full-text available

Aug 2021

This article considers the interdisciplinary opportunities and challenges of working with digital cultural heritage, such as digitized historical newspapers, and proposes an integrated digital hermeneutics workflow to combine purely disciplinary research approaches from computer science, humanities, and library work. Common interests and motivation...

Figure 4. Speed up of HPCMA on Windows 10

Figure 7. Processing time of each Specimen

The influence of data size on a high-performance computing memetic algorithm in fingerprint dataset

Article

Full-text available

Aug 2021

The fingerprint is one kind of biometric. This biometric unique data have to be processed well and secure. The problem gets more complicated as data grows. This work is conducted to process image fingerprint data with a memetic algorithm, a simple and reliable algorithm. In order to achieve the best result, we run this algorithm in a parallel envir...

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Conference Paper

Jul 2021

Survey of Post-OCR Processing Approaches

Article

Jul 2021

Optical character recognition (OCR) is one of the most popular techniques used for converting printed documents into machine-readable ones. While OCR engines can do well with modern text, their performance is unfortunately significantly reduced on historical materials. Additionally, many texts have already been processed by various out-of-date digi...

Poster of Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Data

Jun 2021

High-performance computing memetic algorithm (HPCMA) model to process image fingerprint dataset

Article

Full-text available

May 2021

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Conference Paper

Full-text available

Apr 2021

COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Grenoble, France, April 15, 2021. ARIA 2021

Book

Apr 2021

Figure 1: Example of input modification to fit the QA paradigm for a...

Figure 2: An example of a sentence that contains two events: Die event...

Figure 3: An example for the Die event triggered by killings with three...

Figure 4: [CLS] representation of each sentence in the test set that...

Event Detection as Question Answering with Entity Information

Preprint

Full-text available

Apr 2021

In this paper, we propose a recent and under-researched paradigm for the task of event detection (ED) by casting it as a question-answering (QA) problem with the possibility of multiple answers and the support of entities. The extraction of event triggers is, thus, transformed into the task of identifying answer spans from a context, while also foc...

Figure 1: Data formatting example for the KBP 2020 RUFES dataset.

Figure 2: Detailed model proposed for entity extraction in Boros et al....

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Preprint

Full-text available

Apr 2021

This paper summarizes the participation of the Laboratoire Informatique, Image et Interaction (L3i laboratory) of the University of La Rochelle in the Recognizing Ultra Fine-grained Entities (RUFES) track within the Text Analysis Conference (TAC) series of evaluation workshops. Our participation relies on two neural-based models, one based on a pre...

A New Feature Selection and Classification Approach for Optimizing Breast Cancer Subtyping Based on Gene Expression

Chapter

Apr 2021

Breast cancer subtypes, which play a significant role in breast cancer prognosis and targeted therapy selection, can be identified with gene expression profiling. It is also beneficial for personalized treatment to know bio-markers that impact the development of cancer cells from studying gene expression. Therefore, this study uses recursive featur...

Slides of Simple Ways to Improve NER in Every Language using Markup

Data

Apr 2021

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Conference Paper

Apr 2021

We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian. These NER systems have been trained using different BERT models and a Frustratingly Easy Domain Adaptation (FEDA). FEDA allow us creating NER systems using multiple datasets without having to worry...

Simple Ways to Improve NER in Every Language using Markup

Conference Paper

Apr 2021

We explore three different methods for improving Named Entity Recognition (NER) systems based on BERT, each responding to one of three potential issues: the processing of uppercase tokens, the detection of entity boundaries and low generalization. Specifically, we first explore the marking of uppercase tokens for providing extra casing information....

Determining image age with rank-consistent ordinal classification and object-centered ensemble

Conference Paper

Mar 2021

Deep Multimodal learning for Cross-Modal Retrieval: one model for all tasks

Article

Mar 2021

We investigate the effectiveness of a successful model in Visual-Question-Answering (VQA) problems as the core component in a cross-modal retrieval system that can accept images or text as queries, in order to retrieve relevant data from a multimodal document collection. To this end, we adapt the VQA model for deep multimodal learning to combine vi...

Event Detection with Entity Markers

Chapter

Mar 2021

Event detection involves the identification of instances of specified types of events in text and their classification into event types. In this paper, we approach the event detection task as a relation extraction task. In this context, we assume that the clues brought by the entities participating in an event are important and could improve the pe...

Measuring memetic algorithm performance on image fingerprints dataset

Article

Full-text available

Feb 2021

Personal identification has become one of the most important terms in our society regarding access control, crime and forensic identification, banking and also computer system. The fingerprint is the most used biometric feature caused by its unique, universality and stability. The fingerprint is widely used as a security feature for forensic recogn...

Multi-Attribute Learning With Highly Imbalanced Data

Conference Paper

Jan 2021

Dragonfly algorithm in 2020: Review

Article

Full-text available

Jan 2021

Swarm Intelligence is the meta-heuristic algorithm that is inspired by the natural behavior of some groups of animals (like dragonfly, ants, ducks, etc.) striving for their life existence. One of them is Dragonfly Algorithm. Dragonfly Algorithm has been used to solve real-world nonlinear problems in engineering. In this paper, we reviewed Dragonfly...

Multi-TimeLine Summarization (MTLS): Improving Timeline Summarization by Generating Multiple Summaries

Conference Paper

Jan 2021

Named Entity Recognition Architecture Combining Contextual and Global Features

Book

Jan 2021

Multilingual Epidemic Event Extraction

Book

Full-text available

Jan 2021

Entity Linking for Historical Documents: Challenges and Solutions

Conference Paper

Nov 2020

Named entities (NEs) are among the most relevant type of information that can be used to efficiently index and retrieve digital documents. Furthermore, the use of Entity Linking (EL) to disambiguate and relate NEs to knowledge bases, provides supplementary information which can be useful to differentiate ambiguous elements such as geographical loca...

When to Use OCR Post-correction for Named Entity Recognition?

Chapter

Nov 2020

In the last decades, a huge number of documents has been digitised, before undergoing optical character recognition (OCR) to extract their textual content. This step is crucial for indexing the documents and to make the resulting collections accessible. However, the fact that documents are indexed through their OCRed content is posing a number of p...