Mohamed Khemakhem's scientific contributions

Publications (5)

Thesis
Dictionaries could be considered as the most comprehensive reservoir of human knowledge, which carry not only the lexical description of words in one or more languages, but also the common awareness of a certain communityabout every known piece of knowledge in a time frame. Print dictionaries are the principle resources which enable the documentati...
Conference Paper
Full-text available
In this article, we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely Part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demo...
Preprint
Full-text available
Lexical Markup Framework (LMF) or ISO 24613 [1] is a de jure standard that provides a framework for modelling and encoding lexical information in retrodigitised print dictionaries and NLP lexical databases. An in-depth review is currently underway within the standardisation subcommittee , ISO-TC37/SC4/WG4, to find a more modular, flexible and durab...
Conference Paper
Full-text available
In this paper, we present a generic workflow for retro-digitizing and structuring large entry-based documents, using the 33.000 entries of Internationale Bibliographie der Lexikographie, by Herbert Ernst Wiegand, as an example (published in four volumes (Wiegand 2006-2014)). The goal is to convert the large bibliography, at present available as col...

Citations

... The first task of automatically structuring dictionaries had been already partially covered by the work of Khemakhem et al. (2017Khemakhem et al. ( , 2018 who developed GROBIDdictionaries, a submodule of GROBID 1 (Grobid contributors, 2008 -2018) implementing a Java machine learning library for structuring digitized lexical resources in TEI format (TEI Consortium, eds, 2018), to enable analysis, extraction and structuring of textual information in such resources. GROBID-dictionaries had already obtained promising results and performances (Khemakhem, 2020), so much so that it was used to make a first annotation of the Dictionnaire Universel macrostructure. ...
... L'ILC è attivo su questo versante attraverso la partecipazione a progetti e iniziative finalizzate alla pubblicazione come "linked data" dei dati all'interno di importanti risorse del patrimonio culturale (es. risorse lessicografiche, Khan et al., 2017Khan et al., , 2020. ...
... Transkribus offered OCR capability, providing an inbuilt ABBYY Finereader function, before licensing issues in 2021. Out of the 10 indexed materials which mentioned OCR, only 2 abstracts included a description of using this function within Transkribus (Lindemann et al. 2018;Ströbel and Clematide 2019) while others used OCR externally through self-built platforms or those supplied by ABBYY (n = 2), suggesting that the licensing issue did not impact users greatly. Others mentioned OCR in comparison to HTR, comparing their accuracy rates in deciphering text (n = 6). ...
... In Khemakhem et al., 2018b, we present promising preliminary results of first experiments carried out to extract macro-structures of the entries of these directories. The OCR quality was a serious obstacle for us to pursue more advanced experiments. ...