ArticlePDF Available

Abstract

This is a report on the tenth edition of the \textsl{Conference and Labs of the Evaluation Forum} (CLEF 2020), (virtually) held from September 22--25, 2020, in Thessaloniki, Greece. CLEF was a four day event combining a Conference and an Evaluation Forum. The Conference featured keynotes by Ellen Voorhees and Yiannis Kompasiaris, and presentation of peer reviewed research papers covering a wide range of topics in addition to many posters. The Evaluation Forum consisted to twelve Labs: ARQMath, BioASQ, CheckThat!, ChEMU, CLEF eHealth, eRisk, HIPE, ImageCLEF, LifeCLEF, LiLAS, PAN, and Touch\'{e}, addressing a wide range of tasks, media, languages, and ways to go beyond standard test collections.
EVENT REPORT
Report on CLEF 2020
Avi Arampatzis
Democritus University of Thrace, Greece
avi@ee.duth.gr
Linda Cappellato
University of Padua, Italy
cappellato@dei.unipd.it
Carsten Eickhoff
Brown University, USA
carsten@brown.edu
Nicola Ferro
University of Padua, Italy
ferro@dei.unipd.it
Hideo Joho
University of Tsukuba, Japan
hideo@slis.tsukuba.ac.jp
Evangelos Kanoulas
University of Amsterdam, The Netherlands
e.kanoulas@uva.nl
Christina Lioma
University of Copenhagen, Denmark
c.lioma@di.ku.dk
Aur´elie N´ev´eol
LIMSI, CNRS, France
neveol@limsi.fr
Theodora Tsikrika
Information Technologies Institute, CERTH, Greece
theodora.tsikrika@iti.gr
Stefanos Vrochidis
Information Technologies Institute, CERTH, Greece
stefanos@iti.gr
Abstract
This is a report on the tenth edition of the Conference and Labs of the Evaluation Forum
(CLEF 2020), (virtually) held from September 22–25, 2020, in Thessaloniki, Greece.
CLEF was a four day event combining a Conference and an Evaluation Forum. The Con-
ference featured keynotes by Ellen Voorhees and Yiannis Kompasiaris, and presentation of
ACM SIGIR Forum 1 Vol. 54 No. 2 December 2020
peer reviewed research papers covering a wide range of topics in addition to many posters.
The Evaluation Forum consisted to twelve Labs: ARQMath, BioASQ, CheckThat!, ChEMU,
CLEF eHealth, eRisk, HIPE, ImageCLEF, LifeCLEF, LiLAS, PAN, and Touce, addressing
a wide range of tasks, media, languages, and ways to go beyond standard test collections.
1 Introduction
The 2020 edition of the Conference and Labs of the Evaluation Forum1(CLEF) was jointly orga-
nized by the Center for Research and Technology Hellas (CERTH), the University of Amsterdam,
and the Democritus University of Thrace, and it was expected to be hosted by CERTH, and in
particular by the Multimedia Knowledge and Social Media Analytics Laboratory of its Informa-
tion Technologies Institute, at the premises of CERTH, in Thessaloniki, Greece from 22th to 25th
September 2020.
The outbreak of the Covid-19 pandemic in early 2020 affected the organization of CLEF 2020.
The CLEF steering committee along with the organizers of CLEF 2020, after detailed discussions,
decided to run the conference fully virtually. The conference format remained the same as in past
years, and consisted of keynotes, contributed papers, lab sessions, and poster sessions, including
reports from other benchmarking initiatives from around the world. All sessions were organized
and run online.
CLEF was established in 2000 as a spin-off of the TREC Cross-Language Track with a focus
on stimulating research and innovation in multimodal and multilingual information access and
retrieval [1,2]. Over the years, CLEF has fostered the creation of language resources in many
European and non-European languages, promoted the growth of a vibrant and multidisciplinary
research community, provided sizable improvements in the performance of monolingual, bilingual,
and multilingual information access systems [3], and achieved a substantial scholarly impact [4,
5,6].
In its first 10 years, CLEF hosted a series of experimental labs that reported their results
at an annual workshop held in conjunction with the European Conference on Digital Libraries
(ECDL). In 2010, now a mature and well-respected evaluation forum, CLEF expanded to include
a complementary peer-reviewed conference for discussion of advancing evaluation methodologies
and reporting the evaluation of information access and retrieval systems regardless of data type,
format, language, etc. Moreover, the scope of the evaluation labs was broadened, to comprise
not only multilinguality but also multimodality in information access. Multimodality here is
intended not only as the ability to deal with information coming in multiple media but also
in different modalities, e.g. the Web, social media, news streams, specific domains and so on.
Since 2010, the CLEF conference has established a format with keynotes, contributed papers, lab
sessions, and poster sessions, including reports from other benchmarking initiatives from around
the world. Since 2013, CLEF has been supported by an association, a lightweight not-for-profit
legal entity that thanks to the financial support of the CLEF community takes care of the small
central coordination needed to operate CLEF on an ongoing basis and makes it a self-sustaining
activity [1].
CLEF 2020 continued the initiative introduced in the 2019 edition during which, the European
1http://clef2020.clef-initiative.eu/
ACM SIGIR Forum 2 Vol. 54 No. 2 December 2020
Conference for Information Retrieval (ECIR) and CLEF joined forces: ECIR 2020 hosted a special
session dedicated to CLEF Labs where lab organizers present the major outcomes of their Labs
and their plans for ongoing activities, followed by a poster session to favour discussion during the
conference. This was reflected in the ECIR 2020 proceedings, where CLEF Lab activities and
results were reported as short papers. The goal was not only to engage the ECIR community
in CLEF activities but also to disseminate the research results achieved during CLEF evaluation
cycles as submission of papers to ECIR.
CLEF 2020 ran as an online, free of charge event, thanks to the BCS-IRSG and the ACM-
SIGIR Friends sponsorship. This gave the opportunity to researchers around the globe for remote
participation. In total, 673 individuals registered to attend the conference, with approximately
11.4% coming from Asia, 21.5% coming from the Americas, and 67.1% from Europe and Africa
(only the timezone of participants was known to the organizers). The online program was run
using Zoom Webinar for the plenary sessions, and Zoom Meetings for the Lab sessions. The
number of attendees per plenary was approximately 80 individuals, while the number of attendees
for the lab sessions varied. The organizers scheduled all Zoom sessions ahead of time, assigning
different coordinators to the different sessions. Unfortunately, it was not possible to organize
social activities (random encounters, social events, etc.), mainly due to limited solutions available.
Several options however have sprung during the recent months that future online editions of
conferences could adopt.
2 The CLEF Conference
CLEF 2020 continued the focus of the CLEF conference on “experimental IR”, as carried out
at evaluation forums (CLEF Labs, TREC, NTCIR, FIRE, MediaEval, RomIP, TAC, etc.), with
special attention to the challenges of multimodality, multilinguality, and interactive search. We
invited submissions on significant new insights demonstrated on the resulting IR test collections,
on analysis of IR test collections and evaluation measures, as well as on concrete proposals to push
the boundaries of the Cranfield/TREC/CLEF paradigm [7].
Keynotes The following scholars were invited to give a keynote talk at the CLEF 2020 confer-
ence.
Ellen Voorhees (NIST, USA) delivered a talk entitled “Building Reusable Test Collections”
which focused on reviewing various approaches for building fair, reusable test collections with
large documents sets.
Yiannis Kompasiaris (CERTH-ITI, Greece) gave a speech on “Social media mining for sensing
and responding to real-world trends and events”, presenting the unique opportunity social media
offer to discover, collect, and extract relevant information that provides useful insights in areas
ranging from news to environmental and security topics, while addressing key challenges and
issues, such as fighting misinformation and analysing multimodal and multilingual information.
Other Evaluation Initiatives Ellen Voorhees (NIST, USA) briefly introduced TREC2(Text
REtrieval Conference) of which the purpose is to support research within the information retrieval
2https://trec.nist.gov/
ACM SIGIR Forum 3 Vol. 54 No. 2 December 2020
community by providing the infrastructure necessary for large-scale evaluation of text retrieval
methodologies. Then, she presented in detail the recent TREC-COVID3initiative whose goals
are: (i) to evaluate search algorithms and systems for helping scientists, clinicians, policy mak-
ers, and others manage the existing and rapidly growing corpus of scientific literature related to
COVID-19, and (ii) to discover methods that will assist with managing scientific information in
future global biomedical crises. Makoto P. Kato (University of Tsukuba, Japan) presented NT-
CIR4(NII Testbeds and Community for Information access Research), which promotes research
in information access technologies with a special focus on East Asian languages and English.
Prasenjit Majumder (DA-IICT, India) introduced FIRE5, which fosters the development of mul-
tilingual information access systems for the Indian sub-continent and explores new domains like
plagiarism detection, legal information access, mixed script information retrieval and spoken doc-
ument retrieval. Finally, Gareth Jones (Dublin City University, Ireland) presented MediaEval6,
the benchmarking initiative for multimedia evaluation, including speech, audio, visual content,
tags, users, and context.
Technical Program CLEF 2020 received a total of nine submissions, of which a total of seven
papers (five long, two short) were accepted. Each submission was reviewed by three program
committee members, and the program chairs oversaw the reviewing and follow-up discussions.
Seven countries are represented in the accepted papers where many of them were a product of
international collaboration. This year, researchers addressed the following important challenges
in the community: a large-scale evaluation of translation effects in academic search, advancement
of assessor-driven aggregation methods for efficient relevance assessments, development of a new
test collection or dataset for 1) missing data detection methods in knowledge-base, 2) Russian
reading comprehension and 3) under-resourced languages such as Amharic (Ethiopia), revisiting
the concept of session boundaries with fresh eyes, and development of argumentative document
retrieval methods.
Like in previous editions since 2015, CLEF 2020 continued inviting CLEF lab organizers to
nominate a “best of the labs” paper that was reviewed as a full paper submission to the CLEF
2020 conference according to the same review criteria and PC. Seven full papers were accepted
for this section.
3 The CLEF Lab Sessions
Fifteen lab proposals were received and evaluated in peer review based on their innovation potential
and the quality of the resources created. To identify the best proposals, well-established criteria
from previous editions of CLEF were applied like, for example, topical relevance, novelty, potential
impact on future world affairs, likely number of participants, and the quality of the organizing
consortium. This year we further stressed the connection to real-life usage scenarios and we tried
to avoid as much as possible overlaps among labs in order to promote synergies and integration.
3https://ir.nist.gov/covidSubmit/
4http://research.nii.ac.jp/ntcir/
5http://fire.irsi.res.in/
6http://multimediaeval.org/
ACM SIGIR Forum 4 Vol. 54 No. 2 December 2020
The 12 selected labs represented scientific challenges based on new data sets and real world
problems in multimodal and multilingual information access. These data sets provide unique
opportunities for scientists to explore collections, develop solutions for these problems, receive
feedback on the performance of their solutions, and discuss the issues with peers at the workshops.
The 12 labs running as part of CLEF 2020 comprised new labs (ARQMath, CheMU, HIPE,
LiLAS and Touch´e) as well as seasoned labs that offered previous editions at CLEF (CheckThat!,
CLEF eHealth, eRisk, ImageCLEF, LifeCLEF and PAN) or in other platforms (BioASQ). Details
of the individual labs are described by the lab organizers in the CLEF Working Notes [8]. We
only provide a brief overview of them here.
ARQMath: Answer Retrieval for Mathematical Questions7considers the problem of
finding answers to new mathematical questions among posted answers on the community
question answering site Math Stack Exchange. The goals of the lab are to develop methods
for mathematical information retrieval based on both text and formula analysis [9].
BioASQ8challenges researchers with large-scale biomedical semantic indexing and question
answering (QA). The challenges include tasks relevant to hierarchical text classification,
machine learning, information retrieval, QA from texts and structured data, multi-document
summarization and many other areas. The aim of the BioASQ workshop is to push the
research frontier towards systems that use the diverse and voluminous information available
online to respond directly to the information needs of biomedical scientists [10].
CheckThat!: Identification and Verification of Political Claims9aims to foster the de-
velopment of technology capable of both spotting and verifying check-worthy claims in po-
litical debates in English, Arabic and Italian. The concrete tasks were to assess the check-
worthiness of a claim in a tweet, check if a (similar) claim has been previously verified,
retrieve evidence to fact-check a claim, and verify the factuality of a claim [11].
ChEMU: Information Extraction from Chemical Patents10 proposes two key informa-
tion extraction tasks over chemical reactions from patents. Task 1 aims to identify chem-
ical compounds and their specific types, i.e. to assign the label of a chemical compound
according to the role which it plays within a chemical reaction. Task 2 requires identifi-
cation of event trigger words (e.g. “added” and “stirred”) which all have the same type of
“EVENT TRIGGER”, and then determination of the chemical entity arguments of these
events [12].
CLEF eHealth11 aims to support the development of techniques to aid laypeople, clinicians
and policy-makers in easily retrieving and making sense of medical content to support their
decision making. The goals of the lab are to develop processing methods and resources in
a multilingual setting to enrich difficult-to-understand eHealth texts and provide valuable
documentation [13].
7https://www.cs.rit.edu/~dprl/ARQMath/
8http://www.bioasq.org/workshop2020
9https://sites.google.com/view/clef2020-checkthat
10http://chemu.eng.unimelb.edu.au/
11http://clef-ehealth.org/
ACM SIGIR Forum 5 Vol. 54 No. 2 December 2020
eRisk: Early Risk Prediction on the Internet12 explores challenges of evaluation methodol-
ogy, effectiveness metrics and other processes related to early risk detection. Early detection
technologies can be employed in different areas, particularly those related to health and
safety. The 2020 edition of the lab focused on texts written in social media for the early
detection of signs of self-harm and depression [14].
HIPE: Named Entity Processing on Historical Newspapers13 aims at fostering named
entity recognition on heterogeneous, historical and noisy inputs. The goals of the lab are to
strengthen the robustness of existing approaches on non-standard input; to enable perfor-
mance comparison of named entity processing on historical texts; and, in the long run, to
foster efficient semantic indexing of historical documents in order to support scholarship on
digital cultural heritage collections [15].
ImageCLEF: Multimedia Retrieval14 provides an evaluation forum for visual media analy-
sis, indexing, classification/learning, and retrieval in medical, nature, security and lifelogging
applications with a focus on multimodal data, so data from a variety of sources and me-
dia [16].
LifeCLEF: Biodiversity Identification and Prediction15 aims at boosting research on the
identification and prediction of living organisms in order to solve the taxonomic gap and im-
prove our knowledge of biodiversity. Through its biodiversity informatics related challenges,
LifeCLEF is intended to push the boundaries of the state-of-the-art in several research di-
rections at the frontier of multimedia information retrieval, machine learning and knowledge
engineering [17].
LiLAS: Living Labs for Academic Search16 aims to bring together researchers interested in
the online evaluation of academic search systems. The long term goal is to foster knowledge
on improving the search for academic resources like literature, research data, and the inter-
linking between these resources in fields from the Life Sciences and the Social Sciences. The
immediate goal of this lab is to develop ideas, best practices, and guidelines for a full online
evaluation campaign at CLEF 2021 [18].
PAN: Digital Text Forensics and Stylometry17 is a networking initiative for the digital
text forensics, where researchers and practitioners study technologies that analyze texts
with regard to originality, authorship, and trustworthiness. PAN provides evaluation re-
sources consisting of large-scale corpora, performance measures, and web services that allow
for meaningful evaluations. The main goal is to provide for sustainable and reproducible
evaluations, to get a clear view of the capabilities of state-of-the-art-algorithms [19].
12http://erisk.irlab.org/
13https://impresso.github.io/CLEF-HIPE-2020/
14https://www.imageclef.org/2019
15http://www.lifeclef.org/
16https://clef-lilas.github.io/
17http://pan.webis.de/
ACM SIGIR Forum 6 Vol. 54 No. 2 December 2020
Touch´e: Argument retrieval18 is the first shared task on the topic of argument retrieval.
Decision making processes, be it at the societal or at the personal level, eventually come to
a point where one side will challenge the other with a why-question, which is a prompt to
justify one’s stance. Thus, technologies for argument mining and argumentation processing
are maturing at a rapid pace, giving rise for the first time to argument retrieval [20].
As a group, the 71 lab organizers were based in 14 countries, with Germany, and France leading
the distribution. Despite CLEF’s traditionally Europe-based audience, 18 (25.4%) organizers were
affiliated with international institutions outside of Europe. The gender distribution was biased
towards 81.3% male organizers.
More information on the CLEF 2020 conference, the CLEF initiative and the CLEF Association
is provided on the Web:
CLEF 2020: http://clef2020.clef-initiative.eu/
CLEF initiative: http://www.clef-initiative.eu/
CLEF Association: http://www.clef-initiative.eu/association
4 CLEF 2021 and Beyond
CLEF 2021 will be hosted by the University “Politehnica” of Bucharest, Romania, 21-24 Septem-
ber 2021.
More information on CLEF 2020, the call for papers and the ongoing labs are available at:
http://clef2021.clef-initiative.eu/
As far as labs are concerned, CLEF 2021 will run 12 evaluation activities out of 15 proposals
received: 11 will be a continuation of the labs running during CLEF 2020 and 1 will be a new
pilot lab.
The continued activities are:
ARQMath: Answer Retrieval for Questions on Math (https://www.cs.rit.edu/~dprl/
ARQMath/);
BioASQ: Large-scale Biomedical Semantic Indexing and Question Answering (http://www.
bioasq.org/workshop2021);
CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and
Fake News (https://sites.google.com/view/clef2021-checkthat);
ChEMU: Cheminformatics Elsevier Melbourne University lab (http://chemu.eng.unimelb.
edu.au/);
CLEF eHealth: Retrieving and Making Sense of Medical Content (https://clefehealth.
imag.fr/);
18https://events.webis.de/touche-20/
ACM SIGIR Forum 7 Vol. 54 No. 2 December 2020
eRisk: Early Risk Prediction on the Internet (http://early.irlab.org/);
ImageCLEF: Multimedia Retrieval Challenge in CLEF (https://www.imageclef.org/2021);
LifeCLEF: Multimedia Life Species Identification (https://www.imageclef.org/LifeCLEF2021);
LiLAS: Living Labs for Academic Search (https://clef-lilas.github.io);
PAN: Lab on Digital Text Forensics and Stylometry (https://pan.webis.de/).
Touch´e: Argument Retrieval (http://touche.webis.de/).
The new activity is:
SimpleText-2021: (Re)Telling Right Scientific Stories to Non-specialists via Text Simplifica-
tion (https://www.irit.fr/simpleText/).
CLEF 2022 will be hosted by University of Bologna, Italy, in early September 2022.
CLEF 2023 will be hosted by CERTH-ITI, Greece, in early September 2023.
Finally, bids for hosting CLEF 2024 are now open and will close around July 2021. Proposals
can be sent to the CLEF Steering Committee Chair at chair@clef-initiative.eu.
Acknowledgments
The success of CLEF 2020 would not have been possible without the huge effort of several people
and organizations, including the CLEF Association19 , the program committee, the lab organizing
committee, the local organization committee in Thessaloniki, the reviewers, and the many students
and volunteers who contributed along the way.
We gratefully acknowledge the support we received from our sponsors: ACM SIGIR20 and the
Information Retrieval Specialist Group (IRSG)21 of BCS (The Chartered Institute for IT).
Last but not least, without the important and tireless effort of the enthusiastic and creative
authors, the organizers of the selected labs, the colleagues and friends involved in running them,
and the participants who contribute their time to making the labs and the conference a success, as
well as financially supporting them through the CLEF Association, CLEF would not be possible.
Thank you all very much!
References
[1] N. Ferro. What Happened in CLEF... For a While? In F. Crestani, M. Braschler, J. Savoy,
A. Rauber, H. M¨uller, D. E. Losada, G. Heinatz B¨urki, L. Cappellato, and N. Ferro, editors,
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the
Tenth International Conference of the CLEF Association (CLEF 2019), pages 3–45. Lecture
Notes in Computer Science (LNCS) 11696, Springer, Heidelberg, Germany, 2019.
19http://www.clef-initiative.eu/association
20http://sigir.org/
21https://irsg.bcs.org/
ACM SIGIR Forum 8 Vol. 54 No. 2 December 2020
[2] N. Ferro and C. Peters, editors. Information Retrieval Evaluation in a Changing World –
Lessons Learned from 20 Years of CLEF, volume 41 of The Information Retrieval Series,
2019. Springer International Publishing, Germany.
[3] N. Ferro and G. Silvello. 3.5K runs, 5K topics, 3M assessments and 70M measures: What
trends in 10 years of Adhoc-ish CLEF? Information Processing & Management, 53(1):175–
202, January 2017.
[4] B. Larsen. The Scholarly Impact of CLEF 2010-2017. In Ferro and Peters [2], pages 547–554.
[5] T. Tsikrika, A. Garcia Seco de Herrera, and H. M¨uller. Assessing the Scholarly Impact of
ImageCLEF. In P. Forner, J. Gonzalo, J. Kek¨al¨ainen, M. Lalmas, and M. de Rijke, editors,
Multilingual and Multimodal Information Access Evaluation. Proceedings of the Second Inter-
national Conference of the Cross-Language Evaluation Forum (CLEF 2011), pages 95–106.
Lecture Notes in Computer Science (LNCS) 6941, Springer, Heidelberg, Germany, 2011.
[6] T. Tsikrika, B. Larsen, H. M¨uller, S. Endrullis, and E. Rahm. The Scholarly Impact of CLEF
(2000–2009). In P. Forner, H. M¨uller, R. Paredes, P. Rosso, and B. Stein, editors, Information
Access Evaluation meets Multilinguality, Multimodality, and Visualization. Proceedings of the
Fourth International Conference of the CLEF Initiative (CLEF 2013), pages 1–12. Lecture
Notes in Computer Science (LNCS) 8138, Springer, Heidelberg, Germany, 2013.
[7] A. Arampatzis, E. Kanoulas, T. Tsikrika, S. Vrochidis, H. Joho, C. Lioma, K. Eickhoff,
A. N´ev´eol, L. Cappellato, and N. Ferro, editors. Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Eleventh International Conference of the
CLEF Association (CLEF 2020), 2020. Lecture Notes in Computer Science (LNCS) 12260,
Springer, Heidelberg, Germany.
[8] L. Cappellato, C. Eickhoff, N. Ferro, and A. N´ev´eol, editors. CLEF 2020 Working Notes,
2020. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.
org/Vol-2696/.
[9] R. Zanibbi, D. W. Oard, A. Agarwal, and B. Mansouri. Overview of ARQMath 2020: CLEF
Lab on Answer Retrieval for Questions on Math. In Arampatzis et al. [7], pages 169–193.
[10] A. Nentidis, A. Krithara, K. Bougiatiotis, M. Krallinger, C. Rodr´ıguez Penagos, M. Villegas,
and G. Paliouras. Overview of BioASQ 2020: The Eighth BioASQ Challenge on Large-
Scale Biomedical Semantic Indexing and Question Answering. In Arampatzis et al. [7], pages
194–214.
[11] A. Barr´on-Cede˜no, T. Elsayed, P. Nakov, G. Da San Martino, M. Hasanain, R. Suwaileh,
F. Haouari, N. Babulkov, B. Hamdan, A. Nikolov, S. Shaar, and Z. Sheikh Ali. Overview of
CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. In
Arampatzis et al. [7], pages 215–236.
[12] J. He, D. Q. Nguyen, S. A. Akhondi, C. Druckenbrodt, C. Thorne, R. Hoessel, Z. Afzal,
Z. Zhai, B. Fang, H. Yoshikawa, A. Albahem, L. Cavedon, T. Cohn, T. Baldwin, and K. Ver-
spoor. Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chem-
ical Reactions from Patents. In Arampatzis et al. [7], pages 237–254.
ACM SIGIR Forum 9 Vol. 54 No. 2 December 2020
[13] L. Goeuriot, H. Suominen, L. Kelly, A. Miranda-Escalada, M. Krallinger, Z. Liu, G. Pasi,
G. Gonzalez Saez, M. Viviani, and C. Xu. Overview of the CLEF eHealth Evaluation Lab
2020. In Arampatzis et al. [7], pages 255–271.
[14] D. E. Losada, F. Crestani, and J. Parapar. Overview of eRisk 2020: Early Risk Prediction
on the Internet. In Arampatzis et al. [7], pages 272–287.
[15] M. Ehrmann, M. Romanello, A. Fl¨uckiger, and S. Clematide. Overview of CLEF HIPE 2020:
Named Entity Recognition and Linking on Historical Newspapers. In Arampatzis et al. [7],
pages 288–310.
[16] B. Ionescu, H. M¨uller, R. P´eteri, A. Ben Abacha, V. V. Datla, S. A. Hasan, D. Demner-
Fushman, S. Kozlovski, V. Liauchuk, Y. Dicente Cid, V. Kovalev, O. Pelka, C. M. Friedrich,
A. Garc´ıa Seco de Herrera, V.-T. Ninh, T.-K. Le, L. Zhou, L. Piras, M. Riegler, P. Halvorsen,
M.-T. Tran, M. Lux, C. Gurrin, D.-T. Dang-Nguyen, J. Chamberlain, A. Clark, A. Campello,
D. Fichou, R. Berari, P. Brie, M. Dogariu, L.-D. Stefan, and M. G. Constantin. Overview
of the ImageCLEF 2020: Multimedia Retrieval in Medical, Lifelogging, Nature, and Internet
Applications. In Arampatzis et al. [7], pages 311–341.
[17] A. Joly, H. Go¨eau, S. Kahl, B. Deneu, M. Servajean, E. Cole, L. Picek, R. L. Ruiz
De Casta˜neda, I. Bolon, A. Durso, T. Lorieul, C. Botella, H. Glotin, J. Champ, I. Eggel,
W.-P. Vellinga, P. Bonnet, and H. M¨uller. Overview of LifeCLEF 2020: A System-Oriented
Evaluation of Automated Species Identification and Species Distribution Prediction. In Aram-
patzis et al. [7], pages 342–363.
[18] P. Schaer, J. Schaible, and L. Jael Garc´ıa Castro. Overview of LiLAS 2020 - Living Labs for
Academic Search. In Arampatzis et al. [7], pages 364–371.
[19] J. Bevendorff, B. Ghanem, A. Giachanou, M. Kestemont, E. Manjavacas, I. Markov, M. May-
erl, M. Potthast, F. M. Rangel Pardo, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Wieg-
mann, and E. Zangerle. Overview of PAN 2020: Authorship Verification, Celebrity Profiling,
Profiling Fake News Spreaders on Twitter, and Style Change Detection. In Arampatzis et al.
[7], pages 372–383.
[20] A. Bondarenko, M. Fr¨obe, M. Beloucif, L. Gienapp, Y. Ajjour, A. Panchenko, C. Biemann,
B. Stein, H. Wachsmuth, M. Potthast, and M. Hagen. Overview of Touch´e 2020: Argument
Retrieval - Extended Abstract. In Arampatzis et al. [7], pages 384–395.
ACM SIGIR Forum 10 Vol. 54 No. 2 December 2020
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Building accurate knowledge of the identity, the geographic distribution and the evolution of species is essential for the sustainable development of humanity, as well as for biodiversity conservation. However, the difficulty of identifying plants and animals in the field is hindering the aggregation of new data and knowledge. Identifying and naming living plants or animals is almost impossible for the general public and is often difficult even for professionals and naturalists. Bridging this gap is a key step towards enabling effective biodiversity monitoring systems. The LifeCLEF campaign, presented in this paper, has been promoting and evaluating advances in this domain since 2011. The 2020 edition proposes four data-oriented challenges related to the identification and prediction of biodiversity: (i) PlantCLEF: cross-domain plant identification based on herbarium sheets (ii) BirdCLEF: bird species recognition in audio soundscapes, (iii) GeoLifeCLEF: location-based prediction of species based on environmental and occurrence data, and (iv) SnakeCLEF: snake identification based on image and geographic location.
Chapter
Academic Search is a timeless challenge that the field of Information Retrieval has been dealing with for many years. Even today, the search for academic material is a broad field of research that recently started working on problems like the COVID-19 pandemic. However, test collections and specialized data sets like CORD-19 only allow for system-oriented experiments, while the evaluation of algorithms in real-world environments is only available to researchers from industry. In LiLAS, we open up two academic search platforms to allow participating researchers to evaluate their systems in a Docker-based research environment. This overview paper describes the motivation, infrastructure, and two systems LIVIVO and GESIS Search that are part of this CLEF lab.
Chapter
We briefly report on the four shared tasks organized as part of the PAN 2020 evaluation lab on digital text forensics and authorship analysis. Each tasks is introduced, motivated, and the results obtained are presented. Altogether, the four tasks attracted 230 registrations, yielding 83 successful submissions. This, and the fact that we continue to invite the submissions of software rather than its run output using the TIRA experimentation platform, marks for a good start into the second decade of PAN evaluations labs.
Chapter
The ARQMath Lab at CLEF considers finding answers to new mathematical questions among posted answers on a community question answering site (Math Stack Exchange). Queries are question posts held out from the searched collection, each containing both text and at least one formula. This is a challenging task, as both math and text may be needed to find relevant answer posts. ARQMath also includes a formula retrieval sub-task: individual formulas from question posts are used to locate formulae in earlier question and answer posts, with relevance determined considering the context of the post from which a query formula is taken, and the posts in which retrieved formulae appear.
Chapter
This paper presents an overview of the first edition of HIPE (Identifying Historical People, Places and other Entities), a pioneering shared task dedicated to the evaluation of named entity processing on historical newspapers in French, German and English. Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. In this context, the objective of HIPE, run as part of the CLEF 2020 conference, is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents. Tasks, corpora, and results of 13 participating teams are presented.
Chapter
In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements.
Chapter
In this paper, we provide an overview of the Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020). The ChEMU evaluation lab focuses on information extraction over chemical reactions from patent texts. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 addresses chemical named entity recognition, the identification of chemical compounds and their specific roles in chemical reactions. Task 2 focuses on event extraction, the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. Herein, we describe the resources created for these tasks and the evaluation methodology adopted. We also provide a brief summary of the participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than our baseline methods.
Chapter
This paper presents an overview of the ImageCLEF 2020 lab that was organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2020. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2020, the 18th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks, i.e., caption analysis, tuberculosis prediction, and medical visual question answering and question generation, (ii) a lifelog task (videos, images and other sources) about daily activity understanding, retrieval and summarization, (iii) a coral task about segmenting and labeling collections of coral reef images, and (iv) a new Internet task addressing the problems of identifying hand-drawn user interface components. Despite the current pandemic situation, the benchmark campaign received a strong participation with over 40 groups submitting more than 295 runs.
Chapter
This paper provides an overview of eRisk 2020, the fourth edition of this lab under the CLEF conference. The main purpose of eRisk is to explore issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection. Early detection technologies can be employed in different areas, particularly those related to health and safety. This edition of eRisk had two tasks. The first task focused on early detecting signs of self-harm. The second task challenged the participants to automatically filling a depression questionnaire based on user interactions in social media.
Chapter
We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. The lab is completed with Task 5 on check-worthiness estimation in political debates and speeches. A total of 67 teams registered to participate in the lab (up from 47 at CLEF 2019), and 23 of them actually submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over the baselines on all tasks. Here we describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants, and we discuss some lessons learned. Last but not least, we release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.