Conference PaperPDF Available

Implications of the New Regulation Proposed by the European Commission on Automatic Content Moderation



In April 2021 the European Commission (EC) proposed a new regulation to establish a regulatory structure for the risk assessment of Artificial Intelligence (AI) systems and applications. The intended goal of initiating a harmonised legal framework for the European Union (EU) poses new challenges in developing countermeasures for hate speech and fake news detection. This analysis investigates the implications of the proposed regulations on different automatic content moderation approaches such as flagging, blocking and filtering. The fuzzy nature of the risk categories causes major challenges for the risk categorisation task and leaves room for future improvements of the proposed regulations.
Implications of the New Regulation Proposed by the European Commission on
Automatic Content Moderation
Vera Schmitt1, Veronika Solopova2, Vinicius Woloszyn1, Jessica de Jesus de Pinho Pinhal1
1Technische Universit¨
at Berlin, Germany
2Freie Universit¨
at Berlin, Germany,,
In April 2021 the European Commission (EC) proposed a new
regulation to establish a regulatory structure for the risk assess-
ment of Artificial Intelligence (AI) systems and applications.
The intended goal of initiating a harmonised legal framework
for the European Union (EU) poses new challenges in develop-
ing countermeasures for hate speech and fake news detection.
This analysis investigates the implications of the proposed reg-
ulations on different automatic content moderation approaches
such as flagging, blocking and filtering. The fuzzy nature of
the risk categories causes major challenges for the risk cate-
gorisation task and leaves room for future improvements of the
proposed regulations.
1. Introduction
Fake news and hate speech have been around for centuries but
the emergence of the internet and social platforms has facili-
tated the spread of false information and hate speech globally
[1]. Whereas the drivers of hate speech are multifaceted and
range from personal insults to politically motivated spread of
certain ideologies [2, 3], one of the most important motiva-
tions for the distribution of false information is financial gain
[4] as well as the influence on elections [5]. Hereby, the profit
comes primarily from advertisement services such as Google’s
AdSense. Moreover, the spread of false information created a
more profound concern that the prevalence of fake news has
increased political polarisation, undermined democracy and de-
creased trust in public institutions [6, 7].
The impact of hate speech is less clear as the desensitis-
ing impact of the frequent exposure to online hate speech on
bystanders fuels prejudice against the victimised groups and de-
creases public sympathy towards them strengthening the victim-
blaming phenomenon [8, 9]. Moreover, social psychology ex-
plains the emergence of hate and prejudice with the concept of
out-groups as a potential existential and cultural threat. Thus,
the motivation is to defend and preserve the existing social
norms to keep the in-group itself intact [10]. Thus, both, hate
speech and false information, have major implications on social
and news platforms and online communication.
Over the past years, industry, governmental institutions and
civil society have worked on developing policies, automated
detection tools, and enforcement frameworks to tackle decep-
tive actors and false content online. Also, researchers have pro-
posed techno-centric solutions to detect fake news [11] and hate
speech content [12, 13, 14]. Although AI has evolved drasti-
cally over the past years, it is still prone to errors [15, 16], at-
tacks [17, 18, 19], and biases [20, 21]. For example, [18] has
shown that state-of-the-art technologies are vulnerable to sim-
ple adversarial attacks such as character-swap-based methods
(e.g. changing ”Trump” to ”Trupm”). In response, in 2019, the
EU released a report about automatic moderation of content on
the internet which discourages countermeasures being purely
based on AI without any human supervision. Further regula-
tions for tackling fake news and hate speech were proposed in
September 2020. The EU Commission announced their aim to
expand the list of crimes to cover hate speech on the grounds
of race, religion and national or ethnic origin. Later this year,
the EC planned to release a common definition of hate speech
even though some member states have expressed concerns that
cultural differences may result in a threat to the right of freedom
of expression under a common definition [22].
More recently, the EU has proposed the first legal frame-
work on AI in April 2021 [23]. The proposal intends to create
a uniform legal framework for AI within the EU. Here, safety-
critical applications are addressed by a risk-based approach in
order to classify AI systems and applications into four different
risk categories. Each category has its own set of obligations
which the providers of AI systems have to follow to protect
users’ fundamental rights. However, the categorization proce-
dure is not a very transparent task as the risk categories are not
distinct and there is no independent entity defined which super-
vises the classification procedure. Therefore, the contribution of
this paper is the analysis of the applicability and consequences
of the proposed regulations on AI in the domain of automatic
content moderation (ACM) to verify the feasibility of the pro-
posed risk-based approach.
2. Terminology and legal background
In the following, the terms used in this analysis will be defined
and an overview of the existing ACM methods will be given.
2.1. Legal definition of fake news and hate speech
The definition of fake news and hate speech is a non-trivial task
as both phenomena depend on each country’s constitutional and
legal structure, culture, political situation and level of public
awareness on the problem of fake news and hate speech. Espe-
cially, the definition of fake news is challenging as fake news
is often situated within a grey area of political expression en-
compassing both mis- and disinformation [24]. Moreover, the
term is often abused to label opinions and information as fake
when they do not comply with their viewpoint. Therefore, it is
challenging to find appropriate countermeasures to tackle fake
news while protecting fundamental rights, including freedom of
expression (Article 11, defined in EU Charter of Fundamental
Rights), data protection (Articles 7 and 8) and media pluralism.
Broadly, the term fake news can be distinguished between
disinformation and misinformation whereas the latter is unin-
2021 ISCA Symposium on Security and Privacy in Speech Communication
10-12 November 2021, Online
47 10.21437/SPSC.2021-10
tentionally false and inaccurate information shared by private
persons [25]. In line with the EC High-Level Expert Group
(HLEG), the term disinformation is defined as ”verifiable false
or misleading information that is created, presented and dis-
seminated for economic gain or to intentionally deceive the
public, and may cause public harm” [26]. This definition is
taken as the basis for further analysis.
The only binding instrument for hate speech is the Counter-
Racism Framework Decision [27] with obligations to make
racist and xenophobic speech punishable under criminal law
[28]. However, they do not consider online platforms and ac-
cording to the Treaty on the Functioning of the European Union,
online speech only falls into its legislative scope if it consti-
tutes terrorist or child sexual abuse content [29]. The EU Code
of Conduct on countering illegal hate speech online is a ’non-
binding’ document being voluntary for the online platforms to
sign. It presupposes self- and co-regulatory monitoring mea-
sures and routinely publishes statistical information on hate
speech successfully taken down on different platforms. Hate
speech and disinformation are mainly regulated through indi-
vidual regulation in the respective member states. For hate
speech, Austria, Czech Republic, France, Italy, Spain and Ger-
many passed specialized legislation to tackle hate speech online.
The legal frameworks address fake news only within France,
Spain and Germany. Germany’s Network Enforcement Act
(NetzDG) [30] states that service providers are responsible for
illegal content shared on their platforms and are obliged to re-
move it within 24 hours when receiving a complaint. Hereby,
disinformation and hate speech are both covered by the concept
of illegal content. Content is deemed illegal if it can be classi-
fied in one of the many offences listed in the Panel Code cover-
ing domains such as state security, public order, sexual freedom
and incitement to hatred or dissemination of unconstitutional
symbols or groups. NetzDG also inspired French ”Online Hate
Observatory” [31] and Austrian ”the Communications Platform
Law” or KoPl-G [32].
One positive outcome is that the EU and other countries
acknowledged that online disinformation and hate speech are
severe matters and need to be addressed with respect to human
rights. Even within the EU the approaches to tackle false in-
formation and hate speech online by different countries have
few similarities due to cultural context, political situation, le-
gal structure and the level of public awareness of the respective
member states. This emphasises how difficult it is to find a com-
mon ground even on the EU level. Therefore, the risk-based
approach is one promising direction to assess the harm an AI
system can cause on an individual level irrespective of cultural
differences and the legal context.
2.2. Automatic content moderation
Within machine learning (ML) approaches that are advancing
towards AI, ACM technologies are audio-visual and textual
analysis programs that are trained to moderate suspicious con-
tent online [24]. The recognition process itself is crucial to de-
tect hate speech and false information to assist human judge-
ment but does not imply any further action. Content recognition
is an important component for content moderation tasks but will
not be considered in the following analysis as the type of moder-
ation mainly applies to the infringement of human rights. Thus,
the focus is on ACM approaches and the classification and as-
sessment of the respective risk category. The main ACM ap-
proaches encompass filtering, blocking, (de)prioritisation, flag-
ging and disabling of spreaders which are described in the fol-
lowing [24].
Filtering or removing content is the most effective but also
the most invasive countermeasure to tackle disinformation and
hate speech. Hereby, filtering content online is an ex ante coun-
termeasure that providers adopt in order to cope with the upload
and posting of suspicious content. For this purpose, the content
is scanned prior to the upload or post. The YouTube Content ID
is one example that deploys ex ante filtering in order to protect
the copyright and to give owners the possibility to decide to ei-
ther block, monetise or track the video containing their work.
On the other hand, removing content is an ex post countermea-
sure which is often a reaction to user requests also known as
notice and action procedure [33].
Blocking of content is a widely used countermeasure that
can be applied by users, email providers, search engines and
social media platforms alike. Herewith, it differs from ex ante
filtering such that the content is not removed while only the
user’s access is blocked. Blocking can be applied both, ex ante
and ex post, where the user’s awareness and ability to review
the content and information provided is often a key component.
Blocking can be applied in manifold ways such as browser-
based blocking which is often used to block advertisements or
cookies [24].
(De)prioritisation includes internal algorithmic down-
ranking of user’s content (e.g. shadow banning), reduc-
tion of advertising and prominence of content in users’ feeds
(e.g. Facebook’s options to hide certain ads), demonetisation
(Youtube and Twitch policy against keywords in the videos),
and certain internet protocols (e.g. P2P). (De)prioritisation in
the context of disinformation gives less prominence to content
that contains false information, e.g. when content from me-
dia organisations or fact-checkers is given preference or shown
next to false information [24]. For instance, deprioritisation is
often used by platforms to tackle false information concerning
COVID19 and vaccine hesitancy.
Flagging of content can be understood as the process of
reporting doubtful and insulting content by other users, trusted
flaggers, moderators and algorithms to the system which issues
the flag after having verified the validity of the content. Another
approach is visual tagging or blurring of the content which is
potentially harmful. Facebook, Twitter and Youtube have both,
machine and human-driven flagging systems, implemented at
the core of their content moderation policies.
Disabling and suspension of accounts are temporary and
permanent solutions to deal with users’ abuse of terms of ser-
vice and legislation (e.g. email providers, social media plat-
forms, cloud services, and multi-player games). This is known
as jamming and also applies to public and private groups and
is often applied on Reddit and Facebook. The process is usu-
ally gradual with users first receiving Appeals and warnings.
However, the reaction becomes more punitive and permanent
reflecting the severity, frequency and persistence of violation.
Despite many propositions against using AI to tackle false
narratives and hate speech online, there is also an increasing
concern about the potential risks of automatic moderation of
content. For example, in a recent study from the European
Union Parliament [24], the trade-offs of using AI are exam-
ined. The authors raise concerns about the techno-centric solu-
tions that propose automated detection, (de)prioritization, and
removal by online intermediaries without human intervention.
Moreover, they suggest that more independent, transparent, and
effective appeals and oversight mechanisms are necessary to
minimize the inevitable inaccuracies of AI. In the following,
we will verify whether these advancements are addressed in the
risk-based approach of the proposal to regulate AI applications.
3. Risk assessment of countermeasures for
fake news and hate speech detection
Although the opportunities and potential benefits of AI are man-
ifold, certain AI systems lead to potential harm and infringe-
ment of rights, especially in the domains of recruitment, ed-
ucation, healthcare and law enforcement. Therefore, the EC
formulated a proposal for a new regulatory framework on AI.
This framework adopts a human-centric approach with the aim
to facilitate the development of AI which ensures the protec-
tion of fundamental rights and user safety as well as trust and
transparency. In order to adequately assess the influence of the
proposed regulations on ACM, the following section introduces
the main principles of the regulations and the impact on ACM
concerning hate speech and fake news detection.
3.1. Proposal of harmonized rules on AI
The proposed regulatory framework applies to AI applications
of both, the public and the private sector, for all systems placed
on the EU market or in case EU citizens are affected. Here-
with, it aims to provide guidance for AI developers, deploy-
ers and users alike by defining clear requirements and obliga-
tions regarding specific uses of AI systems. The risk-based ap-
proach has been chosen after extensive consultation with mul-
tiple stakeholders such as the High-Level Expert Group on
AI. Hereby, the risk-based approach recognizes the benefits
and potential of AI but at the same time also addresses pos-
sible dangers and risks of new AI applications and systems.
Within the regulation, a broad definition of AI is given in Ar-
ticle 3 (Definitions) which includes any AI system generating
outputs such as content, predictions, recommendations or de-
cisions influencing the environment they interact with. More
concretely, it applies to ML components, including supervised,
unsupervised, reinforcement and deep learning but also logic-
and knowledge-based approaches related to inductive logic pro-
gramming, knowledge representation, inference and deductive
engines. Furthermore, it also addresses statistical approaches
such as Bayesian estimation and search optimization methods
to fall under the definition of AI [34]. Therefore, the previously
mentioned concepts of disinformation and hate speech fall into
the broad definition of AI of the proposed regulation by the EC.
3.2. Risk-based approach
For the assessment of AI systems, a risk-based approach has
been developed including four levels of risk:
1. Unacceptable risk includes all AI systems that pose a clear
threat to the safety, rights and livelihoods of people. AI sys-
tems, ranging from social scoring by governments to toys that
use voice assistance encouraging dangerous behaviour, will be
banned from the European market.
2. High-risk AI systems falling into this category map to one
of the following application areas: critical infrastructure with
the potential to put the life and health of citizens at risk, educa-
tional or vocational training that might determine the access to
education, safety components of products such as robot-assisted
surgery, employment including CV sorting software for recruit-
ment procedure, essential private and public services e.g. credit
scoring systems, law enforcement interfering with humans’ fun-
damental rights including the reliability of evidence, migra-
tion, asylum and border control including verification of the
authenticity of travel documents and administration of justice
and democratic process. Moreover, all remote biometric identi-
fication systems fall into the high-risk class and are also subject
to strict obligations before they are allowed to be put on the
3. Limited risk include AI systems that require specific trans-
parency obligations. One example mentioned in the proposal
are chat-bots where it is required to show transparently whether
the user is communicating with a bot or a human such that the
users have the opportunity to make an informed choice to con-
tinue or stop their activities.
4. Minimal risk according to the proposal, most AI systems
fall into the category of minimal risk. Given examples range
from AI-enabled video games to spam filters.
3.3. Obligations
The different risk categories are related to certain obligations.
AI applications falling into the unacceptable-risk category are
prohibited. In case they map to the high-risk category, they
need to follow a list of obligations defined in Chapter 3 of
the proposal before they are put on the market. This list in-
cludes, among others, compliance with requirements defined in
Chapter 2. These demand the development of a risk manage-
ment system, a data governance structure, technical documen-
tation, record-keeping in the development phase, transparency
and provisioning of information to users, human oversight, and
accuracy, robustness and cybersecurity measures. Furthermore,
the obligations address the development of a quality manage-
ment system, technical documentation of high-risk systems and
apply conformity assessment and compliance with registration
obligations defined in Article 51. Moreover, service providers
of high-risk AI need to collaborate with national competent au-
thorities and demonstrate conformity with requirements defined
in Chapter 2 of the proposal by affix CE marking to AI systems
to indicate the conformity with this regulation in accordance to
Article 49. Obligations defined for AI applications falling into
the category of limited risk need to follow only four obligations
defined in Title IV, Article 52, namely: AI applications inter-
acting with natural persons or emotion recognition systems are
obliged to inform the user that they interact with an AI system.
Also, users of AI systems that generate or manipulate images,
audio or video content (deep fakes) must be informed about the
artificial generation or manipulation of the displayed content.
3.4. Impact of risk categories on ACM
Applying the risk-based approach on the different ACM meth-
ods is not a straightforward method as the different categories
overlap and the examples given in Annex III of the proposal are
not very concrete.
The risk categorisation of filtering can be assigned to lim-
ited risk when filtering is applied ex ante and the content cannot
be shown to potential users or consumers. Thus, transparency
obligations need to be followed to communicate the flagging
procedure as comprehensibly as possible. For ex post removing
of content according to notice and action procedure, the reasons
for the removal must be clearly stated and, therefore, it falls at
least under the category of limited risk with transparency obli-
gations. Nonetheless, removing content can also fall into the
high-risk category when algorithms or humans make false re-
movals and, thus, affect freedom of expression. In such scenar-
ios removing content does fall into the high-risk category and
needs to follow the obligations provided in Chapter 3 of the
proposal. Similarly, blocking can be applied to different risk
categories depending on the scenario. When users deploy ad
blockers, it certainly does not contain any of the proposed risk
categorisations. But when search engine filters prevent access
to certain content, as with Google’s rules against hate speech,
blocking falls into the high-risk categorisation as it can nega-
tively affect media pluralism and free speech. Thus, the blocked
content need to be assessed carefully.
Algorithmic (de)prioritisation is key to user experiences
that search engines and social media platforms offer, yet it is
not straightforward in its implications for media pluralism and
freedom of expression [24]. Hence, (de)prioritisation can be
mapped to the limited risk category with transparency obliga-
tions. Nowadays, many social media platforms are used as a
primary income source by bloggers and advertisers. For this
reason, it can lead to unjustified substantial revenue loss (de-
monetisation) if the algorithm makes a mistake. It is vital to
transparently communicate if the content (de)prioritisation has
been initialised by a human or machine and the reasons behind
the decisions, especially when negative effects on the users can
be expected. More clarity can be achieved by the risk cate-
gorisation of flagging.Flagging can be assigned to the risk
category limited risk with transparency obligations as it does
not block or remove any content and, therefore, does not affect
fundamental rights such as freedom of expression. Neverthe-
less, users should be able to easily find out why the content
got flagged and if the message was flagged by a human or by
a machine. The ACM method of disabling spreaders can be
clearly assigned to the high-risk category which needs to follow
the obligations stated in Chapter 3 of the proposal. The coun-
termeasure of disabling has significant implications on funda-
mental human rights such as freedom of expression, freedom of
assembly [35] and the democratic process.
Overall, we argue that the ACM methods
(de)prioritisation and flagging can be mapped clearly
onto the limited risk category with transparency obligations.
Also, for the case of disabling users to participate further
on a platform, the high-risk category can be applied. For the
ACM methods blocking and filtering, the risk categorisation is
not so clear as it depends heavily on the scenario. Moreover,
considering the current performance of automated systems,
there is an obvious need for human oversight. This is addressed
by the proposed regulation by the obligations listed in Chapter
3of the proposal which demands, among others, human
oversight, transparency and provisioning of robustness and
cybersecurity for high-risk AI systems. According to Article
54 [23], regulatory sandboxes should provide a controlled
environment facilitating the development, testing and valida-
tion of innovative AI systems for a limited time. Regulatory
sandboxes can be useful to test further AI applications for
different ACM methods. However, when affecting fundamental
rights and being prone to biased results, they need to be tested
under direct supervision and guidance by competent authorities
defined in Chapter 4 of the proposal before they can be applied
in a broader scope.
4. Discussion
Risk management, human oversight and ex ante testing should
facilitate the respect of fundamental rights by minimising the
risk of erroneous or biased AI-assisted decisions in critical ar-
eas such as education and important services. In case of the in-
fringement of fundamental rights, the proposal on harmonised
rules for AI mentions effective redress for affected persons
made possible by ensuring transparency and traceability of AI
systems coupled with strong ex post controls. Yet, how these
strong ex post controls can look like for different areas of appli-
cation is not clear. Moreover, in Annex III of the proposal, social
media is not considered under the high-risk category. Hereby, it
is not clear whether ACM would also fall under the definition of
social media or if it can be analysed as an independent compo-
nent that can be applied in various forms. Moreover, the regula-
tion aims at protecting users but does not consider humans-in-
the-loop within these AI systems. Current working conditions
of human content moderation labourers are dangerous for their
mental well-being and health as found by [36]. Thus, a broader
perspective on users of such AI systems needs to be considered.
Another source of concern is the ex ante risk self-
assessment by providers of AI systems themselves and ex post
enforcement for high-risk AI. Considering the level of gener-
alisation of the definitions of the risk categories, they remain
very open for interpretation. This could lead to the situation
that most providers of AI systems aim to classify their applica-
tions and systems into the limited risk or minimal risk category
even though fundamental rights might be affected. Addition-
ally, bridging the gap between legal principles and technical
implementation is a major barrier to develop ethically aligned
AI systems. Major difficulties can be observed when the Gen-
eral Data Protection Directive was enforced, and yet, privacy
infringements can still be detected [37]. This is not necessarily
caused by bad intentions of developers but also results from the
difficulty of fuzzy regulations where no clear guidance can be
inferred in concrete scenarios. For example, ”training, valida-
tion and testing data sets should be sufficiently relevant, repre-
sentative, free of errors and complete in view of the intended
purpose of the system” (Recital 44), which is very difficult to
achieve. Basically, all the ACM methods and AI systems trained
on user data do not comply with this requirement.
Moreover, the very broad definition of AI, which classifies
most of the existing and also future software as AI the proposed
regulation would cover, might hinder future development of AI
systems that fall under the high-risk category and result in over-
regulation [34]. The new legislation might not cause an im-
provement of risky AI systems but pushing the development of
critical systems outside of EU borders and also its related ethi-
cal and legal problems [38]. Companies might be more willing
to develop their products and services in other countries where
legal constraints are less stringent or even absent. Technically,
the proposed regulations will be challenging to translate into
concrete guidelines. Furthermore it is expensive for service
providers and sometimes even internationally problematic. In
contrary, the challenge nowadays is no longer digital innovation
but the governance of the digital sphere and shaping of digital
sovereignty. These normative challenges have not been tackled
so far and the EU is not simply ahead; it has no competition
5. Conclusion
The analysis of the risk-based approach with respect to ACM
methods shows that the proposed regulations suffer from ma-
jor limitations. The regulation lacks clarity when fundamental
rights are affected and in which risk category different appli-
cations fall which will have a major impact on the ex ante risk
self-assessment of providers of AI. Nevertheless, the proposed
regulations are a significant step towards a digital constitution-
alism [39] where an infosphere [40] can create a space where
citizens may live and work better and more sustainable.
6. References
[1] J. Suler, “The online disinhibition effect,” Cyberpsychology & be-
havior : the impact of the Internet, multimedia and virtual reality
on behavior and society, vol. 7, pp. 321–6, 07 2004.
[2] E. Barendt, “What is the harm of hate speech?” Ethical Theory
and Moral Practice, vol. 22, no. 3, pp. 539–553, 2019.
[3] U. M. Ananthakrishnan and C. E. Tucker, “The drivers and virality
of hate speech online,” Available at SSRN 3793801, 2021.
[4] J. A. Braun and J. L. Eklund, “Fake news, real money: Ad tech
platforms, profit-driven hoaxes, and the business of journalism,
Digital Journalism, vol. 7, no. 1, pp. 1–21, 2019.
[5] I. S. Florence Davey-Attlee, “The fake news machine,” 2020. [On-
line]. Available:
[6] J. A. Tucker, A. Guess, P. Barber ´
a, C. Vaccari, A. Siegel,
S. Sanovich, D. Stukal, and B. Nyhan, “Social media, political po-
larization, and political disinformation: A review of the scientific
literature,” Political polarization, and political disinformation: a
review of the scientific literature (March 19, 2018), 2018.
[7] J. Allen, B. Howland, M. Mobius, D. Rothschild, and D. J. Watts,
“Evaluating the fake news problem at the scale of the information
ecosystem,” Science Advances, vol. 6, no. 14, p. eaay3539, 2020.
[8] W. Soral, M. Bilewicz, and M. Winiewski, “Exposure to hate
speech increases prejudice through desensitization,” Aggressive
Behavior, vol. 44, p. 136–146, 2018.
[9] L. Patterson, A. Allan, and D. Cross, “Adolescent bystander be-
havior in the school and online environments and the implications
for interventions targeting cyberbullying,Journal of School Vio-
lence, vol. 16, no. 4, pp. 361–375, Oct. 2017.
[10] D. Gadd, “Aggravating racism and elusive motivation,” British
Journal of Criminology, vol. 49, 11 2009.
[11] X. Zhou and R. Zafarani, “A survey of fake news: Fundamental
theories, detection methods, and opportunities,” ACM Computing
Surveys (CSUR), vol. 53, no. 5, pp. 1–40, 2020.
[12] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and
A. Mukherjee, “Hatexplain: A benchmark dataset for explainable
hate speech detection,” 2020.
[13] S. Frenda, B. Ghanem, M. Montes, and P. Rosso, “Online hate
speech against women: Automatic identification of misogyny and
sexism on twitter,Journal of Intelligent & Fuzzy Systems, vol. 36,
pp. 4743–4752, 05 2019.
[14] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, and N. Farra,
“Semeval-2019 task 6: Identifying and categorizing offensive lan-
guage in social media (offenseval),” 03 2019.
[15] T. Scheffler, V. Solopova, and M. Popa-Wyatt, “The telegram
chronicles of online harm,” 03 2021.
[16] “Ai incident database, 2021. [Online]. Available:
[17] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M. Bron-
stein, “Fake news detection on social media using geometric deep
learning,” arXiv preprint arXiv:1902.06673, 2019.
[18] Z. Zhou, H. Guan, M. M. Bhat, and J. Hsu, “Fake news detec-
tion via nlp is vulnerable to adversarial attacks,” arXiv preprint
arXiv:1901.09657, 2019.
[19] T. Le, S. Wang, and D. Lee, “Malcom: Generating malicious com-
ments to attack neural fake news detection models,arXiv preprint
arXiv:2009.01048, 2020.
[20] A. Arango, J. P ´
erez, and B. Poblete, “Hate speech detection is
not as easy as you may think: A closer look at model validation
(extended version),Information Systems, p. 101584, 2020.
[21] K. Shu, S. Wang, and H. Liu, “Beyond news contents: The role
of social context for fake news detection,” in Proceedings of the
twelfth ACM international conference on web search and data
mining, 2019, pp. 312–320.
[22] C. Goujard, “Hate speech & hate crime – inclu-
sion on list of eu crimes,” 2021. [Online]. Available:
list-of-EU-crimes en
[23] “Proposal regulation: laying down harmonised
rules artificial intelligence,” 2021. [Online]. Avail-
[24] C. Marsden and T. Meyer, Regulating disinformation with artifi-
cial intelligence: effects of disinformation initiatives on freedom
of expression and media pluralism. European Parliament, 2019.
[25] C. Wardle and H. Derakhshan, “Information disorder: Toward
an interdisciplinary framework for research and policy making,
Council of Europe report, vol. 27, pp. 1–107, 2017.
[26] H. L. E. G. on Fake News and O. Disinformation,
“Report to the european commission on a multi-dimensional
approach to disinformation,” 2018. [Online]. Available:
[27] Council of European Union, “Council framework decision
(EU) no 913/jha,” 2008. [Online]. Available: https://eur-
[28] “Tackling disinformation and online hate speech: Eu and member
state approaches, so far,” Democracy Reporting International,
2020. [Online]. Available:
[29] T. E. Parliament, “Consolidated version of the treaty on
the functioning of the european union,part three: Union
policies and internal actions, title v: Area of free-
dom, security and justice, chapter 4: Judicial cooperation
in criminal matters, article 83,” 2008. [Online]. Available: 2008/art 83/oj
[30] Bundesministerium des Justiz und f ¨
ur Verbraucherschutz, “Gesetz
zur verbesserung der rechtsdurchsetzung in sozialen netzwerken
(netzwerkdurchsetzungsgesetz - netzdg),” 2017.
[31] L. C. sup´
erieur de l’audiovisuel, “D´
ecision n° 2020-435 du 8 juil-
let 2020 relative `
a la composition et aux missions de l’observatoire
de la haine en ligne,” 07 2020.
[32] Bundesministerium f ¨
ur Digitalisierung und Wirtschaftsstandort,
“Bundesgesetz ¨
uber maßnahmen zum schutz der nutzer auf kom-
munikationsplattformen,” 10 2020.
[33] R. Barnes, A. Cooper, O. Kolkman, D. Thaler, and E. Nordmark,
“Technical considerations for internet service blocking and
filtering,” Request for Comments (RFC), vol. 7754, 2016.
[Online]. Available:
[34] P. Glauner, “An assessment of the ai regulation proposed by the
european commission,” arXiv preprint arXiv:2105.15133, 2021.
[35] I. Siatitsa, “Freedom of assembly under attack: General and indis-
criminate surveillance and interference with internet communica-
tions,” International Review of the Red Cross, vol. 102, no. 913,
pp. 181–198, 2020.
[36] M. Steiger, T. J. Bharucha, S. Venkatagiri, M. J. Riedl, and
M. Lease, “The psychological well-being of content moderators,”
in Proceedings of the 2021 CHI Conference on Human Factors in
Computing Systems, CHI, vol. 21, 2021.
[37] M. Hatamian, “Engineering privacy in smartphone apps: A tech-
nical guideline catalog for app developers,IEEE Access, vol. 8,
pp. 35 429–35445, 2020.
[38] L. Floridi, “The european legislation on ai: a brief analysis of
its philosophical approach,” Philosophy & Technology, pp. 1–8,
[39] G. De Gregorio, “The rise of digital constitutionalism in the euro-
pean union,” International Journal of Constitutional Law, vol. 19,
no. 1, pp. 41–70, 2021.
[40] L. Floridi, The fourth revolution: How the infosphere is reshaping
human reality. OUP Oxford, 2014.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability metrics like model plausibility and faithfulness. We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities. We have made our code and dataset public for other researchers.
Full-text available
On 21 April 2021, the European Commission published the proposal of the new EU Artificial Intelligence Act (AIA) — one of the most influential steps taken so far to regulate AI internationally. This article highlights some foundational aspects of the Act and analyses the philosophy behind its proposal.
Full-text available
In April 2021, the European Commission published a proposed regulation on AI. It intends to create a uniform legal framework for AI within the European Union (EU). In this chapter, we analyze and assess the proposal. We show that the proposed regulation is actually not needed due to existing regulations. We also argue that the proposal clearly poses the risk of overregulation. As a consequence, this would make the use or development of AI applications in safety-critical application areas, such as in healthcare, almost impossible in the EU. This would also likely further strengthen Chinese and US corporations in their technology leadership. Our assessment is based on the oral evidence we gave in May 2021 to the joint session of the European Union affairs committees of the German federal parliament and the French National Assembly.
Full-text available
Harmful and dangerous language is frequent in social media, in particular in spaces which are considered anonymous and/or allow free participation. In this paper, we analyse the language in a Telegram channel populated by followers of Donald Trump, in order to identify the ways in which harmful language is used to create a specific narrative in a group of mostly like-minded discussants. Our research has several aims. First, we create an extended taxonomy of potentially harmful language that includes not only hate speech and direct insults, but also more indirect ways of poisoning online discourse, such as divisive speech and the glorification of violence. We apply this taxonomy to a large portion of the corpus. Our data gives empirical evidence for harmful speech such as in/out-group divisive language and the use of codes within certain communities which have not often been investigated before. Second, we compare our manual annotations to several automatic methods of classifying hate speech and offensive language, namely list based and machine learning based approaches. We find that the Telegram data set still poses particular challenges for these automatic methods. Finally, we argue for the value of studying such naturally occurring, coherent data sets for research on online harm and how to address it in linguistics and philosophy.
Full-text available
The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news detection and intervention. This survey reviews and evaluates methods that can detect fake news from four perspectives: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its source. The survey also highlights some potential research tasks based on the review. In particular, we identify and detail related fundamental theories across various disciplines to encourage interdisciplinary research on fake news. We hope this survey can facilitate collaborative efforts among experts in computer and information sciences, social sciences, political science, and journalism to research fake news, where such efforts can lead to fake news detection that is not only efficient but more importantly, explainable.
Every day across the world, as people assemble, demonstrate and protest, their pictures, their messages, tweets and other personal information are amassed without adequate justification. Arguing that they do so in order to protect assemblies, governments deploy a wide array of measures, including facial recognition, fake mobile towers and internet shutdowns. These measures are primarily analyzed as interferences with the right to privacy and freedom of expression, but it is argued here that protest and other assembly surveillance should also be understood as an infringement of freedom of assembly. This is necessary not only to preserve the distinct nature of freedom of assembly that protects collective action, but also to allow for better regulation of surveillance and interference with internet communications during assemblies.
In the last twenty years, the EU policy in the field of digital technologies has shifted from a liberal economic perspective to a constitutional-based approach. The development of digital technologies has not only challenged the protection of individuals' fundamental rights such as freedom of expression and data protection. Even more importantly, this new technological framework has also empowered transnational corporations operating in the digital environment as hosting providers to perform quasi-public functions in the transnational context. These two drivers have led the Union to enter into a new phase of modern constitutionalism (ie digital constitutionalism). This evolution is described by three constitutional phases: digital liberalism, judicial activism and digital constitutionalism. At the end of the last century, the Union adopted a liberal approach. A strict regulation of the online environment would have damaged the growth of the internal market, exactly when new technologies were going to revolutionize the entire society and promising new opportunities for the internal marked. The end of this first season was the result of the emergence of the Nice Charter as a bill of rights and new challenges raised by private actors in the digital environment. In this phase, the ECJ has played a pivotal role in moving the EU standpoint from fundamental freedoms to fundamental rights. This second phase has only anticipated a new season of constitutionalism based on codifying of the ECJ's case law and limiting online platforms' powers within the framework of the Digital Single Market. The path of digital constitutionalism is still at the beginning. A fourth phase of EU constitutionalism is around the corner based on the extension of constitutional values beyond EU borders and the expression of a human-centric technological model in a global context.
Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets, most of them in English language. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods. Moreover, we design some baseline approaches to perform cross-lingual experiments, using English and Spanish datasets.