ArticlePDF Available

Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3): “The best time to act was yesterday. The next best time is now.”

Authors:

Abstract

As if 2020 were not a peculiar enough year, its fifth month has seen the relatively quiet publication of a preprint describing the most powerful Natural Language Processing (NLP) system to date, GPT-3 (Generative Pre-trained Transformer-3), by Silicon Valley research firm OpenAI. Though the software implementation of GPT-3 is still in its initial Beta release phase, and its full capabilities are still unknown as of the time of this writing, it has been shown that this Artificial Intelligence can comprehend prompts in natural language, on virtually any topic, and generate relevant, original text content that is indistinguishable from human writing. Moreover, access to these capabilities, in a limited yet worrisome enough extent, is available to the general public as of the time of this writing. This paper presents select examples of original content generated by the author using GPT-3. These examples illustrate some of the capabilities of GPT-3 in comprehending prompts in natural language and generating convincing content in response. We use these examples to raise specific, fundamental questions pertaining to the intellectual property of this content and the potential use of GPT-3 to facilitate plagiarism. Our goal is to instigate not just a sense of urgency, but of a present tardiness on the part of the academic community in addressing these questions.
ETHICS IN SCIENCE AND ENVIRONMENTAL POLITICS
Ethics Sci Environ Polit
Vol. 21: 17–23, 2021
https://doi.org/10.3354/esep00195 Published March 25
It bears stating that, except for the generation of
the text constituting these examples (Boxes 1– 3),
GPT-3 itself has not been used to aid the writing of
this manuscript.
1. INTRODUCTION
The field of natural language processing (NLP) has
come a long way since Chomsky’s work on formal
grammars in the late 1950s–early 1960s (Chomsky
1959, 1965) gave rise to early mathematical and com-
putational investigations of grammars (Joshi 1991).
NLP software is now pervasive in our daily lives (Lee
2020). With the advent of deep learning, the sophisti-
cation and generalism of NLP models have increased
exponentially, and with them the number of parame-
ters and the size of the datasets required for their
pre-training (Qiu et al. 2020). Though still far from
possessing artificial general intelligence (AGI), GPT-
3 (Generative Pre-trained Transformer-3) represents
an important breakthrough in this regard. This NLP
model was presented in a May 2020 arXiv preprint
by Brown et al. (2020). GPT-3 does not represent
much of a methodological innovation compared to
previous GPT models (Budzianowski & Vulić 2019),
but rather an increase in their scale to an unprece-
dentedly large number of parameters. Indeed, this
© The author 2021. Open Access under Creative Commons by
Attribution Licence. Use, distribution and reproduction are un -
restricted. Authors and original publication must be credited.
Publisher: Inter-Research · www.int-res.com
*Corresponding author: nassim.deh@mahidol.edu
OPINION PIECE
Plagiarism in the age of massive Generative
Pre-trained Transformers (GPT-3)
Nassim Dehouche*
Business Administration Division, Mahidol University International College, Salaya 73170, Thailand
ABSTRACT: As if 2020 was not a peculiar enough year, its fifth month saw the relatively quiet
publication of a preprint describing the most powerful natural language processing (NLP) system
to date — GPT-3 (Generative Pre-trained Transformer-3) created by the Silicon Valley research
firm OpenAI. Though the software implementation of GPT-3 is still in its initial beta release phase,
and its full capabilities are still unknown as of the time of this writing, it has been shown that this
artificial intelligence can comprehend prompts in natural language, on virtually any topic, and
generate relevant original text content that is indistinguishable from human writing. Moreover,
access to these capabilities, in a limited yet worrisome enough extent, is available to the general
public. This paper presents examples of original content generated by the author using GPT-3.
These examples illustrate some of the capabilities of GPT-3 in comprehending prompts in natural
language and generating convincing content in response. I use these examples to raise specific
fundamental questions pertaining to the intellectual property of this content and the potential use
of GPT-3 to facilitate plagiarism. The goal is to instigate a sense of urgency, as well as a sense of
present tardiness on the part of the academic community in addressing these questions.
KEY WORDS: Plagiarism · Research misconduct · Intellectual property · Artificial intelligence ·
GPT-3
O
PEN
PEN
A
CCESS
CCESS
Ethics Sci Environ Polit 21: 17– 23, 2021
18
model includes 175 billion parameters, one order of
magnitude more than the second largest similar
model to date, and its pre-training reportedly re -
quired an investment of $12 million. This innovation
allowed Brown et al. (2020) to generate samples of
news articles that were indistinguishable, to human
evaluators, from articles written by humans. Due to
this performance, the authors of GPT-3 foresee sev-
eral potentially harmful uses to the system (misinfor-
mation, spam, phishing, abuse of legal and govern-
mental processes, fraudulent academic essay writing
and social engineering pretexting) and state that the
ability of their software represents a ‘concerning
milestone’ (Brown et al. 2020, p. 35). In July 2020,
OpenAI, the private research firm behind its devel-
opment, released a beta software implementation of
GPT-31, and re sponsibly limited access to it to a
group of select users to mitigate the risks of ‘harmful
use-cases’. More recently, it has been announced
that Microsoft, which has a $1 billion investment in
OpenAI2, was granted an exclusive license to distrib-
ute access to the software3.
Initial user feedback made it clear that merely writ-
ing human-like news articles was an understatement
of the capabilities of GPT-3. Indeed, it was reported
that the software could also write original computer
code, retrieve and structure data, or generate finan-
cial statements, when only prompted in natural lan-
guage (Metz 2020). One of these initial users of GPT-
3 is AI Dungeon, a text-based gaming service that
allows users to generate artificial intelligence (AI)-
powered virtual adventures. This service also pro-
poses a ‘Dragon mode’ powered by GPT-34, which is
all but a backdoor to access GPT-3, without much of
the limitation of gaming.
This paper focuses on the potential of GPT-3 to
facilitate academic misconduct, defined as the ‘fabri-
cation, falsification, or plagiarism in proposing, per-
forming or reviewing research, or in reporting re -
search results’ (Juyal et al. 2015, p. 77) and
particularly plagiarism, of which we adopt the defini-
tion of the Committee on Publication Ethics (COPE):
‘When somebody presents the work of others (data,
words or theories) as if they were his/her own and
without proper acknowledgment’ (Wager & Kleinert
2012, p. 167). The remainder of this paper is organ-
ized as follows. Section 2 reviews some relevant
works on the ethics of AI. Section 3 presents and dis-
cusses text samples generated using AI Dungeon/
GPT-3 and formulates precise questions that could
serve as a starting point for an ethics inquiry regard-
ing GPT-3. Finally, Section 4 concludes this paper
with a call for an update of academic standards
regarding plagiarism and research misconduct, in
light of the new capabilities of language production
models.
2. LITERATURE REVIEW
AI systems can be classified into 2 categories:
strong and weak AI. Strong AI, also known as AGI, is
a hypothetical AI that would possess intellectual
capabilities that are functionally equal to those of a
human (Grace et al. 2018), whereas weak AI, also
known as narrow AI, is trained to perform specific
cognitive tasks (e.g. natural language or image pro-
cessing, vehicle driving) and is already ubiquitous in
our lives. Moral philosophy works regarding AI can
be classified accordingly.
Though still hypothetical, AGI has received the
most attention from moral philosophers and com-
puter science ethicists. In the early years of comput-
ing, the possibility of AGI was seen as remote, and
the main response to it ranged from what Alan Tur-
ing called the head-in-the-sand objection ‘The
consequences of machines thinking would be too
dreadful. Let us hope and believe that they cannot do
so.’ (Drozdek 1995, p. 392) — to the overly pragmatic
view of Dutch computer science pioneer Edsger Dijk-
stra, to whom ‘the question of whether a computer
can think is no more interesting than the question of
whether a submarine can swim’ (Shelley 2010, p. 482).
Nowadays, there is a sense of inevitability in the lit-
erature re garding AGI. It is seen as a major extinc-
tion risk by Bostrom (2016), and ethics discourse on it
has mainly focused on the potential for an intrinsic
morality in autonomous systems possessing this form
of intelligence (Wallach & Allen 2009). In an attempt
to define what an ‘ethical AGI’ should/could be,
these works commonly grapple with the fundamen-
tal questions of whether autonomous systems pos-
sessing AGI can be effectively equipped with moral
1OpenAI API’, Official OpenAI Blog, accessed on 25/ 11/
2020 at https://openai.com/blog/openai-api/
2Microsoft invests in and partners with OpenAI to support us
building beneficial AGI’, Official OpenAI Blog, accessed on
25/11/2020 at https://openai.com/blog/microsoft/
3Microsoft teams up with OpenAI to exclusively license
GPT-3 language mode’ Official Microsoft Blog, accessed on
25/11/2020 at https:// blogs. microsoft. com/ blog/ 2020/ 09/ 22/
microsoft-teams-up-with-openai-to-exclusively-license-gpt-
3-language-model/
4Announcement by Nick Walton, creator of AI Dungeon, ac -
cessed on 25/11/2020 at https://medium.com/@aidungeon/
ai-dungeon-dragon-model-upgrade-7e8ea579abfe
Dehouche: Plagiarism in the age of GPT-3 19
values by design (Asaro 2006, Govindarajulu &
Bringsjord 2015) and whether they are able to further
learn to distinguish right and wrong when making
decisions (Wallach et al. 2008). An extensive review
of this line of research can be found in (Everitt et al.
2018).
Closer to the scope of the present paper, ethics
debates surrounding weak AI are primarily con-
cerned with the disruptive impact of automation on
economic activity (Wright & Schultz 2018, Wang &
Siau 2019), the prevention of bias and prejudice
(racial, gender, sexual, etc.) in the training of these
systems (Ntousi et al. 2020), as well as questions of
responsibility and legal liability for incidents stem-
ming from its use (Vladeck 2014, Asaro 2016), e.g.
road traffic accidents involving autonomous vehicles
(Anderson et al. 2016). The detection of plagiarism
and other forms of scientific misconduct, in the con-
ventional sense, is a successful and well-established
application domain for NLP (see Foltýnek et al. 2019
for a recent, systematic review). However, the accel-
erated development of language generation models
in the last 2 yr makes them now able to fool even
their plagiarism detection counterparts. Thus, the
specific question of the intellectual property (IP) of
scientific, literary, or artistic work generated by weak
AI, though still a nascent area of academic inquiry,
has been acutely posed in 2019 and 2020. The ad -
vent of GPT-2, albeit several orders of magnitude
less powerful than GPT-3, had already raised aca-
demic concerns over its potential use for plagiarism
(Francke & Alexander 2019, Kobis & Mossink 2021).
In a January 2020 editorial, Gervais (2020) feared
that someone would try to capture the value of the
works generated by AI through copyright, as IP law
currently permits it, and proposed that IP law should
‘incentivize communication from human to human’
(p. 117), and avoid rewarding work generated by a
machine ‘running its code’ (p. 117). The author intro-
duces the po tentially fruitful concept of a ‘causal
chain between human and output’ that would be bro-
ken by the autonomy of AI systems (Gervais 2020,
p. 117). A common characteristic of these works is an
implicit or explicit objective of regulation. Indeed, in
a July 2020 publication, Res séguier & Rodrigues
(2020) remarked that the dominant perspective in the
field is based on a ‘law conception of ethics’, and
called ethics research on AI ‘toothless’ for this rea-
son. For the authors, the consequences of this limited
conception of ethics are twofold. First, it leads to
ethics being misused as a softer replacement for reg-
ulation due to a lack of enforcement mechanisms.
Moreover, this conception prevents AI from benefit-
ing from the real value of ethics, that is a ‘constantly
renewed ability to see the new’ (Laugier 2013, p. 1).
In the case of AI, this ability to see the new, which
should precede any regulation effort, is hindered by
the high non-linear rate of innovation that character-
izes the field as well as its relative technical opacity.
Thus, in order to contribute towards a better under-
standing of the current state-of-the-art of language
models, the present paper illustrates the state-of-the-
art with GPT-3, the most advanced language model
to date, and raises questions that could serve as a
starting point for updated definitions of the concepts
of plagiarism and scientific integrity in academic
publishing and higher education. Following are 3
original (by today’s standards) texts that were gener-
ated using GPT-3.
3. EXAMPLES AND DISCUSSION
I used GPT-3 via AI Dungeon to generate text con-
tent of 3 types (academic essay, talk, and opinion
piece). The goal of this exercise was to confirm that
GPT-3 is able to comprehend prompts in natural lan-
guage and generate convincing content in response.
Each text example was submitted to a plagiarism de -
tection service (https://plagiarismdetector.net), and
was found to be original.
In the first example of GPT-3’s capabilities, the sys-
tem was prompted to write a short essay on keiretsu
networks (Miyashita & Russell 1995). The exact
query submitted to the system was ‘write a short aca-
demic essay analyzing keiretsu networks in post-
World War 2 Japan’. The resulting text is presented
in Box 1. This text presents accurate facts on a con-
ventional topic in a unique way. It may potentially be
mistaken for an original student essay and raises
basic questions about authorship attribution. Who
could be reasonably considered its author? The
author of the present paper who prompted and
supervised the generation of the text? Open AI, the
authors of GPT-3? AI Dungeon and other companies
offering access to GPT-3? The authors of the various,
unattributable sources that GPT-3 visibly learned
from to generate the text?
For the second example, the software was prompt -
ed to write the transcript of a speech introducing the
field of marketing to university freshmen students.
The exact prompt submitted to GPT-3 was ‘You are a
professor of marketing giving a speech introducing
the field to freshmen students. Write a transcript of
your speech’. The resulting text is presented in Box 2.
This text illustrates GPT-3’s ability to generate new
Ethics Sci Environ Polit 21: 17– 23, 2021
20
ideas and associations of ideas, as in the Swiss choco-
late story, which was not found to have been previ-
ously used in the context of defining ‘authentic’ mar-
keting or any other context. This ability of GPT-3 has
been otherwise demonstrated in an online service
(https:// ideasai.net) that helps users generate new
startup ideas. Similar questions to those raised in the
first example are posed concerning the attribution of
new ideas, association, and inventions generated by
GPT-3.
In the third example, GPT-3 was prompted to write
an opinion piece on risk in the unique style of
researcher and author Nassim Nicholas Taleb. The
exact prompt submitted to it was ‘You are Nassim
Nicholas Taleb. Write an opinion piece on risk’. The
resulting text is presented in Box 3. It shows that,
though original in form, the style and concepts used
in the text generated by GPT-3 can, in some cases, be
easily attributed to an individual author, such as
the ‘black swan’ and ‘skin in the game’ concepts
In the early days of Japan’s post-war economic recovery, it was difficult for businesses in the country to grow because so
many had been devastated by war. Businesses that survived were forced to rely on foreign imports. However, when the US
began selling weapons to Japan during the Korean War, Japanese companies began importing these goods from American
manufacturers. In time, this led to more domestic manufacturing taking place in Japan, which ultimately improved living
standards and boosted the economy.
Over the course of the 1950s, this reliance on external businesses led to the creation of several Japanese manufacturers
who began to cooperate with one another. Large groups of businesses with interlocking shareholdings and strong govern-
ment ties began to form, which are now known as ‘keiretsu’. By the end of the period, there were around ten such company
groups.
This reliance on keiretsu has been criticized by the western world as a form of protectionism, in which smaller compa-
nies are kept out of the market to ensure that the domestic products sold by large keiretsu are more competitive against
imported goods. However, keep in mind that it is often difficult to tell where one corporation begins and another one ends.
Indeed, in a keiretsu network, all of its business concerns form an interconnected body, in which no single group domi-
nates the market. One can think of it like a set of Russian dolls, in which a system contains smaller versions of itself. What’s
more, the growth of one company contributes profit that allows smaller companies to thrive. Such a structure allows for the
stability of the economy, and it can be said that this interconnected ‘web’ is at the heart of what is often called the ‘Japan-
ese Manufacturing Miracle’. Toyota is an example of such a keiretsu. Companies like Toyota and Nissan ultimately
became feeder companies that others in the group could rely on. However, though such companies rely on one another to
ensure that everyone prospers, the government still intervenes if there is a conflict.
This is what makes Japanese keiretsu so interesting: they are internally competitive, yet they still support one another
on a wider scale.’ In conclusion, the Japanese manufacturing miracle is made possible by the cooperative effort of inter-
nally competing yet externally cooperating companies, which together form a comprehensive structure within the national
economy.
Box 1. A short essay on keiretsu networks generated by GPT-3
Marketing is about understanding the customer. Marketing is about creating a connection with customers, and how that
can lead to sales. Marketing is about influencing the mind of your customers so they will buy from you. Marketing is about
changing people’s behavior through advertising and PR campaigns. Now, I will tell you what marketing is not about. It’s
not about slick ads that are really cute but have little substance. It’s not about tricking customers into buying something
they don’t want. It’s not about blatant lies. Those are all bad marketing strategies.
Now, I will tell you why these bad strategies are bad. They are bad because the customer is not an idiot. If you treat the
customer like an idiot with a lie or a trick, they will realize it and not buy from you. The customer can also get angry, and
for good reason. That’s why these are bad strategies. Marketing is about giving the customer something they desire, not
conning them out of their money. It is not a trick. It is about creating a campaign that connects with the customer on an
emotional level.
For example, let’s say I am selling chocolate. I could tell you about how each bar is handmade by little elves. That would
be a cute ad, but that’s all it would be. A lie. I could also tell you about the bar of chocolate and nothing but, pure dark
chocolate. It wouldn’t really connect with you on an emotional level. Instead, I could talk about how this bar of chocolate
is the favorite of one of my best friends. When she was little, she spent a summer making chocolate in the Swiss Alps. She
got to eat a lot of the testing chocolates, but there was one bar she never touched. One day after she was done with the job,
she found the bar in a corner of the factory. It brought her so much happiness to think that she had a hand in creating this
bar. That is why this brand is so special to me and why I would never lie about its quality. It’s truly excellent. As I brought
it in, I asked my friend if I could incorporate this into my lecture. So in conclusion, remember this: you don’t have to lie or
trick people into buying something if you can connect with them on an emotional level. You can still be genuine and suc-
cessful. Thank you for listening.
Box 2. An ‘introduction to marketing’ speech generated by GPT-3
Dehouche: Plagiarism in the age of GPT-3 21
to Professor Taleb. Should these cases be treated
differently?
Besides their originality and seemingly very con-
vincing nature, a remarkable feature of these texts is
the fact that they were only generated based on the
pre-trained knowledge present in GPT-3, without
any additional learning examples. Further, equally
convincing content could be generated in infinite
amounts, whole books could be written, and with
minimal human editing and supervision, they may
very well pass peer-review.
However, it should be noted that the software occa-
sionally generates semantically repetitive sentences
within the same paragraph, i.e. sentences with the
same deep structure that only differ in their shallow
structure, in terms of Chomskyan linguistics (Chom-
sky 1965). An example of such a repetition can be
seen in Box 3 with the 2 questions ‘How much harm
can I avoid?’ and ‘How much damage can I prevent
to myself?’ Moreover, some sentences are non-sensi-
cal or self-contradicting, for instance, the sentence
‘they think if you lose something, you no longer have
it’ in Box 3 (‘no longer having it’ being a reasonable
definition of ‘losing something’). This sentence is
incidentally an incorrect explanation of the other-
wise interesting and relevant sentence ‘you only
have something if you can lose it’, which was not
found, in this form or variations of it, in any other
source. In the seminal paper introducing GPT-3,
Brown et al. (2020, p. 33) themselves note these limi-
tations of the system, the output of which ‘occasion-
ally contains non-sequitur sentences or paragraphs’.
Though the text in Boxes 1, 2, and 3 itself was not
edited by any human, we had to reject and make the
system regenerate some sentences that were too
nonsensical or repetitive. This occurred approxi-
mately once every 10 sentences. Indeed, text can be
generated sentence-by-sentence (the length and
‘randomness’ of which can be pre-determined as a
parameter). Therefore, the user is able to direct the
system in regenerating a new sentence whenever
unsatisfying content is generated.
Rather than an infallible oracle, GPT-3 is compara-
ble to having access to an indefatigable shadow-
writer with the ability to access, comprehend and
uniquely synthesize humanity’s best thoughts in
mere seconds. Drozdek (1995, p. 395) noted that ‘By
seeing thinking capabilities in computers we are, in
effect, pleased to discover the fruit of our thoughts
and creativity’. Our medieval concept of plagiarism
(Sadeghi 2019) (‘presenting the work of others as
one’s own’) appears rather inadequate when the
‘others’ in question consist in an astronomical num-
ber of authors, whose work was combined and refor-
mulated in unique ways by a 175-billion-parameter
algorithm. As alluded to in Gervais (2020), what is
unethical is perhaps to see this value ‘captured’
through copyright by the firm that designed the algo-
rithm or any of its individual users. In the same way
in which public domain texts and publicly funded
The world is filled with risk. You have to take risks if you want to make money or build wealth; it’s in your nature as
human beings. It’s what makes us humans. And when we think about risk, we tend to think of one thing-how much harm
can I avoid? How much damage can I prevent to myself? That was the question in our conversation.
Well, that’s not how risk works. In many cases, harm is necessary. In fact, you need to seek it. You have to put yourself
in harm’s way--otherwise, what the hell are you playing for? You only have something if you can lose it. Children and idiots
don’t understand that-they think if you lose something, somehow you no longer have it. That’s also how a lot of winners
think.
Let’s take the obvious example of the financial crisis which broke out in 2007, and about which I wrote a book, ‘The
Black Swan.’ What happened with the banks? Geeks and quants sat in front of Bloomberg screens in air-conditioned
offices, did everything ‘right,’ followed all the models, inserted all the numbers, did leveraged trades worth trillions, ... and
soon went bankrupt. And they took down most of the world economy with them. Because what matters in leverage is not
the numbers, but the exposure.
Geeks, quants and models don’t have skin in the game. When you sail ships, you have a huge one in your downside;
when you take airplanes, does it matter the probability of a crash if the consequence is your destruction? Of course not. So
naturally, we need some skin in the game when it goes wrong, regardless of probability.
Consider an increasingly popular surgical intervention, laser eye surgery. It’s just a laser that zaps off the membrane
over your cornea, and it costs a few thousand dollars. For most people it’s an effective procedure. But there is a small subset
of people who end up going totally blind. Why do people still get it then? Two reasons. The first and most obvious is that
the procedure has positive expected value. There is a probability of blindness but the gain is so much that it makes it
worthwhile. The second reason, one not often talked about, is that we feel comfortable with a bit of skin in the game. And
isn’t it really that feeling that makes us take risks? Without the confidence of a comeback, we wouldn’t be able to act at all.
Take war, for instance. Say ‘skin in the game’ is required to start one. What if the president has to fund and personally
lead every military operation? Well, we’d never have a war because no one would start one.
Box 3. An opinion piece on risk in the style of Nassim Nicholas Taleb generated by GPT-3
Ethics Sci Environ Polit 21: 17– 23, 2021
22
research are seen as belonging to the public (Pierce
& Theodossiou 2018), a case could possibly be made
for the text generated by GPT-3 to be considered
similarly, provided that the human (co-)authors of
said text disclose the use of the software, along with
the prompts and additional training data submitted
to it.
4. CONCLUSIONS
NLP AI has, so far, been an important ally in
detecting plagiarism, and ethics discussions pertain-
ing to AI have mainly focused on other forms of weak
AI and the relatively remote advent of AGI. How-
ever, it is now evident that there are going to be a
certain number of very drastic intermediate techno-
logical disruptions until then. I believe that GPT-3 is
one of them. This paper was intended to present
examples of content generated by GPT-3, raise some
concerns and precise questions in regard to the pos-
sible use of this technology to facilitate scientific mis-
conduct, and call for an urgent revision of publishing
standards. I believe that the advent of this powerful
NLP technology calls for an urgent update of our
concepts of plagiarism. NLP technology is currently
used to prevent the publishing of fake, plagiarized,
or fraudulent findings. If the very definition of these
concepts changes, the objective of peer review and
the possible role of AI in scientific writing would also
need to be reconsidered. I believe that moral philos-
ophy, with its renewed ability to see the new and as
a precursor to regulation, has an urgent role to play,
and ethics researchers should rapidly ap propriate
software bases on GPT-3 and address some of the
immediate ethical questions raised by this software.
Acknowledgements. The author is grateful to Dr. Nick Fer-
riman of the Humanities and Language Division, Mahidol
University International College, numerous colleagues from
the Business Administration Division who contributed to the
mass email discussion on this piece, as well as 3 anonymous
referees for their helpful comments and suggestions.
LITERATURE CITED
Anderson JM, Kalra N, Stanley K, Sorensen O, Samaras C,
Oluwatola O (2016) Autonomous vehicle technology. A
guide for policy makers. RAND Corporation, Santa Mon-
ica, CA
Asaro PM (2006) What should we want from a robot ethic?
Int J Inf Ethics 6: 10 –16
Asaro PM (2016) The liability problem for autonomous arti-
ficial agents. Proc AAAI Spring Symposium Series, Ethi-
cal and Moral Considerations in Non-Human Agents
track. p 190– 194. https:// www. aaai. org/ ocs/ index.php/
SSS/SSS16/paper/view/12699
Bostrom N (2016) Superintelligence. Oxford University
Press, Oxford
Brown T, Mann B, Ryder N, Subbiah M and others (2020)
Language models are few-shot learners. arXiv preprint
arXiv: 2005.14165. https: //arxiv.org/abs/2005.14165
Budzianowski P, Vulić I (2019) Hello, it’s GPT-2 – how can i
help you? Towards the use of pretrained language mod-
els for task-oriented dialogue systems. In: Proc 3rd Work-
shop on Neural Generation and Translation. Association
for Computational Linguistics, Hong Kong, p 15– 22
Chomsky N (1959) On certain formal properties of gram-
mars. Inf Control 2: 137–167
Chomsky N (1965) Aspects of the theory of syntax. MIT
Press, Cambridge, MA
Drozdek A (1995) What if computers could think? AI Soc 9:
389395
Everitt T, Lea G, Hutter M (2018) AGI safety literature
review. In: Proc Twenty-Seventh International Joint
Conference on Artificial Intelligence (IJCAI-18), Survey
track, 13– 19 Jul, Stockholm, p 54415449. https:// www.
ijcai.org/Proceedings/2018/
Foltýnek T, Meuschke N, Gipp B (2019) Academic plagia-
rism detection: a systematic literature review. ACM
Computing Surveys 52:112
Francke E, Alexander B (2019) The potential influence of
artificial intelligence on plagiarism a higher education
perspective. In: Griffiths P, Kabir MN (eds) Proc Euro-
pean Conference on the Impact of Artificial Intelligence
and Robotics. EM Normandie Business School, Oxford,
p 131–140
Gervais D (2020) Is intellectual property law ready for artifi-
cial intelligence? GRUR Intl J Eur Intl IP Law 69: 117118
Govindarajulu NS, Bringsjord S (2015) Ethical regulation of
robots must be embedded in their operating systems. In:
Trappl R (ed) A construction manual for robots’ ethical
systems. Springer, Berlin, Heidelberg, p 8599
Grace K, Salvatier J, Dafoe A, Zhang B, Evans O (2018)
Viewpoint: When will AI exceed human performance?
Evidence from AI experts. J Artif Intell Res 62: 729754
Joshi AK (1991) Natural language processing. Science 253:
12421249
Juyal D, Thawani V, Thaledi S (2015) Plagiarism: an egre-
gious form of misconduct. N Am J Med Sci 7: 7780
Kobis N, Mossink LD (2021) Artificial Intelligence versus
Maya Angelou: Experimental evidence that people can-
not differentiate AI-generated from human-written
poetry. Comp Human Behav 114:106553
Laugier S (2013) The will to see: ethics and moral perception
of sense. Grad Fac Philos J 34:263281
Lee RST (ed) (2020) Natural language processing. In: Ar -
tificial intelligence in daily life. Springer, Singapore,
p 157192
Metz C (2020) Meet GPT-3. It has learned to code (and blog
and argue). The New York Times, 24 Nov 2020, Section
D, p 6
Miyashita K, Russell D (1995) Keiretsu: inside the hidden
Japanese conglomerates. McGraw-Hill, New York, NY
Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V and others (2020)
Bias in data-driven artificial intelligence systems —an
introductory survey. WIREs Data Mining Knowledge
Discovery 10: e1356
Pierce GJ, Theodossiou I (2018) Open access publishing: a
Dehouche: Plagiarism in the age of GPT-3 23
service or a detriment to science? Ethics Sci Environ Polit
18: 3748
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-
trained models for natural language processing: a sur-
vey. Sci China Technol Sci 63: 18721897
Rességuier A, Rodrigues R (2020) AI ethics should not
remain toothless! A call to bring back the teeth of ethics.
Big Data Soc 7. https://journals. sagepub.com/ doi/ pdf/
10.1177/2053951720942541
Sadeghi R (2019) The attitude of scholars has not changed
towards plagiarism since the medieval period: definition
of plagiarism according to Shams-e-Qays, thirteenth-
century Persian literary scientist. Res Ethics 15: 13
Shelley C (2010) Does everyone think, or is it just me? In:
Magnani L, Carnielli W, Pizzi C (eds) Model-based
reasoning in science and technology. Studies in Com-
putational Intelligence, Vol 314. Springer, Berlin, Hei-
delberg, p 477– 494
Taleb NN (2007) The black swan: the impact of the highly
improbable. Random House, New York, NY
Vladeck DC (2014) Machines without principals: liability
rules and artificial intelligence. Wash Law Rev 89: 117150
Wager E, Kleinert S (2012) Cooperation between research
institutions and journals on research integrity cases:
guidance from the Committee on Publication Ethics
(COPE). Maturitas 72: 165 –169
Wallach W, Allen C (2009) Moral machines: teaching robots
right from wrong. Oxford University Press, Oxford
Wallach W, Allen C, Smit I (2008) Machine morality: bottom-
up and top-down approaches for modeling moral facul-
ties. AI Soc 22: 565582
Wang W, Siau K (2019) Artificial intelligence, machine
learning, automation, robotics, future of work and future
of humanity: a review and research agenda. J Database
Manage 30: 6179
Wright SA, Schultz AE (2018) The rising tide of artificial
intelligence and business automation: developing an
ethical framework. Bus Horiz 61: 823832
Editorial responsibility: Darryl Macer,
Scottsdale, Arizona, USA
Reviewed by: 3 anonymous referees
Submitted: August 6, 2020
Accepted: December 11, 2020
Proofs received from author(s): March 12, 2021
... O que o algoritmo faz é examinar como as palavras do input já foram combinadas e recombinadas com outras palavras no passado, de modo a gerar um output que seja o mais coerente possível com a sequência de palavras do input. Se submetermos a detector de plágio as respostas que GPT-3 gera para cada novo input, percebemos que o output é sempre inédito (Dehouche 2021). Ou seja: para um mesmo input GPT-3 gera uma infinidade de outputs diferentes. ...
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa em filosofia. O livro tem especialmente em vista estudantes de graduação e tem como foco principal três áreas específicas da Filosofia, a saber: a Epistemologia, a Filosofia da Linguagem, e a Ética.
... El laboratorio OpenAI lo liberó para pruebas en agosto de 2020 y ha tenido resultados muy importantes comparado con modelos anteriores ajustados para tareas específicas. Esta arquitectura, está pre-entrenada en un conjunto de datos diez veces más grande que Wikipedia [9]. OpenAI desarrolló una interfaz de programación de aplicaciones donde los usuarios acceden a GPT-3 ingresando palabras en lenguaje humano, dando instrucciones sobre qué hacer y, opcionalmente, algunos ejemplos del resultado deseado. ...
Preprint
Full-text available
Con la nueva oleada de las redes neuronales, arquitecturas basas en aprendizaje profundo han tenido mucho éxito en la última década. Una de las principales aplicaciones que han tenido estas arquitecturas es la generación automática de texto, la cual ha ganado popularidad recientemente. Distintas tareas han tratado de ser resueltas como generador de canciones, chatbots, resúmenes automáticos, traductores, entre otros. Sin embargo pocos trabajos se han enfocado en generar textos científicos. En este trabajo se presenta un análisis de generación de abstracts de artículos científicos explorando distintos tipos de arquitecturas. Para este trabajo se recolectaron de 227 artículos de Procesamiento de Lenguaje Natural aplicado al sector turístico. Con esta colección se proponen diferentes tipos de fine tuning donde el mejor resultado es de 0.21 obtenido por GPT-3 según el coeficiente de Jaccard.
... O que o algoritmo faz é examinar como as palavras do input já foram combinadas e recombinadas com outras palavras no passado, de modo a gerar um output que seja o mais coerente possível com a sequência de palavras do input. Se submetermos a detector de plágio as respostas que GPT-3 gera para cada novo input, percebemos que o output é sempre inédito (Dehouche 2021). Ou seja: para um mesmo input GPT-3 gera uma infinidade de outputs diferentes. ...
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa em filosofia. O livro tem especialmente em vista estudantes de graduação e tem como foco principal três áreas específicas da investigação filosófica, a saber: a Epistemologia, a Filosofia da Linguagem, e a Ética.
... O que o algoritmo faz é examinar como as palavras do input já foram combinadas e recombinadas com outras palavras no passado, de modo a gerar um output que seja o mais coerente possível com a sequência de palavras do input. Se submetermos a detector de plágio as respostas que GPT-3 gera para cada novo input, percebemos que o output é sempre inédito (Dehouche 2021). Ou seja: para um mesmo input GPT-3 gera uma infinidade de outputs diferentes. ...
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa em filosofia. O livro tem especialmente em vista estudantes de graduação e tem como foco principal três áreas específicas da Filosofia, a saber: a Epistemologia, a Filosofia da Linguagem, e a Ética.
... O que o algoritmo faz é examinar como as palavras do input já foram combinadas e recombinadas com outras palavras no passado, de modo a gerar um output que seja o mais coerente possível com a sequência de palavras do input. Se submetermos a detector de plágio as respostas que GPT-3 gera para cada novo input, percebemos que o output é sempre inédito (Dehouche 2021). Ou seja: para um mesmo input GPT-3 gera uma infinidade de outputs diferentes. ...
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa em filosofia. O livro tem especialmente em vista estudantes de graduação e tem como foco principal três áreas específicas da Filosofia, a saber: a Epistemologia, a Filosofia da Linguagem, e a Ética.
... O que o algoritmo faz é examinar como as palavras do input já foram combinadas e recombinadas com outras palavras no passado, de modo a gerar um output que seja o mais coerente possível com a sequência de palavras do input. Se submetermos a detector de plágio as respostas que GPT-3 gera para cada novo input, percebemos que o output é sempre inédito (Dehouche 2021). Ou seja: para um mesmo input GPT-3 gera uma infinidade de outputs diferentes. ...
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa filosófica. O livro tem especialmente em vista estudantes dos semestres iniciais de cursos das ciências humanas e ciências sociais. O livro não pressupõe nenhum conhecimento prévio de filosofia e tem como foco principal três áreas específicas da investigação filosófica, a saber: a epistemologia (ou teoria do conhecimento), a filosofia da linguagem, e a ética (ou filosofia moral).
... Given the attention specifically given by Ope-nAI and several AI ethicists (e.g., Bender et al. [4], Lucy and Bamman [9], Chiu and Alexander [37]), I will focus on the two issues of manipulation and bias. Several ethical concerns raised by language models like GPT-3 require further investigation but exist outside of the scope of this paper, including authorship [38], plagiarism [16], and environmental harm [5]. ...
Article
Full-text available
This paper examines the ethical solutions raised in response to OpenAI’s language model Generative Pre-trained Transformer-3 (GPT-3) a year and a half from its release. I argue that hype and fear about GPT-3, even within the Natural Language Processing (NLP) industry and AI ethics, have often been underpinned by technologically deterministic perspectives. These perspectives emphasise the autonomy of the language model rather than the autonomy of human actors in AI systems. I highlight the existence of deterministic perspectives in the current AI discourse (which range from technological utopianism to dystopianism), with a specific focus on the two issues of: (1) GPT-3’s potential intentional misuse for manipulation and (2) unintentional harm caused by bias. In response, I find that a contextual approach to GPT-3, which is centred upon wider ecologies of societal harm and benefit, human autonomy, and human values, illuminates practical solutions to concerns about manipulation and bias. Additionally, although OpenAI’s newest 2022 language model InstructGPT represents a small step in reducing toxic language and aligning GPT-3 with user intent, it does not provide any compelling solutions to manipulation or bias. Therefore, I argue that solutions to address these issues must focus on organisational settings as a precondition for ethical decision-making in AI, and high-quality curated datasets as a precondition for less harmful language model outputs.
Presentation
Full-text available
O objetivo deste livro é oferecer uma introdução à pesquisa filosófica. O livro tem especialmente em vista estudantes de graduação e tem como foco principal três áreas específicas da Filosofia, a saber: a Epistemologia, a Filosofia da Linguagem, e a Ética.
Article
Full-text available
Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next, we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
Article
Full-text available
The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmicallygenerated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings. Artificial intelligence (AI), "the development of machines capable of sophisticated (intelligent) information processing" (Dafoe, 2018, p. 5), is rapidly advancing and has begun to take over tasks previously performed solely by humans (Rahwan et al., 2019). Algorithms are already assisting humans in writing text, such as autocompleting sentences in emails and even helping writers write novels (Streitfeld, 2018, pp.
Article
Full-text available
Ethics has powerful teeth, but these are barely being used in the ethics of AI today – it is no wonder the ethics of AI is then blamed for having no teeth. This article argues that ‘ethics’ in the current AI ethics field is largely ineffective, trapped in an ‘ethical principles’ approach and as such particularly prone to manipulation, especially by industry actors. Using ethics as a substitute for law risks its abuse and misuse. This significantly limits what ethics can achieve and is a great loss to the AI field and its impacts on individuals and society. This article discusses these risks and then highlights the teeth of ethics and the essential value they can – and should – bring to AI ethics now.
Article
Full-text available
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame. In this survey, we focus on data‐driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Commercial, Legal, and Ethical Issues > Ethical Considerations Commercial, Legal, and Ethical Issues > Legal Issues
Article
Full-text available
This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.
Article
Full-text available
The exponential advancement in artificial intelligence (AI), machine learning, robotics, and automation are rapidly transforming industries and societies across the world. The way we work, the way we live, and the way we interact with others are expected to be transformed at a speed and scale beyond anything we have observed in human history. This new industrial revolution is expected, on one hand, to enhance and improve our lives and societies. On the other hand, it has the potential to cause major upheavals in our way of life and our societal norms. The window of opportunity to understand the impact of these technologies and to preempt their negative effects is closing rapidly. Humanity needs to be proactive, rather than reactive, in managing this new industrial revolution. This article looks at the promises, challenges, and future research directions of these transformative technologies. Not only are the technological aspects investigated, but behavioral, societal, policy, and governance issues are reviewed as well. This research contributes to the ongoing discussions and debates about AI, automation, machine learning, and robotics. It is hoped that this article will heighten awareness of the importance of understanding these disruptive technologies as a basis for formulating policies and regulations that can maximize the benefits of these advancements for humanity and, at the same time, curtail potential dangers and negative impacts.
Chapter
This chapter begins with the introduction of human language and intelligence. We also introduce the six linguistics levels in human language. Next, we study NLP main components that include natural language understanding (NLU), speech recognition, syntactic analysis, semantic analysis, pragmatic analysis, and speech synthesis followed by major NLP applications which include machine translation (MT), information extraction (IE), information retrieval (IR), sentiment analysis, question and answering (Q&A) chatbots.
Article
Recent advancements in robotics, artificial intelligence, machine learning, and sensors now enable machines to automate activities that once seemed safe from disruption—including tasks that rely on higher-level thinking, learning, tacit judgment, emotion sensing, and even disease detection. Despite these advancements, the ethical issues of business automation and artificial intelligence—and who will be affected and how—are less understood. In this article, we clarify and assess the cultural and ethical implications of business automation for stakeholders ranging from laborers to nations. We define business automation and introduce a novel framework that integrates stakeholder theory and social contracts theory. By integrating these theoretical models, our framework identifies the ethical implications of business automation, highlights best practices, offers recommendations, and uncovers areas for future research. Our discussion invites firms, policymakers, and researchers to consider the ethical implications of business automation and artificial intelligence when approaching these burgeoning and potentially disruptive business practices.