NLP for Requirements Engineering:
Tasks, Techniques, Tools, and Technologies
University of Manchester
Al-Imam Muhammad ibn Saud Islamic University
Riyadh, Saudi Arabia
Abstract—Requirements engineering (RE) is one of the most
natural language-intensive ﬁelds within the software engineering
area. Therefore, several works have been developed across the
years to automate the analysis of natural language artifacts that
are relevant for RE, including requirements documents, but also
app reviews, privacy policies, and social media content related
to software products. Furthermore, the recent diffusion of game-
changing natural language processing (NLP) techniques and plat-
forms has also boosted the interest of RE researchers. However,
a reference framework to provide a holistic understanding of the
ﬁeld of NLP for RE is currently missing. Based on the results of a
recent systematic mapping study, and stemming from a previous
ICSE tutorial by one of the authors, this technical brieﬁng
gives an overview of NLP for RE tasks, available techniques,
supporting tools and NLP technologies. It is oriented to both
researchers and practitioners, and will gently guide the audience
towards a clearer view of how NLP can empower RE, providing
pointers to representative works and specialised tools.
I. TECHNICAL BRI EFI NG PRO PO SA L
a) Topic: The topic of the proposed technical brieﬁng is
natural language processing (NLP) for requirements engineer-
ing (RE) , . This is a specialised, yet highly multi-faceted
research topic that has received attention since the ’90s, with
seminal contributions mostly focused on generating models
from requirements , . Later, NLP has been applied to
automate the typical time-consuming RE tasks, such as defect
detection , requirements tracing , and categorization .
With the emerging of novel RE-relevant formats and sources,
such as user stories , but also app reviews  and privacy
policies , the ﬁeld has expanded its scope of application to
tackle issues speciﬁcally related to these types of artifacts. In
parallel, NLP, and artiﬁcial intelligence techniques in general,
have seen a tremendous breakthrough in the last decade after
the development of game-changing technologies generally
classiﬁed under the umbrella of deep learning . These two
trends have created further synergies between NLP and RE,
and a mapping study was performed by the proponents of this
technical brieﬁng to provide a holistic understanding of the
ﬁeld . The study analyses 404 papers on the topic, and
provides statistics on multiple relevant facets (e.g., documents
used, techniques, NLP tools, resources), giving not only a
report on the state-of-the-art, but also a conceptual reference
framework to make sense of current research in the ﬁeld, and
its possible evolution.
This technical brieﬁng stems from the results of the mapping
study, and aims to provide a blueprint of the landscape of
NLP for RE, in terms of (1) tasks that can be automated
(2) NLP-based techniques that are available to support the
tasks (3) tools that support the techniques, and (4) established
and novel NLP technologies that are relevant for RE and
can further advance the ﬁeld. In particular, part of the talk
will present the concept of transfer learning, and the BERT
language model . Transfer learning is based on reusing
pre-trained models to new and unseen problems, and this can
be particularly useful in RE, where datasets to train models
are particularly scarce . The speciﬁc merit of BERT is
its versatility to be used in different NLP downstream tasks
such as text classiﬁcation, question-answering and named-
entity recognition, which are useful to several RE tasks as
reported in the mapping study . For example, in the case
of requirements classiﬁcation, using a pre-trained model such
as BERT and ﬁne-tuning it with some labelled requirements
generates promising results, as shown by Hey et al. .
The topic of NLP for RE is particularly relevant and timely,
given the recent NLP developments not yet absorbed by the RE
community . It is also the right time for technology transfer,
as it has been observed that only 16% of companies use some
form of automation in requirements analysis , and only
7% of the works retrieved in the systematic mapping study
concern industrial applications, while a plethora of different
solution proposals is available (67% of the studies) .
b) Format: The technical brieﬁng consists of 90 minutes
of presentation, and is divided into three main parts. The
ﬁrst part is dedicated to provide an overview of the different
RE tasks that can be supported by NLP techniques, such
as traceability, defect detection, requirements classiﬁcation,
etc. This is followed by an overview of the results of the
mapping study, and representative contributions are presented
to illustrate available techniques and tools developed to support
each RE task. Particular relevance will be given to point
to enabling technologies and platforms from the NLP ﬁeld.
The mapping study has identiﬁed a clear lack of adoption of
cutting-edge NLP technologies in RE. Therefore, to trigger
interest and open to further development, the ﬁnal slot will
focus on the presentation of one speciﬁc NLP technique that
we consider of potential interest for both RE and software
engineering (SE) practitioners, namely the BERT model .
c) Interest for the Community: The ﬁrst author provided
a technical brieﬁng at ICSE’18 on the topic , and the
presentation received high interest from the audience—20
to 30 participants were present. While the previous brieﬁng
was mostly focused on consolidated NLP technologies that
could be used in daily RE practice, this new one has the
objective of giving a holistic view of the ﬁeld, also based on
empirical data from the mapping study, and considering recent
developments. This can be of interest to researchers willing to
address current gaps, but also to practitioners who want to
know where the ﬁeld is now, what are available tools, and
who are the reference experts to involve in research-industry
collaborations. Furthermore, we believe that the ﬁnal slot,
dedicated to the presentation of BERT, can raise the interest of
the whole SE community, as NLP has seen several applications
also in software testing (see, e.g., the recent mapping study
from Garousi et al. ), and maintenance , . Overall,
our goal is also to raise awareness about current research on
NLP for RE within the SE community, suggest NLP-centered
synergies, and pave the way for more integration between
different SE perspectives.
Alessio Ferrari is research scientist at CNR-ISTI (Con-
siglio Nazionale delle Ricerche - Istituto di Scienza e Tec-
nologia dell’Informazione was “A. Faedo”, Pisa, Italy -
http://www.isti.cnr.it), where he works since 2011. He received
his Ph. D. in Computer Engineering from the University of
Florence, Italy, in 2011. His current research interests are
applications of NLP techniques to RE, requirements elicita-
tion, and RE teaching. In particular, his main focus is natural
language ambiguity detection and mistakes identiﬁcation in re-
quirements elicitation interviews and requirements documents.
Ferrari participated in several European Projects, including
Learn PAd, ASTRail, and DESIRA. He is author of more than
70 papers in relevant conferences (RE, ICSE) and journals
(REJ, EMSE, IEEE Software). He served in the PC of ICSE,
IEEE RE, REFSQ, he has been co-organiser of two editions of
the NLP4RE workshop, and local organiser of REFSQ 2020.
Liping Zhao is Associate Professor in Department of
Computer Science, the University of Manchester. Her current
research focuses on using NLP and machine learning to sup-
port RE. From 2004 to 2014 she had been in collaboration with
IBM on Pattern Language for the Design and Development E-
business Applications and received three IBM Faculty Awards
(2004, 2005, and 2008) for her contributions. From 2007 to
2012, she co-founded and led a multidisciplinary academic
network in the UK on service science (SSMEnetUK), funded
by the UK Research Council, BT, HP, and IBM. She is
an Associate Editor for Requirements Engineering (Springer)
and Expert Systems (Wiley). She has served on numerous
conferences and workshops, and has been co-organiser of
IEEE International Workshops on Requirements Patterns, co-
located with RE Conference (from 2012 to 2016) and Inter-
national Workshop on Advances and Applications of Problem
Orientation (IWAAPO), co-located with ICSE (2010).
Waad Alhoshan is an Assistant Professor in the Department
of Computer Science, IMSIU. She received her PhD degree
in Computer Science in 2020 from the University of Manch-
ester, where she studied corpus-based and language modeling
techniques to investigate approaches for detecting semantic
relationships between software requirements. During her PhD,
Waad published several papers in peer-reviewed conferences
such as LREC, RE and ESEM. Currently, she is cooperating
on multiple research projects on designing NLP-based systems
to support software in Arabic and English languages. One of
these projects is a collaboration project between IMSIU and
the Saudi Authority for Intellectual Property (SAIP) to design
AI-driven systems for processing legal documents.
 A. Ferrari, F. Dell’Orletta, A. Esuli, V. Gervasi, and S. Gnesi, “Natural
language requirements processing: a 4D vision,” IEEE Software, vol. 34,
no. 6, pp. 28–35, 2017.
 L. Zhao, W. Alhoshan, A. Ferrari, K. J. Letsholo, M. A.
Ajagbe, E.-V. Chioasca, and R. T. Batista-Navarro, “Natural
Language Processing (NLP) for Requirements Engineering: A
Systematic Mapping Study,” 2020 (Under Review). [Online]. Available:
 C. Rolland and C. Proix, “A natural language approach for requirements
engineering,” in International Conference on Advanced Information
Systems Engineering. Springer, 1992, pp. 257–277.
 L. Mich, “NL-OOPS: from natural language to object oriented require-
ments using the natural language processing system LOLITA,” Natural
language engineering, vol. 2, no. 2, pp. 161–187, 1996.
 H. Femmer, D. M. Fern´
andez, S. Wagner, and S. Eder, “Rapid quality
assurance with requirements smells,” JSS, vol. 123, pp. 190–213, 2017.
 J. Guo, J. Cheng, and J. Cleland-Huang, “Semantically enhanced soft-
ware traceability using deep learning techniques,” in ICSE’17. IEEE,
2017, pp. 3–14.
 M. Binkhonain and L. Zhao, “A review of machine learning algorithms
for identiﬁcation and classiﬁcation of non-functional requirements,”
Expert Systems with Applications: X, vol. 1, p. 100001, 2019.
 M. Robeer, G. Lucassen, J. M. E. van der Werf, F. Dalpiaz, and
S. Brinkkemper, “Automated extraction of conceptual models from user
stories via nlp,” in RE’16. IEEE, 2016, pp. 196–205.
 W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, “A survey of
app store analysis for software engineering,” TSE, vol. 43, no. 9, pp.
 D. Torre, S. Abualhaija, M. Sabetzadeh, L. Briand, K. Baetens, P. Goes,
and S. Forastier, “An ai-assisted approach for checking the completeness
of privacy policies against gdpr,” in RE’20. IEEE, 2020, pp. 136–146.
 T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in
deep learning based natural language processing,” IEEE Computational
intelligence magazine, vol. 13, no. 3, pp. 55–75, 2018.
 J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
 T. Hey, J. Keim, A. Koziolek, and W. F. Tichy, “NoRBERT: Transfer
learning for requirements classiﬁcation,” in RE’20. IEEE, pp. 169–179.
 D. M. Fern´
andez, S. Wagner, M. Kalinowski, M. Felderer, P. Mafra,
o, T. Conte, M.-T. Christiansson, D. Greer, C. Lassenius et al.,
“Naming the pain in requirements engineering,” EMSE, vol. 22, no. 5,
pp. 2298–2338, 2017.
 A. Ferrari, “Natural language requirements processing: from research to
practice,” in ICSE’18. IEEE, 2018, pp. 536–537.
 V. Garousi, S. Bauer, and M. Felderer, “Nlp-assisted software testing:
A systematic mapping of the literature,” IST, p. 106321, 2020.
 X. Hu, G. Li, X. Xia, D. Lo, and Z. Jin, “Deep code comment generation
with hybrid lexical and syntactical information,” EMSE, vol. 25, no. 3,
pp. 2179–2217, 2020.
 S. Gupta and S. Gupta, “Natural language processing in mining un-
structured data from software repositories: a review,” S¯
a, vol. 44,
no. 12, p. 244, 2019.