ArticlePDF Available

A Comparative Analysis of European Integrated and Stand-Alone Sustainability Reports: Evidence From LDA


Abstract and Figures

I employ an unsupervised learning method to investigate the thematic content of a large set of integrated and separate sustainability reports. Subject to a non-financialreporting mandate, companies in the EU are required todisclose their non-financial performance but are free to decide how they do so. I compare 2,248 integrated and 3,567 stand-alonesustainability reports to identify and examine the topicsthat firms disclose when preparing non-financial information. UsingLatent Dirichlet Allocation (LDA), a topic modeling technique, Ishowthat when text corpora are constructed of integrated reports, they are less likely to contain topics concerning environmental and socialmattersand have lower readability scores as measured by common readability metrics. I argue that the differences in disclosure content and textual characteristicsare the result offirms aimingto address specificaudienceswhen preparing their reports. The analysis is of potential interest to regulators and academicswishing to understand the intricacies that arise when firms choose a specific reporting format over the other.
Content may be subject to copyright.
A comparative analysis of European integrated and stand-alone
sustainability reports: Evidence from LDA
Felipe Sicka
aDepartment of Accounting, Control and Auditing, University of St.Gallen, St. Gallen, Switzerland
ACA-HSG, Office 57-104, Tigerbergstraße 9,9000 St.Gallen, Switzerland,
E-mail:, phone: +4915207326221
Acknowledgments: I thank Prof. Dinh and A. Stenzel at the University of St.Gallen for their
valuable guidance and support on this project. In addition, I like to thank participants of the
fifth edition of the Early Researcher Consortium (ERC) by the Accounting Research Groups at
the UPF-BSM (Spain), Free University of Bolzano (Italy) and University of Padua (Italy).
Electronic copy available at:
A comparative analysis of European integrated and Stand-
alone sustainability reports: Evidence from LDA
ABSTRACT: I employ an unsupervised learning algorithm to investigate the thematic
content of a large set of integrated and stand-alone sustainability reports. Subject to a non-
financial reporting mandate, companies in the EU can disclose their non-financial
performance together in their financial statement or in a separate sustainability report. In
this paper, I compare 2,248 integrated reports and 3,567 stand-alone reports to identify and
examine the thematic content that firms disclose when preparing their sustainability
information. To do so I employ a topic model, Latent Dirichlet Allocation (LDA) to
examine the topics disclosed by firms issuing a report between 2015 and 2021. Comparing
the outcomes of both types of report, I find that when text corpora are constructed of
integrated reports, they are less likely to contain topics concerning ESG matters and
especially addressing social issue. In line with previous research, I show that the readability
is lower for firms including their non-financial information in their financial statements as
compared to when they issue a separate report. The analysis provides insights for regulators
and preparers of sustainability information and contributes to the debate on the implications
of the recently approved proposal to extend non-financial reporting regulation to listed
small- and medium-sized enterprises.
Keywords: sustainability reporting, topic modeling, ESG, European corporate reporting,
integrated reporting
Electronic copy available at:
1. Introduction
The disclosure of non-financial information has become a relevant subject for preparers, users
and regulators alike, the latter increasingly looking to harmonize reporting practices. Under
current EU regulations, companies with more than 500 employees have the flexibility to disclose
their non-financial information together with financial information in an integrated report or in
a separate sustainability report. The Non-Financial Reporting Directive (NFRD 2014/95/EU)
essentially creates a situation in which companies either issue an integrated or voluntarily
disclose their non-financial information in a stand-alone report. How firms decide to disclose
their sustainability information has resulted in considerable heterogeneity in firms’ reporting
practices as well as significant challenges for measurement, comparability and standardization
(Christensen et al., 2019). At the same time, companies may wish to understand current and
future regulatory interventions to align their reporting practice given future changes in their
reporting practices. Recent developments, especially in the consolidation of standard setters
such as the creation of the International Sustainability Standards Board (ISSB) highlight the need
for making sustainability information comparable, reliable, and relevant.1.
In many cases, the information presented in multiple reports can differ with direct and
indirect implications for the end-users of sustainability reports (Rupley et al., 2017). While the
NFRD highlights the importance of how to assess materiality in the context of non-financial
information it does not standardize sustainability practices and is generally being interpreted as
a call for more integrated reporting practices as set forth by the International Integrated
Reporting Council, IIRC (Milla & Haberl-Arkhurst, 2018). Because an integrated report (IR)
should communicate ‘concisely’ how a firm’s strategy, governance, performance, and prospects,
in the context of its external environment, lead to the creation of sustainable value (IIRC, 2013)
an integrated report is said to be effective when it goes beyond addressing solely providers of
1 With the recent merge of the international integrated reporting council (IIRC) and the Sustainability Accounting Standards Board (SASB) into the Value Reporting
Foundation the regulatory environment in Europe can be said to be highly dynamic. For a detailed review on the CSRD and current consolidations in Europe I
refer to Dinh, et. al (2021).
Electronic copy available at:
financial resources2. However, it may also leave out information that may be relevant to
stakeholders beyond providers of financial resources. In fact, contents in sustainability reports
often seem to have been selective, mainly stressing positive aspects of the reporting company’s
performance (Baumüller, 2018). Using machine learning, I examine the thematic content of a
large dataset of integrated and stand-alone reports to gain a better insight into what managers
of large European companies disclose when they prepare non-financial information. Methods
that rely on artificial intelligence are being used increasingly in the accounting and finance
literature and are yet to find more applications in sustainability accounting research. My
motivation to study the differences in the thematic content is to make use of a computationally
efficient methods that are suitable to identify underlying or hidden topics that are disclosed in
different reporting types. In particular, I show how the thematic content, in terms of ESG
disclosure differs when a company issues an integrated report, relative to preparing a stand-
alone sustainability report. In addition, I investigate the differences in textual attributes when
text corpora are constructed of integrated reports. I find that when a company produces an
integrated report they are less likely to include topics around ESG matters and especially leave
out subjects around social issues. In line with previous research (especially du Toit, 2017; Stone
& Lodhia, 2019) I find that the readability of integrated reports is generally lower relative to stand-
alone reports. The result seems to be pronounced when a firm changes to an integrated reporting
framework. Throughout this paper I argue that the difference in disclosure content can generally
arises because of external pressures and firms wanting to tailor their information to their target
audience. For integrated reports, this means shareholders. Companies that primarily focus on
information that is financially material to investors may cover ESG related topics, but may
exclude topics for which the firm does not bear the full cost (Christensen et al., 2021)3. This can
have implications for the report a firm decides to disclose their non-financial information, the
2 The case for combining non-financial information with financial information was made prominently by Eccles and Krzus (2010) who emphasized that a
company should embed sustainability topics in the fabric of its business operations to demonstrate a commitment to CSR and therefore contribute to a more
sustainable society. Combining financial with non-financial information in a single report is said to improve corporate disclosure and transparency by eliminating
the artificial and unhelpful analytical distinctions between shareholders and stakeholders (Eccles, 2010).
3 This is especially in the case if a firms’ operations result in externalities such as air pollution.
Electronic copy available at:
thematic content of the report and the information available to end-users. Preparing reports that
not only incorporate how sustainability topics affect a corporation, but how a firms’ operations
affect the environment, is likely to shape the regulatory environment in the EU4.
My paper foremost contributes to text-based analytics in the sustainability accounting
literature. By explicitly examining the thematic content of sustainability reports and contrasting
topics by report type, I add to the understanding on how textual information is different in a a
fast changing regulatory environment that characterizes the EU. From fiscal year 2024 onwards,
large public entities in the EU will have to disclose their sustainability information under the
new Corporate Social Responsibility Directive (CSRD). Non-financial reports are currently and
will increasingly to rely on judgements about materiality (Baumüller & Schaffhauser-Linzatti,
2018a). The challenge I aim to address is in analyzing trends in corpuses far too large for humans
to manually review and summarize in a way that is easily interpretable. The insights are of
potential interest to regulators and standard setters in fostering awareness on how firms are able
to communicate their ESG activity more effectively to stakeholders. Understanding the thematic
content is also important for corporations wishing to understand this information and improve
stakeholder communication. It may help them to strengthen their relationships with customers,
employees and local communities (Du & Bhattacharya, 2010).
The remainder of this article is organized as follows. Section two reviews the literature
and explains the current and changing regulatory environment in the EU. Section three
introduce topic modeling and how I use LDA for the application of sustainability reports. In
this section I elaborate on how I generally carry out the analysis. To make the large body of
integrated and stand-alone reports useful for the model, I carry out an array of pre-processing
steps to the data that I describe in section four. Section five presents the LDA results separately
for stand-alone and integrated reports. In the same section I focus on firms that switch from
4 Regulators in the EU are currently aiming to promote the concept of ‘double materiality’ which requires business es to consider the firm’s economic, environmental,
or social impact on society Garst, J., Maas, K., & Suijs, J. (2022). Materiality Assessment Is an Art, Not a Science: Selecting ESG Topics for Sustainability Reports.
California Management Review, 00081256221120692. .
Electronic copy available at:
disclosing non-financial information in a financial statement to an integrated report type. Finally,
in section six I offer my concluding remarks.
1. Background and Literature Review
1.1. Institutional setting
The non-financial reporting directive 2014/95/EU (hereinafter: NFRD) came into effect in
2017 as a response to the lack in transparency and comparability of non-financial information,
previousely subject to directive 2013/34/EU. The previous directive (2013/34/EU) was
primarily concerned with ensuring general comparability between firms operating in European
member states in terms of their financial statements, management reports and other general
financial reporting practices (European Union, 2013). In contrast to the superseded directive,
the NFRD stipulates that companies in European member states not only disclose how
sustainability issues affect them, but also how their activities affect society and the environment
at large (EC, 2021). This principle, referred to as double materiality, suggests that stakeholders
can have preferences beyond shareholder value maximization. While 2013/34/EU required
public interest entities to disclose a non-financial statement, it did not have materiality of
disclosures as one of the fundamental principles of the nonfinancial reporting directive (Milla &
Haberl-Arkhurst, 2018). The NFRD requires large public interest entities to disclose
information on how they operate and manage social and environmental challenges5.
Fundamentally, the main target of the NFRD is to allow a sufficient level of comparability and
transparency to meet the needs of investors and other stakeholders and to provide consumers
with easy access to information on the impact of businesses on society (European Union, 2014)6.
The disclosure should help the measuring, monitoring, and managing of a companies’
performance towards environmental targets and their impact on society. Transparency is
considered key for companies to deliver better results and is expected to enhance the trust
5 The EU generally consideres non-financial information as environmental, social and governance (ESG) information (EC, 2013).
6 The disclosure of non-financial information is considered vital for managing change towards a sustainable global economy by combining long-term profitability
with social justice and environmental protection (European Union, 2014).
Electronic copy available at:
citizens have in business and in markets and enable a more efficient allocation of capital (EU,
2014; European Union, 2014). The provisions of the directive, together with guidelines set forth
by the EU Commission generally show a reference to the principle of materiality (Baumüller &
Schaffhauser-Linzatti, 2018b; EC, 2017)
Information, according to the NFRD can be disclosed in the form of a statement in the
annual report or a separate sustainability report (Baumüller, 2018; Dinh et al., 2021; EC, 2017).
Allowing the choice between a disclosure of non-financial information in the annual report with
a voluntary decision on how detailed the information is presented in a stand-alone report has
been considered to result in satisfactory increase in transparency, while keeping the
administrative burden low (Commission, 2017; European Commission, 2017). In addition, the
material non-financial information would be made publicly available on a regular basis and could
be used by stakeholders such as social organisations or local communities to assess the impact
and risks related to the operations of a company. Even though companies choosing to provide
a separate report might have to sustain higher costs (European Commission, 2017) the benefits,
for instance in enhancing efficiency of capital markets outweigh the downsides. Cuomo et al.
(2022) find that the directive generally has led to an increase in transparency and sustainability
performance. Before the NFRD was made public, the usefulness of an integrated framework
was made easier for companies voluntarily issuing an integrated report (Stawinoga & Velte,
Baumüller and Schaffhauser-Linzatti (2018b) show that despite the expectations of
many, with regards to materiality, the reporting requirements of the NFRD are closer to
integrated reporting than it is to sustainability reporting. This finding is in line with Stawinoga
et al (2017) who argues that integrated reporting depicts the more realistic implementation
because of the quantitative and qualitative requirements of the directive and companies aligning
their non-financial information into their financial reporting practices. Given the need to
reconcile financial with nonfinancial information, Neumann et al., (2012) argue that the increase
Electronic copy available at:
in total information available might have an adverse effect and hamper the intended positive
effects of reporting, from its users perspective as a whole. According to Eccles and Krzus (2019)
there are two main reasons why a company should adopt an integrated report. The first is that
integrated thinking is a key element of taking sustainability seriously, because once the company
has created a truly sustainable strategy, it can better respond to the risks and opportunities
created by the need to ensure a sustainable society (Eccles, 2010). Second the reader gains a
better understanding of the relationships between financial and non-financial performance,
urging managers to provide more specific examples of how the firm is doing well (for
shareholders) by doing good (for stakeholders). An integrated report, according to the latter
view puts pressure on a company to be as precise as possible about the relationship between
strong ESG results on the one side and financial results on the other. Maniora (2017) finds that
integrated reports can differ from stand-alone reports because integrated reports generally
include financial and non-financial information and therefore result in a firm's tendency to cater
to a specific audience (investors). This view is supported by Baumüller and his colleague (2018)
who highlight the main target group for integrated reporting to be providers of financial
resources. While Lai et al. (2018) state that integrated reporting establishes a meaningful dialogue
with a growing variety of stakeholders through broader and plainer messages they, too, argue
that the primary addressee of an integrated report are shareholders. Should companies decide
to voluntarily provide a non-financial report, the level of detail of information disclosed would
necessarily increase (European Commission, 2017). I would then expect to see topics being
clearly identifyable as environmental, governance and social.
Generally, the guidelines set forth by the EU allow companies to decide how to use
international, European or national guidelines according to their own characteristics or business
environment (Dinh et al., 2021; Goloshchapova et al., 2019). Two noteworthy guidelines for the
disclosure of non-financial information in the EU are the Global Reporting Initiative (GRI) and
the International Integrated Reporting Council (IIRC). The GRI can be understood as an
initiative providing non-binding guidance for companies (European Commission, 2017).
Electronic copy available at:
Striving to help companies communicate their impact on critical sustainability issues it
developed to be a global standards for sustainability reporting (Christensen et al., 2021). The
GRI define material topics as those topics, that can reasonably be considered important for
reflecting the organization’s economic, environmental and social impacts, or influencing the
decisions of stakeholders (GRI, 2016). The IIRC, in contrast, attempts to institutionalize
integrated reporting as a practice that is critical to the relevance and value of corporate reporting
(Humphrey et al., 2017). The objective of the IIRC is to change business actors' perspectives to
further integrate sustainability activities and impacts into strategic planning and decision-making
(IIRC, 2013). In other words, an integrated report should disclose about matters that
substantively affect the organization’s ability to create value over the short, medium and long
term (IIRC, 2013a). Simnett and Huggins (2015) provide insights into salient issues in the
development of the IIRC and explain that much of what is expected to be included in an
integrated report could be described as qualitative rather than quantitative information. In other
words, companies are free to write about how they combine the non-financial information with
their financial information. This makes the assessment in terms of materiality of topics difficult.
According to the IIRC, a topic is material if it is of such relevance and importance that it could
substantively influence the assessments of providers of financial capital with regards to the
organization’s ability to create value over the short- medium and long-term (IIRC, 2013b).
Prior studies have conjectured on the differences in information content across different
channels and mainly argued from an institutional and stakeholder perspective (Vitolla & Raimo,
2018). External ‘tensions’ between a firm and its stakeholders can influence managers to release
only certain information, especially when the information is proprietary (Dye, 1985). While
Melloni and Stacchezzini (2014) demonstrate that companies do not adopt integrated reporting
as a legitimation strategy, integrated reporting is generally associated with higher ESG disclosure
ratings and performance (Velte 2017)7. Flower (2015) paints a different picture and argues that
7 A legitimate explanation for the nondisclosure of management’s information generally arises from the fact that stakeholders are often unsure whether the
manager has any such information—or they are uncertain about the kind of information held by management (Dye R., 1985).
Electronic copy available at:
the <IR> framework (i) focuses exclusively on impacts that are directly related to the firm, and
(ii) generally lacks enforceability, resulting in little impact on reporting practices. Similarly,
Vitolla et al. (2018) conduct a systematic literature review and conclude that the <IR>
framework lacks specificity, leading to problems of standardization and homogenization of the
content. Furthermore, a stream of literature raises concerns regarding the divergence of
information for different report types because of a lack of specificity inherent in the <IR>
framework (Beck et al., 2017; Mio and Fasan, 2016). Because of its multi-dimensionality it is
said to make comparison across firms and industries difficult (Eccles et al., 2019). Finally,
integrated reports that follow IIRC’s guidelines have been shown to be low in readability by
common readability measures, primarily because of the complex nature of the language used in
integrated reports (du Toit, 2017; Stone & Lodhia, 2019).
Taken the characteristics of integrated reporting into consideration it is initially unclear
to what extend material topics are adequately communicated to users of sustainability reports.
To study the extent to which the thematic content differ by report type, I rely on textual analysis
and a classification algorithm known as Latent Dirichlet allocation (LDA). I review the literature
of similar applications in the next subsection.
2.2. Topic modeling in Accounting Research
Text mining is the large-scale, automated processing of plain text language in digital form to
extract data that is converted into useful quantitative or qualitative information (Das, 2014;
Antweiler und Frank, 2004). In topic modeling, a document is considered a collection of words
containing multiple topics in different proportions (Chae and Park, 2018). Latent Dirichlet
Allocation (LDA) was first introduced by Blei et al. (2001) and is based on the idea that a corpus
of a collection of disclosures can be represented by a set of common topics and the content of
a document described by the weights placed on these topics (Hoberg & Lewis, 2017). The basic
assumption underlying LDA is that each document is generated by drawing content from a
Electronic copy available at:
common set of topics, or clusters of words8. LDA has been described to be superior to some
alternative methods for several reasons. Because it is an unsupervised learning technique,
researchers do not have to prepare, ex-ante, dictionaries for the analysis (Tirunillai & Tellis,
2014). In addition, it is not necessary to know in advance what the topics will look like. By tuning
the LDA parameters to fit different dataset shapes, one can explore topic formation and
resulting document clusters in an exploratory fashion (Nguyen, 2014). Finally, topic models are
particularly useful when corpora are large and when there are many documents to be classified.
Outcomes of topic models have been evaluated by probability-based metrics such as coherence,
perplexity or visual inspection (Hagen, 2018; Israelsen, 2014)9. The most popular method used
for feature extraction is the bag-of-words (word frequency) approach. This technique breaks the
text into word-level units, and treats these units as features, while ignoring the order and co-
occurrence of words (Chen et al., 2017). LDA is predicated on the idea that documents are built
as mixtures of latent subjects, each of which is basically a probability distribution over words.
Textual analysis using bag-of-words, however, faces the risk of becoming overly conditioned by
the usefulness and timeliness of the word dictionary used. In contrast, LDA is suitable to
compare and assess the topics of a sizable collection of European sustainability reports as in the
case of this study. LDA which is but one application of text mining has several benefits over
manual coding including the ability to process many documents that would be inefficient to
code manually or the redundance of applying dictionaries or interpretation guidelines before the
analysis (DiMaggio et al., 2013). Because of the large component of subjective judgement,
manual coding may not capture the hidden features that LDA may detect. By fitting the
presumptive statistical model to the full textual corpus, the topic model can identify topics and
their probabilistic relationships that may otherwise remain unobserved (Mohr & Bogdanov,
2013). A textual corpus' overall subjects, keywords used to identify each topic, and the
8 The process of LDA, can be compared to cluster analysis or principal component analysis (PCA) as applied to quantitative data Huang, A. H., Lehavy,
R., Zang, A. Y., & Zheng, R. (2018). Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach. Manag. Sci., 64, 2833-2855.
. PCA is similar in that it ext racts the important information from the data to express this information as a set of new orthogonal variables called principal
components or vectors that are used as input to the model(s).
9 Perplexity is calculated by taking the log likelihood of unseen text documents given the topics defined by a topic model. A good model will have a high
likelihood and resultantly low perplexity. Coherence generally can be understood as how well a topic is ‘supported’ by a (reference) text set and combine
the score using topic overlap.
Electronic copy available at:
probabilistic relationship between keywords and topics cannot be determined beforehand
(Huang et al., 2018). Alternatively put, a fitted LDA model recovers the set of subjects that most
accurately capture the empirical distribution of word groupings across the documents (Bellstam
et al., 2021).
Blei et al. (2018) provide an overview of how topic modeling works in general and latent
Dirichlet Allocation in particular. The latent structure of text refers to two types of information
from the data: document-topic distribution and topic-term distribution. The document-topic
distribution informs as to how each document is composed, in terms of topics. The topic-term
distribution provides different lists of semantically coherent words, where each list of terms
represents a topic or theme (Chae and Park, 2018). These two procedures are repeated word for
word to create the document. By fitting this two-step generative model to the observed words
in the documents until it finds the best collection of variables that define the topic and word
distributions, the LDA algorithm iteratively determines the topic distribution for each document
and the word distribution of each subject (Blei et al., 2001).
In accounting and finance research, LDA has been used to identify financial
misreporting (Brown et al., 2019), analyze company disclosure (Israelsen, 2014) and investigate
conference calls (Huang et al., 2017). Hober and Lewis (2017) use LDA to analyze 10-K
Management’s Discussion and Analysis (MD&A) of firms that are suspected to conduct fraud.
Highlighting that managers may under- or over disclose certain topics, the authors find that
firms committing fraud, less frequently discuss topics that link the CEO with participation in
actual firm plans and financial strategies and rather discuss acquisitions, hedging transactions,
derivative instruments, and business opportunities (Hober and Lewis, 2017)10. In addition,
fraudulent managers discuss fewer details explaining the sources of the firm’s performance,
while disclosing more information about positive aspects of firm performance11. Huang et al.
10 The authors rely on two distinct approaches to accurately interpret the identified topics. The first is a list of the most frequent key phrases that are
associated with each topic. The second is a representative paraghraphwhich best represents the content that is typical among firms that use the topic.
11 Israelsen Israelsen, R. (2014). Tell It Like It Is: Disclosed Risks and Factor Portfolios. SSRN Electronic Journal.
also applies LDA on 10K disclosure and focuses primarily on the frequency of topics which are disclosed in each individual report. Primarily interested
Electronic copy available at:
(2017) use LDA to compare the thematic content of a large sample of analyst reports with
transcript of conference calls that precede the disclosure of the analysis. To compare the topics
of analyst reports and conference calls, the authors use the topic distribution of each document
and the word distribution of each topic and conduct a pearson’s chi-square test for the
homogeneity of the distributions between the two text corpora12. Huang et al (2018) conduct
the LDA for each industry separately, arguing that many topics are industry specific. LDA was
also used in Dyer et al. (2017) to study trends in 10-K disclosure over time. They find that three
of the 150 topics (fair value, internal controls, and risk factor disclosures) required by the
Financial Accounting Standards Board (FASB) and Securities and Exchange Commission (SEC)
explain most of the increase in length of the disclosures (Dyer et al., 2017).
Székely and Vom Brocke (2017) apply topic modeling to 9,514 sustainability reports
published between 1999 and 2015 to identify the most common topics in the reports. By
manually labeling the topics, they identify forty-two topics that reflect sustainability,
environmental, and social topics. Based on these results, Székely and Vom Brocke (2017) then
derive ten specific propositions to guide future research and practitioners. Importantly, Székely
and Vom Brocke (2017) show that topics related to environmental sustainability consist mainly
of emissions and consumption, particularly related to energy. They also highlight, that
biodiversity and renewable energy do not appear in their results. In addition to providing a broad
picture of sustainability practices over several industries, they highlight the necessity of balancing
all three dimensions of CSR. My analysis is different from theirs in that I contrast the topic
models outcome with differing report types. The study perhaps most closely to this one is by
Goloshchapova et al. (2019) who use LDA for sustainability reports of 15 European countries
and the UK. They use 4,156 sustainability reports by 2,685 unique firms and focus on the
in the association of the frequency of the words with firm characteristics, he finds that disclosing risks associated with borrowing, with the ability to pay
dividends, with legal issues, and with the ability to develop new products or technology have higher total, and idiosyncratic stock-return volatility
(Isrealesen, 2014).
12 If the two documents are homogeneous, the proportion of sentences in the topic will be equal, i.e., the observed number of sentences in each topic
will be equal to the expected number of sentences for the two documents. The degree of freedom of the chi-square test between the two documents is
the vector length minus one. Details can be found on
Electronic copy available at:
disclosed topics relative to their industries. Goloshchapova, et al. (2019) find that industrial firms
emphasize employee safetywhereas the utility sector concentrates on energyand efficient
power’ while consumer products focus on food wastewhen firms report on their CSR. The
authors manually label the topics and do not distinguish between different reporting formats.
Here again, my study distinguishes itself from theirs in that I compare the thematic content of
stand-alone and integrated reports. In addition, I use the popular topic modeling library
MALLET which has been shown to outperform alternative methods (McCallum, 2002; Yao et
al., 2009).
3. Methodology
3.1. Data collection and preparation
I obtain both stand-alone and integrated sustainability reports from Corporate Register,
a data provider that specializes in collecting corporate sustainability reports13. The sample period
ranges from 2015 to 2021 and covers all countries that are member of the EU, UK and
Switzerland. The dataset contains a total of 5,815 unique reports that are written in English. I
obtain a meta-file that contains additional information on the observations, such as industry, the
country in which the firm is headquartered and whether the firm issues an integrated or
standalone report. Each report in the dataset is one observation14. I provide an overview of the
variables used in the study in Table 1 in the appendix. The data is initially stored by a company
number that Corporate Register provides. After querying the reports by report type, I obtain
two datasets with 3,567 stand-alone reports and 2,248 integrated reports, respectively. In the
dataset, I observe 158 instances where a firm switches from a stand-alone to an integrated report.
I define a switch in report type as a change in the binary variable that I previously assign to each
observation. If a company produces a stand-alone report in one year (type=0) and an integrated
report in the following year (type=1), I recognize this as one instance in which a firm switches
13 I do not get paid by or am otherwise affiliated with Corporate Register.
14 An observation in the dataset is defined as an instance where one unique report exists in a year by a company.
Electronic copy available at:
report type. While this approach allows me to identify the unique switch from one year to the
next, it does not allow me to separately analyze a situation in which a company produces several
reports in one year. It can be that a company produces both an integrated and a stand-alone
report in one year, especially when the firm is in the process of switching report type wishing to
include a ‘bridge year’, i.e. producing both type of reports. Because I am interested in thematic
content around firms that switch from an integrated to a stand-alone report, I initially exclude
firms that produce more than one report that can be clearly attributed to an integrated or stand-
alone report. I convert all the available reports to text files using a function to decrypt files for
which the text cannot be readily extracted. Italian and German firms make the largest proportion
in the dataset (12% and 11%, respectively), followed by Swedish (10%) and French firms (9%).
Interestingly, firms headquartered in Scandinavian countries, on average, report their
sustainability information primarily using an integrated reporting framework as compared to
Eastern European countries such as Poland. According to Mitchell et al., (1997) regional
variation in reporting format can generally be explained by management practices and strategic
responses to the corporate environment, in particular to the expectations and pressures of
corporate stakeholders. Western European countries, particularly Anglo-Saxon governments are
significantly more active in promoting ESG than governments in other EU countries (Steurer
et al., 2012). A breakdown of the frequency of integrated and stand-alone sustainability reports
by country can be found in table 2 in the appendix. In total, there are 1,218 unique firms
represented in the sample.
3.2. Latent Dirichlet allocation and model selection
In this analysis make use of and rely on a technique known as Latent Dirichlet allocation
(LDA) to investigate how the topics in sustainability reports differ by report type. LDA has been
described by Blei et al (2001) and generally calculates topics as a probability distribution over
words given a text corpus and assumes that topics have latent patterns of words in the corpus.
Electronic copy available at:
Any document can be categorized as a combination of these topics15. The likelihood that this
topic will appear in any document, and the distribution of the resulting topics for each document
in the corpus then make the outcome that is derived probabilistically by computation. Using an
automatic categorization procedure such as LDA is suitable because it is otherwise difficult to
analyze thousands of reports using traditional manual approaches qualitative coding methods
such as wordlists. In addition, LDA can automatically discover hidden structures in texts and
provide a relatively efficient way to examine large texts. These models have been used in a variety
of applications and including finance and accounting research (Chen et al., 2017)16. By
recognizing each document, or sustainability report, as a collection of words I am able map the
content of a variety of themes that managers of both integrated and stand-alone sustainability
reports cover. Each topic (or subject) can then be understood as a distribution across all
observed words in the corpus, and words highly related with the document's dominant topics
can even be traced back to their original reports. The major distinguishing characteristic of topic
models from other machine learning approaches is that they offer an automated process for
classifying the content of texts into several significant topics. The process of applying LDA for
different report type is enticing because the algorithm can complete the classification with very
little assistance from the researcher. Specifically, the number of topics is the only material input
the researcher needs to specify when running LDA (Hoberg and Lewis, 2017).
For the computation of the topics, I rely on The Machine Learning for LanguagE
Toolkit (MALLET). MALLET uses an implementation of Gibbs sampling, a statistical technique
meant to quickly construct a sample distribution, to create its topic models (Graham et al., 2012).
An overview of Gibbs sampling can be found in exhibit 1 in the appendix. I use an iterative
approach and build several models in which I adjust the number of topics and select the model
that describes the topic model the best. In each iteration, I vary the number of topics using
15 For instance, an LDA analysis of stand-alone reports might uncover two topics: one with the terms ‘emissions’, ‘weather’, ‘forest,’ and the other with the words
‘employee’, ‘health’ and ‘work’. LDA calculates the likelihood that the term ‘climate’ will be related to the term ‘emissions’, ‘weather’, ‘forest’ resulting in a topic
that can be labeled as an environmental topic.
16 The method has a long pre-history that includes early Latent Semantic Indexing (LSI) research by Deerwester et al. (1990) and Hoffman's (1999) probabilistic
Latent Semantic Indexing (pLSI) approach (Mohr & Bogdanov. 2013). I provide more information in part two.
Electronic copy available at:
MALLETS inbuilt function with standard hyperparameters. The challenge in selecting the
optimal number of topics is usually to find the optimal trade-off between topic relevance and
topic specificity while maintaining analytical interpretability (Yao et al., 2009). The topics in
models with fewer topics are usually wide and blend words from clearly distinct themes into a
single topic. On the other hand, topic models with more topics provide more distinguishable
subjects, but they can also be too narrowly focused. While several evaluation procedures for
topic models have been established to determine the optimal number of topics, I rely on a
coherence measure as proposed by Röder et al. (2015) and used in previous studies ((Mimno et
al., 2011)17.
In line with Atkins et al. (2012) and Bao & Datta (2014) I provide an intuitive label for
the topic outcomes by reading the high-probability words in key topics. I provide the label based
on whether a topic can be considered to relate to environmental, social or governance matters.
I also mark topics with ‘no clear assignment’ when I cannot readily identify an ESG topic18. In
general, I can label topics on their E, S, and G dimensions after reading trough the terms in the
topics. Although it requires qualitative and human subjectivity it does not involve extensive
labelling since, very often the term social, governance or environmental appear within the 15
highest occurring words in the topic words. As is common practice when using topic models, it
is helpful to create a variety of candidate topic models before using a validation procedures to
choose the model that best fits the particular research question (Griffiths et al., 2007). Other
than Goloshchapova et al (2019) who translate all non-English words into English I follow
Huang et al (2018) and remove non-English reports from my dataset entirely.
17 While I primarily use coherence as the measure to gauge the model, I follow previous research and use a visual tool (LDAvis) develo ped by Sievert and Shirley
Sievert, C., & Shirley, K. (2 014). LDAvis: A method for visualizing and interpreting topics. to get an impression of the overlap of
the topics. The method is primarily used to gain an impression of the outcomes. Chang et al (2009) offer caution when relying on the topic optimal parameter
selection solely by this approach. LDAvis helps me only to the extent of validating the outcome and primarily rely on coherence to select the model. It is therefore
used for validation purposes only. I check the top words generated for each topic to make sure they are comprehensible.
18 AlSumait et al., (2009) follows a similar approach and label ‘junk topics’ as topics that are too broad themes made up of generic terms or common adjectives
(AlSumait, L., Barbará, D., Gentle, J. E., & Domeniconi, C. (2009). Topic Significance Ranking of LDA Generative Models. ECML/PKDD, .
Electronic copy available at:
3.3. Pre-processing
The words in the text corpora, which are generally a series of (unicode) letter characters,
are the fundamental units of a text. For the use of LDA, it is necessary to pre-process each
corpus before obtaining results. For the pre-processing I use a commonly used libary (SpaCy),
which is an open-source software for professional text-processing (Jugran et al., 2021). The pre-
processing generally includes tokenizing the text which includes lowercasing the words,
removing punctuation and letter accents. I also lemmatize each word which is the process of
converting a word to its base form, or lemma, by removing just the inflectional endings (Khyani
& B S, 2021). Words like ‘used,’ ‘using,’ and ‘uses’ then are transformed to ‘use.’ Both integrated
and stand-alone reports may include words that are used together with a meaning that is distinct
when used together (Jurafsky & Martin, 2021)19. I therefore allow for these words to be treated
as one instance to be shown in the resulting topics20. I next remove all other stop words that are
used for connection and grammar but have no or little meaning. Those words are for instance
‘the’, ‘which,’ ‘on,’ ‘in’ and contain little informational value. I rely on pre-existing libraries and
follow common practice to remove these words for the analysis (Maier et al., 2018). In addition,
I compute word frequencies for the ten most occurring words separately for integrated
standalone reports and add them to extend the list of potential stop words. For example, words
such as ‘report’ or ‘page’ may apply to all reports in the sample and infer little meaning. I
therefore add words such as ‘report’ and ‘page’ or ‘year’ to the list of stop words. An overview
of the high frequency words can be found in the appendix. Finally, I add all company names to
the list of stop-words for the same reason as connecting words. Following Goloshchapova et
al. (2019) some common but non-recognizable words, such as ‘ve,’ ‘re,’ etc., appear as words in
the topic results, so I remove them in an iterative process, too. Finally, I specify word attributes
19 An example of a trigram would be chief financial officer’.
20 These groups of 'n' words can then appear as one single instance in the topic output. For implementation I use G ensim’s Phrases model to identify
and build the bi- and trigrams, respectively. The two important arguments to the Phrases model are min_clount and threshold. The higher the
values of these parameters, the harder it is for words to be combined to bi-/trigrams. I initially leave these parameters at their standard
Electronic copy available at:
that should be allowed as input parameters for the topic model. In particular, I only allow nouns,
adjectives and verbs to be considered as input data. After these steps I create a dictionary with
the remaining words. With the dictionary I am able to compute the topic results.
4. Results
4.1. Results of the descriptive analysis
Table 1 shows both integrated and stand-alone reports based on their framework
adherence. A detailed breakdown on the guideline adherence in the dataset can be found in
Panel A and B of table 1. As can be seen in the table, firms that include their non-financial
information in their financial statement, generally adhere to GRI standards. When the company
issues a separate stand-alone report, it always follows GRI standards21. When a firm issues an
integrated report, the firm is more likely to follow IIRC guidelines (36.6%) than when it issues
a stand-alone report (4.5%). This result may come as a surprise since companies following IIRC
standards would not be considered to issue a separate sustainability report. However, by closer
inspection into these reports, a company may explicitly declare that the separate report include
some aspects of the IIRC. The Corporate Register therefore marks them as following IIRC
guidelines, accordingly.
Table 1: Integrated and standalone reports by GRI and IIRC adherence
Panel A: GRI adherence
Integrated Report
464 (20.6%)
1,784 (79.4%)
2,248 (100%)
Stand-alone Report
0 (0%)
3,567 (100%)
3,567 (100%)
Panel B: IIRC adherence
Integrated Report
1,425 (63.4%)
823 (36.6%)
2,248 (100%)
Stand-alone Report
3,405 (95.5%)
162 (4.5%)
3,567 (100%)
21 This result may stem in part because of Corporate Register’s past affiliation with the official GRI Register.
The company started its operations, by collecting only stand-alone reports that follow GRI guidelines. More
information can be found on: About the GRI Register
Electronic copy available at:
Notes: The table provides information on the GRI and IIRC adherence of a company in the entire dataset. The
adherence is defined by a company indicating their compliance in the report and categorized accordingly by Corporate
Figure 1 shows the proportion of integrated reports in relation to the entire number of
reports over time and industry. As can be seen, the number of companies that issue an integrated
report steadily increases, starting at 36.9% in 2015 and surpassing 50% at the end of 2021 and
the beginning of 2022. Panel B of figure 1 presents both integrated and stand-alone reports that
are issued by companies in their industries. As can be inferred from the graph, the largest
proportion of industries generally represented in the sample are industrial firms, followed by
financial companies and consumer goods. Firms in the health care, basic materials, tele-
communications and the financial sector, on average disclose their sustainability information
using an integrated framework. More than half of the companies (53%) issue a report that is
assured by an independent third party.
Figure 1: Integrated and standalone reports over time and industry
Electronic copy available at:
Panel A Panel B
Notes: Figure one shows the absolute amount of integrated and stand-alone sustainability reports over time (Panel A). At the time
of the data collection not all companies had issued an integrated (stand-alone) report which explains the drop in year 2022 in panel
A. Panel B shows the number of reports by report type and industry. The largest proportion of industries presented in the sample are
industrial firms, followed by financial companies and consumer goods. Telecommunications and Health Care are represented the
least. The figure is based on 5,815 observations from which 2,248 reports are categorized by Corporate Register as integrated reports
and 3,567 reports are labeled as stand-alone sustainability reports.
A company may issue more than one report in a given year. This can be for instance
firms that in one year produces an integrated report and voluntarily issues a separate
sustainability report. Table 2 indicates, in absolute terms, whether companies produce several
reports. In total, I observe 340 companies (5.85%) that issue two reports and 27 firms which
produce more than two reports in a single year. A company that issues several reports may for
instance issue a separate sustainability report and a GRI content index. While the constellations
may differ in terms of the content, it shows the general heterogeneity of companies in the EU
of firms being able to issue an array of different reports. It also happens that a company
mayissues two stand-alone reports that are similar which both contain ESG information. One
Italian company in the Oil and Gas industry for instance has two separate reports labeling one
sustainability performanceand the other sustainability report with some, but marginally
differences in the information presented. An overview of companies that produce one, two or
three reports is presented in table 2.
Electronic copy available at:
Table 2: Companies issueing more than one report
Notes: The table shows observations at the company level of firms issuing only one, two or three reports in a single year.
It indicates how many reports are issued by a company. For instance, there are 9 instances in which a firm discloses an
integrated report after having issued another integrated report. It is to be understood in absolute terms. In summary, there
are a total of 367 firms that issue more than two different reports in the same year.
Table three reports textual attributes of integrated and stand-alone reports. In particular, I compute for
each observation the total number of pages as well the word count, and readability scores. Table three
shows that, on average integrated ports are longer and show generally lower readability score as measured
by a higher (lower) Gunning Fog (Flesch Reading Ease) index. This result is generally in line with
previous literature (du Toit, 2017; Stone & Lodhia, 2019). For firms switching the report type from one
year to the next, I report results of the descriptive analysis in Panel B od Table three. While the average
page length does not seem to be very different, the differences in readability indices seem to be larger.
Table 3: descriptive analysis of integrated and standalone reports
Panel A
Word count
Report type Total
Panel B
Notes: Panel A in table shows descriptive statistics for all 2,248 (3,567) integrated (standalone) reports in the sample. The Gunning
fog and Flesch Reading Ease are used to measure the readability of the reports before pre-processing. Exhibit 1 provides background on
the readability indices. Panel B shows textual characteristics for all instances when a firm switches report. The last two columns report the
character counts before pre-processing and after pre-processing, respectively. Word counts are in million of words.Word count are in million
of words. The null hypothesis in the two sided t-test is the situation in which there are no differences. *** indicate statistical differences at
the 1% level. P-values are in parentheses.
Electronic copy available at:
4.2.Results of the topic model
Tables 8 and 9 in the appendix report the outcomes of the topic model of stand-alone and
integrated reports, separately. The topic order has no significance. I start with the interpretation
of the topics for stand-alone reports before discussing the outcomes for integrated reports. For
stand-alone reports the optimal number of topics prove to be 36. I can assign an ESG label for
28 topics (77%) of all 3,567 standalone reports. In general, some topics discuss an ESG theme
more prominently than other topics. After labeling the topics across the ESG dimension I obtain
17 topics that I can link to environmental matters, 7 topics to social and 4 topics are governance
related. In other words, I obtain a rather large proportion of Environmental themes that run
through the reports. Topic two for instance is a good example on how firms disclose
information on their environmental matters. The topic includes words such as climate,
‘sustainable’, ‘carbon’, ‘targetwhich can be clearly linked to a company discussing their climate-
related targets. Some of the environmental topics appear to be disclosed in relation to packaging
as topic 17 shows. Words such as paper, recylce’, waste’, renewableand plastic appear
together with words such as circular or ‘sustainable’. Yet other environmental topics are
disclosed in relation to natural resources as topic 31 illustrates. Words such as ‘food’, ‘waterand
‘plant’ appear together with biodiversity’, ‘supply’ and ‘sourcing’. In terms of topics that are
related to governance I observe words such as ‘compliance’ and ‘policies’ that appear in
conjunction with gender (topic 36) or employee wellbeing (topic 1). While governance-related
topics may include terms regarding responsibility, they also have a high likelihood of appearing
with words such as performance, ‘targetsor results(topic 24). This tendency can also be seen,
albeit to a lesser extent in topics around social matters. In general, topics on societal matters
generally include words such as community, ‘human’, ‘people’ but may also appear in
conjunction with marketing related words such as ‘brand’, marketor sale. Yet other topics
hardly contain any ESG related terms at all. Topic 9 which is on real estate has words such as
‘building’, ‘construction’, ‘property’, office, etc. or topic 22 which is on infrastructure
Electronic copy available at:
containing words such as ‘networks’, ‘communication’, security’ and platform. Finally, some
topics may at first be attributed to one of the three ESG categories.
I next turn to the topic outcomes of 2,248 integrated reports and report the results in
table 9. This time, I add for each of the identified topic the dominant file that contributes the
largest percentage to one single topic. For integrated reports the optimal number of topics is 35.
Other than for stand-alone reports in which I could clearly provide a label for 28 topics, I am
only able to label 16 instances (46%). After labeling the topics across the three ESG dimensions
I obtain 6 topics for environmental matters, 4 topics to social and 6 to governance. In general,
I note that the topic outcomes for integrated reports are at first sight more specific. In particular,
the LDA algorithm seems to better identify the themes on individual subjects, however other
than ESG topics. There are several examples: Topic 14 discusses mobility and includes words
such as car, ‘rentalor vehicle’. Topic 28 discusses mining operations and has words such as
‘gold’ and mineral, but also african and reserve. In terms of topics that I label as
environmental related, I find that they are predominantly constructed of words on energy (topic
27), resource sourcing (topic 13) but also related to performance and investment related terms
(topic 11). I find that topics on societal matters are relatively underrepresented and if they
appear, they make a rather low proportion to the overall topic. One example is topic 2 which
includes words such as ‘director’, executive’, chairman. In general, ESG topics cannot be
identified as readily as for the outcomes of stand-alone reports. The topic which features the
social dimension the most is perhaps topic 12. While the topic includes words such as customer
and ‘employee’ it also features ‘environmental’ and governance and can therefore not
specifically be allocated to social topics only. In summary, topics on social matters seem to be
underrepresented in the results for integrated reports, relative to stand-alone reports.
Table 9 also shows each document that makes the largest proportion to a specific topic.
This allows me to analyze further to what industry and company the report belongs to and in
which year it was published. The contribution in percentage can be understood as the proportion
of the words within the report that influence the establishment of the topic. Because any report
Electronic copy available at:
in LDA can be described as a combination of topics, this approach is useful in investigating a
report that predominantly features one specific topic. A topic will appear as the main topic in a
document more frequently the higher its contribution. Topic 5 is an energy company that
focuses on renewable energy and is therefore said to contribute the most on words such as
electricity, ‘power’, ‘renewableetc. While tracing back topics to their original documents can
be used to drill down on the determinants of topics, I am more interested in the change of topics
when a firm switches from a stand-alone report to and integrated report.
5.2 Results of firms switching report type
I next report the results of the reports in which I observe a change in report type. In total I
observe 158 instances in which firms change from a stand-alone to an integrated report. Of the
topic outcomes before a company switches (stand-alone) I obtain 31 topics as ideal number.
For an integrated report that followed a stand-alone report I compute 21 topics to be the
optimum. I report the results of the topic model before and after the switch in tables 10 and 11
in the appendix.
For stand-alone reports that precede an integrated report in the following year I can
label five topics as social, 10 topics as environmental and 8 topics as governance topics. I am
not able to assign 6 of the topics to an ESG dimension and mark them accordingly. After the
switch I can only label 6 topics as environmental, 3 related to governance and 3 to social. Based
on the words that appear for the remaining 9 topics, I am not able to readily assign an ESG
label. Analog to the results in table 8 and 9, some topics can be considered highly distinctive
(topic 22, topic 16). While I am not able to provide a distinct ESG label to some of the reports,
they are more likely to be attributable to an ESG topic prior to a switch. One example is the
topic on corporate social responsibility, topic 13. While the topic includes words such as energy
or production, they cannot be directly related to the environmental dimension, primarily because
of other words such as ethicor risk. In table 10 I am also reporting the individual reports with
their largest contribution. Topic number 16 that I label as an environmental topic contains the
Electronic copy available at:
words emission’, ‘carbon’ but also fueland aircraft. The report contributes 36.39% to this
topic. Looking into this report, reveals that the company issuing this report before the switch is
a Swedish company in the airline industry. Similarly, Topic number 22 with words such as sea,
farming, ‘fish’ and salmon, is a Norwegian firm operating in the fishing sector. In summary,
it can be noted that the corpus, when constructed of stand-alone reports is more likely to contain
topics across the ESG dimensions, whereas it is less likely to contain topics around social issues
when the firm issues an integrated report.
5. Discussion and Conclusion
Although the NFRD aims to address the lack of materiality that users of sustainability
reports had previousely critizised it does not specify to what extent thematic contents have to
be represented in either one of the two reporting types. Focusing on materiality in the reporting
of non-financial information is considered an important as well as an effective remedy against
the threat of information overload and managers cherry picking information. In some cases,
especially when ESG is understood as a corporate response to societal demands, corporate
sustainability reporting takes on a voluntary nature, creating larger differences in the information
available to users of the disclosure (Steurer et al., 2012). While combining financial with non-
financial information in a single report it is said to improve corporate disclosure and
transparency, there is little empirical evidence, primarily from a textual analysis perspective
supporting this argument. This study uses topic modeling to address this issue and contribute
to the by analyzing 2,248 integrated and 3,567 stand-alone in terms of their thematic content
and textual attributes. In addition, I identify 158 instances where a company switches from
voluntarily providing a stand-alone sustainability report to issuing an integrated report. I find
that for a set of stand-alone sustainability reports, companies disclose relatively more topics
across all three ESG dimensions as to when they decide to issue their non-financial information
with their financial statements (i.e. produce an integrated report). When a company combines
non-financial with financial information, it is possible that the report is tailored to providers of
financial resources and therefore more likely to be read by investors than other stakeholders
Electronic copy available at:
(Alessandro et al., 2018). In addition, managers may be unaware or ignorant about a particular
piece of non-financial information (Matsumoto et al., 2011). This may lead companies to decide
not to disclose on specific topics, such as societal challenges.
5.1 Limitations and future research
Despite its advantages, the use of topic modeling is not without limitations. A primary
limitation, as well as other unsupervised learning methods is that of the optimal model selection
(Lewis and Grossetti, 2022). I rely on a single measure (coherence) to determine the optimal
number of topics. While visualizing the topics using LDAvis can help improve model selection,
I use it only for exploratory purposes. The manual label I provide for each topics, is not without
limitations either since this manual procedure is subject to my personal judgement (and error).
In some instances, it is difficult to provide a label based on an ESG dimension. When a topic
contains words such as employee, community but also water and climate it is unclear
whether to categorize the topic as an environmental or social topic. In the cases where I have
been unsure, I relied on the first words which generally have a higher proportion of describing
the topic. This limitation may be overcome by having other researchers label the topics. I also
accept, that the topics may not capture all the available information present in the reports. The
nature of the LDA takes only textual information as inputs and does not allow for tables and
figures to be used as input data. Figures, however, that stand out with colors may ensure that
key non-financial information is communicated in a visual appealing way (Eccles, 2010). Images
and graphs may also influence the perception of end-users and improve the usefulness for
stakeholders relying on this information. In addition, companies may use their corporate
websites to make information easily accessible to a broader audience. Here, too I am unable to
provide insight on how this information is presented and used by stakeholders. To further
understand the textual differences between different report types, further studies should aim at
taking all available disclosure channels to judge the effectiveness of corporate sustainability
information. In addition, finding ways to better interpret topic outcomes by applying wordlist
Electronic copy available at:
to compare the output seem to be promising. Baier et al., (2018) offer such as word list which
can be a starting point for further research.
5.2. concluding remarks
Under current EU regulations, firms with a certain size are required to disclose their
sustainability information in their management reports but are generally free to decide how they
do so or which guidelines to follow. Companies can either issue their non-financial information
with their financial statement or disclose a stand-alone sustainability report. Sustainability
reports in the EU can vary on a variety of dimensions, including length and information content
such as the topics discussed. Analyzing differences in sustainability disclosure is important
because they can have direct and indirect effects on stakeholders (Roberts, 1992). The
requirement to disclose material issues related to environmental policies and risk-management
is of particular relevance, not least because of its potential to trigger better internal sustainability
awareness, better resources' management and usefulness for end-users with little prior familiarity
with reading corporate disclosures (European Commission, 2017). Using topic modeling that
can generatively go beyond the conventional boundaries of qualitative analysis this study
provides insights on topics that, given the large number of topics may go undetected. Corporate
sustainability reports cover a wide range of topics, such as the company’s current and future
ESG activities, adherence to regulatory guidelines and their environmental and societal impact.
In this analysis, I show that firms producing an integrated report disclose less topics on ESG
matters and especially about social issues. In addition, I show that the effect could potentially
be stronger when a firm switches from a stand-alone report to an integrated reporting format.
Finally, the readability generally decreases after the switch which may leave users of sustainability
reports without the information they would need or desire.
Electronic copy available at:
Alessandro, L., Melloni, G., & Stacchezzini, R. (2018). Integrated reporting and narrative
accountability: the role of preparers. Accounting, Auditing & Accountability
Journal, 31.
AlSumait, L., Barbará, D., Gentle, J. E., & Domeniconi, C. (2009). Topic Significance
Ranking of LDA Generative Models. ECML/PKDD,
Atkins, D., Rubin, T., Steyvers, M., Doeden, M., Baucom, B., & Christensen, A. (2012).
Topic Models: A Novel Method for Modeling Couple and Family Text Data.
Journal of family psychology : JFP : journal of the Division of Family Psychology
of the American Psychological Association (Division 43), 26, 816-827.
Baier, P., Berninger, M., & Kiesel, F. (2018). Environmental, Social and Governance
Reporting in Annual Reports: A Textual Analysis. SSRN Electronic Journal.
Bao, Y., & Datta, A. (2014). Simultaneously Discovering and Quantifying Risk Types
from Textual Risk Disclosures. Management Science, 60, 1371-1391.
Baumüller, J. (2018). Ziele und Inhalte der nichtfinanziellen Berichterstattung. 2, 94.
Baumüller, J., & Schaffhauser-Linzatti, M.-M. (2018a). In search of materiality for
nonfinancial information—reporting requirements of the Directive 2014/95/EU.
NachhaltigkeitsManagementForum | Sustainability Management Forum, 26(1),
Baumüller, J., & Schaffhauser-Linzatti, M.-M. (2018b). In search of materiality for
nonfinancial information—reporting requirements of the Directive 2014/95/EU.
NachhaltigkeitsManagementForum | Sustainability Management Forum, 26.
Bellstam, G., Bhagat, S., & Cookson, J. A. (2021). A Text-Based Analysis of Corporate
Innovation. SPGMI: Compustat Fundamentals (Topic).
Blei, D., Ng, A., & Jordan, M. (2001). Latent Dirichlet Allocation (Vol. 3).
Brown, N., Crowley, R., & Elliott, W. (2019). What Are You Saying? Using topic to
Detect Financial Misreporting. Journal of Accounting Research, 58.
Casella, G., & George, E. I. (1992). Explaining the Gibbs Sampler. The American
Statistician, 46(3), 167-174.
Chae, B., & Park, E. (2018). Corporate Social Responsibility (CSR): A Survey of Topics
and Trends Using Twitter Data and Topic Modeling. Sustainability, 10(7), 2231.
Chen, Y., Rabbani, M., Gupta, A., & Zaki, M. (2017). Comparative text analytics via topic
modeling in banking.
Christensen, H., Hail, L., & Leuz, C. (2019). Adoption of CSR and Sustainability
Reporting Standards: Economic Analysis and Review. SSRN Electronic Journal.
Christensen, H. B., Hail, L., & Leuz, C. (2021). Mandatory CSR and sustainability
reporting: economic analysis and literature review. Review of Accounting Studies,
26(3), 1176-1248.
Commission, E. (2017). Non-financial Reporting Directive. Retrieved from Non-financial
Reporting Directive (
Cuomo, F., Gaia, S., Girardone, C., & Piserà, S. (2022). The effects of the EU non-
financial reporting directive on corporate social responsibility. The European
Journal of Finance, 1-27.
Electronic copy available at:
DiMaggio, P., Nag, M., & Blei, D. M. (2013). Exploiting affinities between topic modeling
and the sociological perspective on culture: Application to newspaper coverage of
U.S. government arts funding. Poetics, 41, 570-606.
Dinh, T., Husmann, A., & Melloni, G. (2021). The role of non-financial performance
indicators and integrated reporting in achieving sustainable value creation. In.
European Parliament: European Union, 2021.
Du, S., & Bhattacharya, C. B. (2010). Maximizing Business Returns to Corporate Social
Responsibility (CSR): The Role of CSR Communication. International Journal of
Management Reviews, 12.
du Toit, E. (2017). The readability of integrated reports. Meditari Accountancy Research,
25, 00-00.
Dye, R. A. (1985). Disclosure of Nonproprietary Information. Journal of Accounting
Research, 23(1), 123-145.
Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual
Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and
Economics, 64.
EC. (2017). Guidelines on non-financial reporting: Supplement on reporting climate-
related information. In: Official Journal of the European Union.
EC. (2021). Study on the non-financial reporting directive : final report. Publications
Eccles, R., Krzus, M., & Solano, C. (2019). A Comparative Analysis of Integrated
Reporting in Ten Countries. SSRN Electronic Journal.
Eccles, R. G. (2010). One report : integrated reporting for a sustainable strategy. John
Wiley & Sons.
EU. (2014). Directive 2014/95/EU of the European Parliament and of the Council of 22
October 2014 amending Directive 2013/34/EU as regards disclosure of non-
financial and diversity information by certain large undertakings and groups.
Official Journal of the European Union, 57.
European Commission, T. (2017). Directive 2014/95/EU: Impact assessment
accompanying the original proposal from the Commission. In: European
European Union, T. (2013). Directive 2013/34/EU of the European Parliament and of the
Council of 26 June 2013 on the annual financial statements, consolidated financial
statements and related reports of certain types of undertakings. In (Vol. 56). Official
Journal of the European Union: European Union.
European Union, T. (2014). Directive 2014/95/EU of the European Parliament and of the
Council of 22 October 2014 amending Directive 2013/34/EU as regards disclosure
of non-financial and diversity information by certain large undertakings and groups.
Official Journal of the European Union, 57.
Flower, J. (2015). The International Integrated Reporting Council: A story of failure.
Critical Perspectives on Accounting, 27, 1-17.
Garst, J., Maas, K., & Suijs, J. (2022). Materiality Assessment Is an Art, Not a Science:
Selecting ESG Topics for Sustainability Reports. California Management Review,
Goloshchapova, I., Poon, S.-H., Pritchard, M., & Reed, P. (2019). Corporate social
responsibility reports: topic analysis and big data approach. The European Journal
of Finance, 25, 1-18.
Graham, S., Weingart, S., & Milligan, I. (2012). Getting Started with Topic Modeling and
MALLET. The Programming Historian.
Electronic copy available at:
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic
representation. Psychological review, 114 2, 211-244.
Hagen, L. (2018). Content analysis of e-petitions with topic modeling: How to train and
evaluate LDA models? Information Processing & Management, 54(6), 1292-1307.
Hoberg, G., & Lewis, C. (2017). Do fraudulent firms produce abnormal disclosure?
Journal of Corporate Finance, 43, 58-85.
Huang, A., Lehavy, R., Zang, A., & Zheng, R. (2017). Analyst Information Discovery and
Interpretation Roles: A Topic Modeling Approach. Management Science, 64.
Huang, A. H., Lehavy, R., Zang, A. Y., & Zheng, R. (2018). Analyst Information
Discovery and Interpretation Roles: A Topic Modeling Approach. Manag. Sci., 64,
Humphrey, C., O’Dwyer, B., & Unerman, J. (2017). Re-theorizing the configuration of
organizational fields: the IIRC and the pursuit of ‘Enlightened’ corporate reporting.
Accounting and Business Research, 47(1), 30-63.
IIRC. (2013). The International <IR> Framework. In: International Integrated Reporting
Israelsen, R. (2014). Tell It Like It Is: Disclosed Risks and Factor Portfolios. SSRN
Electronic Journal.
Jugran, S., Kumar, A., Tyagi, B., & Anand, V. (2021). Extractive Automatic Text
Summarization using SpaCy in Python & NLP.
Jurafsky, D., & Martin, J. H. (2021). N-gram Language Models. In (3 ed.).
Khyani, D., & B S, S. (2021). An Interpretation of Lemmatization and Stemming in
Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of
University of Shanghai for Science and Technology, 22, 350-357.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., . . . Adam,
S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a
Valid and Reliable Methodology. Communication Methods and Measures, 12, 1-
Maniora, J. (2017). Is Integrated Reporting Really the Superior Mechanism for the
Integration of Ethics into the Core Business Model? An Empirical Analysis.
Journal of Business Ethics, 140.
Matsumoto, D., Pronk, M., & Roelofsen, E. (2011). What Makes Conference Calls Useful?
The Information Content of Managers' Presentations and Analysts' Discussion
Sessions. The Accounting Review, 86(4), 1383-1414.
McCallum, A. K. (2002). MALLET:A Machine Learning for Language Toolkit.
Melloni, G., & Stacchezzini, R. (2014). Corporate Sustainable Development: Is “Integrated
Reporting” a Legitimation Strategy? Business Strategy and the Environment,
accepted for publication on the.
Milla, A., & Haberl-Arkhurst, B. (2018). Wesentlichkeitsanalyse in der nichtfinanziellen
Berichterstattung. In. RWZ.
Mimno, D., Wallach, H. M., Talley, E. M., Leenders, M., & McCallum, A. (2011).
Optimizing Semantic Coherence in Topic Models. Conference on Empirical
Methods in Natural Language Processing,
Mitchell, R. K., Agle, B. R., & Wood, D. J. (1997). Toward a Theory of Stakeholder
Identification and Salience: Defining the Principle of who and What Really Counts.
Academy of Management Review, 22, 853-886.
Electronic copy available at:
Mohr, J., & Bogdanov, P. (2013). Introduction-Topic Models: What They Are and Why
They Matter. Poetics, 41, 545–569.
Neumann, B. R., Cauvin, E., & Roberts, M. L. (2012). Management Control Systems
Dilemma: Reconciling Sustainability with Information Overload. In M. J. Epstein
& J. Y. Lee (Eds.), Advances in Management Accounting (Vol. 20, pp. 1-28).
Emerald Group Publishing Limited.
Nguyen, E. (2014). Text Mining and Network Analysis of Digital Libraries in R.
Rupley, K. H., Brown, D., & Marshall, S. (2017). Evolution of corporate reporting: From
stand-alone corporate social responsibility reporting to integrated reporting.
Research in Accounting Regulation, 29(2), 172-176.
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics.
Simnett, R., & Huggins, A. (2015). Integrated reporting and assurance: Where can research
add value? Sustainability Accounting, Management and Policy Journal, 6, 29-53.
Stawinoga, M. (2017). Die Richtlinie 2014/95/EU und das CSR-Richtlinie-
Umsetzungsgesetz – Eine normative Analyse des Transformationsprozesses sowie
daraus resultierender Implikationen für die Rechnungslegungs- und Prüfungspraxis.
uwf UmweltWirtschaftsForum, 25, 213-227.
Stawinoga, M., & Velte, P. (2017). Empirical evidence of the disclosure and assurance of
Integrated Reporting - A content analysis of the IIRC Examples Database.
Zeitschrift für Umweltpolitik und Umweltrecht (ZfU), 40, 59-84.
Steurer, R., Martinuzzi, A., & Margula, S. (2012). Public Policies on CSR in Europe:
Themes, Instruments, and Regional Differences. Corporate Social Responsibility
and Environmental Management, 19.
Stone, G. W., & Lodhia, S. (2019). Readability of integrated reports: an exploratory global
study. Accounting, auditing and accountability., 32(5), 1532-1557.
Székely, N., & Brocke, J. v. (2017). What can we learn from corporate sustainability
reporting? Deriving propositions for research and practice from over 9,500
corporate sustainability reports published between 1999 and 2015 using topic
modelling technique. PLOS ONE, 12.
Tirunillai, S., & Tellis, G. (2014). Mining Marketing Meaning from Online Chatter:
Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation. Journal of
Marketing Research, 51, 463-479.
Vitolla, F., & Raimo, N. (2018). Adoption of Integrated Reporting: Reasons and Benefits-
A Case Study Analysis. International Journal of Business and Management, Vol.
13, 244-250.
Yao, L., Mimno, D., & McCallum, A. (2009). Efficient methods for topic model inference
on streaming document collections. KDD,
Řehůřek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large
Electronic copy available at:
Table 1: Variable definition
Variable name
Company number of the data provider that is used
A report number provided by Corporate Register that identifies each report.
The company name of the firm that issues a report
The country in which the reporting entity is headquartered.
The International Securities Identification Number of a company that I use to
match each observation with fundamental information of the issuing firm.
Industry Classification Benchmark (ICB) of the firm provides by Corporat
The title oft he report
Whether a report is an integrated or a stand-alone report.
The year in which the report was published.
Binary variable equal to one if a firm adheres to the Global Reporting Initiative
(GRI) standard and zero otherwise
Binary variable equal to one if a firm adheres to the international integrated
reporting (IIRC) standard and zero otherwise
Whether the report was audited by an independent third party
The file name oft he report used to identify each report.
The file version of the textual file.
The language of the report
Notes: The table provides an overview of the variables that are used in the study, together with a description.
Electronic copy available at:
Table 2: Integrated and standalone sustainability reports by country
Notes: The table shows the absolute number of integrated and standalone reports in the sample, grouped by country. For
the countries I use three-character codes defined by the International Organization for Standardization (ISO).
Electronic copy available at:
Table 2.2: Summary Statistics – fundamental firm information
Panel A: integrated
T. Assets
Panel B: stand-alone
T. Assets
Notes: The table provides fundamental firm observation of the sample. Panel A shows the fundamental annual
information for integrated reports and Panel B shows the information for stand-alone observations. Fundamental annual
information is taken from Compustat Global by matching the firms using their isin’s. Total assets are in millions.
Table 3: Frequently occurring words by report type
Integrated Reports
Standalone Reports
financial 79,931
report 12,1380
value 46,752
year 59,615
board 44,263
risk 57,787
statement 43,719
use 57,533
Notes: The table shows the ten most frequently occurring words in the sample, separated for integrated and stand-alone
reports. Computing the most occurring words helps me to extend the list of stop-words and improve the results. . As
expected, the most occurring words in integrated reports are largely financial terms such as ‘value’, ‘risk’,
‘statement’ or ‘assets’. In contrast, words such as ‘employee’, ‘sustainability’ and ‘use’ feature most prominently
in stand-alone reports. I compute the high frequency words
Electronic copy available at:
Table 4: Integrated and stand-alone sustainability reports by industry
583 (38%)
949 (62%)
800 (52%)
486 (43%)
655 (57%)
589 (52%)
Consumer Goods
293 (35%)
539 (65%)
464 (56%)
Basic Materials
238 (42%)
328 (38%)
292 (52%) 566
Consumer Services
208 (40%)
316 (60%)
271 (51%)
105 (35%)
196 (65%)
164 (54%)
Oil & Gas
76 (32%)
162 (68%)
136 (57%)
82 (31%)
181 (69%)
138 (52%)
Health Care
95 (42%)
130 (58%)
107 (48%)
79 (42%)
107 (58%)
135 (73%) 186
3 (75%)
4 (25%)
6 (86%)
3102 (53%)
Notes: The table shows the absolute number of integrated and standalone reports by industry. In addition, column
four shows the absolute number of reports that are verified by an external assurer. A visual representation can be
found in the accompanying appendix 1.
Exhibit 1: Latent Dirichlet, Gibbs Sampling and MALLET
Latent Dirichlet Allocation is a topic modeling technique and unlike Latent Semantic Analysis, a fully
generative model, where documents are assumed to have been generated according to a per-document
topic distribution (with a Dirichlet prior) and per-topic word distribution (Řehůřek & Sojka, 2010).
Instead of using these distributions to generate random documents, the objective is to infer the
distributions from the observed documents. A document is a sequence of N words denoted by wn =
(w1,w2,...wN). Where wN is the nth word in the sequence. A corpus is a collection of M documents
denoted by D = (w1,w2,...wN. ). LDA is generally defined as follows:
(,,,,,)= (;
 )(;
 )(, |
 ), | ,
Where and define Dirichlet distributions  define multinomial distributions. Z is the
vector with topics of all words in all documents. The LHS represents the total probability of the LDA
model. The first product over P() denotes the Dirichlet distribution of topics over terms, the second
product over P() the distribution over topics. The third product denotes the probability of a topic
appearing in a given document (,
) and the probability of a word appearing given a topic
, | ,
The Gibbs sampler as is used in the MALLET implementation is a technique for generating random
variables from a (marginal) distribution indirectly, without having to calculate the density (Casella &
George, 1992). In essence, Gibbs sampling is used to avoid difficult computations by replacing them
with a series of simpler calculations. All categorical variables that are dependent on a certain Dirichlet
prior are brought into dependence, and the resulting joint distribution of these variables is a Dirichlet-
multinomial distribution. In this distribution, the conditional distribution of a particular categorical
variable, conditioned on the others, acquires a very basic form, which makes Gibbs sampling much
simpler than it would be otherwise.
Electronic copy available at:
MALLET is used based on the idea of document streaming or the processing of corpora document
after document, in a memory independent fashion (Řehůřek & Sojka, 2010). In addition to the output
of the topic model, the java-based framework offers diagnostic data, such as topic-specific word
distribution in relation to corpus distribution (AlSumait et al., 2009). These quantitative traits can
supply extra information to help trace back topics to their documents and is used in subsequent
analyses22. In other words, from the output it can be referred which document contributed the largest
proportion to a specific topic. This is also helpful for inspecting reports that have a disproportionately
large influence on a topic or may be inconsistent in ambiguous circumstances to be labelled
accordingly. Exhibit 1 in the appendix provides an overview of the technical definition of how topics
are computed.
Exhibit 2: Readability indices
Gunning Fog index: A standard readability test to determine how easily a document can be read for its
target audience is the Gunning Fog Index. The Gunning fog index measures the readability of English
writing. The index estimates the years of formal education needed to understand the text on a first
reading. A fog index of 12 requires the reading level of a U.S. high school senior (around 18 years old).
The Gunning Fog index is defined as follows:
  = 0.4  
+100  
 
Where complex words are those that have three or more syllables but are not compound words.
Compound words are made up of two or more individual words. For instance, the term
"multinational" combines the words "multi" and "national.”
Flesch Reading Ease: The Flesch Reading Ease gives a text a score between 1 and 100, with 100 being
the highest readability score. The Flesch readability tests work by considering sentence and word
counts. The mathematical formula underlying the test is as follows:
   = 206.835 1.015   
  84.6  
  
22 I use this the tracing back of topics primarily for exploratory purposes.
Electronic copy available at:
Figure 2: Inter-topic Distance Map
Panel A
Panel B
Notes: the figure shows a visual representation on the topic overlap used to identify the optimal number of topics (Panel
A). The circles represent the marginal topic distribution Panel B shows the top-30 most relevant terms for a topic and
include a representation of the overall term frequency and the estimated term frequency within the selected topic.
Figure 3: Coherence scores for stand-alone and integrated reports
Panel A
Panel B
Notes: The figure shows coherence scores for integrated (Panel A) and stand-alone reports (Panel B) for 85 LDA models
with a varying number of topics. The optimal number of topics identified for stand-alone reports is 39 and for integrated
report 54. To better gauge the relative degree of coherence across the topic results, I add a marker at a coherence score of
0.42 for comparison
Electronic copy available at:
Figure 4: Coherence scores for switching firms
Panel A
Panel B
Notes: The figure shows coherence scores for switching first pre (Panel A) and post switch (Panel B) for 100 different LDA
models and 158 instances. A pre-switch firm generally issues a standalone report while a post-switch is recognized as an integrated
report. To better gauge the relative degree of coherence across the topic results, I add a marker at a coherence score of 0.42 for
Table 8: Topic outcomes for 3,567 stand-alone reports
Topic Terms per Topic ESG label
policy, compliance, process, ensure, conduct, standard, assessment, training, health, human, code, approach, performance, provide,
operation, review, key, site, local, supply, stakeholder, level, engagement, practice, base, requirement, support, identify, set, team, chain,
internal, princ iple, anti, water, issue, development, responsible, regulation
people, support, climate, target, sustainable, carbon, strategy, change, make, continue, community, ensure, focus, provide, approach,
deliver, create, develop, future, progress, reduce, action, opportunity, build, goal, lead, performance, issue, increase, commitment,
challenge, engage, achieve, team, enable, reduction, set, positive, area
conduct, responsibility, operation, service, reduce, development, good, target, corporate, number, base, environment, code, increase,
country, solution, consumption, area, principle, goal, focus, important, sustainable, develop, make, percent, responsible, personnel, sale,
unit, improve, offer, financial, part, efficiency, cent, chain, term, aim
approach, topic, annual, governance, disclosure, index, standard, information, financial, performance, statement, stakeholder, social,
content, corporate, human, explanation, economic, compliance, evaluation, reporting, key, component, relate, assurance, section, esg,
assessment, climate, principle, responsible, sustainable, policy, board, number, strategy, engagement, operation, website
No clear assignment
key, environment, figure, people, waste, meet, market, social, close, chain, number, external, performance, culture, raw, overview, integrity,
mercede, sector, training, corporate, introduction, indicator, generate, project, stakeholder, topic, benz, online, consumption,
responsibility, community, basis, daimler, sale, methodology, document
vehicle, road, mobility, plant, production, car, market, fuel, development, area, design, project, process, target, system, model, engine,
develop, training, drive, brand, improve, stakeholder, service, approach, worldwide, reduce, component, waste, electric, traffic, accident,
provide, base, local, infrastructure, construction, leve l, solution
assure, applicable, corporate, aspect, compliance, responsibility, ist, shop, basis, mit, metsoo, ber, hat, nachhaltigkeit, rahmen, oberbank,
dabei, governance, thema, download, nachhaltige, durch, fileadmin, indikatoren, man, checkout, sich No clear assignment
patient, health, quality, pharmaceutical, healthcare, site, country, care, number, medical, disease, medicine, treatment, waste, water,
research, clinical, responsibility, activity, production, study, base, environment, drug, improve, service, supply, corporate, pharmacy,
develop, animal, relevant, sector, plant, people, information, human, home, science
building, construction, consumption, portfolio, property, estate, office, tenant, project, real, number, area, development, home, colonial,
build, approach, term, service, unit, water, housing, sus tainable, performance, residential, s takeholder, social, h eating, high, meas ure, long,
site, space, key, board, client, intensity, electricity, increase
No clear assignment
program, site, organization, labor, water, environment, corporate, reduce, waste, performance, local, development, initiative, end, chain,
manufacturing, ton, supply, process, improve, design, social, people, recognize, health, quality, responsibility, increase, consumption,
solution, standard, center, innovation, rate, efficiency, high, sustainable, assessment, number
centre, waste, asset, water, consumption, portfolio, shopping, performance, carbon, electricity, retail, tonne, scope, corporate, positive,
target, number, community, intensity, tenant, development, table, area, fuel, local, office, continue, reduction, annual, section, park,
engagement, change, manage, net, landlord, project, place, base
flight, fuel, aircr aft, airport, service, clh, airline, passenger, engine, section, security, operation, aviation, number, fleet, system, quality,
social, ground, project, traffic, noise, term, maintenance, environment, diversity, route, base, efficiency, increase, plan, long, mtu,
stakeholder, cargo, human, goal, lufthansa, government
oil, operation, performance, water, operate, local, production, carbon, environment, appendix, tonne, contractor, community, offshore,
activity, project, manage, health, asset, operational, spill, low, produce, industry, reduce, human, ship, programme, potential, vessel,
number, area, climate, site, sea, petroleum, process, provide, fuel
exhibition, event, site, service, waste, security, space, soltec, procedure, avio, exhibitor, activity, visitor, number, water, system, relate,
information, manage, sector, plan, provide, fiera, female, organise, stand, area, control, generate, male, development, order, follow, design,
client, corpo rate, mobility, lead, health
disclosure, aspect, significant, standard, number, principle, organization, approach, indicator, percentage, operation, note, social,
governance, human, high, economic, activity, body, category, relate, performance, specific, policy, manage, community, type, service,
practice, gender, country, concern, grievance, stakeholder, mechanism, information, water, criterion, local
tower, totale, informazioni, ruo, persone, responsabilit, genere, due, modello, servizi, rapporto, dal, allo, sociale, rete, sistema, essere, era,
propri, possono, governance, presente, tale, time, turnover, relativi, sede, diverse, agam, standard, questi, salute, processo, aree, clienti,
fonti, procedure, contratto, doof
No clear assignment
packaging, sustainable, chain, paper, water, circular, recycle, make, production, waste, store, policy, supply, change, renewable, fair,
standard, strategy, brand, plastic, factory, raw, social, good, goal, design, base, industry, source, equal, lead, approach, process, operation,
performance, textile, fibre, forest
people, store, brand, hear, quality, care, design, hearing, make, pattern, production, market, offer, experience, consumption, initiative,
social, high, consumer, approach, relate, support, promote, order, sale, create, country, good, talent, world, stakeholder, fashion, base,
manufacturer, professional, activity, retail, aspect, start
Electronic copy available at:
fish, production, forest, feed, salmon, focus, important, water, reduce, good, produce, high, increase, facility, area, project, result,
development, health, make, raw, industry, farm, quality, food, large, operation, environment, develop, farming, day, consumer, ensure,
effort, part, number, target, pulp, process
registration, document, home, service, communication, csr, connect, externally, pour, technicoloro, site, information, assure, odd,
compact, entertainment, travail, division, universal, box, annexe, performance, directive, mission, rse, rapport, vigilance, film, party,
condition, durable, plan, talent, politique, ment, challenge
No clear assignment
community, water, local, operation, development, mining, project, mine, health, stakeholder, programme, environment, plan, site, social,
number, area, contractor, level, performance, training, continue, provide, government, engagement, economic, biodiversity, workforce,
increase, incident, waste, process, sustainable, relate, standard, manage, closure, change, operate
service, digital, security, network, solution, information, base, offer, market, internet, mobile, innovation, communication, experience,
platform, privacy, development, number, people, performance, user, team, key, provide, stakeholder, training, consumption, office,
protection, program, share, skill, organization, create, project, infrastructure, time, device
No clear assignment
plant, water, electricity, power, production, responsibility, waste, renewable, project, process, indicator, social, fuel, relate, economic, base,
target, governance, generation, supply, country, development, performance, source, sustainable, unit, efficiency, produce, change,
industrial, climate, local, natural, area, site, generate, level, system
compliance, measure, board, corporate, process, protection, standard, training, system, responsibility, requirement, part, information,
development, topic, time, financial, addition, location, make, figure, basis, strategy, activity, consumption, offer, area, key, social, ensure,
conduct, base, important, relevant, order, support, target, result
production, site, chemical, raw, process, water, groupos, section, development, performance, base, sustainable, waste, reduce, innovation,
number, financial, environment, substance, industrial, market, plant, increase, progress, result, high, relate, sale, develop, application,
information, health, measure, communication, internal, target, area, social, document
corporate, social, responsibility, policy, commitment, information, action, plan, director, activity, compliance, training, csr, chapter, area,
development, project, responsible, model, environment, process, main, measure, good, professional, ethic, committee, promote, service,
quality, make, communication, internal, governance, health, financial, remuneration, woman, implement
appendix, supply, people, network, chain, service, country, mobile, programme, sustainable, market, health, number, woman, introduction,
information, operate, provide, skill, local, human, principle, integrity, practice, base, job, government, communication, privacy, policy ,
digital, conduct , reduce, high, time, law, site, enable, code
service, project, activity, area, system, carry, involve, aim, economic, plan, initiative, regard, make, order, result, increase, internal, level,
term, event, issue, objective, training, meeting, network, structure, follow, provide, board, local, process, aspect, social, communication,
time, sector, offer, specific, director
investment, financial, esg, client, service, asset, policy, banking, sustainable, responsible, fund, market, principle, corporate, credit,
portfolio, information, support, climate, social, base, insurance, provide, term, loan, change, relate, capital, offer, sector, tax, strategy,
board, private, number, governance, framework, activity, annual
No clear assignment
production, water, plant, site, solution, raw, reduce, high, target, consumption, process, waste, steel, performance, increase, building,
accident, machine, result, cement, specific, local, quality, training, level, unit, reduction, market, health, measure, recycle, number, concrete,
indicator, efficiency, pipe, stakeholder, time, ton
food, water, source, waste, sustainable, consumer, oil, palm, supply, programme, production, crop, ingredient, plant, chain, raw, packaging,
reduce, area, good, farmer, natural, operation, development, tonne, make, practice, healthy, site, agricultural, nutrition, biodiversity, local,
agriculture, produce, health, commitment, animal, sourc ing
plan, tobacco, farmer, progress, store, aim, assurance, sustainable, achieve, programme, standard, cocoa, continue, bat, food, supply,
social, consumer, performance, source, reduction, chain, commitment, harm, community, provide, support, governance, stakeholder,
develop, operation, health, maison, child, review, human, independent, policy
No clear assignment
medium, sport, content, paper, woman, man, digital, information, responsibility, gaming, book, scope, advertising, protection, sale, game,
social, figure, corporate, online, editorial, responsible, gambling, magazine, manager, brand, event, female, bet, prisa, male, user, relate,
radio, print, office, publish
No clear assignment
corporate, responsibility, people, community, support, programme, social, provide, make, continue, environment, initiative, education,
performance, project, good, responsible, annual, skill, school, child, staff, training, opportunity, governance, young, award, develop, team,
world, issue, local, day, student, office, ensure, charity, create, development
production, commitment, facility, client, quality, chain, passion, supply, make, reduce, waste, food, packaging, tomato, process, marke t,
fruit, number, base, creval, water, produce, issue, line, ensure, raw, economic, aspect, control, time, source, complaint, pulse, system,
worker, order, term
financial, activity, relate, approach, training, process, compliance, policy, woman, man, decree, statement, information, topic, adopt,
model, corruption, control, procedure, specific, reference, regulation, base, standard, system, hour, main, code, regard, legislative, worker,
director, health, consumption, issue, aim, order, personnel
Notes: The table shows the topic outcomes for 3,567 stand-alone reports for the sample period 2015 to 2022. In the last
column I manually label the topic with respect to their environmental, social or governance affiliation. I can link 17 topics
to ‘Envir
onmental’ 7 topics to ‘Social’, 3 topics to ‘Governance’. For 18 topics I am not able to readily assign an ESG
label. In general, if the first 20 words contain either one of the three ESG dimension I label the topic accordingly. For all
other that do not contain one of the three keywords, I look for words close to the three pillars and assign the label, accordingly.
The order of the topics has no significance.
Table 9: Topic outcomes for 2,248 integrated reports
Topic Dominant file
Topic Terms ESG label
1 96270-17In-
solution, business, growth, employee, service, customer, digital, integrate, offer, strategy,
market, support, investment, innovation, create, make, network, term, develop,
development, transformation, work, model, long, project, director, sector, country, revenue,
major, team, shareholder, performance, csr, expertise, provide, design, key
No clear ESG assignment
2 74451-15In-
16379220O2994568122 0X-
director, document, registration, information, meeting, compensation, executive, product,
officer, issue, committee, employee, plan, sale, corporate, capital, general, number, chief,
control, chairman, security, hold, set, grant, article, annual, agreement, social, resolution, net,
performance, account, term, site, concern, reference, note
3 162933-22In-
35682327W34329983 1T-
director, remuneration, corporate, annual, governance, information, independent, strategic,
line, customer, structure, verificatio n, identity, social, consolidate, committee, responsib le,
policy, member, include, banking, shareholder, general, impact, section, plan, business,
service, activity, cmr, meeting, environmental, promote, consolidated, commitment,
principle, relate, executive, hold
4 79849-15In-
water, customer, performance, service, annual, regulatory, work, supply, measure, include,
business, continue, deliver, plan, provide, cost, charge, expenditure, level, improve, tax,
support, reduce, network, end, area, treatment, programme, household, utility, underlie ,
impact, sewer, approach, debt, target, scheme, environment
5 163938-22In-
energy, electricity, power, project, market, plant, renewable, service, offshore, generation,
annual, cost, price, supply, recognise, percent, increase, customer, capacity, turbine,
distribution, revenue, activity, programme, grid, expect, development, joint, construction,
network, net, production, continue, tax, contract, change, investment, solution
No clear ESG assignment
Electronic copy available at:
6 103588-18In-
bam, performance, project, construction, executive, cent, amount, result, recognise,
information, supervisory, property, include, contract, cash, joint, tax, rate, interest,
development, member, cost, integrate, benefit, relate, base, remuneration, ppp, creation, part,
safety, plan, material, venture, governance, liability, current, net, general
7 103718-18In-
24166294F195155788 8H-
store, ahold, continue, food, sale, performance, brand, annual, business, review, income,
product, sustainable, sustainability, net, plan, governance, customer, local, associate, include,
operation, member, retail, retailing, world, operate, information, lease, support, merger,
shareholder, number, underlie, base, common, program, relate, cash
8 108083-18In-
insurance, life, investment, policy , business, interest , product, liability, contract, market ,
equity, solvency, customer, performance, pension, include, property, benefit, fair, loss, fund,
executive, premium, portfolio, rate, claim, result, level, capital, employee, real, change,
operating, recognise, profit, disability, relate, service, segment
No clear ESG assignment
9 70894-15In-
20559260D134500096 80W-
loan, customer, capital, credit, interest, income, amount, loss, cent, operation, market,
branch, instrument, liability, security, net, fund, equity, annual, change, term, profit, tier,
balance, cost, corporate, institution, business, requirement, pension, exposure, local, note,
sheet, employee, rate, service, provision
No clear ESG assignment
10 162805-22In-
35003075P149103331 20A-
supervisory, annual, service, tax, customer, remuneration, governance, revenue, information,
employee, result, audit, include, imp act, policy, business , adjust, solution, cash, base,
performance, operation, sale, note, relate, cost, content, compliance, social, dutch, target,
lease, corporate, table, part, recognize, product, position, shareholder
11 142651-21In-
36518656W24055238 130L-
note, chapter, create, climate, liability, insurance, content, investment, income, relate, letter,
measure, commitment, result, claim, employee, operation, forsikre, figure, base, chair,
customer, add, ceo, performance, loss, key, consolidate, expense, capital, engage, alternative,
responsible, pension, provision, account, equity, pass, society
12 92599-17In-
24538735T2667962388 T-
customer, business, corporate, service, social, economic, environmental, gri, governance,
activity, plan, director, model, challenge, euro, commitment, dimension, popular, main,
banking, policy, channel, employee, information, action, performance, product, integrate,
result, complian ce, capital, area, tr aining, initiative, supplier, program me, good, quality, sec tor
13 73447-15In-
18802432W22293661 698L-
production, oil, activity, price, operation, project, reserve, exploration, development, net,
result, natural, due, sale, supply, cash, field, operating, business, annual, profit, market, plan,
expenditure, increase, capital, cost, corporate, energy, flow, continue, term, include,
approximately, segment, start, low, operate
31346168T2299658620 0S-
car, mobility, solut ion, recognise , vehicle, rental, csr, service, passenger, workshop, bed,
repair, brand, good, lease, fleet, body, miljoner, auto, light, commercial, dealer, man,
discontinue, branch, ret, kapital, anst
No clear ESG assignment
15 71717-15In-
20295911R28441384 426F-
performance, product, annual, production, sale, steel, plant, growth, crop, business, review,
market, site, program, good, water, stock, continue, increase, target, solution, raw, cement,
work, local, deve lopment, employee, cash, seed, focus, energy, research, environmental, high,
safety, future, plan, impact, improve
16 125037-20In-
33759990Y752372636 4N-
director, executive, committee, annual, business, performance, audit, continue, include, cash,
tax, policy, cost, remuneration, review, strategic, groupos, recognise, profit, shareholder, net,
note, scheme, provide, interest, governance, award, period, term, rate, plan, liability, key, fair,
set, account, benefit, ensure, employee
17 146476-21In-
cent, loan, pension, customer, annual, market, rate, interest, income, loss, equity, change,
profit, capital, liability, cost, fair, investment, credit, note, bond, sustainability, account,
guarantee, net, director, tax, corporate, insurance, fund, include, scheme, cash, security,
portfolio, return, relate, base
No clear ESG assignment
18 88126-16nn-
24234650C1244277431 8R-
bus, rail, ahead, franchise, contract, cost, passenger, service, operating, change, performance,
revenue, shareholder, relate, pension, information, liability, profit, end, customer, increase,
operate, account, payment, plan, kingdom, regional, make, local, key, future, award, interest,
provide, provision, improve, lease, review
No clear ESG assignment
19 81117-16In-
21901590F152175492V -
performance, business, product, sustainability, program, committee, growth, plan, market,
member, include, chemical, continue, coating, process, percent, benefit, material,
supervisory, safety, level, energy, paint, note, audit, income, site, review, carbon, key,
supplier, improvement, ton, chain, operation, raw, million, str ategic
No clear ESG assignment
20 148398-21In-
42441828Y558762989 40L-
usd, annual, interest, vessel, current, income, tax, liability, note, account, cash, colour,
expense, lease, rate, service, relate, operating, derivative, net, market, profit, currency, parent,
debt, fair, equity, include, recognise, pension, remuneration, flow, plan, balance, base,
business, bear, impairment
No clear ESG assignment
21 71373-15In-
16273044E251375706 0Q-
medium, service, business, development, content, revenue, advertising, digital, operation,
responsibility, employee, director, project, growth, customer, corporate, meur, information,
change, segment, member, develop, responsible, sale, increase, online, finnish, team,
marketing, target, market, operating, cent, relate, unit, programme, base, figure, work
No clear ESG assignment
22 110434-18In-
22859838B6739566152G -
service, director, annual, employee, cost, remuneration, customer, mail, benefit, recognize,
eur, increase, parcel, committee, impact, amount, income, cash, operating, business, activity,
ceo, net, acquisition, pay, revenue, subsidiary, corporate, consolidate, gain, logistic, decrease,
trade, day, plan, relate, loss
23 152569-21In-
service, euro, satellite, concession, follow, communication, director, end, road, amount,
information, indicator, result, income, contract, activity, note, tax, investment, mobility,
operation, infrastructure, business, client, vehicle, network, traffic, associate, record, relate,
impact, area, provide, change, revenue, universal, remuneration, current, capacity
No clear ESG assignment
24 161091-21In-
47199663F119159012 70O-
employee, result, make, change, information, term, time, number, investment, corporate,
include, base, high, capital, income, provide, work, market, long, due, business, member,
follow, process, increase, position, issue, governance, reporting, interest, meeting, internal,
development, conduct, focus, shareholder, give, period, external
25 150386-21ea-
euro, director, annual, store, product, lease, rate, supplier, brand, turnover, cost, material, gri,
impact, net, change, recognize, production, taxis, refer, plan, groupos, exchange, separate,
thousand, liability , cash, follow, re sult, relate, curren t, period, trade, revenue, option,
represent, raw, good, income
No clear ESG assignment
26 114686-19In-
24198746E138724185 6R-
property, interest, investment, tax, income, annual, portfolio, rate, change, development,
rental, office, logistic, rent, lease, net, area, project, cost, energy, tenant, work, building, note,
valuation, liabilit y, city, yield, ra tio, increase, expen se, derivative, mar ket, percent, accoun t,
location, acquisition, term, space
No clear ESG assignment
27 157512-21In-
43630824U5742099960 0O-
employee, project, energy, integrate, environmental, emission, material, sustainability, safety,
supplier, work, environment, stakeholder, process, area, operation, waste, community, water,
impact, development, activity, include, health, approach, business, gri, trainin g, social,
people, sustainable, action, standard, governance, local, reduce, initiative, consumption,
28 155224-21In-
mining, mine, production, operation, annual, mineral, cost, gold, african, project, increase,
capital, price, integrate, metal, groupos, strategic, reserve, evander, ore, performance, plant,
barberton, cash, tonne, safety, term, grade, zar, impact, coal, stakeholder, platinum, result,
life, ebitda, high , development, alloy
Electronic copy available at:
29 110140-18In-
27535000D146871690 00X-
annual, director, product, service, business, chf, performance, executive, compensation,
software, employee, usd, client, member, recognize, market, revenue, audit, governance,
operate, growth, solution, continue, customer, security, committee, office, responsibly,
target, overview, shareholder, program, base, banking, officer, organization, region, meeting,
30 143637-21In-
40505634F473283915G -
consumer, director, brand, option, stock, growth, annual, beer, inbev, include, program,
member, performance, grant, water, market, shareholder, volume, officer, remuneration,
note, continue, business, support, restrict, unit, executive, grow, chief, country, world,
beverage, launch, base, hold, term, community, packaging, local
No clear ESG assignment
31 148633-21In-
41171341R32577380 94E-
patient, health, product, care, healthcare, disease, medical, medicine, program, sale,
treatment, country, research, development, pharmaceutical, version, augment, science,
clinical, organization, compensation, change, performance, support, image, include,
approach, quality, life, business, site, safety, impact, strategy, combine, innovation, child,
No clear ESG assignment
32 145399-21In-
43183503Y376496170 60R-
customer, business, support, review, banking, service, continue, include, client, increase,
annual, digital, st rategic, colleague , focus, provide, ex perience, impact , strategy, performance,
change, work, deliver, product, make, key, stakeholder, create, cent, market, low, executive,
strong, responsible, sustainable, plan, ensure, model, growth
33 81319-16In-
22688001B7239830570S -
product, food, executive, tax, cash, market, annual, sustainability, organic, brand, plan, net,
trade, operation, performance, fair, recognise, supervisory, healthy, feed, income, quality,
result, base, information, review, relate, commodity, business, amount, acquisition, supplier,
foreign, policy, audit, part, integrate, expense
No clear ESG assignment
34 119123-19In-
32163210Y358393457 8N-
annual, cash, acc ount, liability, sustainability, profit, parent, customer, work, market, note,
director, number, operation, tax, employee, audit, sale, cost, meeting, expense, amount,
business, flow, term, acquisition, remuneration, balance, current, base, loss, general, policy,
shareholder, net, earning
No clear ESG assignment
35 71860-15In-
16240360C2286290574 0J-
liability, amount, cash, income, loss , cost, interest, tax, ra te, include, fair, re late, profit, net,
expense, note, current, sale, base, term, impairment, consolidated, plan, change, period,
contract, date, benefit, flow, equity, end, payment, currency, transaction, follow, provision,
instrument, mate rial, hedge
No clear ESG assignment
Notes: The table shows the topic outcomes for 2,248 integrated reports for the sample period 2015 to 2022. In the last
column I manually label the topic with respect to their environmental, social or governance dimension. I can link 6 topics to
environmental 5 topics to social and 6 topics to governance matters. For 18 topics I am not able to readily assign an ESG
label. In general, if the first 20 words contain either one of the three ESG dimension I label the topic accordingly. For all
other that do not contain one of the three keywords, I look for words close to the three pillars and assign the label,
accordingly. The order of the topics has no significance.
Table 10: Topic outcomes (standalone) before switching to an integrated report
Topic Dominant
Topic Terms ESG label
product, tobacco, programme, responsible, natural, society, introduction, respect, rewarding,
workplace, reinveste, brand, assessment, provide, datum, farmer, approach, study, trade, market,
illicit, labour, child, reduce, work, continue, case, improve, ensure, performance
production, product, sustainability, sustainable, energy, emission, plant, waste, risk, base, reduce,
material, raw, increase, safety, renewable, important, source, business, achieve, area, main, supplier,
climate, facility, society, responsib ility, wood, certify, innovation, goal
insurance, customer, risk, sustainability, social, fund, investment, finan cial, agency, claim, activity,
amount, relation, manage, policy, corporate, sale, system, network, director, pension, carry, datum,
direct, aim, start, function, board, sector, main
customer, employee, plastic, sustainability, carbon, standard, emission, risk, energy, anti, datum,
volume, part, goal, percent, impact, cash, injury, number, material, policy, business, safety, rating,
consumption, workplace, matter, decrease, fleet, conduct
communication, document, product, registration, assure, externally, sustainability, service, policy,
program, energy, business, approach, plan, corporate, assessment, customer, material, site, csr,
environmental, home, ethic, talent, connect, principle, performance, development, regulation,
training, diversit y
product, responsibility, corporate, customer, supplier, business, conduct, energy, emission, waste,
core, risk, end, focus, include, responsible, chain, csr, increase, female, code, work, ethic, bed,
safety, cent, return, target, material, area
gri, site, prime, building, energy, datum, portfolio, key, staff, real, estate, market, investment,
capital, property, client, make, business, tenant, topic, target, maintain, rate, sustainability, district,
future, generation, long, guest, working, approach, adjust
No clear ESG assignment
medium, responsibility, service, emission, digital, relate, corporate, employee, datum, target,
customer, responsible, information, content, code, financial, quality, scope, operation, business,
development, survey, covid, security, tax, competence, personnel, provide, conduct, statement
No clear ESG assignment
product, forest, ton, consumer, water, target, supplier, hygiene, customer, fiber, people, code, site,
conduct, accident, supply, scaos, program, life, include, pulp, paper, care, wood, tissue, solution,
nature, solid, organization, innovation, unit
risk, water, operation, mine, mining, approach, plan, social, programme, site, community, closure,
incident, socio, energy, work, matter, government, identify, biodiversity, american, target,
economic, area, employee, include, change, operational, deliver, climate, ensure
employee, emission, energy, market, change, internal, compliance, process, provide, consumption,
area, action, external, increase, focus, scope, team, reporting, manage, part, aspect, long, content,
end, target, law, relevant, account, country, general, solution
annual, sustainability, corporate, governance, datum, board, risk, society, executive, employee,
investment, financial, ratosos, impact, operation, high, customer, statement, travel, state, chain,
efficient, responsible, information, deliver, supply, resource, client, material, topic , profit
chapter, corporate, gri, responsibility, annual, risk, energy, production, principle, appendix,
include, ethic, policy, oil, indicator, area, compliance, supplier, environment, chemical, tonne,
business, change, action, social, carry, unit, professional, climate, establish, director
No clear ESG assignment
Electronic copy available at:
patient, number, site, health, corporate, healthcare, risk, product, program, csr, clinical,
responsibility, medicine, disease, approach, quality, treatment, country , social, center, trial, sanofio,
vaccine, animal, research, professional, ethic, datum, study, ensure, organization
section, gri, approach, sustainability, disclosure, topic, information, standard, stakeholder,
environmental, material, employee, compliance, business, social, statement, sustainable,
governance, risk, explanation, annual, index, supplier, emission, engagemen t, principle, figure, tax,
policy, general, reporting
sustainability, emission, fuel, airc raft, environmental, passenger, fligh t, tonne, relate, em ployee,
approach, cargo, customer, program, carbon, make, cost, airline, sick, include, organization,
number, weight, biofuel, efficient, travel, kilometer, ground, freight, airport
business, include, support, make, information, impact, standard, policy, environmental, human,
people, supplier, performance, develop, development, activity, review, continue, local, stakeholder,
responsible, community, conduct, strategy, governance, initiative, practice, progress, opportunity,
respect, follow, product
No clear ESG assignment
colonial, tele, building, consumpt ion, corporate, socia l, material, client , responsibility, offic e,
energy, governance, sfl, property, high, employee, follow, certification, training, include, water,
scope, market, groupos, control, aspect, make, consume, director, shareholder, risk
csr, customer, document, annexe, registration, corporate, applicable, responsibility, program, hear,
service, worldlineo, employee, datum, solution, performance, materiality, social, payment, hearing,
people, improve, business, innovation, work, consumer, overview, interview, center, auditor,
gri, disclosure, bekaert, approach, customer, semiconductor, material, topic, table, component,
plant, boundary, supply, energy, product, chain, steel, incident, explanation, team, evaluation,
standard, power, materiality, wire, program, supplier, high, labour, covid, worldwide, ethic
No clear ESG assignment
fish, salmon, feed, farm, important, focus, aquaculture, production, processing, facility, food,
quality, produce, make, cent, area, escape, sea, lice, work, salmaro, passion, farming, effort,
harvesting, large, licence, day, welfare, good
product, food, s tore, climate, susta inability, impact, wor k, animal, supplie r, label, organic, reduce,
issue, sale, private, customer, electricity, organization, make, certify, meat, tion, area, material,
aspect, solar, target, offer, condition, late
road, sustainability, customer, indicator, impact, service, environmental, area, safety, information,
operation, mobility, risk, traffic, accident, stage, system, verde, project, infrastructure, organization,
network, rate, concession, activity, performance, biodiversity, construction, improvement, main
safety, sustaina bility, employee , health, gri, risk, environmental, local, im pact, quality, community,
supplier, economic, training, compliance, system, issue, standard, stakeholder, environment,
operation, continue, waste, material, performance, topic, ensure, key, assessment, operate,
gri, material, approach, source, sustainability, product, natural, supplier, water, supply, emission,
raw, sustainable, chain, goal, energy, customer, consumer, waste, principle, reduce, programme,
oil, tonne, issue, site, change, key, target, scope
gri, sustainability, make, material, woman, promote, store, fashion, order, man, groupos, aim,
florence, italian, obtain, addition, initiative, customer, adopt, principle, work, exhibition, brand,
culture, product, commitment, certification, approach, main, scope, contract, respect
product, sustainability, chemical, make, material, customer, raw, site, production, safety, base, oil,
program, performance, process, offer, packaging, additive, produce, pigment, application, coating,
important, industry, substance, german, accident, measure, ton, time, innovative
work, employee, operation, health, good, base, environment, development, risk, ensure, high,
conduct, environmental, include, create, issue, reduce, stakeholder, develop, measure, sustainable,
goal, key, woman, increase, code, accordance, supplier, climate, corruption, time
Notes: The table shows the topic outcomes for 158 standalone reports of companies before switching to an integrated
report. For each topic I report the most prominent file that contributes the most to the topic (contribution). In the last
column I manually label the topic with respect to their environmental, social or governance affiliation. Five topics can be
linked to ‘Social’, 11 topics to ‘Environmental’, 7 topics to ‘Governance’ and 5 topics cannot be clearly assigned. In
general, if the first 20 words contain either one of the three ESG pillars I label the topics accordingly. For all other that
do not contain one of the three keywords, I look for words close to the three pillars and assign the label, accordingly. The
order of the topics has no significance.
Table 11: Topic outcomes (integrated) after switching from a stand-alone report
Topic Terms per Topic ESG label
percent, share, market, service, turbine, recognise, energy, project, director, power, board, policy, accounting,
cost, contract, development, joint, provision, loss, meur, increase, venture, order
development, business, process, risk, activity, safety, project, environmental, service, provide, change, economic,
training, governance, include, material, quality, initiative, action, objective, reduce, key, social
amount, financial, current, expense, employee, rate, risk, account, base, investment, due, customer, number,
corporate, follow, relate, activity, contract, present, contribution, member, trade, carry
No clear assignment
financial, asset, liability, loss, net, rate, period, statement, derivative, term, market, make, impairment, credit,
instrument, note, impact, payable, material, interest, current, provision, transaction
No clear assignment
customer, organization, integrate, supplier, people, relationship, indicator, innovation, approach, chip, aspect,
theme, machine, program, material, sustainable, knowledge, make, ethic, survey, safety, develop, score
project, power, plant, integrate, work, system, construction, contract, infrastructure, facility, electricity, operation,
sector, solar, aspect, line, water, director, term, transmission, people, main, maintenance
note, board, kluwer, share, tax, profit, supervisory, adjust, cash, revenue, risk, recognize, policy, service, annual,
plan, asset, lease, solution, contract, table, liability, include
No clear assignment
energy, market, area, emission, service, high, supply, work, capacity, site, power, construction, long, time, phase,
addition, environmental, maintenance, develop, generation, opportunity, security, water
employee, performance, product, include, business, result, strategy, support, term, improve, impact, relate,
number, supplier, high, time, industry, issue, corporate, standard, ensure, position, operation
include, performance, risk, price, joint, policy, strategic, dividend, item, expenditure, equity, attributable, relate,
benefit, vest, capital, deliver, note, earning, average, operational, reserve
No clear assignment
share, increase, plan, committee, executive, business, base, information, shareholder, continue, board, capital,
level, audit, remuneration, sale, basis, work, measure, key, future, target, significant
No clear assignment
Electronic copy available at:
employee, sustainability, integrate, system, model, aim, approach, network, capital, site, service, people, health,
organizational, plan, digital, risk, tower, woman, generate, infrastructure, process, standard
financial, statement, cost, annual, cash, tax, risk, income, flow, profit, item, expect, amount, revenue, include,
consolidate, share, asset, fair, sale, relate, measure, base
No clear assignment
euro, lease, separate, recognize, gri, thousand, director, refer, change, store, product, contract, order, taxis,
charge, sustainability, reference, amendment, table, art, consolidate, main
No clear assignment
board, director, integrate, customer, freight, service, continue, share, executive, program, compensation, solution,
logistic, gri, member, standard, organization, office, chain, region, end, meeting, impact
No clear assignment
director, service, cost, benefit, remuneration, annual, review, csr, cash, recognize, loss, acquisition, mail, net,
parcel, eur, liability, tax, operating, radial, committee
patient, healthcare, disease, health, csr, program, organization, approach, medicine, create, support, launch,
sanofio, develop, vaccine, medical, impact, ethic, market, factsheet, stakeholder, country, clinical
waste, plant, energy, design, day, solution, environment, industrial, sector, treatment, firm, security, french,
innovation, build, chief, equipment, model, unit, client, expertise
statement, asset, financial, interest, fair, result, end, income, board, pay, follow, operate, purchase, price,
accordance, gain, change, social, plant, record, assess, foreign, subsidiary
No clear assignment
operation, production, director, cost, coal, mining, ore, mine, cash, mineral, tax, tonne, investment, underlie,
copper, platinum, recognise, beer, iron, diamond, project, metal, ebitda
electricity, plant, interest, power, project, gri, price, grid, equity, verbundo, result, generation, board, liability,
supervisory, measure, change, renewable, loss, balance, section, austrian, accordance
Notes: The table shows the topic outcomes for 158 standalone reports of companies before switching to an integrated report.
For each topic I report the most prominent file that contributes the most to the topic (contribution). In the last column I
manually label the topic with respect to their environmental, social or governance affiliation. Five topics can be linked to
‘Social’, 10 topics to ‘Environmental’, 8 topics to ‘Governance’ and 6 topics cannot be clearly assigned.
Electronic copy available at:
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Using a large sample of EU non-financial firms over the period 2008-2018, this study examines the effect of the 2014 EU Non-Financial Reporting Directive on corporate social responsibility (CSR) and finds that the Directive has led to an increase in CSR transparency and performance. Further, it shows that the association between the Directive and CSR transparency is stronger for smaller firms, firms highly followed by analysts and firms headquartered in countries with strong legal systems. The adoption of CSR reporting after the Directive's enactment, small firm size and investments in research and development strengthen the positive effects of the Directive on CSR performance. However, the mandating of CSR reporting assurance by some EU member states seems not to have any significant impact. Lastly, our study shows that after the Directive's enactment, firms adopting CSR reporting experienced lower systematic risk and cost of equity. Our study contributes to the debate about whether and how non-financial disclosure should be regulated and shows the positive effects of the 'comply or explain' approach. It also provides insights for the EU in relation to the recently approved proposal to extend CSR reporting regulation to listed small and medium-sized enterprises and mandate CSR reporting assurance.
Full-text available
Disclosure of climate-related financial risks greatly helps investors assess companies’ preparedness for climate change. Voluntary disclosures such as those based on the recommendations of the Task Force for Climate-related Financial Disclosures (TCFD) are being hailed as an effective measure for better climate risk management. We ask whether this expectation is justified. We do so by training ClimateBERT, a deep neural language model fine-tuned based on the language model BERT. In analyzing the disclosures of TCFD-supporting firms, ClimateBERT comes to the sobering conclusion that the firms’ TCFD support is mostly cheap talk and that firms cherry-pick to report primarily non-material climate risk information.
Full-text available
Purpose – The International Integrated Reporting Framework (IIRF) encourages organisations to disclose material information that affects their ability to create value. This paper investigates the challenges and techniques preparers of integrated reports employ to determine the materiality of non-financial information. Design/methodology/Approach – This paper uses an exploratory interpretive thematic analysis and an archival research approach. Qualitative semi-structured interviews were conducted with 55 integrated reporting (IR) preparers in 12 publicly listed companies, supported by the perusal of the companies’ integrated annual reports over a three-year period. Findings – IR preparers find materiality determination for non-financial information challenging. We found that preparers convert challenges into opportunities by using materiality disclosures as image enhancing marketing tools, which causes concerns regarding weak accountability and a deviation from the International Integrated Reporting Council’s (IIRC’s) objective of improving information quality. We found that IR preparers employ various techniques in conjunction to determine materiality levels, as well as whether to disclose non-financial information in their integrated reports. The institutional isomorphism lens used in the study highlighted the issues IR preparers faced in their determination efforts of IR materiality levels under mimetic and normative isomorphism pressures. Research limitations/implications – The challenges and techniques identified can contribute to the development of a framework for materiality level determination for non-financial information. Practical implications – Regulators who are concerned with ensuring sufficient information to improve investor decision-making will be interested in the techniques IR preparers use to determine materiality levels for non-financial information, in order to improve their regulations and frameworks. Originality/Value – This study contributes to the literature regarding challenges with materiality level determination in integrated reports and techniques used by IR preparers. The application of an institutional isomorphism lens led to greater insight and understanding of IR preparers’ challenges and techniques in materiality determination. This paper makes a number of significant contributions to the IR literature. Firstly, it identifies the usefulness of material information for decision-making and the influence stakeholders have on the materiality determination of non-financial information, which have not been mentioned in the prior literature. Secondly, the literature is silent on how organisations relate materiality to value creation for the purposes of determining the materiality content of an integrated report; this research provides empirical evidence of the use of value creation criteria in materiality determination. Thirdly, the study highlights that materiality is a combination of efforts that involves everyone in an organization. Further, strategy should be linked to IR and preparers have indicated that integrated thinking is required for materiality determination.
Full-text available
This study collates potential economic effects of mandated disclosure and reporting standards for corporate social responsibility (CSR) and sustainability topics. We first outline key features of CSR reporting. Next, we draw on relevant academic literatures in accounting, finance, economics, and management to discuss and evaluate the potential economic consequences of a requirement for CSR and sustainability reporting for U.S. firms, including effects in capital markets, on stakeholders other than investors, and on firm behavior. We also discuss issues related to the implementation and enforcement of CSR and sustainability reporting standards as well as two approaches to sustainability reporting that differ in their overarching goals and materiality standards. Our analysis yields a number of insights that are relevant for the current debate on mandatory CSR and sustainability reporting. It also points scholars to avenues for future research.
Full-text available
Purpose This study aimed at investigating the readability of sustainability reports in Indonesia. The Indonesian government, through the Financial Services Authority of Indonesia (Otoritas Jasa Keuangan [OJK]), has issued regulation POJK 51/2017 concerning the implementation of sustainable finance, which requires public companies to prepare sustainability reports—either stand-alone reports or parts of annual reports. Until 2017, only 30% of the top public companies in terms of market capitalisation issued the required report. Companies' decisions to provide the report stem from the greater visibility and access to resources that flow from additional narratives. However, the usefulness of such a report can be questioned. Design/methodology/approach We used several linguistic techniques (Flesch Reading Ease [FRE], Flesch–Kincaid, and Gunning Fog measures) to evaluate the readability of sustainability reports. The analysis was performed using a software application called “Readability Studio 2015.” Findings We found the reports to have a low level of readability. This means that the information provided in the disclosures are very difficult to decipher and understand by the targeted users. Considering the similar levels of report readability in companies across industries, we observe a pattern of isomorphism in the way companies have implemented the same format and language construct in disclosing their sustainability information. They might apply the myth that complex language attracts investors or impresses others. Research limitations/implications The techniques to measure readability that we use might not capture the whole dimensions of readability and understandability, especially in the non-English language. Practical implications The results from this study can be used as evaluation tools for companies and regulators in preparing more intelligible and readable sustainability reports, as mandated by POJK 51/2017. Social implications Sustainability reports act as a medium of accountability for a company's sustainable production and operations. Their usefulness for investors and other users often depends on the readability of the information. Originality/value The readability of sustainability reports in the context of Indonesia as an emerging market has not been comprehensively investigated in previous research. This study is among the first of its kind to support the quality enhancement of the reports.
We investigate real effects of a widespread corporate social responsibility (CSR) reporting mandate. In 2014, the European Union (EU) passed Directive 2014/95 (hereafter, “CSR Directive”), mandating large listed EU firms to prepare annual nonfinancial reports beginning from fiscal year 2017 onward. We document that firms within the scope of the directive respond by increasing their CSR activities and that they start doing so before the entry‐into‐force of the directive. These real effects are concentrated in firms that are plausibly more strongly affected by the directive, i.e., those with previously low levels of both CSR reporting and CSR activities. Using various alternative outcome variables (e.g., new CSR initiatives, improvements in CSR infrastructure, or firm performance), we show that these real effects reflect meaningful increases in CSR beyond firms’ potential attempts to “greenwash” CSR performance. Finally, we conduct tests that increase our confidence that the documented real effects are attributable to the CSR Directive and not general EU trends in CSR. This article is protected by copyright. All rights reserved
Textual analysis, implemented at scale, has become an important addition to the methodological toolbox of finance. In this review, given the proliferation of papers now using this method, we first provide an updated survey of the literature while focusing on a few broad topics—social media, political bias, and detecting fraud. We do not attempt to survey the various statistical methods and instead initially focus on the construction and use of lexicons in finance. We then center the discussion on readability as an attribute frequently incorporated in contemporaneous research, arguing that its use begs the question of what we are measuring. Finally, we discuss how the literature might build on the intent of measuring readability to measure something more appropriate and more broadly relevant—complexity.
We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&D (research and development). For nonpatenting firms, the measure identifies innovative firms that adopt novel technologies and innovative business practices (e.g., Walmart’s cross-geography logistics). For patenting firms, the text-based measure strongly correlates with valuable patents, which likely capture true innovation. The text-based measure robustly forecasts greater firm performance and growth opportunities for up to four years, and these value implications hold just as strongly for innovative nonpatenting firms. This paper was accepted by Gustavo Manso, finance.