Content uploaded by Felipe Sick
Author content
All content in this area was uploaded by Felipe Sick on Feb 14, 2023
Content may be subject to copyright.
1
A comparative analysis of European integrated and stand-alone
sustainability reports: Evidence from LDA
Felipe Sicka
aDepartment of Accounting, Control and Auditing, University of St.Gallen, St. Gallen, Switzerland
ACA-HSG, Office 57-104, Tigerbergstraße 9,9000 St.Gallen, Switzerland,
E-mail: felipe.sick@unisg.ch, phone: +4915207326221
Acknowledgments: I thank Prof. Dinh and A. Stenzel at the University of St.Gallen for their
valuable guidance and support on this project. In addition, I like to thank participants of the
fifth edition of the Early Researcher Consortium (ERC) by the Accounting Research Groups at
the UPF-BSM (Spain), Free University of Bolzano (Italy) and University of Padua (Italy).
Electronic copy available at: https://ssrn.com/abstract=4283860
2
A comparative analysis of European integrated and Stand-
alone sustainability reports: Evidence from LDA
ABSTRACT: I employ an unsupervised learning algorithm to investigate the thematic
content of a large set of integrated and stand-alone sustainability reports. Subject to a non-
financial reporting mandate, companies in the EU can disclose their non-financial
performance together in their financial statement or in a separate sustainability report. In
this paper, I compare 2,248 integrated reports and 3,567 stand-alone reports to identify and
examine the thematic content that firms disclose when preparing their sustainability
information. To do so I employ a topic model, Latent Dirichlet Allocation (LDA) to
examine the topics disclosed by firms issuing a report between 2015 and 2021. Comparing
the outcomes of both types of report, I find that when text corpora are constructed of
integrated reports, they are less likely to contain topics concerning ESG matters and
especially addressing social issue. In line with previous research, I show that the readability
is lower for firms including their non-financial information in their financial statements as
compared to when they issue a separate report. The analysis provides insights for regulators
and preparers of sustainability information and contributes to the debate on the implications
of the recently approved proposal to extend non-financial reporting regulation to listed
small- and medium-sized enterprises.
Keywords: sustainability reporting, topic modeling, ESG, European corporate reporting,
integrated reporting
Electronic copy available at: https://ssrn.com/abstract=4283860
3
1. Introduction
The disclosure of non-financial information has become a relevant subject for preparers, users
and regulators alike, the latter increasingly looking to harmonize reporting practices. Under
current EU regulations, companies with more than 500 employees have the flexibility to disclose
their non-financial information together with financial information in an integrated report or in
a separate sustainability report. The Non-Financial Reporting Directive (NFRD 2014/95/EU)
essentially creates a situation in which companies either issue an integrated or voluntarily
disclose their non-financial information in a stand-alone report. How firms decide to disclose
their sustainability information has resulted in considerable heterogeneity in firms’ reporting
practices as well as significant challenges for measurement, comparability and standardization
(Christensen et al., 2019). At the same time, companies may wish to understand current and
future regulatory interventions to align their reporting practice given future changes in their
reporting practices. Recent developments, especially in the consolidation of standard setters
such as the creation of the International Sustainability Standards Board (ISSB) highlight the need
for making sustainability information comparable, reliable, and relevant.1.
In many cases, the information presented in multiple reports can differ with direct and
indirect implications for the end-users of sustainability reports (Rupley et al., 2017). While the
NFRD highlights the importance of how to assess materiality in the context of non-financial
information it does not standardize sustainability practices and is generally being interpreted as
a call for more integrated reporting practices as set forth by the International Integrated
Reporting Council, IIRC (Milla & Haberl-Arkhurst, 2018). Because an integrated report (IR)
should communicate ‘concisely’ how a firm’s strategy, governance, performance, and prospects,
in the context of its external environment, lead to the creation of sustainable value (IIRC, 2013)
an integrated report is said to be effective when it goes beyond addressing solely providers of
1 With the recent merge of the international integrated reporting council (IIRC) and the Sustainability Accounting Standards Board (SASB) into the Value Reporting
Foundation the regulatory environment in Europe can be said to be highly dynamic. For a detailed review on the CSRD and current consolidations in Europe I
refer to Dinh, et. al (2021).
Electronic copy available at: https://ssrn.com/abstract=4283860
4
financial resources2. However, it may also leave out information that may be relevant to
stakeholders beyond providers of financial resources. In fact, contents in sustainability reports
often seem to have been selective, mainly stressing positive aspects of the reporting company’s
performance (Baumüller, 2018). Using machine learning, I examine the thematic content of a
large dataset of integrated and stand-alone reports to gain a better insight into what managers
of large European companies disclose when they prepare non-financial information. Methods
that rely on artificial intelligence are being used increasingly in the accounting and finance
literature and are yet to find more applications in sustainability accounting research. My
motivation to study the differences in the thematic content is to make use of a computationally
efficient methods that are suitable to identify underlying or hidden topics that are disclosed in
different reporting types. In particular, I show how the thematic content, in terms of ESG
disclosure differs when a company issues an integrated report, relative to preparing a stand-
alone sustainability report. In addition, I investigate the differences in textual attributes when
text corpora are constructed of integrated reports. I find that when a company produces an
integrated report they are less likely to include topics around ESG matters and especially leave
out subjects around social issues. In line with previous research (especially du Toit, 2017; Stone
& Lodhia, 2019) I find that the readability of integrated reports is generally lower relative to stand-
alone reports. The result seems to be pronounced when a firm changes to an integrated reporting
framework. Throughout this paper I argue that the difference in disclosure content can generally
arises because of external pressures and firms wanting to tailor their information to their target
audience. For integrated reports, this means shareholders. Companies that primarily focus on
information that is financially material to investors may cover ESG related topics, but may
exclude topics for which the firm does not bear the full cost (Christensen et al., 2021)3. This can
have implications for the report a firm decides to disclose their non-financial information, the
2 The case for combining non-financial information with financial information was made prominently by Eccles and Krzus (2010) who emphasized that a
company should embed sustainability topics in the fabric of its business operations to demonstrate a commitment to CSR and therefore contribute to a more
sustainable society. Combining financial with non-financial information in a single report is said to improve corporate disclosure and transparency by eliminating
the artificial and unhelpful analytical distinctions between shareholders and stakeholders (Eccles, 2010).
3 This is especially in the case if a firms’ operations result in externalities such as air pollution.
Electronic copy available at: https://ssrn.com/abstract=4283860
5
thematic content of the report and the information available to end-users. Preparing reports that
not only incorporate how sustainability topics affect a corporation, but how a firms’ operations
affect the environment, is likely to shape the regulatory environment in the EU4.
My paper foremost contributes to text-based analytics in the sustainability accounting
literature. By explicitly examining the thematic content of sustainability reports and contrasting
topics by report type, I add to the understanding on how textual information is different in a a
fast changing regulatory environment that characterizes the EU. From fiscal year 2024 onwards,
large public entities in the EU will have to disclose their sustainability information under the
new Corporate Social Responsibility Directive (CSRD). Non-financial reports are currently and
will increasingly to rely on judgements about materiality (Baumüller & Schaffhauser-Linzatti,
2018a). The challenge I aim to address is in analyzing trends in corpuses far too large for humans
to manually review and summarize in a way that is easily interpretable. The insights are of
potential interest to regulators and standard setters in fostering awareness on how firms are able
to communicate their ESG activity more effectively to stakeholders. Understanding the thematic
content is also important for corporations wishing to understand this information and improve
stakeholder communication. It may help them to strengthen their relationships with customers,
employees and local communities (Du & Bhattacharya, 2010).
The remainder of this article is organized as follows. Section two reviews the literature
and explains the current and changing regulatory environment in the EU. Section three
introduce topic modeling and how I use LDA for the application of sustainability reports. In
this section I elaborate on how I generally carry out the analysis. To make the large body of
integrated and stand-alone reports useful for the model, I carry out an array of pre-processing
steps to the data that I describe in section four. Section five presents the LDA results separately
for stand-alone and integrated reports. In the same section I focus on firms that switch from
4 Regulators in the EU are currently aiming to promote the concept of ‘double materiality’ which requires business es to consider the firm’s economic, environmental,
or social impact on society Garst, J., Maas, K., & Suijs, J. (2022). Materiality Assessment Is an Art, Not a Science: Selecting ESG Topics for Sustainability Reports.
California Management Review, 00081256221120692. https://doi.org/10.1177/00081256221120692 .
Electronic copy available at: https://ssrn.com/abstract=4283860
6
disclosing non-financial information in a financial statement to an integrated report type. Finally,
in section six I offer my concluding remarks.
1. Background and Literature Review
1.1. Institutional setting
The non-financial reporting directive 2014/95/EU (hereinafter: NFRD) came into effect in
2017 as a response to the lack in transparency and comparability of non-financial information,
previousely subject to directive 2013/34/EU. The previous directive (2013/34/EU) was
primarily concerned with ensuring general comparability between firms operating in European
member states in terms of their financial statements, management reports and other general
financial reporting practices (European Union, 2013). In contrast to the superseded directive,
the NFRD stipulates that companies in European member states not only disclose how
sustainability issues affect them, but also how their activities affect society and the environment
at large (EC, 2021). This principle, referred to as double materiality, suggests that stakeholders
can have preferences beyond shareholder value maximization. While 2013/34/EU required
public interest entities to disclose a non-financial statement, it did not have materiality of
disclosures as one of the fundamental principles of the nonfinancial reporting directive (Milla &
Haberl-Arkhurst, 2018). The NFRD requires large public interest entities to disclose
information on how they operate and manage social and environmental challenges5.
Fundamentally, the main target of the NFRD is to allow a sufficient level of comparability and
transparency to meet the needs of investors and other stakeholders and to provide consumers
with easy access to information on the impact of businesses on society (European Union, 2014)6.
The disclosure should help the measuring, monitoring, and managing of a companies’
performance towards environmental targets and their impact on society. Transparency is
considered key for companies to deliver better results and is expected to enhance the trust
5 The EU generally consideres non-financial information as environmental, social and governance (ESG) information (EC, 2013).
6 The disclosure of non-financial information is considered vital for managing change towards a sustainable global economy by combining long-term profitability
with social justice and environmental protection (European Union, 2014).
Electronic copy available at: https://ssrn.com/abstract=4283860
7
citizens have in business and in markets and enable a more efficient allocation of capital (EU,
2014; European Union, 2014). The provisions of the directive, together with guidelines set forth
by the EU Commission generally show a reference to the principle of materiality (Baumüller &
Schaffhauser-Linzatti, 2018b; EC, 2017)
Information, according to the NFRD can be disclosed in the form of a statement in the
annual report or a separate sustainability report (Baumüller, 2018; Dinh et al., 2021; EC, 2017).
Allowing the choice between a disclosure of non-financial information in the annual report with
a voluntary decision on how detailed the information is presented in a stand-alone report has
been considered to result in satisfactory increase in transparency, while keeping the
administrative burden low (Commission, 2017; European Commission, 2017). In addition, the
material non-financial information would be made publicly available on a regular basis and could
be used by stakeholders such as social organisations or local communities to assess the impact
and risks related to the operations of a company. Even though companies choosing to provide
a separate report might have to sustain higher costs (European Commission, 2017) the benefits,
for instance in enhancing efficiency of capital markets outweigh the downsides. Cuomo et al.
(2022) find that the directive generally has led to an increase in transparency and sustainability
performance. Before the NFRD was made public, the usefulness of an integrated framework
was made easier for companies voluntarily issuing an integrated report (Stawinoga & Velte,
2017).
Baumüller and Schaffhauser-Linzatti (2018b) show that despite the expectations of
many, with regards to materiality, the reporting requirements of the NFRD are closer to
integrated reporting than it is to sustainability reporting. This finding is in line with Stawinoga
et al (2017) who argues that integrated reporting depicts the more realistic implementation
because of the quantitative and qualitative requirements of the directive and companies aligning
their non-financial information into their financial reporting practices. Given the need to
reconcile financial with nonfinancial information, Neumann et al., (2012) argue that the increase
Electronic copy available at: https://ssrn.com/abstract=4283860
8
in total information available might have an adverse effect and hamper the intended positive
effects of reporting, from its users perspective as a whole. According to Eccles and Krzus (2019)
there are two main reasons why a company should adopt an integrated report. The first is that
integrated thinking is a key element of taking sustainability seriously, because once the company
has created a truly sustainable strategy, it can better respond to the risks and opportunities
created by the need to ensure a sustainable society (Eccles, 2010). Second the reader gains a
better understanding of the relationships between financial and non-financial performance,
urging managers to provide more specific examples of how the firm is doing well (for
shareholders) by doing good (for stakeholders). An integrated report, according to the latter
view puts pressure on a company to be as precise as possible about the relationship between
strong ESG results on the one side and financial results on the other. Maniora (2017) finds that
integrated reports can differ from stand-alone reports because integrated reports generally
include financial and non-financial information and therefore result in a firm's tendency to cater
to a specific audience (investors). This view is supported by Baumüller and his colleague (2018)
who highlight the main target group for integrated reporting to be providers of financial
resources. While Lai et al. (2018) state that integrated reporting establishes a meaningful dialogue
with a growing variety of stakeholders through broader and plainer messages they, too, argue
that the primary addressee of an integrated report are shareholders. Should companies decide
to voluntarily provide a non-financial report, the level of detail of information disclosed would
necessarily increase (European Commission, 2017). I would then expect to see topics being
clearly identifyable as environmental, governance and social.
Generally, the guidelines set forth by the EU allow companies to decide how to use
international, European or national guidelines according to their own characteristics or business
environment (Dinh et al., 2021; Goloshchapova et al., 2019). Two noteworthy guidelines for the
disclosure of non-financial information in the EU are the Global Reporting Initiative (GRI) and
the International Integrated Reporting Council (IIRC). The GRI can be understood as an
initiative providing non-binding guidance for companies (European Commission, 2017).
Electronic copy available at: https://ssrn.com/abstract=4283860
9
Striving to help companies communicate their impact on critical sustainability issues it
developed to be a global standards for sustainability reporting (Christensen et al., 2021). The
GRI define material topics as those topics, that can reasonably be considered important for
reflecting the organization’s economic, environmental and social impacts, or influencing the
decisions of stakeholders (GRI, 2016). The IIRC, in contrast, attempts to institutionalize
integrated reporting as a practice that is critical to the relevance and value of corporate reporting
(Humphrey et al., 2017). The objective of the IIRC is to change business actors' perspectives to
further integrate sustainability activities and impacts into strategic planning and decision-making
(IIRC, 2013). In other words, an integrated report should disclose about matters that
substantively affect the organization’s ability to create value over the short, medium and long
term (IIRC, 2013a). Simnett and Huggins (2015) provide insights into salient issues in the
development of the IIRC and explain that much of what is expected to be included in an
integrated report could be described as qualitative rather than quantitative information. In other
words, companies are free to write about how they combine the non-financial information with
their financial information. This makes the assessment in terms of materiality of topics difficult.
According to the IIRC, a topic is material if it is of such relevance and importance that it could
substantively influence the assessments of providers of financial capital with regards to the
organization’s ability to create value over the short- medium and long-term (IIRC, 2013b).
Prior studies have conjectured on the differences in information content across different
channels and mainly argued from an institutional and stakeholder perspective (Vitolla & Raimo,
2018). External ‘tensions’ between a firm and its stakeholders can influence managers to release
only certain information, especially when the information is proprietary (Dye, 1985). While
Melloni and Stacchezzini (2014) demonstrate that companies do not adopt integrated reporting
as a legitimation strategy, integrated reporting is generally associated with higher ESG disclosure
ratings and performance (Velte 2017)7. Flower (2015) paints a different picture and argues that
7 A legitimate explanation for the nondisclosure of management’s information generally arises from the fact that stakeholders are often unsure whether the
manager has any such information—or they are uncertain about the kind of information held by management (Dye R., 1985).
Electronic copy available at: https://ssrn.com/abstract=4283860
10
the <IR> framework (i) focuses exclusively on impacts that are directly related to the firm, and
(ii) generally lacks enforceability, resulting in little impact on reporting practices. Similarly,
Vitolla et al. (2018) conduct a systematic literature review and conclude that the <IR>
framework lacks specificity, leading to problems of standardization and homogenization of the
content. Furthermore, a stream of literature raises concerns regarding the divergence of
information for different report types because of a lack of specificity inherent in the <IR>
framework (Beck et al., 2017; Mio and Fasan, 2016). Because of its multi-dimensionality it is
said to make comparison across firms and industries difficult (Eccles et al., 2019). Finally,
integrated reports that follow IIRC’s guidelines have been shown to be low in readability by
common readability measures, primarily because of the complex nature of the language used in
integrated reports (du Toit, 2017; Stone & Lodhia, 2019).
Taken the characteristics of integrated reporting into consideration it is initially unclear
to what extend material topics are adequately communicated to users of sustainability reports.
To study the extent to which the thematic content differ by report type, I rely on textual analysis
and a classification algorithm known as Latent Dirichlet allocation (LDA). I review the literature
of similar applications in the next subsection.
2.2. Topic modeling in Accounting Research
Text mining is the large-scale, automated processing of plain text language in digital form to
extract data that is converted into useful quantitative or qualitative information (Das, 2014;
Antweiler und Frank, 2004). In topic modeling, a document is considered a collection of words
containing multiple topics in different proportions (Chae and Park, 2018). Latent Dirichlet
Allocation (LDA) was first introduced by Blei et al. (2001) and is based on the idea that a corpus
of a collection of disclosures can be represented by a set of common topics and the content of
a document described by the weights placed on these topics (Hoberg & Lewis, 2017). The basic
assumption underlying LDA is that each document is generated by drawing content from a
Electronic copy available at: https://ssrn.com/abstract=4283860
11
common set of topics, or clusters of words8. LDA has been described to be superior to some
alternative methods for several reasons. Because it is an unsupervised learning technique,
researchers do not have to prepare, ex-ante, dictionaries for the analysis (Tirunillai & Tellis,
2014). In addition, it is not necessary to know in advance what the topics will look like. By tuning
the LDA parameters to fit different dataset shapes, one can explore topic formation and
resulting document clusters in an exploratory fashion (Nguyen, 2014). Finally, topic models are
particularly useful when corpora are large and when there are many documents to be classified.
Outcomes of topic models have been evaluated by probability-based metrics such as coherence,
perplexity or visual inspection (Hagen, 2018; Israelsen, 2014)9. The most popular method used
for feature extraction is the bag-of-words (word frequency) approach. This technique breaks the
text into word-level units, and treats these units as features, while ignoring the order and co-
occurrence of words (Chen et al., 2017). LDA is predicated on the idea that documents are built
as mixtures of latent subjects, each of which is basically a probability distribution over words.
Textual analysis using bag-of-words, however, faces the risk of becoming overly conditioned by
the usefulness and timeliness of the word dictionary used. In contrast, LDA is suitable to
compare and assess the topics of a sizable collection of European sustainability reports as in the
case of this study. LDA which is but one application of text mining has several benefits over
manual coding including the ability to process many documents that would be inefficient to
code manually or the redundance of applying dictionaries or interpretation guidelines before the
analysis (DiMaggio et al., 2013). Because of the large component of subjective judgement,
manual coding may not capture the hidden features that LDA may detect. By fitting the
presumptive statistical model to the full textual corpus, the topic model can identify topics and
their probabilistic relationships that may otherwise remain unobserved (Mohr & Bogdanov,
2013). A textual corpus' overall subjects, keywords used to identify each topic, and the
8 The process of LDA, can be compared to cluster analysis or principal component analysis (PCA) as applied to quantitative data Huang, A. H., Lehavy,
R., Zang, A. Y., & Zheng, R. (2018). Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach. Manag. Sci., 64, 2833-2855.
. PCA is similar in that it ext racts the important information from the data to express this information as a set of new orthogonal variables called principal
components or vectors that are used as input to the model(s).
9 Perplexity is calculated by taking the log likelihood of unseen text documents given the topics defined by a topic model. A good model will have a high
likelihood and resultantly low perplexity. Coherence generally can be understood as how well a topic is ‘supported’ by a (reference) text set and combine
the score using topic overlap.
Electronic copy available at: https://ssrn.com/abstract=4283860
12
probabilistic relationship between keywords and topics cannot be determined beforehand
(Huang et al., 2018). Alternatively put, a fitted LDA model recovers the set of subjects that most
accurately capture the empirical distribution of word groupings across the documents (Bellstam
et al., 2021).
Blei et al. (2018) provide an overview of how topic modeling works in general and latent
Dirichlet Allocation in particular. The latent structure of text refers to two types of information
from the data: document-topic distribution and topic-term distribution. The document-topic
distribution informs as to how each document is composed, in terms of topics. The topic-term
distribution provides different lists of semantically coherent words, where each list of terms
represents a topic or theme (Chae and Park, 2018). These two procedures are repeated word for
word to create the document. By fitting this two-step generative model to the observed words
in the documents until it finds the best collection of variables that define the topic and word
distributions, the LDA algorithm iteratively determines the topic distribution for each document
and the word distribution of each subject (Blei et al., 2001).
In accounting and finance research, LDA has been used to identify financial
misreporting (Brown et al., 2019), analyze company disclosure (Israelsen, 2014) and investigate
conference calls (Huang et al., 2017). Hober and Lewis (2017) use LDA to analyze 10-K
Management’s Discussion and Analysis (MD&A) of firms that are suspected to conduct fraud.
Highlighting that managers may under- or over disclose certain topics, the authors find that
firms committing fraud, less frequently discuss topics that link the CEO with participation in
actual firm plans and financial strategies and rather discuss acquisitions, hedging transactions,
derivative instruments, and business opportunities (Hober and Lewis, 2017)10. In addition,
fraudulent managers discuss fewer details explaining the sources of the firm’s performance,
while disclosing more information about positive aspects of firm performance11. Huang et al.
10 The authors rely on two distinct approaches to accurately interpret the identified topics. The first is a list of the most frequent key phrases that are
associated with each topic. The second is a ‘representative paraghraph’ which best represents the content that is typical among firms that use the topic.
11 Israelsen Israelsen, R. (2014). Tell It Like It Is: Disclosed Risks and Factor Portfolios. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2504522
also applies LDA on 10K disclosure and focuses primarily on the frequency of topics which are disclosed in each individual report. Primarily interested
Electronic copy available at: https://ssrn.com/abstract=4283860
13
(2017) use LDA to compare the thematic content of a large sample of analyst reports with
transcript of conference calls that precede the disclosure of the analysis. To compare the topics
of analyst reports and conference calls, the authors use the topic distribution of each document
and the word distribution of each topic and conduct a pearson’s chi-square test for the
homogeneity of the distributions between the two text corpora12. Huang et al (2018) conduct
the LDA for each industry separately, arguing that many topics are industry specific. LDA was
also used in Dyer et al. (2017) to study trends in 10-K disclosure over time. They find that three
of the 150 topics (fair value, internal controls, and risk factor disclosures) required by the
Financial Accounting Standards Board (FASB) and Securities and Exchange Commission (SEC)
explain most of the increase in length of the disclosures (Dyer et al., 2017).
Székely and Vom Brocke (2017) apply topic modeling to 9,514 sustainability reports
published between 1999 and 2015 to identify the most common topics in the reports. By
manually labeling the topics, they identify forty-two topics that reflect sustainability,
environmental, and social topics. Based on these results, Székely and Vom Brocke (2017) then
derive ten specific propositions to guide future research and practitioners. Importantly, Székely
and Vom Brocke (2017) show that topics related to environmental sustainability consist mainly
of emissions and consumption, particularly related to energy. They also highlight, that
biodiversity and renewable energy do not appear in their results. In addition to providing a broad
picture of sustainability practices over several industries, they highlight the necessity of balancing
all three dimensions of CSR. My analysis is different from theirs in that I contrast the topic
models outcome with differing report types. The study perhaps most closely to this one is by
Goloshchapova et al. (2019) who use LDA for sustainability reports of 15 European countries
and the UK. They use 4,156 sustainability reports by 2,685 unique firms and focus on the
in the association of the frequency of the words with firm characteristics, he finds that disclosing risks associated with borrowing, with the ability to pay
dividends, with legal issues, and with the ability to develop new products or technology have higher total, and idiosyncratic stock-return volatility
(Isrealesen, 2014).
12 If the two documents are homogeneous, the proportion of sentences in the topic will be equal, i.e., the observed number of sentences in each topic
will be equal to the expected number of sentences for the two documents. The degree of freedom of the chi-square test between the two documents is
the vector length minus one. Details can be found on https://pubsonline.informs.org/doi/suppl/10.1287/mnsc.2017.2751
Electronic copy available at: https://ssrn.com/abstract=4283860
14
disclosed topics relative to their industries. Goloshchapova, et al. (2019) find that industrial firms
emphasize ‘employee safety’ whereas the utility sector concentrates on ‘energy’ and ‘efficient
power’ while consumer products focus on ‘food waste’ when firms report on their CSR. The
authors manually label the topics and do not distinguish between different reporting formats.
Here again, my study distinguishes itself from theirs in that I compare the thematic content of
stand-alone and integrated reports. In addition, I use the popular topic modeling library
MALLET which has been shown to outperform alternative methods (McCallum, 2002; Yao et
al., 2009).
3. Methodology
3.1. Data collection and preparation
I obtain both stand-alone and integrated sustainability reports from Corporate Register,
a data provider that specializes in collecting corporate sustainability reports13. The sample period
ranges from 2015 to 2021 and covers all countries that are member of the EU, UK and
Switzerland. The dataset contains a total of 5,815 unique reports that are written in English. I
obtain a meta-file that contains additional information on the observations, such as industry, the
country in which the firm is headquartered and whether the firm issues an integrated or
standalone report. Each report in the dataset is one observation14. I provide an overview of the
variables used in the study in Table 1 in the appendix. The data is initially stored by a company
number that Corporate Register provides. After querying the reports by report type, I obtain
two datasets with 3,567 stand-alone reports and 2,248 integrated reports, respectively. In the
dataset, I observe 158 instances where a firm switches from a stand-alone to an integrated report.
I define a switch in report type as a change in the binary variable that I previously assign to each
observation. If a company produces a stand-alone report in one year (type=0) and an integrated
report in the following year (type=1), I recognize this as one instance in which a firm switches
13 https://www.corporateregister.com/ I do not get paid by or am otherwise affiliated with Corporate Register.
14 An observation in the dataset is defined as an instance where one unique report exists in a year by a company.
Electronic copy available at: https://ssrn.com/abstract=4283860
15
report type. While this approach allows me to identify the unique switch from one year to the
next, it does not allow me to separately analyze a situation in which a company produces several
reports in one year. It can be that a company produces both an integrated and a stand-alone
report in one year, especially when the firm is in the process of switching report type wishing to
include a ‘bridge year’, i.e. producing both type of reports. Because I am interested in thematic
content around firms that switch from an integrated to a stand-alone report, I initially exclude
firms that produce more than one report that can be clearly attributed to an integrated or stand-
alone report. I convert all the available reports to text files using a function to decrypt files for
which the text cannot be readily extracted. Italian and German firms make the largest proportion
in the dataset (12% and 11%, respectively), followed by Swedish (10%) and French firms (9%).
Interestingly, firms headquartered in Scandinavian countries, on average, report their
sustainability information primarily using an integrated reporting framework as compared to
Eastern European countries such as Poland. According to Mitchell et al., (1997) regional
variation in reporting format can generally be explained by management practices and strategic
responses to the corporate environment, in particular to the expectations and pressures of
corporate stakeholders. Western European countries, particularly Anglo-Saxon governments are
significantly more active in promoting ESG than governments in other EU countries (Steurer
et al., 2012). A breakdown of the frequency of integrated and stand-alone sustainability reports
by country can be found in table 2 in the appendix. In total, there are 1,218 unique firms
represented in the sample.
3.2. Latent Dirichlet allocation and model selection
In this analysis make use of and rely on a technique known as Latent Dirichlet allocation
(LDA) to investigate how the topics in sustainability reports differ by report type. LDA has been
described by Blei et al (2001) and generally calculates topics as a probability distribution over
words given a text corpus and assumes that topics have latent patterns of words in the corpus.
Electronic copy available at: https://ssrn.com/abstract=4283860
16
Any document can be categorized as a combination of these topics15. The likelihood that this
topic will appear in any document, and the distribution of the resulting topics for each document
in the corpus then make the outcome that is derived probabilistically by computation. Using an
automatic categorization procedure such as LDA is suitable because it is otherwise difficult to
analyze thousands of reports using traditional manual approaches qualitative coding methods
such as wordlists. In addition, LDA can automatically discover hidden structures in texts and
provide a relatively efficient way to examine large texts. These models have been used in a variety
of applications and including finance and accounting research (Chen et al., 2017)16. By
recognizing each document, or sustainability report, as a collection of words I am able map the
content of a variety of themes that managers of both integrated and stand-alone sustainability
reports cover. Each topic (or subject) can then be understood as a distribution across all
observed words in the corpus, and words highly related with the document's dominant topics
can even be traced back to their original reports. The major distinguishing characteristic of topic
models from other machine learning approaches is that they offer an automated process for
classifying the content of texts into several significant topics. The process of applying LDA for
different report type is enticing because the algorithm can complete the classification with very
little assistance from the researcher. Specifically, the number of topics is the only material input
the researcher needs to specify when running LDA (Hoberg and Lewis, 2017).
For the computation of the topics, I rely on The Machine Learning for LanguagE
Toolkit (MALLET). MALLET uses an implementation of Gibbs sampling, a statistical technique
meant to quickly construct a sample distribution, to create its topic models (Graham et al., 2012).
An overview of Gibbs sampling can be found in exhibit 1 in the appendix. I use an iterative
approach and build several models in which I adjust the number of topics and select the model
that describes the topic model the best. In each iteration, I vary the number of topics using
15 For instance, an LDA analysis of stand-alone reports might uncover two topics: one with the terms ‘emissions’, ‘weather’, ‘forest,’ and the other with the words
‘employee’, ‘health’ and ‘work’. LDA calculates the likelihood that the term ‘climate’ will be related to the term ‘emissions’, ‘weather’, ‘forest’ resulting in a topic
that can be labeled as an environmental topic.
16 The method has a long pre-history that includes early Latent Semantic Indexing (LSI) research by Deerwester et al. (1990) and Hoffman's (1999) probabilistic
Latent Semantic Indexing (pLSI) approach (Mohr & Bogdanov. 2013). I provide more information in part two.
Electronic copy available at: https://ssrn.com/abstract=4283860
17
MALLETS inbuilt function with standard hyperparameters. The challenge in selecting the
optimal number of topics is usually to find the optimal trade-off between topic relevance and
topic specificity while maintaining analytical interpretability (Yao et al., 2009). The topics in
models with fewer topics are usually wide and blend words from clearly distinct themes into a
single topic. On the other hand, topic models with more topics provide more distinguishable
subjects, but they can also be too narrowly focused. While several evaluation procedures for
topic models have been established to determine the optimal number of topics, I rely on a
coherence measure as proposed by Röder et al. (2015) and used in previous studies ((Mimno et
al., 2011)17.
In line with Atkins et al. (2012) and Bao & Datta (2014) I provide an intuitive label for
the topic outcomes by reading the high-probability words in key topics. I provide the label based
on whether a topic can be considered to relate to environmental, social or governance matters.
I also mark topics with ‘no clear assignment’ when I cannot readily identify an ESG topic18. In
general, I can label topics on their E, S, and G dimensions after reading trough the terms in the
topics. Although it requires qualitative and human subjectivity it does not involve extensive
labelling since, very often the term social, governance or environmental appear within the 15
highest occurring words in the topic words. As is common practice when using topic models, it
is helpful to create a variety of candidate topic models before using a validation procedures to
choose the model that best fits the particular research question (Griffiths et al., 2007). Other
than Goloshchapova et al (2019) who translate all non-English words into English I follow
Huang et al (2018) and remove non-English reports from my dataset entirely.
17 While I primarily use coherence as the measure to gauge the model, I follow previous research and use a visual tool (LDAvis) develo ped by Sievert and Shirley
Sievert, C., & Shirley, K. (2 014). LDAvis: A method for visualizing and interpreting topics. https://doi.org/10.13140/2.1.1394.3043 to get an impression of the overlap of
the topics. The method is primarily used to gain an impression of the outcomes. Chang et al (2009) offer caution when relying on the topic optimal parameter
selection solely by this approach. LDAvis helps me only to the extent of validating the outcome and primarily rely on coherence to select the model. It is therefore
used for validation purposes only. I check the top words generated for each topic to make sure they are comprehensible.
18 AlSumait et al., (2009) follows a similar approach and label ‘junk topics’ as topics that are too broad themes made up of generic terms or common adjectives
(AlSumait, L., Barbará, D., Gentle, J. E., & Domeniconi, C. (2009). Topic Significance Ranking of LDA Generative Models. ECML/PKDD, .
Electronic copy available at: https://ssrn.com/abstract=4283860
18
3.3. Pre-processing
The words in the text corpora, which are generally a series of (unicode) letter characters,
are the fundamental units of a text. For the use of LDA, it is necessary to pre-process each
corpus before obtaining results. For the pre-processing I use a commonly used libary (SpaCy),
which is an open-source software for professional text-processing (Jugran et al., 2021). The pre-
processing generally includes tokenizing the text which includes lowercasing the words,
removing punctuation and letter accents. I also lemmatize each word which is the process of
converting a word to its base form, or lemma, by removing just the inflectional endings (Khyani
& B S, 2021). Words like ‘used,’ ‘using,’ and ‘uses’ then are transformed to ‘use.’ Both integrated
and stand-alone reports may include words that are used together with a meaning that is distinct
when used together (Jurafsky & Martin, 2021)19. I therefore allow for these words to be treated
as one instance to be shown in the resulting topics20. I next remove all other stop words that are
used for connection and grammar but have no or little meaning. Those words are for instance
‘the’, ‘which,’ ‘on,’ ‘in’ and contain little informational value. I rely on pre-existing libraries and
follow common practice to remove these words for the analysis (Maier et al., 2018). In addition,
I compute word frequencies for the ten most occurring words separately for integrated
standalone reports and add them to extend the list of potential stop words. For example, words
such as ‘report’ or ‘page’ may apply to all reports in the sample and infer little meaning. I
therefore add words such as ‘report’ and ‘page’ or ‘year’ to the list of stop words. An overview
of the high frequency words can be found in the appendix. Finally, I add all company names to
the list of stop-words for the same reason as connecting words. Following Goloshchapova et
al. (2019) some common but non-recognizable words, such as ‘ve,’ ‘re,’ etc., appear as words in
the topic results, so I remove them in an iterative process, too. Finally, I specify word attributes
19 An example of a trigram would be ‘chief financial officer’.
20 These groups of 'n' words can then appear as one single instance in the topic output. For implementation I use G ensim’s Phrases model to identify
and build the bi- and trigrams, respectively. The two important arguments to the Phrases model are min_clount and threshold. The higher the
values of these parameters, the harder it is for words to be combined to bi-/trigrams. I initially leave these parameters at their standard
setting.
Electronic copy available at: https://ssrn.com/abstract=4283860
19
that should be allowed as input parameters for the topic model. In particular, I only allow nouns,
adjectives and verbs to be considered as input data. After these steps I create a dictionary with
the remaining words. With the dictionary I am able to compute the topic results.
4. Results
4.1. Results of the descriptive analysis
Table 1 shows both integrated and stand-alone reports based on their framework
adherence. A detailed breakdown on the guideline adherence in the dataset can be found in
Panel A and B of table 1. As can be seen in the table, firms that include their non-financial
information in their financial statement, generally adhere to GRI standards. When the company
issues a separate stand-alone report, it always follows GRI standards21. When a firm issues an
integrated report, the firm is more likely to follow IIRC guidelines (36.6%) than when it issues
a stand-alone report (4.5%). This result may come as a surprise since companies following IIRC
standards would not be considered to issue a separate sustainability report. However, by closer
inspection into these reports, a company may explicitly declare that the separate report include
some aspects of the IIRC. The Corporate Register therefore marks them as following IIRC
guidelines, accordingly.
Table 1: Integrated and standalone reports by GRI and IIRC adherence
Panel A: GRI adherence
No
Yes
Total
Integrated Report
464 (20.6%)
1,784 (79.4%)
2,248 (100%)
Stand-alone Report
0 (0%)
3,567 (100%)
3,567 (100%)
Total
464
5,351
5,815
Panel B: IIRC adherence
Integrated Report
1,425 (63.4%)
823 (36.6%)
2,248 (100%)
Stand-alone Report
3,405 (95.5%)
162 (4.5%)
3,567 (100%)
21 This result may stem in part because of Corporate Register’s past affiliation with the official GRI Register.
The company started its operations, by collecting only stand-alone reports that follow GRI guidelines. More
information can be found on: CorporateRegister.com About the GRI Register
Electronic copy available at: https://ssrn.com/abstract=4283860
20
Total
4,830
985
5,815
Notes: The table provides information on the GRI and IIRC adherence of a company in the entire dataset. The
adherence is defined by a company indicating their compliance in the report and categorized accordingly by Corporate
Register.
Figure 1 shows the proportion of integrated reports in relation to the entire number of
reports over time and industry. As can be seen, the number of companies that issue an integrated
report steadily increases, starting at 36.9% in 2015 and surpassing 50% at the end of 2021 and
the beginning of 2022. Panel B of figure 1 presents both integrated and stand-alone reports that
are issued by companies in their industries. As can be inferred from the graph, the largest
proportion of industries generally represented in the sample are industrial firms, followed by
financial companies and consumer goods. Firms in the health care, basic materials, tele-
communications and the financial sector, on average disclose their sustainability information
using an integrated framework. More than half of the companies (53%) issue a report that is
assured by an independent third party.
Figure 1: Integrated and standalone reports over time and industry
Electronic copy available at: https://ssrn.com/abstract=4283860
21
Panel A Panel B
Notes: Figure one shows the absolute amount of integrated and stand-alone sustainability reports over time (Panel A). At the time
of the data collection not all companies had issued an integrated (stand-alone) report which explains the drop in year 2022 in panel
A. Panel B shows the number of reports by report type and industry. The largest proportion of industries presented in the sample are
industrial firms, followed by financial companies and consumer goods. Telecommunications and Health Care are represented the
least. The figure is based on 5,815 observations from which 2,248 reports are categorized by Corporate Register as integrated reports
and 3,567 reports are labeled as stand-alone sustainability reports.
A company may issue more than one report in a given year. This can be for instance
firms that in one year produces an integrated report and voluntarily issues a separate
sustainability report. Table 2 indicates, in absolute terms, whether companies produce several
reports. In total, I observe 340 companies (5.85%) that issue two reports and 27 firms which
produce more than two reports in a single year. A company that issues several reports may for
instance issue a separate sustainability report and a GRI content index. While the constellations
may differ in terms of the content, it shows the general heterogeneity of companies in the EU
of firms being able to issue an array of different reports. It also happens that a company
mayissues two stand-alone reports that are similar which both contain ESG information. One
Italian company in the Oil and Gas industry for instance has two separate reports labeling one
“sustainability performance” and the other “sustainability report” with some, but marginally
differences in the information presented. An overview of companies that produce one, two or
three reports is presented in table 2.
Electronic copy available at: https://ssrn.com/abstract=4283860
22
Table 2: Companies issueing more than one report
integrated
Stand-alone
total
Percent
1
2,099
3,349
5448
94.38
2
140
200
340
5.85
3
9
18
27
0.46
total
2,248
3,567
5,815
100.00
Notes: The table shows observations at the company level of firms issuing only one, two or three reports in a single year.
It indicates how many reports are issued by a company. For instance, there are 9 instances in which a firm discloses an
integrated report after having issued another integrated report. It is to be understood in absolute terms. In summary, there
are a total of 367 firms that issue more than two different reports in the same year.
Table three reports textual attributes of integrated and stand-alone reports. In particular, I compute for
each observation the total number of pages as well the word count, and readability scores. Table three
shows that, on average integrated ports are longer and show generally lower readability score as measured
by a higher (lower) Gunning Fog (Flesch Reading Ease) index. This result is generally in line with
previous literature (du Toit, 2017; Stone & Lodhia, 2019). For firms switching the report type from one
year to the next, I report results of the descriptive analysis in Panel B od Table three. While the average
page length does not seem to be very different, the differences in readability indices seem to be larger.
Table 3: descriptive analysis of integrated and standalone reports
Panel A
Readability
Word count
Report type Total
reports
average
pages
Gunning
Fog
Flesch
Reading
Ease
Before
After
Integrated
2248
211
218.05
-431.05
2.756
0.048
Stand-alone
3567
87
73.31
-72.18
1.350
0.075
difference
144,74***
(<0.001)
358,57***
(<0.001)
Total
5815
135
4,106
124,094
Panel B
Pre-switch
(stand-alone)
158
85
69.2
-73.1
59.92
0.015
Post-switch
(integrated)
158
215
304.02
-649.17
189.97
0.011
difference
234,82***
(<0.001)
576,07***
(<0.001)
Total
316
150
249.89
26,210
Notes: Panel A in table shows descriptive statistics for all 2,248 (3,567) integrated (standalone) reports in the sample. The Gunning
fog and Flesch Reading Ease are used to measure the readability of the reports before pre-processing. Exhibit 1 provides background on
the readability indices. Panel B shows textual characteristics for all instances when a firm switches report. The last two columns report the
character counts before pre-processing and after pre-processing, respectively. Word counts are in million of words.Word count are in million
of words. The null hypothesis in the two sided t-test is the situation in which there are no differences. *** indicate statistical differences at
the 1% level. P-values are in parentheses.
Electronic copy available at: https://ssrn.com/abstract=4283860
23
4.2.Results of the topic model
Tables 8 and 9 in the appendix report the outcomes of the topic model of stand-alone and
integrated reports, separately. The topic order has no significance. I start with the interpretation
of the topics for stand-alone reports before discussing the outcomes for integrated reports. For
stand-alone reports the optimal number of topics prove to be 36. I can assign an ESG label for
28 topics (77%) of all 3,567 standalone reports. In general, some topics discuss an ESG theme
more prominently than other topics. After labeling the topics across the ESG dimension I obtain
17 topics that I can link to environmental matters, 7 topics to social and 4 topics are governance
related. In other words, I obtain a rather large proportion of Environmental themes that run
through the reports. Topic two for instance is a good example on how firms disclose
information on their environmental matters. The topic includes words such as ‘climate’,
‘sustainable’, ‘carbon’, ‘target’ which can be clearly linked to a company discussing their climate-
related targets. Some of the environmental topics appear to be disclosed in relation to packaging
as topic 17 shows. Words such as ‘paper’, ‘recylce’, ‘waste’, ‘renewable’ and ‘plastic’ appear
together with words such as ‘circular’ or ‘sustainable’. Yet other environmental topics are
disclosed in relation to natural resources as topic 31 illustrates. Words such as ‘food’, ‘water’ and
‘plant’ appear together with ‘biodiversity’, ‘supply’ and ‘sourcing’. In terms of topics that are
related to governance I observe words such as ‘compliance’ and ‘policies’ that appear in
conjunction with gender (topic 36) or employee wellbeing (topic 1). While governance-related
topics may include terms regarding responsibility, they also have a high likelihood of appearing
with words such as ‘performance’, ‘targets’ or ‘results’ (topic 24). This tendency can also be seen,
albeit to a lesser extent in topics around social matters. In general, topics on societal matters
generally include words such as ‘community’, ‘human’, ‘people’ but may also appear in
conjunction with marketing related words such as ‘brand’, ‘market’ or ‘sale’. Yet other topics
hardly contain any ESG related terms at all. Topic 9 which is on real estate has words such as
‘building’, ‘construction’, ‘property’, ‘office’, etc. or topic 22 which is on infrastructure
Electronic copy available at: https://ssrn.com/abstract=4283860
24
containing words such as ‘networks’, ‘communication’, ‘security’ and ‘platform’. Finally, some
topics may at first be attributed to one of the three ESG categories.
I next turn to the topic outcomes of 2,248 integrated reports and report the results in
table 9. This time, I add for each of the identified topic the dominant file that contributes the
largest percentage to one single topic. For integrated reports the optimal number of topics is 35.
Other than for stand-alone reports in which I could clearly provide a label for 28 topics, I am
only able to label 16 instances (46%). After labeling the topics across the three ESG dimensions
I obtain 6 topics for environmental matters, 4 topics to social and 6 to governance. In general,
I note that the topic outcomes for integrated reports are at first sight more specific. In particular,
the LDA algorithm seems to better identify the themes on individual subjects, however other
than ESG topics. There are several examples: Topic 14 discusses mobility and includes words
such as ‘car’, ‘rental’ or ‘vehicle’. Topic 28 discusses mining operations and has words such as
‘gold’ and ‘mineral’, but also ‘african’ and ‘reserve’. In terms of topics that I label as
environmental related, I find that they are predominantly constructed of words on energy (topic
27), resource sourcing (topic 13) but also related to performance and investment related terms
(topic 11). I find that topics on societal matters are relatively underrepresented and if they
appear, they make a rather low proportion to the overall topic. One example is topic 2 which
includes words such as ‘director’, ‘executive’, ‘chairman’. In general, ESG topics cannot be
identified as readily as for the outcomes of stand-alone reports. The topic which features the
social dimension the most is perhaps topic 12. While the topic includes words such as ‘customer’
and ‘employee’ it also features ‘environmental’ and ‘governance’ and can therefore not
specifically be allocated to social topics only. In summary, topics on social matters seem to be
underrepresented in the results for integrated reports, relative to stand-alone reports.
Table 9 also shows each document that makes the largest proportion to a specific topic.
This allows me to analyze further to what industry and company the report belongs to and in
which year it was published. The contribution in percentage can be understood as the proportion
of the words within the report that influence the establishment of the topic. Because any report
Electronic copy available at: https://ssrn.com/abstract=4283860
25
in LDA can be described as a combination of topics, this approach is useful in investigating a
report that predominantly features one specific topic. A topic will appear as the main topic in a
document more frequently the higher its contribution. Topic 5 is an energy company that
focuses on renewable energy and is therefore said to contribute the most on words such as
‘electricity’, ‘power’, ‘renewable’ etc. While tracing back topics to their original documents can
be used to drill down on the determinants of topics, I am more interested in the change of topics
when a firm switches from a stand-alone report to and integrated report.
5.2 Results of firms switching report type
I next report the results of the reports in which I observe a change in report type. In total I
observe 158 instances in which firms change from a stand-alone to an integrated report. Of the
topic outcomes before a company switches (stand-alone) I obtain 31 topics as ideal number.
For an integrated report that followed a stand-alone report I compute 21 topics to be the
optimum. I report the results of the topic model before and after the switch in tables 10 and 11
in the appendix.
For stand-alone reports that precede an integrated report in the following year I can
label five topics as social, 10 topics as environmental and 8 topics as governance topics. I am
not able to assign 6 of the topics to an ESG dimension and mark them accordingly. After the
switch I can only label 6 topics as environmental, 3 related to governance and 3 to social. Based
on the words that appear for the remaining 9 topics, I am not able to readily assign an ESG
label. Analog to the results in table 8 and 9, some topics can be considered highly distinctive
(topic 22, topic 16). While I am not able to provide a distinct ESG label to some of the reports,
they are more likely to be attributable to an ESG topic prior to a switch. One example is the
topic on corporate social responsibility, topic 13. While the topic includes words such as energy
or production, they cannot be directly related to the environmental dimension, primarily because
of other words such as ‘ethic’ or ‘risk’. In table 10 I am also reporting the individual reports with
their largest contribution. Topic number 16 that I label as an environmental topic contains the
Electronic copy available at: https://ssrn.com/abstract=4283860
26
words ‘emission’, ‘carbon’ but also ‘fuel’ and ‘aircraft’. The report contributes 36.39% to this
topic. Looking into this report, reveals that the company issuing this report before the switch is
a Swedish company in the airline industry. Similarly, Topic number 22 with words such as ‘sea’,
‘farming’, ‘fish’ and ‘salmon’, is a Norwegian firm operating in the fishing sector. In summary,
it can be noted that the corpus, when constructed of stand-alone reports is more likely to contain
topics across the ESG dimensions, whereas it is less likely to contain topics around social issues
when the firm issues an integrated report.
5. Discussion and Conclusion
Although the NFRD aims to address the lack of materiality that users of sustainability
reports had previousely critizised it does not specify to what extent thematic contents have to
be represented in either one of the two reporting types. Focusing on materiality in the reporting
of non-financial information is considered an important as well as an effective remedy against
the threat of information overload and managers cherry picking information. In some cases,
especially when ESG is understood as a corporate response to societal demands, corporate
sustainability reporting takes on a voluntary nature, creating larger differences in the information
available to users of the disclosure (Steurer et al., 2012). While combining financial with non-
financial information in a single report it is said to improve corporate disclosure and
transparency, there is little empirical evidence, primarily from a textual analysis perspective
supporting this argument. This study uses topic modeling to address this issue and contribute
to the by analyzing 2,248 integrated and 3,567 stand-alone in terms of their thematic content
and textual attributes. In addition, I identify 158 instances where a company switches from
voluntarily providing a stand-alone sustainability report to issuing an integrated report. I find
that for a set of stand-alone sustainability reports, companies disclose relatively more topics
across all three ESG dimensions as to when they decide to issue their non-financial information
with their financial statements (i.e. produce an integrated report). When a company combines
non-financial with financial information, it is possible that the report is tailored to providers of
financial resources and therefore more likely to be read by investors than other stakeholders
Electronic copy available at: https://ssrn.com/abstract=4283860
27
(Alessandro et al., 2018). In addition, managers may be unaware or ignorant about a particular
piece of non-financial information (Matsumoto et al., 2011). This may lead companies to decide
not to disclose on specific topics, such as societal challenges.
5.1 Limitations and future research
Despite its advantages, the use of topic modeling is not without limitations. A primary
limitation, as well as other unsupervised learning methods is that of the optimal model selection
(Lewis and Grossetti, 2022). I rely on a single measure (coherence) to determine the optimal
number of topics. While visualizing the topics using LDAvis can help improve model selection,
I use it only for exploratory purposes. The manual label I provide for each topics, is not without
limitations either since this manual procedure is subject to my personal judgement (and error).
In some instances, it is difficult to provide a label based on an ESG dimension. When a topic
contains words such as ‘employee’, ‘community’ but also ‘water’ and ‘climate’ it is unclear
whether to categorize the topic as an environmental or social topic. In the cases where I have
been unsure, I relied on the first words which generally have a higher proportion of describing
the topic. This limitation may be overcome by having other researchers label the topics. I also
accept, that the topics may not capture all the available information present in the reports. The
nature of the LDA takes only textual information as inputs and does not allow for tables and
figures to be used as input data. Figures, however, that stand out with colors may ensure that
key non-financial information is communicated in a visual appealing way (Eccles, 2010). Images
and graphs may also influence the perception of end-users and improve the usefulness for
stakeholders relying on this information. In addition, companies may use their corporate
websites to make information easily accessible to a broader audience. Here, too I am unable to
provide insight on how this information is presented and used by stakeholders. To further
understand the textual differences between different report types, further studies should aim at
taking all available disclosure channels to judge the effectiveness of corporate sustainability
information. In addition, finding ways to better interpret topic outcomes by applying wordlist
Electronic copy available at: https://ssrn.com/abstract=4283860
28
to compare the output seem to be promising. Baier et al., (2018) offer such as word list which
can be a starting point for further research.
5.2. concluding remarks
Under current EU regulations, firms with a certain size are required to disclose their
sustainability information in their management reports but are generally free to decide how they
do so or which guidelines to follow. Companies can either issue their non-financial information
with their financial statement or disclose a stand-alone sustainability report. Sustainability
reports in the EU can vary on a variety of dimensions, including length and information content
such as the topics discussed. Analyzing differences in sustainability disclosure is important
because they can have direct and indirect effects on stakeholders (Roberts, 1992). The
requirement to disclose material issues related to environmental policies and risk-management
is of particular relevance, not least because of its potential to trigger better internal sustainability
awareness, better resources' management and usefulness for end-users with little prior familiarity
with reading corporate disclosures (European Commission, 2017). Using topic modeling that
can generatively go beyond the conventional boundaries of qualitative analysis this study
provides insights on topics that, given the large number of topics may go undetected. Corporate
sustainability reports cover a wide range of topics, such as the company’s current and future
ESG activities, adherence to regulatory guidelines and their environmental and societal impact.
In this analysis, I show that firms producing an integrated report disclose less topics on ESG
matters and especially about social issues. In addition, I show that the effect could potentially
be stronger when a firm switches from a stand-alone report to an integrated reporting format.
Finally, the readability generally decreases after the switch which may leave users of sustainability
reports without the information they would need or desire.
Electronic copy available at: https://ssrn.com/abstract=4283860
29
References
Alessandro, L., Melloni, G., & Stacchezzini, R. (2018). Integrated reporting and narrative
accountability: the role of preparers. Accounting, Auditing & Accountability
Journal, 31. https://doi.org/10.1108/AAAJ-08-2016-2674
AlSumait, L., Barbará, D., Gentle, J. E., & Domeniconi, C. (2009). Topic Significance
Ranking of LDA Generative Models. ECML/PKDD,
Atkins, D., Rubin, T., Steyvers, M., Doeden, M., Baucom, B., & Christensen, A. (2012).
Topic Models: A Novel Method for Modeling Couple and Family Text Data.
Journal of family psychology : JFP : journal of the Division of Family Psychology
of the American Psychological Association (Division 43), 26, 816-827.
https://doi.org/10.1037/a0029607
Baier, P., Berninger, M., & Kiesel, F. (2018). Environmental, Social and Governance
Reporting in Annual Reports: A Textual Analysis. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.3206751
Bao, Y., & Datta, A. (2014). Simultaneously Discovering and Quantifying Risk Types
from Textual Risk Disclosures. Management Science, 60, 1371-1391.
https://doi.org/10.1287/mnsc.2014.1930
Baumüller, J. (2018). Ziele und Inhalte der nichtfinanziellen Berichterstattung. 2, 94.
Baumüller, J., & Schaffhauser-Linzatti, M.-M. (2018a). In search of materiality for
nonfinancial information—reporting requirements of the Directive 2014/95/EU.
NachhaltigkeitsManagementForum | Sustainability Management Forum, 26(1),
101-111. https://doi.org/10.1007/s00550-018-0473-z
Baumüller, J., & Schaffhauser-Linzatti, M.-M. (2018b). In search of materiality for
nonfinancial information—reporting requirements of the Directive 2014/95/EU.
NachhaltigkeitsManagementForum | Sustainability Management Forum, 26.
https://doi.org/10.1007/s00550-018-0473-z
Bellstam, G., Bhagat, S., & Cookson, J. A. (2021). A Text-Based Analysis of Corporate
Innovation. SPGMI: Compustat Fundamentals (Topic).
Blei, D., Ng, A., & Jordan, M. (2001). Latent Dirichlet Allocation (Vol. 3).
Brown, N., Crowley, R., & Elliott, W. (2019). What Are You Saying? Using topic to
Detect Financial Misreporting. Journal of Accounting Research, 58.
https://doi.org/10.1111/1475-679X.12294
Casella, G., & George, E. I. (1992). Explaining the Gibbs Sampler. The American
Statistician, 46(3), 167-174. https://doi.org/10.1080/00031305.1992.10475878
Chae, B., & Park, E. (2018). Corporate Social Responsibility (CSR): A Survey of Topics
and Trends Using Twitter Data and Topic Modeling. Sustainability, 10(7), 2231.
Chen, Y., Rabbani, M., Gupta, A., & Zaki, M. (2017). Comparative text analytics via topic
modeling in banking. https://doi.org/10.1109/SSCI.2017.8280945
Christensen, H., Hail, L., & Leuz, C. (2019). Adoption of CSR and Sustainability
Reporting Standards: Economic Analysis and Review. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.3427748
Christensen, H. B., Hail, L., & Leuz, C. (2021). Mandatory CSR and sustainability
reporting: economic analysis and literature review. Review of Accounting Studies,
26(3), 1176-1248. https://doi.org/10.1007/s11142-021-09609-5
Commission, E. (2017). Non-financial Reporting Directive. Retrieved from Non-financial
Reporting Directive (europa.eu)
Cuomo, F., Gaia, S., Girardone, C., & Piserà, S. (2022). The effects of the EU non-
financial reporting directive on corporate social responsibility. The European
Journal of Finance, 1-27. https://doi.org/10.1080/1351847X.2022.2113812
Electronic copy available at: https://ssrn.com/abstract=4283860
30
DiMaggio, P., Nag, M., & Blei, D. M. (2013). Exploiting affinities between topic modeling
and the sociological perspective on culture: Application to newspaper coverage of
U.S. government arts funding. Poetics, 41, 570-606.
Dinh, T., Husmann, A., & Melloni, G. (2021). The role of non-financial performance
indicators and integrated reporting in achieving sustainable value creation. In.
European Parliament: European Union, 2021.
Du, S., & Bhattacharya, C. B. (2010). Maximizing Business Returns to Corporate Social
Responsibility (CSR): The Role of CSR Communication. International Journal of
Management Reviews, 12. https://doi.org/10.1111/j.1468-2370.2009.00276.x
du Toit, E. (2017). The readability of integrated reports. Meditari Accountancy Research,
25, 00-00. https://doi.org/10.1108/MEDAR-07-2017-0165
Dye, R. A. (1985). Disclosure of Nonproprietary Information. Journal of Accounting
Research, 23(1), 123-145. https://doi.org/10.2307/2490910
Dyer, T., Lang, M., & Stice-Lawrence, L. (2017). The Evolution of 10-K Textual
Disclosure: Evidence from Latent Dirichlet Allocation. Journal of Accounting and
Economics, 64. https://doi.org/10.1016/j.jacceco.2017.07.002
EC. (2017). Guidelines on non-financial reporting: Supplement on reporting climate-
related information. In: Official Journal of the European Union.
EC. (2021). Study on the non-financial reporting directive : final report. Publications
Office. https://doi.org/doi/10.2874/229601
Eccles, R., Krzus, M., & Solano, C. (2019). A Comparative Analysis of Integrated
Reporting in Ten Countries. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.3345590
Eccles, R. G. (2010). One report : integrated reporting for a sustainable strategy. John
Wiley & Sons.
EU. (2014). Directive 2014/95/EU of the European Parliament and of the Council of 22
October 2014 amending Directive 2013/34/EU as regards disclosure of non-
financial and diversity information by certain large undertakings and groups.
Official Journal of the European Union, 57.
European Commission, T. (2017). Directive 2014/95/EU: Impact assessment
accompanying the original proposal from the Commission. In: European
Commission.
European Union, T. (2013). Directive 2013/34/EU of the European Parliament and of the
Council of 26 June 2013 on the annual financial statements, consolidated financial
statements and related reports of certain types of undertakings. In (Vol. 56). Official
Journal of the European Union: European Union.
European Union, T. (2014). Directive 2014/95/EU of the European Parliament and of the
Council of 22 October 2014 amending Directive 2013/34/EU as regards disclosure
of non-financial and diversity information by certain large undertakings and groups.
Official Journal of the European Union, 57.
Flower, J. (2015). The International Integrated Reporting Council: A story of failure.
Critical Perspectives on Accounting, 27, 1-17.
https://doi.org/https://doi.org/10.1016/j.cpa.2014.07.002
Garst, J., Maas, K., & Suijs, J. (2022). Materiality Assessment Is an Art, Not a Science:
Selecting ESG Topics for Sustainability Reports. California Management Review,
00081256221120692. https://doi.org/10.1177/00081256221120692
Goloshchapova, I., Poon, S.-H., Pritchard, M., & Reed, P. (2019). Corporate social
responsibility reports: topic analysis and big data approach. The European Journal
of Finance, 25, 1-18. https://doi.org/10.1080/1351847X.2019.1572637
Graham, S., Weingart, S., & Milligan, I. (2012). Getting Started with Topic Modeling and
MALLET. The Programming Historian. https://doi.org/10.46430/phen0017
Electronic copy available at: https://ssrn.com/abstract=4283860
31
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic
representation. Psychological review, 114 2, 211-244.
Hagen, L. (2018). Content analysis of e-petitions with topic modeling: How to train and
evaluate LDA models? Information Processing & Management, 54(6), 1292-1307.
https://doi.org/https://doi.org/10.1016/j.ipm.2018.05.006
Hoberg, G., & Lewis, C. (2017). Do fraudulent firms produce abnormal disclosure?
Journal of Corporate Finance, 43, 58-85.
https://doi.org/10.1016/j.jcorpfin.2016.12.007
Huang, A., Lehavy, R., Zang, A., & Zheng, R. (2017). Analyst Information Discovery and
Interpretation Roles: A Topic Modeling Approach. Management Science, 64.
https://doi.org/10.1287/mnsc.2017.2751
Huang, A. H., Lehavy, R., Zang, A. Y., & Zheng, R. (2018). Analyst Information
Discovery and Interpretation Roles: A Topic Modeling Approach. Manag. Sci., 64,
2833-2855.
Humphrey, C., O’Dwyer, B., & Unerman, J. (2017). Re-theorizing the configuration of
organizational fields: the IIRC and the pursuit of ‘Enlightened’ corporate reporting.
Accounting and Business Research, 47(1), 30-63.
https://doi.org/10.1080/00014788.2016.1198683
IIRC. (2013). The International <IR> Framework. In: International Integrated Reporting
Council.
Israelsen, R. (2014). Tell It Like It Is: Disclosed Risks and Factor Portfolios. SSRN
Electronic Journal. https://doi.org/10.2139/ssrn.2504522
Jugran, S., Kumar, A., Tyagi, B., & Anand, V. (2021). Extractive Automatic Text
Summarization using SpaCy in Python & NLP.
https://doi.org/10.1109/ICACITE51222.2021.9404712
Jurafsky, D., & Martin, J. H. (2021). N-gram Language Models. In (3 ed.).
Khyani, D., & B S, S. (2021). An Interpretation of Lemmatization and Stemming in
Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of
University of Shanghai for Science and Technology, 22, 350-357.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., . . . Adam,
S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a
Valid and Reliable Methodology. Communication Methods and Measures, 12, 1-
26. https://doi.org/10.1080/19312458.2018.1430754
Maniora, J. (2017). Is Integrated Reporting Really the Superior Mechanism for the
Integration of Ethics into the Core Business Model? An Empirical Analysis.
Journal of Business Ethics, 140. https://doi.org/10.1007/s10551-015-2874-z
Matsumoto, D., Pronk, M., & Roelofsen, E. (2011). What Makes Conference Calls Useful?
The Information Content of Managers' Presentations and Analysts' Discussion
Sessions. The Accounting Review, 86(4), 1383-1414.
McCallum, A. K. (2002). MALLET:A Machine Learning for Language Toolkit.
http://mallet.cs.umass.edu.
Melloni, G., & Stacchezzini, R. (2014). Corporate Sustainable Development: Is “Integrated
Reporting” a Legitimation Strategy? Business Strategy and the Environment,
accepted for publication on the. https://doi.org/10.1002/bse.1863
Milla, A., & Haberl-Arkhurst, B. (2018). Wesentlichkeitsanalyse in der nichtfinanziellen
Berichterstattung. In. RWZ.
Mimno, D., Wallach, H. M., Talley, E. M., Leenders, M., & McCallum, A. (2011).
Optimizing Semantic Coherence in Topic Models. Conference on Empirical
Methods in Natural Language Processing,
Mitchell, R. K., Agle, B. R., & Wood, D. J. (1997). Toward a Theory of Stakeholder
Identification and Salience: Defining the Principle of who and What Really Counts.
Academy of Management Review, 22, 853-886.
Electronic copy available at: https://ssrn.com/abstract=4283860
32
Mohr, J., & Bogdanov, P. (2013). Introduction-Topic Models: What They Are and Why
They Matter. Poetics, 41, 545–569. https://doi.org/10.1016/j.poetic.2013.10.001
Neumann, B. R., Cauvin, E., & Roberts, M. L. (2012). Management Control Systems
Dilemma: Reconciling Sustainability with Information Overload. In M. J. Epstein
& J. Y. Lee (Eds.), Advances in Management Accounting (Vol. 20, pp. 1-28).
Emerald Group Publishing Limited. https://doi.org/10.1108/S1474-
7871(2012)0000020007
Nguyen, E. (2014). Text Mining and Network Analysis of Digital Libraries in R.
Rupley, K. H., Brown, D., & Marshall, S. (2017). Evolution of corporate reporting: From
stand-alone corporate social responsibility reporting to integrated reporting.
Research in Accounting Regulation, 29(2), 172-176.
https://doi.org/https://doi.org/10.1016/j.racreg.2017.09.010
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics.
https://doi.org/10.13140/2.1.1394.3043
Simnett, R., & Huggins, A. (2015). Integrated reporting and assurance: Where can research
add value? Sustainability Accounting, Management and Policy Journal, 6, 29-53.
https://doi.org/10.1108/SAMPJ-09-2014-0053
Stawinoga, M. (2017). Die Richtlinie 2014/95/EU und das CSR-Richtlinie-
Umsetzungsgesetz – Eine normative Analyse des Transformationsprozesses sowie
daraus resultierender Implikationen für die Rechnungslegungs- und Prüfungspraxis.
uwf UmweltWirtschaftsForum, 25, 213-227. https://doi.org/10.1007/s00550-017-
0463-6
Stawinoga, M., & Velte, P. (2017). Empirical evidence of the disclosure and assurance of
Integrated Reporting - A content analysis of the IIRC Examples Database.
Zeitschrift für Umweltpolitik und Umweltrecht (ZfU), 40, 59-84.
Steurer, R., Martinuzzi, A., & Margula, S. (2012). Public Policies on CSR in Europe:
Themes, Instruments, and Regional Differences. Corporate Social Responsibility
and Environmental Management, 19. https://doi.org/10.1002/csr.264
Stone, G. W., & Lodhia, S. (2019). Readability of integrated reports: an exploratory global
study. Accounting, auditing and accountability., 32(5), 1532-1557.
https://doi.org/10.1108/AAAJ-10-2015-2275
Székely, N., & Brocke, J. v. (2017). What can we learn from corporate sustainability
reporting? Deriving propositions for research and practice from over 9,500
corporate sustainability reports published between 1999 and 2015 using topic
modelling technique. PLOS ONE, 12. https://doi.org/10.1371/journal.pone.0174807
Tirunillai, S., & Tellis, G. (2014). Mining Marketing Meaning from Online Chatter:
Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation. Journal of
Marketing Research, 51, 463-479. https://doi.org/10.1509/jmr.12.0106
Vitolla, F., & Raimo, N. (2018). Adoption of Integrated Reporting: Reasons and Benefits-
A Case Study Analysis. International Journal of Business and Management, Vol.
13, 244-250. https://doi.org/10.5539/ijbm.v13n12p244
Yao, L., Mimno, D., & McCallum, A. (2009). Efficient methods for topic model inference
on streaming document collections. KDD,
Řehůřek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large
Corpora. https://doi.org/10.13140/2.1.2393.1847
Electronic copy available at: https://ssrn.com/abstract=4283860
33
Appendix
Table 1: Variable definition
Variable name
Description
company_no
Company number of the data provider that is used
report_no
A report number provided by Corporate Register that identifies each report.
name
The company name of the firm that issues a report
country
The country in which the reporting entity is headquartered.
isin
The International Securities Identification Number of a company that I use to
match each observation with fundamental information of the issuing firm.
industry
Industry Classification Benchmark (ICB) of the firm provides by Corporat
Register.
Title
The title oft he report
type
Whether a report is an integrated or a stand-alone report.
Year
The year in which the report was published.
gri
Binary variable equal to one if a firm adheres to the Global Reporting Initiative
(GRI) standard and zero otherwise
iirc
Binary variable equal to one if a firm adheres to the international integrated
reporting (IIRC) standard and zero otherwise
assurance
Whether the report was audited by an independent third party
file
The file name oft he report used to identify each report.
file_text
The file version of the textual file.
language
The language of the report
Notes: The table provides an overview of the variables that are used in the study, together with a description.
Electronic copy available at: https://ssrn.com/abstract=4283860
34
Table 2: Integrated and standalone sustainability reports by country
Country
Integrated
Stand-alone
Total
ITA
101
598
699
GER
149
488
637
SWE
378
229
607
GBR
136
437
573
FRA
403
132
535
ESP
206
237
443
CHE
116
314
430
NLD
225
124
349
FIN
132
190
322
NOR
91
125
216
AUS
69
105
174
GRC
21
138
159
BEL
79
72
151
POL
32
68
100
PRT
27
61
88
DNK
25
51
76
LUX
17
54
71
IRL
3
47
50
ROU
0
25
25
HUN
1
18
19
HRV
8
10
18
CYP
4
13
17
CZE
0
13
13
LTU
4
7
11
MLT
7
2
9
LIE
6
2
8
EST
5
2
7
SVN
2
4
6
ISL
1
1
2
Total
2,248
3,567
5,815
Notes: The table shows the absolute number of integrated and standalone reports in the sample, grouped by country. For
the countries I use three-character codes defined by the International Organization for Standardization (ISO).
Electronic copy available at: https://ssrn.com/abstract=4283860
35
Table 2.2: Summary Statistics – fundamental firm information
Variables
Obs
Mean
Sd.
Min
Max
p1
p99
Panel A: integrated
T. Assets
2,011
61,608.157
432,000
100.127
6,730,000
109.57
1,780,000
Cash
2,011
3,989.657
25,446.6
0
359,518
.088
109,000
EBIT
2,011
2,562.479
33,440.033
-454,000
909,000
-
3,063.798
58,787
ROA
2,011
.054
.158
-1.891
.901
-.553
.349
Panel B: stand-alone
T. Assets
3,189
771,000
10,500,000
0
3.34e+08
.14
5210,000
Cash
3,189
60,958.486
962,000
-.002
2.79e+07
0
293,000
EBIT
3,189
10,260.403
161,000
-454,000
5799042
-2016
189,058
ROA
3,189
-1.33
46.412
-2,525.309
3817861
-4.944
.329
Notes: The table provides fundamental firm observation of the sample. Panel A shows the fundamental annual
information for integrated reports and Panel B shows the information for stand-alone observations. Fundamental annual
information is taken from Compustat Global by matching the firms using their isin’s. Total assets are in millions.
Table 3: Frequently occurring words by report type
Integrated Reports
Standalone Reports
words
Frequency
words
Frequency
financial 79,931
report 12,1380
report
66,400
employee
91,758
year
64,745
management
87,214
risk
52,632
sustainability
70,981
share
51,084
business
67,822
group
50,929
group
67,629
company
50,773
company
62,278
value 46,752
year 59,615
board 44,263
risk 57,787
statement 43,719
use 57,533
Notes: The table shows the ten most frequently occurring words in the sample, separated for integrated and stand-alone
reports. Computing the most occurring words helps me to extend the list of stop-words and improve the results. . As
expected, the most occurring words in integrated reports are largely financial terms such as ‘value’, ‘risk’,
‘statement’ or ‘assets’. In contrast, words such as ‘employee’, ‘sustainability’ and ‘use’ feature most prominently
in stand-alone reports. I compute the high frequency words
Electronic copy available at: https://ssrn.com/abstract=4283860
36
Table 4: Integrated and stand-alone sustainability reports by industry
Description
Integrated
Stand-alone
Assured
Total
Industrials
583 (38%)
949 (62%)
800 (52%)
1532
Financials
486 (43%)
655 (57%)
589 (52%)
1141
Consumer Goods
293 (35%)
539 (65%)
464 (56%)
832
Basic Materials
238 (42%)
328 (38%)
292 (52%) 566
Consumer Services
208 (40%)
316 (60%)
271 (51%)
524
Utilities
105 (35%)
196 (65%)
164 (54%)
301
Oil & Gas
76 (32%)
162 (68%)
136 (57%)
238
Technology
82 (31%)
181 (69%)
138 (52%)
263
Health Care
95 (42%)
130 (58%)
107 (48%)
225
Telecommunications
79 (42%)
107 (58%)
135 (73%) 186
Other
3 (75%)
4 (25%)
6 (86%)
7
Total
2248
3567
3102 (53%)
5815
Notes: The table shows the absolute number of integrated and standalone reports by industry. In addition, column
four shows the absolute number of reports that are verified by an external assurer. A visual representation can be
found in the accompanying appendix 1.
Exhibit 1: Latent Dirichlet, Gibbs Sampling and MALLET
Latent Dirichlet Allocation is a topic modeling technique and unlike Latent Semantic Analysis, a fully
generative model, where documents are assumed to have been generated according to a per-document
topic distribution (with a Dirichlet prior) and per-topic word distribution (Řehůřek & Sojka, 2010).
Instead of using these distributions to generate random documents, the objective is to infer the
distributions from the observed documents. A document is a sequence of N words denoted by wn =
(w1,w2,...wN). Where wN is the nth word in the sequence. A corpus is a collection of M documents
denoted by D = (w1,w2,...wN. ). LDA is generally defined as follows:
(,,,,,)= (;
)(;
)(, |
), | ,
Where and define Dirichlet distributions define multinomial distributions. Z is the
vector with topics of all words in all documents. The LHS represents the total probability of the LDA
model. The first product over P() denotes the Dirichlet distribution of topics over terms, the second
product over P() the distribution over topics. The third product denotes the probability of a topic
appearing in a given document (,
) and the probability of a word appearing given a topic
, | ,
The Gibbs sampler as is used in the MALLET implementation is a technique for generating random
variables from a (marginal) distribution indirectly, without having to calculate the density (Casella &
George, 1992). In essence, Gibbs sampling is used to avoid difficult computations by replacing them
with a series of simpler calculations. All categorical variables that are dependent on a certain Dirichlet
prior are brought into dependence, and the resulting joint distribution of these variables is a Dirichlet-
multinomial distribution. In this distribution, the conditional distribution of a particular categorical
variable, conditioned on the others, acquires a very basic form, which makes Gibbs sampling much
simpler than it would be otherwise.
Electronic copy available at: https://ssrn.com/abstract=4283860
37
MALLET is used based on the idea of document streaming or the processing of corpora document
after document, in a memory independent fashion (Řehůřek & Sojka, 2010). In addition to the output
of the topic model, the java-based framework offers diagnostic data, such as topic-specific word
distribution in relation to corpus distribution (AlSumait et al., 2009). These quantitative traits can
supply extra information to help trace back topics to their documents and is used in subsequent
analyses22. In other words, from the output it can be referred which document contributed the largest
proportion to a specific topic. This is also helpful for inspecting reports that have a disproportionately
large influence on a topic or may be inconsistent in ambiguous circumstances to be labelled
accordingly. Exhibit 1 in the appendix provides an overview of the technical definition of how topics
are computed.
Exhibit 2: Readability indices
Gunning Fog index: A standard readability test to determine how easily a document can be read for its
target audience is the Gunning Fog Index. The Gunning fog index measures the readability of English
writing. The index estimates the years of formal education needed to understand the text on a first
reading. A fog index of 12 requires the reading level of a U.S. high school senior (around 18 years old).
The Gunning Fog index is defined as follows:
= 0.4
+100
Where complex words are those that have three or more syllables but are not compound words.
Compound words are made up of two or more individual words. For instance, the term
"multinational" combines the words "multi" and "national.”
Flesch Reading Ease: The Flesch Reading Ease gives a text a score between 1 and 100, with 100 being
the highest readability score. The Flesch readability tests work by considering sentence and word
counts. The mathematical formula underlying the test is as follows:
= 206.835 1.015
84.6
22 I use this the tracing back of topics primarily for exploratory purposes.
Electronic copy available at: https://ssrn.com/abstract=4283860
38
Figure 2: Inter-topic Distance Map
Panel A
Panel B
Notes: the figure shows a visual representation on the topic overlap used to identify the optimal number of topics (Panel
A). The circles represent the marginal topic distribution Panel B shows the top-30 most relevant terms for a topic and
include a representation of the overall term frequency and the estimated term frequency within the selected topic.
Figure 3: Coherence scores for stand-alone and integrated reports
Panel A
Panel B
Notes: The figure shows coherence scores for integrated (Panel A) and stand-alone reports (Panel B) for 85 LDA models
with a varying number of topics. The optimal number of topics identified for stand-alone reports is 39 and for integrated
report 54. To better gauge the relative degree of coherence across the topic results, I add a marker at a coherence score of
0.42 for comparison
Electronic copy available at: https://ssrn.com/abstract=4283860
39
Figure 4: Coherence scores for switching firms
Panel A
Panel B
Notes: The figure shows coherence scores for switching first pre (Panel A) and post switch (Panel B) for 100 different LDA
models and 158 instances. A pre-switch firm generally issues a standalone report while a post-switch is recognized as an integrated
report. To better gauge the relative degree of coherence across the topic results, I add a marker at a coherence score of 0.42 for
comparison.
Table 8: Topic outcomes for 3,567 stand-alone reports
Topic Terms per Topic ESG label
1
policy, compliance, process, ensure, conduct, standard, assessment, training, health, human, code, approach, performance, provide,
operation, review, key, site, local, supply, stakeholder, level, engagement, practice, base, requirement, support, identify, set, team, chain,
internal, princ iple, anti, water, issue, development, responsible, regulation
Governance
2
people, support, climate, target, sustainable, carbon, strategy, change, make, continue, community, ensure, focus, provide, approach,
deliver, create, develop, future, progress, reduce, action, opportunity, build, goal, lead, performance, issue, increase, commitment,
challenge, engage, achieve, team, enable, reduction, set, positive, area
Environmental
3
conduct, responsibility, operation, service, reduce, development, good, target, corporate, number, base, environment, code, increase,
country, solution, consumption, area, principle, goal, focus, important, sustainable, develop, make, percent, responsible, personnel, sale,
unit, improve, offer, financial, part, efficiency, cent, chain, term, aim
Environmental
4
approach, topic, annual, governance, disclosure, index, standard, information, financial, performance, statement, stakeholder, social,
content, corporate, human, explanation, economic, compliance, evaluation, reporting, key, component, relate, assurance, section, esg,
assessment, climate, principle, responsible, sustainable, policy, board, number, strategy, engagement, operation, website
No clear assignment
5
key, environment, figure, people, waste, meet, market, social, close, chain, number, external, performance, culture, raw, overview, integrity,
mercede, sector, training, corporate, introduction, indicator, generate, project, stakeholder, topic, benz, online, consumption,
responsibility, community, basis, daimler, sale, methodology, document
Environmental
6
vehicle, road, mobility, plant, production, car, market, fuel, development, area, design, project, process, target, system, model, engine,
develop, training, drive, brand, improve, stakeholder, service, approach, worldwide, reduce, component, waste, electric, traffic, accident,
provide, base, local, infrastructure, construction, leve l, solution
Environmental
7
assure, applicable, corporate, aspect, compliance, responsibility, ist, shop, basis, mit, metsoo, ber, hat, nachhaltigkeit, rahmen, oberbank,
dabei, governance, thema, download, nachhaltige, durch, fileadmin, indikatoren, man, checkout, sich No clear assignment
8
patient, health, quality, pharmaceutical, healthcare, site, country, care, number, medical, disease, medicine, treatment, waste, water,
research, clinical, responsibility, activity, production, study, base, environment, drug, improve, service, supply, corporate, pharmacy,
develop, animal, relevant, sector, plant, people, information, human, home, science
Social
9
building, construction, consumption, portfolio, property, estate, office, tenant, project, real, number, area, development, home, colonial,
build, approach, term, service, unit, water, housing, sus tainable, performance, residential, s takeholder, social, h eating, high, meas ure, long,
site, space, key, board, client, intensity, electricity, increase
No clear assignment
10
program, site, organization, labor, water, environment, corporate, reduce, waste, performance, local, development, initiative, end, chain,
manufacturing, ton, supply, process, improve, design, social, people, recognize, health, quality, responsibility, increase, consumption,
solution, standard, center, innovation, rate, efficiency, high, sustainable, assessment, number
Environmental
11
centre, waste, asset, water, consumption, portfolio, shopping, performance, carbon, electricity, retail, tonne, scope, corporate, positive,
target, number, community, intensity, tenant, development, table, area, fuel, local, office, continue, reduction, annual, section, park,
engagement, change, manage, net, landlord, project, place, base
Environmental
12
flight, fuel, aircr aft, airport, service, clh, airline, passenger, engine, section, security, operation, aviation, number, fleet, system, quality,
social, ground, project, traffic, noise, term, maintenance, environment, diversity, route, base, efficiency, increase, plan, long, mtu,
stakeholder, cargo, human, goal, lufthansa, government
Environmental
13
oil, operation, performance, water, operate, local, production, carbon, environment, appendix, tonne, contractor, community, offshore,
activity, project, manage, health, asset, operational, spill, low, produce, industry, reduce, human, ship, programme, potential, vessel,
number, area, climate, site, sea, petroleum, process, provide, fuel
Environmental
14
exhibition, event, site, service, waste, security, space, soltec, procedure, avio, exhibitor, activity, visitor, number, water, system, relate,
information, manage, sector, plan, provide, fiera, female, organise, stand, area, control, generate, male, development, order, follow, design,
client, corpo rate, mobility, lead, health
Environmental
15
disclosure, aspect, significant, standard, number, principle, organization, approach, indicator, percentage, operation, note, social,
governance, human, high, economic, activity, body, category, relate, performance, specific, policy, manage, community, type, service,
practice, gender, country, concern, grievance, stakeholder, mechanism, information, water, criterion, local
Social
16
tower, totale, informazioni, ruo, persone, responsabilit, genere, due, modello, servizi, rapporto, dal, allo, sociale, rete, sistema, essere, era,
propri, possono, governance, presente, tale, time, turnover, relativi, sede, diverse, agam, standard, questi, salute, processo, aree, clienti,
fonti, procedure, contratto, doof
No clear assignment
17
packaging, sustainable, chain, paper, water, circular, recycle, make, production, waste, store, policy, supply, change, renewable, fair,
standard, strategy, brand, plastic, factory, raw, social, good, goal, design, base, industry, source, equal, lead, approach, process, operation,
performance, textile, fibre, forest
Environmental
18
people, store, brand, hear, quality, care, design, hearing, make, pattern, production, market, offer, experience, consumption, initiative,
social, high, consumer, approach, relate, support, promote, order, sale, create, country, good, talent, world, stakeholder, fashion, base,
manufacturer, professional, activity, retail, aspect, start
Social
Electronic copy available at: https://ssrn.com/abstract=4283860
40
19
fish, production, forest, feed, salmon, focus, important, water, reduce, good, produce, high, increase, facility, area, project, result,
development, health, make, raw, industry, farm, quality, food, large, operation, environment, develop, farming, day, consumer, ensure,
effort, part, number, target, pulp, process
Environmental
20
registration, document, home, service, communication, csr, connect, externally, pour, technicoloro, site, information, assure, odd,
compact, entertainment, travail, division, universal, box, annexe, performance, directive, mission, rse, rapport, vigilance, film, party,
condition, durable, plan, talent, politique, ment, challenge
No clear assignment
21
community, water, local, operation, development, mining, project, mine, health, stakeholder, programme, environment, plan, site, social,
number, area, contractor, level, performance, training, continue, provide, government, engagement, economic, biodiversity, workforce,
increase, incident, waste, process, sustainable, relate, standard, manage, closure, change, operate
Environmental
22
service, digital, security, network, solution, information, base, offer, market, internet, mobile, innovation, communication, experience,
platform, privacy, development, number, people, performance, user, team, key, provide, stakeholder, training, consumption, office,
protection, program, share, skill, organization, create, project, infrastructure, time, device
No clear assignment
23
plant, water, electricity, power, production, responsibility, waste, renewable, project, process, indicator, social, fuel, relate, economic, base,
target, governance, generation, supply, country, development, performance, source, sustainable, unit, efficiency, produce, change,
industrial, climate, local, natural, area, site, generate, level, system
Environmental
24
compliance, measure, board, corporate, process, protection, standard, training, system, responsibility, requirement, part, information,
development, topic, time, financial, addition, location, make, figure, basis, strategy, activity, consumption, offer, area, key, social, ensure,
conduct, base, important, relevant, order, support, target, result
Governance
25
production, site, chemical, raw, process, water, groupos, section, development, performance, base, sustainable, waste, reduce, innovation,
number, financial, environment, substance, industrial, market, plant, increase, progress, result, high, relate, sale, develop, application,
information, health, measure, communication, internal, target, area, social, document
Environmental
26
corporate, social, responsibility, policy, commitment, information, action, plan, director, activity, compliance, training, csr, chapter, area,
development, project, responsible, model, environment, process, main, measure, good, professional, ethic, committee, promote, service,
quality, make, communication, internal, governance, health, financial, remuneration, woman, implement
Social
27
appendix, supply, people, network, chain, service, country, mobile, programme, sustainable, market, health, number, woman, introduction,
information, operate, provide, skill, local, human, principle, integrity, practice, base, job, government, communication, privacy, policy ,
digital, conduct , reduce, high, time, law, site, enable, code
Social
28
service, project, activity, area, system, carry, involve, aim, economic, plan, initiative, regard, make, order, result, increase, internal, level,
term, event, issue, objective, training, meeting, network, structure, follow, provide, board, local, process, aspect, social, communication,
time, sector, offer, specific, director
Social
29
investment, financial, esg, client, service, asset, policy, banking, sustainable, responsible, fund, market, principle, corporate, credit,
portfolio, information, support, climate, social, base, insurance, provide, term, loan, change, relate, capital, offer, sector, tax, strategy,
board, private, number, governance, framework, activity, annual
No clear assignment
30
production, water, plant, site, solution, raw, reduce, high, target, consumption, process, waste, steel, performance, increase, building,
accident, machine, result, cement, specific, local, quality, training, level, unit, reduction, market, health, measure, recycle, number, concrete,
indicator, efficiency, pipe, stakeholder, time, ton
Environmental
31
food, water, source, waste, sustainable, consumer, oil, palm, supply, programme, production, crop, ingredient, plant, chain, raw, packaging,
reduce, area, good, farmer, natural, operation, development, tonne, make, practice, healthy, site, agricultural, nutrition, biodiversity, local,
agriculture, produce, health, commitment, animal, sourc ing
Environmental
32
plan, tobacco, farmer, progress, store, aim, assurance, sustainable, achieve, programme, standard, cocoa, continue, bat, food, supply,
social, consumer, performance, source, reduction, chain, commitment, harm, community, provide, support, governance, stakeholder,
develop, operation, health, maison, child, review, human, independent, policy
No clear assignment
33
medium, sport, content, paper, woman, man, digital, information, responsibility, gaming, book, scope, advertising, protection, sale, game,
social, figure, corporate, online, editorial, responsible, gambling, magazine, manager, brand, event, female, bet, prisa, male, user, relate,
radio, print, office, publish
No clear assignment
34
corporate, responsibility, people, community, support, programme, social, provide, make, continue, environment, initiative, education,
performance, project, good, responsible, annual, skill, school, child, staff, training, opportunity, governance, young, award, develop, team,
world, issue, local, day, student, office, ensure, charity, create, development
Social
35
production, commitment, facility, client, quality, chain, passion, supply, make, reduce, waste, food, packaging, tomato, process, marke t,
fruit, number, base, creval, water, produce, issue, line, ensure, raw, economic, aspect, control, time, source, complaint, pulse, system,
worker, order, term
Environmental
36
financial, activity, relate, approach, training, process, compliance, policy, woman, man, decree, statement, information, topic, adopt,
model, corruption, control, procedure, specific, reference, regulation, base, standard, system, hour, main, code, regard, legislative, worker,
director, health, consumption, issue, aim, order, personnel
Governance
Notes: The table shows the topic outcomes for 3,567 stand-alone reports for the sample period 2015 to 2022. In the last
column I manually label the topic with respect to their environmental, social or governance affiliation. I can link 17 topics
to ‘Envir
onmental’ 7 topics to ‘Social’, 3 topics to ‘Governance’. For 18 topics I am not able to readily assign an ESG
label. In general, if the first 20 words contain either one of the three ESG dimension I label the topic accordingly. For all
other that do not contain one of the three keywords, I look for words close to the three pillars and assign the label, accordingly.
The order of the topics has no significance.
Table 9: Topic outcomes for 2,248 integrated reports
Topic Dominant file
Contri
bution
(%)
Topic Terms ESG label
1 96270-17In-
22430910S13439965890X-
Gl.txt
66.23
solution, business, growth, employee, service, customer, digital, integrate, offer, strategy,
market, support, investment, innovation, create, make, network, term, develop,
development, transformation, work, model, long, project, director, sector, country, revenue,
major, team, shareholder, performance, csr, expertise, provide, design, key
No clear ESG assignment
2 74451-15In-
16379220O2994568122 0X-
Gl.txt
54.24
director, document, registration, information, meeting, compensation, executive, product,
officer, issue, committee, employee, plan, sale, corporate, capital, general, number, chief,
control, chairman, security, hold, set, grant, article, annual, agreement, social, resolution, net,
performance, account, term, site, concern, reference, note
Social
3 162933-22In-
35682327W34329983 1T-
Gl.txt
67.56
director, remuneration, corporate, annual, governance, information, independent, strategic,
line, customer, structure, verificatio n, identity, social, consolidate, committee, responsib le,
policy, member, include, banking, shareholder, general, impact, section, plan, business,
service, activity, cmr, meeting, environmental, promote, consolidated, commitment,
principle, relate, executive, hold
Governance
4 79849-15In-
19962250K7589487752E-
UK.txt
79.34
water, customer, performance, service, annual, regulatory, work, supply, measure, include,
business, continue, deliver, plan, provide, cost, charge, expenditure, level, improve, tax,
support, reduce, network, end, area, treatment, programme, household, utility, underlie ,
impact, sewer, approach, debt, target, scheme, environment
Environmental
5 163938-22In-
35902422M18112362054V-
Gl.txt
35.87
energy, electricity, power, project, market, plant, renewable, service, offshore, generation,
annual, cost, price, supply, recognise, percent, increase, customer, capacity, turbine,
distribution, revenue, activity, programme, grid, expect, development, joint, construction,
network, net, production, continue, tax, contract, change, investment, solution
No clear ESG assignment
Electronic copy available at: https://ssrn.com/abstract=4283860
41
6 103588-18In-
30558460J6876689380B-
Gl.txt
60.69
bam, performance, project, construction, executive, cent, amount, result, recognise,
information, supervisory, property, include, contract, cash, joint, tax, rate, interest,
development, member, cost, integrate, benefit, relate, base, remuneration, ppp, creation, part,
safety, plan, material, venture, governance, liability, current, net, general
Governance
7 103718-18In-
24166294F195155788 8H-
Gl.txt
42.87
store, ahold, continue, food, sale, performance, brand, annual, business, review, income,
product, sustainable, sustainability, net, plan, governance, customer, local, associate, include,
operation, member, retail, retailing, world, operate, information, lease, support, merger,
shareholder, number, underlie, base, common, program, relate, cash
Governance
8 108083-18In-
26696501H35720458753Q-
Th.txt
53.0
insurance, life, investment, policy , business, interest , product, liability, contract, market ,
equity, solvency, customer, performance, pension, include, property, benefit, fair, loss, fund,
executive, premium, portfolio, rate, claim, result, level, capital, employee, real, change,
operating, recognise, profit, disability, relate, service, segment
No clear ESG assignment
9 70894-15In-
20559260D134500096 80W-
Gl.txt
73.41
loan, customer, capital, credit, interest, income, amount, loss, cent, operation, market,
branch, instrument, liability, security, net, fund, equity, annual, change, term, profit, tier,
balance, cost, corporate, institution, business, requirement, pension, exposure, local, note,
sheet, employee, rate, service, provision
No clear ESG assignment
10 162805-22In-
35003075P149103331 20A-
Gl.txt
43.81
supervisory, annual, service, tax, customer, remuneration, governance, revenue, information,
employee, result, audit, include, imp act, policy, business , adjust, solution, cash, base,
performance, operation, sale, note, relate, cost, content, compliance, social, dutch, target,
lease, corporate, table, part, recognize, product, position, shareholder
Governance
11 142651-21In-
36518656W24055238 130L-
No.txt
68.46
note, chapter, create, climate, liability, insurance, content, investment, income, relate, letter,
measure, commitment, result, claim, employee, operation, forsikre, figure, base, chair,
customer, add, ceo, performance, loss, key, consolidate, expense, capital, engage, alternative,
responsible, pension, provision, account, equity, pass, society
Environmental
12 92599-17In-
24538735T2667962388 T-
Gl.txt
77.79
customer, business, corporate, service, social, economic, environmental, gri, governance,
activity, plan, director, model, challenge, euro, commitment, dimension, popular, main,
banking, policy, channel, employee, information, action, performance, product, integrate,
result, complian ce, capital, area, tr aining, initiative, supplier, program me, good, quality, sec tor
Social
13 73447-15In-
18802432W22293661 698L-
Gl.txt
71.91
production, oil, activity, price, operation, project, reserve, exploration, development, net,
result, natural, due, sale, supply, cash, field, operating, business, annual, profit, market, plan,
expenditure, increase, capital, cost, corporate, energy, flow, continue, term, include,
approximately, segment, start, low, operate
Environmental
14
112756-19In-
31346168T2299658620 0S-
Sw.txt
97.38
car, mobility, solut ion, recognise , vehicle, rental, csr, service, passenger, workshop, bed,
repair, brand, good, lease, fleet, body, miljoner, auto, light, commercial, dealer, man,
discontinue, branch, ret, kapital, anst
No clear ESG assignment
15 71717-15In-
20295911R28441384 426F-
Gl.txt
63.32
performance, product, annual, production, sale, steel, plant, growth, crop, business, review,
market, site, program, good, water, stock, continue, increase, target, solution, raw, cement,
work, local, deve lopment, employee, cash, seed, focus, energy, research, environmental, high,
safety, future, plan, impact, improve
Environmental
16 125037-20In-
33759990Y752372636 4N-
UK.txt
62.91
director, executive, committee, annual, business, performance, audit, continue, include, cash,
tax, policy, cost, remuneration, review, strategic, groupos, recognise, profit, shareholder, net,
note, scheme, provide, interest, governance, award, period, term, rate, plan, liability, key, fair,
set, account, benefit, ensure, employee
Governance
17 146476-21In-
39402044J1832707712Q-
Gl.txt
79.04
cent, loan, pension, customer, annual, market, rate, interest, income, loss, equity, change,
profit, capital, liability, cost, fair, investment, credit, note, bond, sustainability, account,
guarantee, net, director, tax, corporate, insurance, fund, include, scheme, cash, security,
portfolio, return, relate, base
No clear ESG assignment
18 88126-16nn-
24234650C1244277431 8R-
UK.txt
46.14
bus, rail, ahead, franchise, contract, cost, passenger, service, operating, change, performance,
revenue, shareholder, relate, pension, information, liability, profit, end, customer, increase,
operate, account, payment, plan, kingdom, regional, make, local, key, future, award, interest,
provide, provision, improve, lease, review
No clear ESG assignment
19 81117-16In-
21901590F152175492V -
Gl.txt
54.76
performance, business, product, sustainability, program, committee, growth, plan, market,
member, include, chemical, continue, coating, process, percent, benefit, material,
supervisory, safety, level, energy, paint, note, audit, income, site, review, carbon, key,
supplier, improvement, ton, chain, operation, raw, million, str ategic
No clear ESG assignment
20 148398-21In-
42441828Y558762989 40L-
Gl.txt
56.93
usd, annual, interest, vessel, current, income, tax, liability, note, account, cash, colour,
expense, lease, rate, service, relate, operating, derivative, net, market, profit, currency, parent,
debt, fair, equity, include, recognise, pension, remuneration, flow, plan, balance, base,
business, bear, impairment
No clear ESG assignment
21 71373-15In-
16273044E251375706 0Q-
Fi.txt
72.8
medium, service, business, development, content, revenue, advertising, digital, operation,
responsibility, employee, director, project, growth, customer, corporate, meur, information,
change, segment, member, develop, responsible, sale, increase, online, finnish, team,
marketing, target, market, operating, cent, relate, unit, programme, base, figure, work
No clear ESG assignment
22 110434-18In-
22859838B6739566152G -
Be.txt
51.66
service, director, annual, employee, cost, remuneration, customer, mail, benefit, recognize,
eur, increase, parcel, committee, impact, amount, income, cash, operating, business, activity,
ceo, net, acquisition, pay, revenue, subsidiary, corporate, consolidate, gain, logistic, decrease,
trade, day, plan, relate, loss
Social
23 152569-21In-
37989681L74437804824O-
Po.txt
51.78
service, euro, satellite, concession, follow, communication, director, end, road, amount,
information, indicator, result, income, contract, activity, note, tax, investment, mobility,
operation, infrastructure, business, client, vehicle, network, traffic, associate, record, relate,
impact, area, provide, change, revenue, universal, remuneration, current, capacity
No clear ESG assignment
24 161091-21In-
47199663F119159012 70O-
Gl.txt
17,30
employee, result, make, change, information, term, time, number, investment, corporate,
include, base, high, capital, income, provide, work, market, long, due, business, member,
follow, process, increase, position, issue, governance, reporting, interest, meeting, internal,
development, conduct, focus, shareholder, give, period, external
Social
25 150386-21ea-
40904992K32172076980E-
Gl.txt
40.11
euro, director, annual, store, product, lease, rate, supplier, brand, turnover, cost, material, gri,
impact, net, change, recognize, production, taxis, refer, plan, groupos, exchange, separate,
thousand, liability , cash, follow, re sult, relate, curren t, period, trade, revenue, option,
represent, raw, good, income
No clear ESG assignment
26 114686-19In-
24198746E138724185 6R-
Eu.txt
49.06
property, interest, investment, tax, income, annual, portfolio, rate, change, development,
rental, office, logistic, rent, lease, net, area, project, cost, energy, tenant, work, building, note,
valuation, liabilit y, city, yield, ra tio, increase, expen se, derivative, mar ket, percent, accoun t,
location, acquisition, term, space
No clear ESG assignment
27 157512-21In-
43630824U5742099960 0O-
Eu.txt
63.13
employee, project, energy, integrate, environmental, emission, material, sustainability, safety,
supplier, work, environment, stakeholder, process, area, operation, waste, community, water,
impact, development, activity, include, health, approach, business, gri, trainin g, social,
people, sustainable, action, standard, governance, local, reduce, initiative, consumption,
principle
Environmental
28 155224-21In-
39582120K25528759936C-
Gl.txt
48.94
mining, mine, production, operation, annual, mineral, cost, gold, african, project, increase,
capital, price, integrate, metal, groupos, strategic, reserve, evander, ore, performance, plant,
barberton, cash, tonne, safety, term, grade, zar, impact, coal, stakeholder, platinum, result,
life, ebitda, high , development, alloy
Environmental
Electronic copy available at: https://ssrn.com/abstract=4283860
42
29 110140-18In-
27535000D146871690 00X-
Gl.txt
43.49
annual, director, product, service, business, chf, performance, executive, compensation,
software, employee, usd, client, member, recognize, market, revenue, audit, governance,
operate, growth, solution, continue, customer, security, committee, office, responsibly,
target, overview, shareholder, program, base, banking, officer, organization, region, meeting,
compliance
Governance
30 143637-21In-
40505634F473283915G -
Gl.txt
65.57
consumer, director, brand, option, stock, growth, annual, beer, inbev, include, program,
member, performance, grant, water, market, shareholder, volume, officer, remuneration,
note, continue, business, support, restrict, unit, executive, grow, chief, country, world,
beverage, launch, base, hold, term, community, packaging, local
No clear ESG assignment
31 148633-21In-
41171341R32577380 94E-
Gl.txt
51.9
patient, health, product, care, healthcare, disease, medical, medicine, program, sale,
treatment, country, research, development, pharmaceutical, version, augment, science,
clinical, organization, compensation, change, performance, support, image, include,
approach, quality, life, business, site, safety, impact, strategy, combine, innovation, child,
activity
No clear ESG assignment
32 145399-21In-
43183503Y376496170 60R-
UK.txt
79.02
customer, business, support, review, banking, service, continue, include, client, increase,
annual, digital, st rategic, colleague , focus, provide, ex perience, impact , strategy, performance,
change, work, deliver, product, make, key, stakeholder, create, cent, market, low, executive,
strong, responsible, sustainable, plan, ensure, model, growth
Social
33 81319-16In-
22688001B7239830570S -
Th.txt
59.1
product, food, executive, tax, cash, market, annual, sustainability, organic, brand, plan, net,
trade, operation, performance, fair, recognise, supervisory, healthy, feed, income, quality,
result, base, information, review, relate, commodity, business, amount, acquisition, supplier,
foreign, policy, audit, part, integrate, expense
No clear ESG assignment
34 119123-19In-
32163210Y358393457 8N-
Sw.txt
59.79
annual, cash, acc ount, liability, sustainability, profit, parent, customer, work, market, note,
director, number, operation, tax, employee, audit, sale, cost, meeting, expense, amount,
business, flow, term, acquisition, remuneration, balance, current, base, loss, general, policy,
shareholder, net, earning
No clear ESG assignment
35 71860-15In-
16240360C2286290574 0J-
Gl.txt
71.3
liability, amount, cash, income, loss , cost, interest, tax, ra te, include, fair, re late, profit, net,
expense, note, current, sale, base, term, impairment, consolidated, plan, change, period,
contract, date, benefit, flow, equity, end, payment, currency, transaction, follow, provision,
instrument, mate rial, hedge
No clear ESG assignment
Notes: The table shows the topic outcomes for 2,248 integrated reports for the sample period 2015 to 2022. In the last
column I manually label the topic with respect to their environmental, social or governance dimension. I can link 6 topics to
environmental 5 topics to social and 6 topics to governance matters. For 18 topics I am not able to readily assign an ESG
label. In general, if the first 20 words contain either one of the three ESG dimension I label the topic accordingly. For all
other that do not contain one of the three keywords, I look for words close to the three pillars and assign the label,
accordingly. The order of the topics has no significance.
Table 10: Topic outcomes (standalone) before switching to an integrated report
Topic Dominant
file
Contri
bution
(%)
Topic Terms ESG label
1
102712-17Su-
23212912C3380
560056J-Gl.txt
38.41
product, tobacco, programme, responsible, natural, society, introduction, respect, rewarding,
workplace, reinveste, brand, assessment, provide, datum, farmer, approach, study, trade, market,
illicit, labour, child, reduce, work, continue, case, improve, ensure, performance
Social
2
113563-19Su-
30094195A1928
2997400X-
No.txt
44.40
production, product, sustainability, sustainable, energy, emission, plant, waste, risk, base, reduce,
material, raw, increase, safety, renewable, important, source, business, achieve, area, main, supplier,
climate, facility, society, responsib ility, wood, certify, innovation, goal
Environmental
3
84801-16Su-
17808210T1545
1251006S-It.txt
34.92
insurance, customer, risk, sustainability, social, fund, investment, finan cial, agency, claim, activity,
amount, relation, manage, policy, corporate, sale, system, network, director, pension, carry, datum,
direct, aim, start, function, board, sector, main
Governance
4
109498-18Su-
25184540X1824
455676K-Gl.txt
47.77
customer, employee, plastic, sustainability, carbon, standard, emission, risk, energy, anti, datum,
volume, part, goal, percent, impact, cash, injury, number, material, policy, business, safety, rating,
consumption, workplace, matter, decrease, fleet, conduct
Social
5
123175-19Su-
32887725V1456
5197400J-Gl.txt
43.14
communication, document, product, registration, assure, externally, sustainability, service, policy,
program, energy, business, approach, plan, corporate, assessment, customer, material, site, csr,
environmental, home, ethic, talent, connect, principle, performance, development, regulation,
training, diversit y
Governance
6
101278-17Co-
22382438J19425
120400H-Gl.txt
45.03
product, responsibility, corporate, customer, supplier, business, conduct, energy, emission, waste,
core, risk, end, focus, include, responsible, chain, csr, increase, female, code, work, ethic, bed,
safety, cent, return, target, material, area
Environmental
7
103372-18Su-
25843000D1837
954160X-Sw.txt
36.59
gri, site, prime, building, energy, datum, portfolio, key, staff, real, estate, market, investment,
capital, property, client, make, business, tenant, topic, target, maintain, rate, sustainability, district,
future, generation, long, guest, working, approach, adjust
No clear ESG assignment
8
148431-21Co-
34881285K2808
908244B-Eu.txt
48.83
medium, responsibility, service, emission, digital, relate, corporate, employee, datum, target,
customer, responsible, information, content, code, financial, quality, scope, operation, business,
development, survey, covid, security, tax, competence, personnel, provide, conduct, statement
No clear ESG assignment
9
92530-17Su-
26926230I29510
407840O-Gl.txt
30.77
product, forest, ton, consumer, water, target, supplier, hygiene, customer, fiber, people, code, site,
conduct, accident, supply, scaos, program, life, include, pulp, paper, care, wood, tissue, solution,
nature, solid, organization, innovation, unit
Environmental
10
104075-18Su-
23104650P7461
553050P-Gl.txt
35.02
risk, water, operation, mine, mining, approach, plan, social, programme, site, community, closure,
incident, socio, energy, work, matter, government, identify, biodiversity, american, target,
economic, area, employee, include, change, operational, deliver, climate, ensure
Environmental
11
87605-16Su-
21901250E4430
0096400L-Gl.txt
20.79
employee, emission, energy, market, change, internal, compliance, process, provide, consumption,
area, action, external, increase, focus, scope, team, reporting, manage, part, aspect, long, content,
end, target, law, relevant, account, country, general, solution
Governance
12
119044-19Su-
31665704W1701
733980G-Th.txt
55.88
annual, sustainability, corporate, governance, datum, board, risk, society, executive, employee,
investment, financial, ratosos, impact, operation, high, customer, statement, travel, state, chain,
efficient, responsible, information, deliver, supply, resource, client, material, topic , profit
Governance
13
130332-20Co-
29976360H1029
9486300N-
Gl.txt
40.75
chapter, corporate, gri, responsibility, annual, risk, energy, production, principle, appendix,
include, ethic, policy, oil, indicator, area, compliance, supplier, environment, chemical, tonne,
business, change, action, social, carry, unit, professional, climate, establish, director
No clear ESG assignment
Electronic copy available at: https://ssrn.com/abstract=4283860
43
14
84021-16Co-
24198048S4020
40485Q-Gl.txt
34.94
patient, number, site, health, corporate, healthcare, risk, product, program, csr, clinical,
responsibility, medicine, disease, approach, quality, treatment, country , social, center, trial, sanofio,
vaccine, animal, research, professional, ethic, datum, study, ensure, organization
Governance
15
162595-21Su-
36746470C4102
8084135J-Gl.txt
82.80
section, gri, approach, sustainability, disclosure, topic, information, standard, stakeholder,
environmental, material, employee, compliance, business, social, statement, sustainable,
governance, risk, explanation, annual, index, supplier, emission, engagemen t, principle, figure, tax,
policy, general, reporting
Governance
16
111260-19Su-
30485240F9134
891040I-Gl.txt
36.39
sustainability, emission, fuel, airc raft, environmental, passenger, fligh t, tonne, relate, em ployee,
approach, cargo, customer, program, carbon, make, cost, airline, sick, include, organization,
number, weight, biofuel, efficient, travel, kilometer, ground, freight, airport
Environmental
17
106382-18on-
23404040J21971
606370I-Fi.txt
25.57
business, include, support, make, information, impact, standard, policy, environmental, human,
people, supplier, performance, develop, development, activity, review, continue, local, stakeholder,
responsible, community, conduct, strategy, governance, initiative, practice, progress, opportunity,
respect, follow, product
No clear ESG assignment
18
95994-17Co-
27166302N6985
003410F-Sp.txt
41.11
colonial, tele, building, consumpt ion, corporate, socia l, material, client , responsibility, offic e,
energy, governance, sfl, property, high, employee, follow, certification, training, include, water,
scope, market, groupos, control, aspect, make, consume, director, shareholder, risk
Social
19
74373-15Co-
20750067B1589
1427656S-Gl.txt
42.54
csr, customer, document, annexe, registration, corporate, applicable, responsibility, program, hear,
service, worldlineo, employee, datum, solution, performance, materiality, social, payment, hearing,
people, improve, business, innovation, work, consumer, overview, interview, center, auditor,
digital
Social
21
151537-21Su-
44248804V6846
441660C-Gl.txt
31.15
gri, disclosure, bekaert, approach, customer, semiconductor, material, topic, table, component,
plant, boundary, supply, energy, product, chain, steel, incident, explanation, team, evaluation,
standard, power, materiality, wire, program, supplier, high, labour, covid, worldwide, ethic
No clear ESG assignment
22
121063-19Su-
31355317W2905
5483189I-Gl.txt
40.08
fish, salmon, feed, farm, important, focus, aquaculture, production, processing, facility, food,
quality, produce, make, cent, area, escape, sea, lice, work, salmaro, passion, farming, effort,
harvesting, large, licence, day, welfare, good
Environmental
24
94266-17Su-
24132096T1915
2494550F-Gl.txt
39.96
product, food, s tore, climate, susta inability, impact, wor k, animal, supplie r, label, organic, reduce,
issue, sale, private, customer, electricity, organization, make, certify, meat, tion, area, material,
aspect, solar, target, offer, condition, late
Environmental
25
84276-16Su-
17192304K8029
81728H-Po.txt
36.90
road, sustainability, customer, indicator, impact, service, environmental, area, safety, information,
operation, mobility, risk, traffic, accident, stage, system, verde, project, infrastructure, organization,
network, rate, concession, activity, performance, biodiversity, construction, improvement, main
Environmental
26
111439-19In-
27859750E1102
9675025I-Eu.txt
33.14
safety, sustaina bility, employee , health, gri, risk, environmental, local, im pact, quality, community,
supplier, economic, training, compliance, system, issue, standard, stakeholder, environment,
operation, continue, waste, material, performance, topic, ensure, key, assessment, operate,
disclosure
Governance
28
104027-18Su-
29127560I34416
292680T-Gl.txt
50.10
gri, material, approach, source, sustainability, product, natural, supplier, water, supply, emission,
raw, sustainable, chain, goal, energy, customer, consumer, waste, principle, reduce, programme,
oil, tonne, issue, site, change, key, target, scope
Environmental
29
128522-20Su-
28660406X4817
2101952L-Gl.txt
34.80
gri, sustainability, make, material, woman, promote, store, fashion, order, man, groupos, aim,
florence, italian, obtain, addition, initiative, customer, adopt, principle, work, exhibition, brand,
culture, product, commitment, certification, approach, main, scope, contract, respect
Social
30
82740-16Su-
21843360U1701
0020160F-Gl.txt
30.33
product, sustainability, chemical, make, material, customer, raw, site, production, safety, base, oil,
program, performance, process, offer, packaging, additive, produce, pigment, application, coating,
important, industry, substance, german, accident, measure, ton, time, innovative
Environmental
31
83654-16Su-
20662538H1783
6705880Q-
Gl.txt
32.45
work, employee, operation, health, good, base, environment, development, risk, ensure, high,
conduct, environmental, include, create, issue, reduce, stakeholder, develop, measure, sustainable,
goal, key, woman, increase, code, accordance, supplier, climate, corruption, time
Environmental
Notes: The table shows the topic outcomes for 158 standalone reports of companies before switching to an integrated
report. For each topic I report the most prominent file that contributes the most to the topic (contribution). In the last
column I manually label the topic with respect to their environmental, social or governance affiliation. Five topics can be
linked to ‘Social’, 11 topics to ‘Environmental’, 7 topics to ‘Governance’ and 5 topics cannot be clearly assigned. In
general, if the first 20 words contain either one of the three ESG pillars I label the topics accordingly. For all other that
do not contain one of the three keywords, I look for words close to the three pillars and assign the label, accordingly. The
order of the topics has no significance.
Table 11: Topic outcomes (integrated) after switching from a stand-alone report
Topic Terms per Topic ESG label
1
percent, share, market, service, turbine, recognise, energy, project, director, power, board, policy, accounting,
cost, contract, development, joint, provision, loss, meur, increase, venture, order
Environmental
2
development, business, process, risk, activity, safety, project, environmental, service, provide, change, economic,
training, governance, include, material, quality, initiative, action, objective, reduce, key, social
Governance
3
amount, financial, current, expense, employee, rate, risk, account, base, investment, due, customer, number,
corporate, follow, relate, activity, contract, present, contribution, member, trade, carry
No clear assignment
4
financial, asset, liability, loss, net, rate, period, statement, derivative, term, market, make, impairment, credit,
instrument, note, impact, payable, material, interest, current, provision, transaction
No clear assignment
5
customer, organization, integrate, supplier, people, relationship, indicator, innovation, approach, chip, aspect,
theme, machine, program, material, sustainable, knowledge, make, ethic, survey, safety, develop, score
Social
6
project, power, plant, integrate, work, system, construction, contract, infrastructure, facility, electricity, operation,
sector, solar, aspect, line, water, director, term, transmission, people, main, maintenance
Environmental
7
note, board, kluwer, share, tax, profit, supervisory, adjust, cash, revenue, risk, recognize, policy, service, annual,
plan, asset, lease, solution, contract, table, liability, include
No clear assignment
8
energy, market, area, emission, service, high, supply, work, capacity, site, power, construction, long, time, phase,
addition, environmental, maintenance, develop, generation, opportunity, security, water
Environmental
9
employee, performance, product, include, business, result, strategy, support, term, improve, impact, relate,
number, supplier, high, time, industry, issue, corporate, standard, ensure, position, operation
Governance
10
include, performance, risk, price, joint, policy, strategic, dividend, item, expenditure, equity, attributable, relate,
benefit, vest, capital, deliver, note, earning, average, operational, reserve
No clear assignment
11
share, increase, plan, committee, executive, business, base, information, shareholder, continue, board, capital,
level, audit, remuneration, sale, basis, work, measure, key, future, target, significant
No clear assignment
Electronic copy available at: https://ssrn.com/abstract=4283860
44
12
employee, sustainability, integrate, system, model, aim, approach, network, capital, site, service, people, health,
organizational, plan, digital, risk, tower, woman, generate, infrastructure, process, standard
Social
13
financial, statement, cost, annual, cash, tax, risk, income, flow, profit, item, expect, amount, revenue, include,
consolidate, share, asset, fair, sale, relate, measure, base
No clear assignment
14
euro, lease, separate, recognize, gri, thousand, director, refer, change, store, product, contract, order, taxis,
charge, sustainability, reference, amendment, table, art, consolidate, main
No clear assignment
15
board, director, integrate, customer, freight, service, continue, share, executive, program, compensation, solution,
logistic, gri, member, standard, organization, office, chain, region, end, meeting, impact
No clear assignment
16
director, service, cost, benefit, remuneration, annual, review, csr, cash, recognize, loss, acquisition, mail, net,
parcel, eur, liability, tax, operating, radial, committee
Governance
17
patient, healthcare, disease, health, csr, program, organization, approach, medicine, create, support, launch,
sanofio, develop, vaccine, medical, impact, ethic, market, factsheet, stakeholder, country, clinical
Social
18
waste, plant, energy, design, day, solution, environment, industrial, sector, treatment, firm, security, french,
innovation, build, chief, equipment, model, unit, client, expertise
Environmental
19
statement, asset, financial, interest, fair, result, end, income, board, pay, follow, operate, purchase, price,
accordance, gain, change, social, plant, record, assess, foreign, subsidiary
No clear assignment
20
operation, production, director, cost, coal, mining, ore, mine, cash, mineral, tax, tonne, investment, underlie,
copper, platinum, recognise, beer, iron, diamond, project, metal, ebitda
Environmental
21
electricity, plant, interest, power, project, gri, price, grid, equity, verbundo, result, generation, board, liability,
supervisory, measure, change, renewable, loss, balance, section, austrian, accordance
Environmental
Notes: The table shows the topic outcomes for 158 standalone reports of companies before switching to an integrated report.
For each topic I report the most prominent file that contributes the most to the topic (contribution). In the last column I
manually label the topic with respect to their environmental, social or governance affiliation. Five topics can be linked to
‘Social’, 10 topics to ‘Environmental’, 8 topics to ‘Governance’ and 6 topics cannot be clearly assigned.
Electronic copy available at: https://ssrn.com/abstract=4283860