PreprintPDF Available

FinancialBERT - A Pretrained Language Model for Financial Text Mining

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Textual data in the financial domain is becoming increasingly important as the number of financial documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information has gained popularity among researchers, deep learning has boosted the development of effective financial text mining models and made significant breakthroughs in various Natural Language Processing tasks. State-of-the-art models such as BERT (Devlin et al., 2019) model developed by Google pre-trained on a large scale of unlabeled texts from Wikipedia, has shown its effectiveness by achieving good results on general domain data. However, these models are not effective enough on finance-specific language and semantics, limiting the accuracy that financial data scientists can expect from their NLP models. In this paper, we introduce FinancialBERT, a domain-specific language representation model pre-trained on large-scale financial corpora that can enhance NLP research in the financial sector. With almost the same architecture across tasks, FinancialBERT largely outperforms BERT and other state-of-the-art models in Sentiment Analysis task when pre-trained on financial corpora. Our pre-trained model FinancialBERT is freely available at: https://huggingface.co/ahmedrachid/FinancialBERT.
Content may be subject to copyright.
FinancialBERT - A Pretrained Language Model for Financial Text Mining
Ahmed Rachid Hazourli
ahmedrachidhazourli@yahoo.fr
Abstract
Textual data in the financial domain is becoming increasingly important as the number of financial documents rapidly grows. With the
progress in natural language processing (NLP), extracting valuable information has gained popularity among researchers, deep learning
has boosted the development of effective financial text mining models and made significant breakthroughs in various Natural Language
Processing tasks.
State-of-the-art models such as BERT (Devlin et al., 2019) model developed by Google pre-trained on a large scale of unlabeled
texts from Wikipedia, has shown its effectiveness by achieving good results on general domain data. However, these models are not
effective enough on finance-specific language and semantics, limiting the accuracy that financial data scientists can expect from their
NLP models. In this paper, we introduce FinancialBERT, a domain-specific language representation model pre-trained on large-scale
financial corpora that can enhance NLP research in the financial sector. With almost the same architecture across tasks, FinancialBERT
largely outperforms BERT and other state-of-the-art models in Sentiment Analysis task when pre-trained on financial corpora.
Our pre-trained model FinancialBERT is freely available at: https://huggingface.co/ahmedrachid/FinancialBERT.
Keywords: Natural Language Processing, BERT, Language Model, Pretrained Model, Sentiment Analysis, Financial Language
Modelling
1. Introduction
In recent years, Deep Neural Networks have revolutionized
the development of intelligent systems in many fields
especially in Natural Language Processing using state-
of-the-art neural networks architectures that significantly
improved many NLP tasks. These results are achieved
thanks to unsupervised pre-training of language models on
large text collections based on deep learning techniques
such as Long Short-Term Memory (LSTM), Transform-
ers...
As the amount of textual content generated in the fi-
nancial domain is growing at an exponential rate, natural
language processing is becoming a strategic tool for
financial analysis. Such textual data is a valuable source
of knowledge, however, applying state-of-the-art models
to financial text mining has limitations. Firstly, word
embeddings or representations such as ELMO (Peters et
al., 2018), Word2Vec (Mikolov et al., 2013) and BERT
(Devlin et al., 2019) are trained on general domain texts,
it is then hard to estimate their performance on financial
datasets. Also, the word distributions are different between
general and financial domains.
BERT achieves great results on various NLP tasks,
adapting it for the financial domain could potentially
achieve high performance by building a model capable of
understanding financial language, producing more accurate
word embeddings and ultimately can improve the perfor-
mance of downstream tasks such as text classification,
topic modelling, automatic summarization and sentiment
analysis.
2. Related Work
Unlike traditional word embedding where a word is repre-
sented as a single vector representation, language models
such as BERT (Devlin et al., 2019), ELMO (Peters et al.,
2018) return contextualized embeddings for each word to-
ken which can be fed into downstream tasks. These models
are trained on general domain corpora and are easy to fine
tune for downstream tasks.
The goal of this work is to test the hypothesized advantages
of using fine-tuning pre-trained language models. Thus,
we pre-train FinancialBERT, a finance domain-specific
BERT model on a large financial communication corpora
including financial news, corporate reports and earning
calls.
The main contributions of this paper are the follow-
ing:
1. Introduce and release FinancialBERT, a new finance
domain-specific BERT-base model. We achieve state-
of-the-art results on Financial PhraseBank dataset.
2. Perform extensive experimentation to investigate the
performance of fine-tuning versus task-specific archi-
tectures atop frozen embeddings, and the effect of hav-
ing an in-domain vocabulary. Then, evaluate on a fi-
nancial corpus for sentiment analysis to show the ef-
fectiveness of our approach.
3. Most importantly, we make publicly available both the
pre-trained FinancialBERT and our fine-tuned Senti-
ment Analysis model . We expect these resources
to boost NLP research and applications for finance,
since fine-tuning pre-trained Transformer-based lan-
guage models for particular downstream tasks is the
state-of-the-art.
3. Methods
In this section, we will present our FinancialBERT imple-
mentation that has the same structure as BERT, after giving
a brief background on relevant neural architectures. Then,
we describe in detail the pre-training and fine-tuning pro-
cess of FinancialBERT.
3.1. Background
With the advent of deep learning and its application in NLP,
researchers began applying Recurrent Neural Networks
(RNNs) and Convolutional Neural Networks (CNNs) for
text classification. The current state-of-the-art in text classi-
fication typically involves a purely attentional architecture,
the Transformer architecture (Vaswani et al., 2017).
3.2. BERT
Bidirectional Encoder Representations from Transformers
(BERT) model architecture (Devlin et al., 2019) is based
on a multilayer bidirectional Transformer. It is pre-trained
on large textual corpora in an unsupervised way. The atten-
tion mechanism (Vaswani et al., 2017) of the Transformer
allows obtaining contextual word embeddings, BERT (De-
vlin et al., 2019) was trained on two parallel tasks:
1. Masked Language Modeling (MLM): instead of pre-
dicting the next word given previous ones, BERT (De-
vlin et al., 2019) masks a randomly selected 15% of
all tokens and learns to predict them, and hence can be
used for learning bidirectional representations. Thus,
it learns to produce token-level embeddings.
2. Next Sentence Prediction (NSP): the model predicts
whether or not these two actually follow each other.
It learns whether the second sentence is the next one
or not from the embeddings of the special token CLS
(class) and produces sentence-level embeddings.
3.3. Model Architecture
The original English BERT was pre-trained on two generic
corpora, English Wikipedia and Books Corpus with a total
of 3,5B words. BERT (Devlin et al., 2019) has two ver-
sions:
1. BERT-BASE: with 12 layers of stacked Transform-
ers, each of 768 hidden units, 12 attention heads,
110M parameters (L=12, H=768, A=12, Total Param-
eters=110M)
2. BERT-LARGE: with 24 layers, each of 1024 hidden
units, 16 attention heads, 340M parameters (L=24,
H=1024, A=16, Total Parameters=340M).
Both architectures were trained on “cased” texts that keep
character casing or “uncased” that convert all text to lower-
case.
4. Pre-training FinancialBERT
In this section, we first describe our financial corpora, the
details of the BERT training procedure, and finally the spe-
cific task we examine.
4.1. Financial Corpora
As a general purpose language representation model, BERT
was pre-trained on English Wikipedia and BooksCorpus.
However, financial domain texts contain a consider-
able number of new domain-specific terms. In this
work, we pre-train FinancialBERT on a large corpora of
representative financial texts:
1. TRC2-financial1:Thomson Reuters Text Research
Collection (TRC2) corpus comprises 1,800,370 news
stories that were published by Reuters covering the pe-
riod between 2008 and 2010.
2. Bloomberg News2:400,000 financial articles pub-
lished by Bloomberg between 2006 and 2013.
3. Corporate Reports:3a rich source of information
as they often disclose new important statements and
provide a comprehensive overview of the company’s
business and financial condition. These documents
are available on EDGAR database as the Securities
Exchange Commission (SEC) mandates all publicly
traded companies to file annual reports (10K) and
quarterly reports (10Q). We retrieved 154,354 doc-
uments of the 10-K reports from 1996 to 2015 and
37,646 quarterly reports 10-Q. Then, we filtered on
sections and decided to use only “Risk Factors” (Sec-
tion 1A) and “Management Discussion and Analysis
of Financial Conditions and Results of Operations”
(Section 7).
4. Earnings Call Transcripts: we obtained 42,156 earn-
ings call transcripts. They are teleconferences, or we-
bcasts between the management of a public company,
analysts, investors, and the media to discuss the com-
pany’s financial results during a given reporting pe-
riod, such as a quarter or a fiscal year. An earnings call
is usually preceded by an earnings report, which con-
tains summary information on financial performance
for the period.
Corpus Number of words Domain
English Wikipedia 2.5B General
BooksCorpus 0.8B General
TRC2-financial 0.29B Financial
Bloomberg News 0.2B Financial
Corporate Reports 2.2B Financial
Earnings Call Transcripts 0.7B Financial
Table 1: Size of text corpora.
The text corpora used for pre-training of FinancialBERT
has a total size of 3.39 billion tokens and are listed above in
Table 1. The description of the textual corpora are listed in
Table 2.
For better performance, we initialized FinancialBERT with
the pre-trained BERT 4model provided by Devlin et al.
1https://trec.nist.gov/data/reuters/
reuters.html
2https://www.bloomberg.com/
3https://www.sec.gov/edgar.shtml
4The pre-trained weights are made public by creators of BERT.
The code and weights can be found here: https://github.
com/google-research/bert
(2019) that was trained on Wikipedia + BooksCorpus cor-
pora with a total of 3.3 billion tokens.
Model Corpus
BERT Wikipedia + BooksCorpus
FinancialBERT TRC2 + Bloomberg News + Corporate Reports + Earnings Call Transcripts
Table 2: Description of pre-training text corpora.
4.2. Vocabulary
BERT uses WordPiece (Wu et al., 2016) with a 30,000 to-
ken vocabulary for unsupervised tokenization of the input
text. With WordPiece tokenization, any new words can be
represented by frequent subwords.
We found that using uncased vocabulary results in slightly
better performances in downstream tasks.
4.3. Implementation Details
In our work, we use the Transformers library from Hug-
gingface on Python. For pre-training we used mainly the
BERT recommended parameters. We used the default
BERT optimizer, AdamWeight decay optimizer, the
recommended learning rate of 5e-5, a batch size of 32, a
dropout rate of 0.1 and a maximum sequence length of 512.
Data preprocessing and training BERT on financial
corpora took significant computational resources. Our
entire model procedure took 23 days of computational
runtime using a single Nvidia GeForce RTX 2060 6GB
GPU. We believe that releasing our pre-trained model
FinancialBERT will be useful to the financial researchers
and use it on downstream tasks without the necessity of the
significant computational resources.
5. Experimental Evaluation
In this section, we describe experiments on Sentiment
Analysis task to evaluate the effectiveness of our pre-
trained language model.
5.1. Sentiment Analysis
Sentiment analysis and opinion mining is the field of study
that analyzes people’s opinions, sentiments, evaluations,
attitudes, and emotions from written language. It is one
of the most active research areas in natural language
processing and is also studied in the financial domain.
Financial sentiment analysis differs the general one, it is
important to guess how the market will react to news and
other textual data.
It can be performed by implementing one of the two
different approaches using NLP models unsupervised
or supervised. As it is known sentiments can be either
positive or negative or neutral. NLP algorithms can be used
to evaluate if a series of words reflect a positive or negative
sentiment. Coming to unsupervised learning, it involves
using a rule-based approach by counting the number of
positive and negative words based on a dictionary such as
Loughran and McDonald (2011). The supervised approach
is a classification model that involves using traditional
machine learning or deep learning methods.
5.2. Dataset
The main sentiment analysis dataset used in this paper is
Financial PhraseBank 5from (Malo et al., 2014).
Financial Phrasebank consists of 4845 english articles that
were categorised by sentiment class and were annotated by
16 researchers with a financial background. The sentiment
label is either positive, neutral or negative. However the
dataset is available in four possible configurations depend-
ing on the percentage of agreement of annotators as you can
see on the Table 3.
Agreement Level Positive Negative Neutral # of articles
100% 25.2% 13.4% 61.4% 2262
75% - 99% 26.6% 9.8% 63.6% 1191
66% - 74% 36.7% 12.3% 50.9% 765
50% - 65% 31.1% 14.4% 54.5% 627
Total 28.1% 12.4% 59.4% 4845
Table 3: Description of Financial PhraseBank dataset.
We chose to use the whole Data (at least 50% agreement).
80% of them as training set, 10% as test set and 10% of the
remaining as validation set as show in Table 4.
Dataset Metric Train Dev Test
Financial PhraseBank Accuracy + F1 3876 484 485
Table 4: Sentiment Analysis task evaluation metrics, and
train, dev, test sets sizes.
5.3. Fine-tuning FinancialBERT
Sentiment analysis is a natural language processing clas-
sification task, we train a model that predicts a sentiment
label based on an article as input.
Typically, we have two successive steps, one during the
pre-training FinancialBERT phase and one during the
fine-tuning phase. We firstly conducted unsupervised
pre-training on the large financial corpus and then applied
supervised fine-tuning on down-stream NLP tasks.
In our work, we use the same fine-tuning architecture used
in (Devlin et al., 2019) by adding a dense layer after the last
hidden state of the [CLS] token. This is the recommended
practice for using BERT for any classification task. Then,
the classifier network is trained on the labeled sentiment
dataset. We also use cross-entropy loss as the loss function.
We used a batch size of 32, a maximum sequence length of
512, and a learning rate of 2e-5 and 5 epochs for fine-tuning
our model.
5.4. Results
The following Table 5 presents the sentiment analysis re-
sults in a classification report on the test set.
Our fine-tuned FinancialBERT6clearly outperforms two
common baselines, the BERT-base (Devlin et al., 2019) and
5The dataset can be found here: https://www.
researchgate.net/publication/251231364_
FinancialPhraseBank-v10
6Our fine-tuned model is available at:
https://huggingface.co/ahmedrachid/
FinancialBERT-Sentiment- Analysis
class precision recall f1-score support
negative 0.96 0.97 0.97 58
neutral 0.98 0.99 0.98 279
positive 0.98 0.97 0.97 148
macro avg 0.97 0.98 0.98 485
weighted avg 0.98 0.98 0.98 485
Table 5: Experimental Results on Financial PhraseBank
test set.
FinBERT (Yang et al., 2020), a financial domain specific
BERT.
FinancialBERT achieved better performance than the state-
of-the-art model on the Financial PhraseBank, which
demonstrates its effectiveness in sentiment analysis. We
obtained the highest Accuracy (0.12 higher) and F1 score
(0.13 higher) than the state-of-the-art model FinBERT as
shown in Table 6.
Model Accuracy F1-score
BERT-base (Devlin et al., 2019) 0.84 0.83
FinBERT (Yang et al., 2020) 0.87 0.85
FinancialBERT (ours) 0.99 0.98
Table 6: Performance of different BERT models on three
financial sentiment analysis task.
As expected, we should highlight the importance of pre-
training on financial corpora approach which improves per-
formance and enhances the downstream financial sentiment
classification task.
6. Conclusion
We presented FinancialBERT, a new pre-trained language
model for financial communications, which has been
trained on a large corpora and can be fine-tuned for
multiple NLP tasks. Requiring minimal task-specific archi-
tectural modification, our model achieves state-of-the-art
performance on Sentiment Analysis task, significantly
outperforming other compared models.
With the release of FinancialBERT, we hope financial
practitioners and researchers can benefit from our model
without the necessity of the significant computational
resources required to train the model.
Future directions include: further exploration of domain-
specific pre-training strategies and incorporating more
tasks in financial NLP such as Named Entity Recognition
(NER) and Question-Answering tasks.
7. Bibliographical References
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). Bert: Pre-training of deep bidirectional trans-
formers for language understanding.
Malo, P., Sinha, A., Korhonen, P., Wallenius, J., and Takala,
P. (2014). Good debt or bad debt: Detecting semantic
orientations in economic texts. Journal of the Associa-
tion for Information Science and Technology, 65.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (2018). Deep contextu-
alized word representations.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017).
Attention is all you need.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M.,
Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey,
K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz
Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H.,
Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C.,
Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado,
G., Hughes, M., and Dean, J. (2016). Google’s neural
machine translation system: Bridging the gap between
human and machine translation.
Yang, Y., UY, M. C. S., and Huang, A. (2020). Finbert:
A pretrained language model for financial communica-
tions.
... BERT Base (no finetuning) bert-base-uncased [3] Emotion nateraw/bert-base-uncased-emotion Financial phrasebank ahmedrachid/FinancialBERT-Sentiment-Analysis [72] Rotten tomatoes textattack/bert-base-uncased-rotten tomatoes [73] SST2 textattack/bert-base-uncased-SST-2 [73] DistilBERT Base (no finetuning) distilbert-base-uncased [63] Emotion aatmasidha/distilbert-base-uncased-finetuned-emotion Financial phrasebank yseop/distilbert-base-financial-relation-extraction [74] Rotten tomatoes textattack/distilbert-base-uncased-rotten-tomatoes [73] SST2 distilbert-base-uncased-finetuned-sst-2-english RoBERTa [75] Emotion Varying Aug-GAM settings. By default (Table A2), we use the final embedding layer of the model (and average it over the sequence length to get a fixed size vector), but Table A2 also shows results using the pooler output layer of the BERT model. ...
Preprint
Full-text available
Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decou-pled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github. *
... We use four different pre-trained base models for our experiments: XLM-RoBERTa [6], Finan-cialBERT [12], FinBERT [1], and a RoBERTa version trained on the Financial Phrasebank corpus by [20] titled Financial RoBERTa 4 . The models differ slightly with respect to their architecture and hyperparameter settings. ...
Article
Full-text available
Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.
Chapter
Credit scoring is a vital task in the financial industry for assessing the creditworthiness of companies and mitigating credit risks. In recent years, machine learning algorithms have shown promising results in credit scoring by leveraging large amounts of tabular data. However, the traditional tabular data alone may not capture all the information relevant to credit scoring that is typically used by credit risk analysts. In this paper, we propose a novel approach for company credit scoring that integrates text and tabular data. Our method uses natural language processing techniques to extract key features from risk assessments made by credit risk experts which are then combined with financial data to predict the likelihood of default within a one-year horizon. We compare different Machine Learning based models for different text embedding techniques. Our results show that the fact of adding a textual feature improves the ability of the model to capture defaulted companies. More concretely, adding a categorical feature generated by the application of sentiment analysis over text risk assessments yields the best results.
Conference Paper
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Article
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
Article
The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. In recent research, a substantial effort has been invested to develop sophisticated financial polarity-lexicons that can be used to investigate how financial sentiments relate to future company performance. However, based on experience from other fields, where sentiment analysis is commonly applied, it is well-known that the overall semantic orientation of a sentence may differ from the prior polarity of individual words. The objective of this article is to investigate how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase-structure information and domain-specific use of language. Our three main contributions are: (1) establishment of a human-annotated finance phrase-bank, which can be used as benchmark for training and evaluating alternative models; (2) presentation of a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect overall sentiment; (3) development of a linearized phrase-structure model for detecting contextual semantic orientations in financial and economic news texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning-algorithm are demonstrated in a comparative study against previously used general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature-space caused by the use of conventional n-gram features.
Bert: Pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Finbert: A pretrained language model for financial communications
  • Y Yang
  • M C S Uy
  • A Huang
Yang, Y., UY, M. C. S., and Huang, A. (2020). Finbert: A pretrained language model for financial communications.