- Access to this full-text is provided by Springer Nature.
- Learn more

Download available

Content available from Scientific Reports

This content is subject to copyright. Terms and conditions apply.

1

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports

Product progression: a machine

learning approach to forecasting

industrial upgrading

Giambattista Albora

1,2, Luciano Pietronero

2, Andrea Tacchella

3 & Andrea Zaccaria

2,4*

Economic complexity methods, and in particular relatedness measures, lack a systematic evaluation

and comparison framework. We argue that out-of-sample forecast exercises should play this role, and

we compare various machine learning models to set the prediction benchmark. We nd that the key

object to forecast is the activation of new products, and that tree-based algorithms clearly outperform

both the quite strong auto-correlation benchmark and the other supervised algorithms. Interestingly,

we nd that the best results are obtained in a cross-validation setting, when data about the predicted

country was excluded from the training set. Our approach has direct policy implications, providing a

quantitative and scientically tested measure of the feasibility of introducing a new product in a given

country.

In her essay e Impact of Machine Learning on Economics, Susan Athey states: “Prediction tasks [...] are typi-

cally not the problems of greatest interest for empirical research in economics, who instead are concerned with

causal inference ” and “economists typically abandon the goal of accurate prediction of outcomes in pursuit of

an unbiased estimate of a causal parameter of interest ”1. is situation is mainly due to two factors: the need to

ground policy prescriptions2,3 and the intrinsic diculty to make correct predictions in complex systems4,5. e

immediate consequence of this behavior is the ourishing of dierent or even contrasting economic models,

whose concrete application largely relies on the specic skills, or biases, of the scholar or the policymaker6. is

horizontal view, in which models are every time aligned and selected, in contrast with the vertical view of hard

sciences, in which models are selected by comparing them with empirical evidence, leads to the challenging issue

of distinguishing which models are wrong. While this situation can be viewed as a natural feature of economic

and, more in general, complex systems6, a number of scholars coming from hard sciences have recently tackled

these issues, trying to introduce concepts and methods from their disciplines in which models’ falsiability, tested

against empirical evidence, is the key element. is innovative approach, called Economic Fitness and Complex-

ity7–12 (EFC), combines statistical physics and complex network based algorithms to investigate macroeconomics

with the aim to provide testable and scientically valid results. e EFC methodology studies essentially two

lines of research: indices for the competitiveness of countries and relatedness measures.

e rst one aims at assessing the industrial competitiveness of countries by applying iterative algorithms to

the bipartite network connecting countries to the products they competitively export13. Two examples are the

Economic Complexity Index ECI14 and the Fitness7. In this case, the scientic soundness of either approach can

be assessed by accumulating pieces of evidence: by analyzing the mathematical formulation of the algorithm and

the plausibility of the resulting rankings15–18, and by using the indicator to predict other quantities. In particular,

the Fitness index, when used in the so-called Selective Predictability Scheme19, provides GDP growth predictions

that outperform the ones provided by the International Monetary Fund10,20. All these elements concur towards the

plausibility of the Fitness approach; however, a direct way to test the predictive performance of these indicators21

is still lacking. is naturally leads to the consideration of further indices, that can mix the existing ones22 or use

new concepts such as information theory23. We argue that, on the contrary, the scientic validity of relatedness

indicators can be univocally assessed, and this is the purpose of the present work.

e second line of research in EFC investigates the concept of Relatedness24, the idea that two human activi-

ties are, in a sense, similar if they share many of the capabilities needed to be competitive in them25. Practical

applications are widespread and include international trade11,26, rm technological diversication27,28, regional

smart specialization29,30, and the interplay among the scientic, technological, and industrial layers31. Most

of these contributions use relatedness not to forecast future quantities, but as an independent variable in a

OPEN

1Dipartimento di Fisica, Universitá Sapienza, Rome, Italy. 2Centro Ricerche Enrico Fermi, Rome,

Italy. 3Joint Research Centre, Seville, Spain. 4Istituto dei Sistemi Complessi-CNR, UOS Sapienza, Rome,

Italy. *email: andrea.zaccaria@cnr.it

Content courtesy of Springer Nature, terms of use apply. Rights reserved

2

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

regression, and so the proximity (or quantities derived from it) is used to explain some observed simultaneous

behavior. We point out, moreover, that no shared denition of relatedness exists, despite the widespread use of

co-occurrences, since dierent scholars use dierent normalizations, null models, and data, so the problem to

decide “which model is wrong” persists. For instance, Hidalgo etal.26 base the goodness of their measure on its

correlation with the probability that a country starts to export a product. O’Clery etal.32 test the goodness of

their relatedness measure through an in-sample logit regression; in this way models with a greater number of

parameters (as provided, for instance, by the addition of xed eects on countries and products) tend to have

greater scores. Finally, Gnecco etal.33 propose an approach to assess the relatedness based on matrix completion.

Note that their test of the goodness of their approach is based on the reconstruction of the country-product

pairs that have been removed from the data; the approach used here, instead, consists into looking at how good

the proposed model is to guess new exports of countries aer 5 years. So once again the performances are not

comparable, as it is evident by looking, for instance, at the respective magnitude of the reported F1 scores.

e examples just discussed clarify why we believe that it is fundamental to introduce elements of falsi-

ability in order to compare the dierent existing models, and that such comparison should be made by looking

at the performances in out-of-sample forecasting, that is the focus of the present paper. We will consider export

as the economic quantity to forecast because most of the indicators used in economic complexity are derived

from export data, being it regarded as a global, summarizing quantity of countries’ capabilities10,34 but also for

the immediate policy implications of the capability to be able, for instance, to predict in which industrial sector

a country will be competitive, say, in ve years.

In this paper, we propose a procedure to systematically compare dierent prediction approaches and, as a

consequence, to scientically validate or falsify the underlying models. Indeed, some attempts to use complex

networks or econometric approaches to predict exports exist32,35–37, but these methodologies are practically

impossible to compare precisely because of the lack of a common framework to choose how to preprocess data,

how to build the training and the test set, or even which indicator to use to evaluate the predictive performance.

In the following, we will systematically scrutiny the steps to build a scientically sound testing procedure to

predict the evolution of the export basket of countries. In particular, we will forecast the presence or the activa-

tion of a binary matrix element

Mcp

, that indicates whether the country c competitively exports product p in the

Revealed Comparative Advantage sense38 (see “Methods” for a detailed description of the export data).

Given the simultaneous presence in the literature of dierent approaches to measure the relatedness, it is

natural to argue whether machine learning algorithm might play a role and build comparable or even better

measures. In particular, given the present ubiquitous and successful use of articial intelligence in many dierent

contexts, it is natural to use machine learning algorithms to set the benchmark. A relevant by-product of this

analysis is the investigation of the statistical properties of the database (namely, the strong auto-correlation and

class imbalance), that has deep consequences on the choice of the most suitable algorithms, testing exercises,

and performance indicators.

Applying these methods we nd two interesting results:

1. e best performing models for this task are based on decision trees. A fundamental property that separates

these algorithms from the main approaches used in the literature26 is the fact that here the presence of a

product in the export basket of a country can have a negative eect on the probability of exporting the target

product. i.e. decision trees are able to combine Relatedness and Anti-Relatedness signals to provide strong

improvements in the accuracy of predictions39

2. Our best model performs better in a cross-validation setting where we exclude data from the predicted

country from the training set. We interpret this nding by arguing that in cross-validation the model is able

to better learn the actual Relatedness relationships among products, rather than focusing on the very strong

self-correlation of the trade data.

In the “Methods” section we show a detailed comparison between our machine learning based approach and

some of theother denitions of relatedness we mentioned.

e present investigation of the predictability of the time evolution of export baskets has a number of practi-

cal and theoretical applications. First, predicting the time evolution of the export basket of a country needs, as

an intermediate step, an assessment of the likelihood that the single product will be competitively exported by

the country in the next years. is likelihood can be seen as a measure of the feasibility of that product, given

the present situation of that country. e possibility to investigate with such a great level of detail which product

is relatively close to a country and which one is out of reach has immediate implications in terms of strategic

policies40. Second, the study of the time evolution of the country-product bipartite network is key to validate the

various attempts to model it41,42. Finally, the present study represents on of the rst attempts to systematically

investigate how machine learning techniques can be applied in development economics, that is something still

not much discussed in literature with except to very recent works33,39,43.

Results

Statistical properties of the country-product network. A key result of the present investigation is a

clear-cut methodology to compare dierent models or predictive approaches in Economic Complexity. In order

to understand the reasons behind some of the choices we made in building the framework, we rst discuss some

statistical properties of the data we will analyze.

Our database is organized in a set of matrices

V

whose element

Vcp

is the amount, expressed in US dollars, of

product p exported by country c in a given year. When not otherwise specied, the number of countries is 169,

the number of products is 5040, and the time range covered by our analysis is 1996-2018. We use the HS1992,

Content courtesy of Springer Nature, terms of use apply. Rights reserved

3

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

6-digits classication. e data are obtained from the UN-COMTRADE database and suitably cleaned in order to

take into account the possible disagreements between importers’ and exporters’ declarations (see “Methods”). We

compute the Revealed Comparative Advantage38 to obtain a set of RCA matrices

R

and, by applying a threshold

equal to 1, a set of matrices

M

whose binary elements are equal to 1 if the given country competitively exports

the given product. Here and in the following we use “competitively” in the Balassa sense, that is,

Rcp >1

. In this

paper we will discuss the prediction of two dierent situations: the unconditional presence of a “1” element in

the

M

matrix and the appearance of such an element requiring that the RCA values were below a non-signicance

threshold t=0.25 in all the previous years. We will refer to the rst case as the full matrix and to the new product

event as an activation. e denition of the activation is somehow arbitrary: one could think, for instance, to

change the threshold t or the number of inactive years. We nd however our choice to be a good trade-o to

have both a good numerosity of the test set and avoid the inuence of trivial 0/1 ips. We point out that our nal

aim is to detect, as much as possible, the appearance of really new products in the export basket of countries.

In Fig.1, le, we plot the probability that a matrix element

Mcp

in 1996 will change or not change its binary

value in the future years. One can easily see that even aer 5 years the probability that a country remains com-

petitive in a product is relatively high (

∼0.64

); being the probability that a country remains not competitive

∼0.95

, we conclude that there is a very strong auto-correlation: this is a reection of the persistent nature of

both the capabilities and the market conditions that are needed to competitively export a product. Moreover, the

appearance of a new product in the export basket of a country is a rare event: the empirical frequency is about

0.047 aer 5 years. A consequence of this persistence is that we can safely predict the presence of a 1 in the

M

matrices by simply looking at the previous years, while the appearance of a new product that was not previously

exported by a country is much more dicult and, in a sense, more interesting from an economical point of view,

since it depends more on the presence of suitable, but unrevealed, capabilities in the country; but these capabili-

ties can be traced by looking at the other products that country exports. Not least, an early detection of a future

activation of a new product has a number of practical policy implications. Note in passing that, since machine

learning based smoothing procedures10,44 may introduce extra spurious correlations, they should be avoided in

prediction exercises, and so only the RCA values directly computed from the raw export data are considered.

On the right side of Fig.1 we plot the density of the matrices

M

, that is the number of nonzero elements with

respect to the total number of elements. is ratio is roughly

10%

. is means that both the prediction of the

full, unconditional matrix elements and the prediction of the so-called activations (i.e., conditioning to that ele-

ment being 0 and with RCA below 0.25 in all the previous years) show a strong class imbalance. is has deep

consequences regarding the choice of the performance indicators to compare the dierent predictive algorithms.

For instance, the ROC-AUC score45, one of the most used measures of performance for binary classiers, is well

known to suer from strong biases when a large class imbalance is present46. More details are provided in the

“Methods” sections.

Recognize the country vs. learning the products’ relations. In this section we present the results

concerning the application of dierent supervised learning algorithms. e training and the test procedures are

fully described in the “Methods” section. Here we just point out that the training set is composed by the matrices

R(y)

with

y∈[1996 ...2013]

, and the test is performed against

M(2018)

, so we try to predict the export basket of

countries aer

=5

years.

e algorithms we tested are XGBoost47,48, a basic Neural Network implemented using the Keras library49

and the following algorithms implemented using the scikit learn library50: Random Forest51, Support Vector

Machines52, Logistic Regression53, a Decision Tree54, ExtraTreesClassier55, ADA Boost56 and Gaussian Naive

Bayes57. For reasons of space, we cannot discuss all these methods here. However, a detailed description can be

found in58 and references therein and, in the following sections, we will elaborate more on the algorithms based

on decision trees, which result to be the most performing ones.

Figure1. Le: transition probabilities between the binary states of the export matrix M. e strong persistency

implies the importance of the study of the appearance of new products (called activations) with respect to the

unconditional presence of one matrix element (in the following, full matrix). Right: the fraction of nonzero

elements in M as a function of time. A strong class imbalance is present.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

4

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

In Fig.2 we show an example of the dynamics that our approach is able to unveil. On the le we show the

RCA of Bhutan for the export of Electrical Transformers as a function of time. RCA is zero from 1996 to 2016,

when a sharp increase begins. Was it possible to predict the activation of this matrix element? Let us train our

machine learning algorithm XGBoost using the data from 1996 to 2012 to predict which products will Bhutan

likely export in the future. e result is a set of scores, or progression probabilities, one score for each possible

product. Each of these scores measures the feasibility, or relatedness, between Bhutan and all the products it

does not export. e distribution of such scores is depicted in Fig.2 on the right. e progression probability

for Electrical Transformers was much higher than average, as shown by the arrow: this means that, already in

2012, Bhutan was very close to this product. Indeed, as shown by the gure on the le, Bhutan will start to export

that specic product in about 5 years. Obviously, this is just an example, so we need a set of quantitative tools to

measure the prediction performance on the whole test set on a statistical basis.

In order to quantitatively assess the goodness of the prediction algorithms, a number of performance indica-

tors are available from the machine learning literature of binary classiers. Here we focus on three of them, and

we show the results in Fig.3, where we show a dierent indicator in each row, while the two columns refer to the

two prediction tasks, full matrix (i.e., the presence of a matrix element equal to one) and activations (a zero matrix

element, with RCA below 0.25 in previous years, possibly becoming higher than one, that is the appearance of a

new product in the export basket of a country). AUC-PR46 gives a parameter-free, comprehensive assessment of

the prediction performance. e F1 Score59,60 is a harmonic mean of the Precision and Recall measures61, and so

takes into account both False Positives and False Negatives. Finally, mean Precision@10 considers each country

separately and computes how many products, on average, are actually exported out of the top 10 predicted. All

the indicators we used are discussed more in detail in the “Methods” section.

We highlight with a red color the RCA benchmark model, which simply uses the RCA values in 2013 to

predict the export matrix in 2018. From the analysis of Fig.3 we can infer the following points:

1. e performance indicators are much higher for the full matrix. is means that predicting the unconditional

presence of a product in the export basket of a country is a relatively simple task, being driven by the strong

persistence of the M matrices through the years.

2. On the contrary, the performance on the activations is relatively poor: for instance, on average, less than one

new product of out the top ten is correctly predicted.

3. Algorithms based on ensembles of trees perform better than the benchmark and the other algorithms on all

the indicators.

4. anks to the strong autocorrelation of the matrices, the RCA-based prediction represents a very strong

benchmark, also in the case of the activations. However, Random Forest, ExtraTreesClassier and XGBoost

perform better both in the full matrix prediction task and in the activations prediction task.

We speculate that the machine learning algorithms perform much better in the full matrix case because, in a

sense, they recognize the single country and, when inputted with a similar export basket, they correctly reproduce

the strong auto-correlation of the export matrices. We can deduce that using this approach we are not learning

the complex interdependencies among products, as we should, and, as a consequence, we do not correctly predict

the new products. In order to overcome this issue, we have to use a k-fold Cross Validation (CV): we separately

train our models to predict the outcome of k countries using the remaining

C−k

, where in our case

C=169

and

k=13

. In this way, we prevent the algorithm to recognize the country, since the learning is performed on

disjoint sets; as a consequence, the algorithm learns the relations among the products and is expected to improve

the performances on the activations.

Using the cross validation procedure, we trained again the three best performing algorithms which are the

Random Forest, ExtraTreesClassier, and XGBoost. e result is that only the XGBoost algorithm improves

Figure2. An example of successful prediction. On the le, the RCA of Bhutan in electrical transformers as a

function of time. Already in 2012, with RCA stably below 1, the progression probability of that matrix element

was well above its country average, as shown by the histogram in the gure on the right. Bhutan will start to

competitively export electrical transformers aer 5 years.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

5

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

ExtraTree Classifier

XGBoost

Random Forest

G. N. B.

Decision Tree

Logistic Regression

Dense Neural Network

S. V. M.

AdaBoost

RCARCA

0.23

0.334

0.336

0.533

0.61

0.639

0.648

0.71

0.716

0.724

PR − AUC

Full Matrix

0 0.005 0.01 0.015 0.02

ExtraTree Classifier

XGBoost

Random Forest

G. N. Bayes

Decision Tree

Logistic Regression

Neural Network

S. V. M.

AdaBoost

RCARCA

0.00603

0.00644

0.00669

0.00639

0.012

0.0129

0.0112

0.0157

0.0175

0.0169

PR − AUC

Activations

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7

ExtraTree Classifier

XGBoost

Random Forest

G. N. B.

Decision Tree

Logistic Regression

Dense Neural Network

S. V. M.

AdaBoost

RCARCA

0.402

0.541

0.566

0.614

0.587

0.667

0.684

0.67

0.685

0.68

Best F1

Full Matrix

0 0.01 0.02 0.03 0.04 0.05

ExtraTree Classifier

XGBoost

Random Forest

G. N. Bayes

Decision Tree

Logistic Regression

Neural Network

S. V. M.

AdaBoost

RCARCA

0.0124

0.0183

0.0206

0.0174

0.0321

0.0374

0.0369

0.0417

0.0465

0.0476

Best F1

Activations

0 0.2 0.4 0.6 0.8 1

ExtraTree Classifier

XGBoost

Random Forest

G. N. B.

Decision Tree

Logistic Regression

Dense Neural Network

S. V. M.

AdaBoost

RCARCA

0.409

0.465

0.649

0.742

0.681

0.778

0.746

0.842

0.89

0.875

Mean Precision@10

Full Matrix

0 0.01 0.02 0.03 0.04 0.05 0.06

ExtraTree Classifier

XGBoost

Random Forest

G. N. Bayes

Decision Tree

Logistic Regression

Neural Network

S. V. M.

AdaBoost

RCARCA

0.0173

0.0145

0.0173

0.0218

0.0275

0.0355

0.0391

0.0509

0.0564

0.0455

Mean Precision@10

Activations

Figure3. Comparison of the prediction performance of dierent algorithms using three performance

indicators. Tree-based approaches are performing better; the prediction of the activations is a harder task with

respect to the simple future presence of a product.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

6

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

its scores, which means that in the cross-validation setting it is more capable to learn the inter-dependencies

among products. So what is happening is that, if we do not perform the cross validation, the Random Forest

tends to recognize the countries better than XGBoost, but if we perform the cross validation XGBoost learns the

inter-dependencies among products better than the Random Forest. is step is crucial if one wants to build a

representation of such interdependencies which has also a good forecasting power39.

In Fig.4 (le) we show the relative improvements of various performance indicators when the CV is used

to train the XGBoost model and the test is performed on the activations. All indicators improve; in particular,

F1-score and mean Precision@10 increase by more than 10%. On the right, we compare the cross-validated

XGBoost predictions with the RCA benchmark, showing a remarkable performance although the previously

noted goodness of the benchmark.

In Table1 we report the values of the performance indicators for the non cross-validated Random Forest,

the cross-validated XGBoost and the RCA benchmark model, once again tested on the activations. e last four

rows represent the confusion matrix, where the threshold on the prediction scores is computed by optimizing

the F1 scores.

e cross validated XGBoost gives the best scores except for the AUC-ROC and the accuracy which are

inuenced by the high class imbalance because of the large number of True Negatives, making these metrics

unsuitable for evaluating the goodness of the predictions. However, the non cross-validated Random Forest is

comparable and in any case shows better scores than the RCA benchmark, so it represents a good alternative,

especially because of the much lower computational cost. Indeed, the inclusion of the cross-validation proce-

dure increases the computational cost by about a factor 13, moreover, if compared with the same number of

trees, Random Forest is 7.7 times faster than XGBoost. So, even if the cross validated XGBoost model is the best

performing, the non cross validated Random Forest is a good compromise to have good predictions in less time.

In general, a desirable output of a classication task is not only a correct prediction, but also an assessment

of the likelihood of the label, in this case, the activation. is likelihood provides a sort of condence in the

prediction. In order to test whether the scores are correlated or not with the actual probability of activations we

Figure4. Le: relative improvement of the prediction performance of XGBoost when the training is cross

validated. e algorithm now can not recognize the country, and so all the performance indicators improve.

Right: relative improvement of the cross validated XGBoost algorithm with respect to the RCA benchmark.

Table 1. Comparison of thepredictive performance of XGBoost with cross validation, Random Forest without

cross validation and the RCA benchmark for the activationsusing dierent indicators. e last row indicates

the computational cost with respectto the non cross validated Random Forest; XGBoost is about 100 times

slower. e highestvaluesof each indicator are in bold.

Algorithm XGBoost-CV Random Forest RCA

AUC-ROC 0.698 0.724 0.592

F1 score 0.0479 0.0476 0.0369

mean Precision@10 0.059 0.045 0.039

Precision 0.34 0.035 0.023

Recall 0.079 0.073 0.103

MCC 0.043 0.042 0.035

AUC-PR 0.018 0.017 0.011

Accuracy 0.981 0.982 0.967

Negative predictive value 0.994 0.994 0.994

TP 202 186 263

FP 5663 5063 11413

FN 2359 2375 2298

TN 403767 404367 398017

Computational cost 100 1 –

Content courtesy of Springer Nature, terms of use apply. Rights reserved

7

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

build a calibration curve. In Fig.5 we show the fraction of positive elements as a function of the output (i.e., the

scores) of the XGBoost and Random Forest algorithms in the activations prediction task. We divide the scores

into logarithmic bins and then we compute the mean and the standard deviation inside each bin. In both cases a

clear correlation is present, pointing out that a higher prediction score corresponds to a higher empirical prob-

ability that the activation of a new product will actually occur. Moreover, we note that the greater is the score

produced by the model, the greater is the error on the y axis; the reason is that the models tend to assign higher

scores to the products already exported from a country, so if we look at the activations the values start to uctu-

ate, and the statistic becomes lower.

We close this section mentioning the possibility to train our algorithms by taking explicitly into account the

class imbalance, as suggested in62,63. e results of this investigation are reported in section2 of the Supplemen-

tary Information. We observe a mild decrease of the prediction performance.

Opening the black box. In order to qualitatively motivate the better performance of tree-based algo-

rithms, in this paragraph we elaborate on the operation of Random Forests. As specied in the “Methods” sec-

tion, in these prediction exercises we train one Random Forest model for each product, and each Random Forest

contains 100 decision trees. In Fig.6 we show one representative decision tree. is tree is obtained by putting

the number of features available for each tree equal to

P=5040

: this means that we are bootstrap aggregating, or

bagging64 the trees, instead of building an actual Random Forest, which considers instead a random subset of the

products51 (the decision trees may be dierent also in this case, since the bagging procedure extracts the features

with replacement). Moreover, the training procedure is cross validated, so the number of input countries is 156

×

7 (156 countries and 7 years from 2007 to 2013).

e decision tree we show refers to the product with HS1992 code 854089; the description is valves and tubes

not elsewhere classied in heading no. 8540, where 8540 stands for cold cathode or photo-cathode valves and tubes

like vacuum tubes, cathode-ray tubes and similars.

e color represents the class imbalance of the leaf (dark orange, many zeros; dark blue, many ones, quanti-

ed in the square brackets). e root product, the one which provides the best split, is chromium, which is used,

Figure5. Calibration curves: fraction of positive elements as a function of the scores produced by XGBoost

(le) and Random Forest (right) for the activations prediction task. In both cases a clear positive correlation

is present, indicating that higher scores are associated to higher empirical probabilities that the activation will

actually occur.

Chromium

wrought other than waste and scrap

value = [1006, 86]

Microscopes and

diffraction apparatus

value = [994, 44]

no

export

Non threaded washers

of iron and steel

value = [12, 42]

export

Rail locomotives

value = [961, 19]

Narrow woven fabrics

value = [33, 25]

value = [912, 8] value = [49, 11]value = [3, 23]value = [30, 2]

Particle accelerators

value = [10, 2]

Tanned or crust

hides and skins of goats

value = [2, 40]

value = [9, 0] value = [1, 2]

gini = 0.476

value = [0, 39]

gini = 0.065

value = [2, 1]

Figure6. A representative decision tree to forecast the export of the product valves and tubes. e root product,

chromium, has a well known technological relation with the target product, and in fact is able to discriminate

against future exporters with high precision.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

8

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

for instance, in the cathode-ray tubes to reduce X-ray leaks. So the Random Forest found a nontrivial connec-

tion between chromium and these types of valves and tubes: out of the 1006 couples country-year that do not

export valves and tubes, 994 do not export chromium either (note the negative association). We can explore the

network considering that the no-export link is always on the le. Looking at the export direction we nd the cut

on washers of iron and steel that works very well: only 2 of the 12 couples country-year that do not export valves

and tubes do export washers and only 2 of the 42 countries that export valves and tubes do not export washers.

Looking at the other splits we nd some of them more reasonable, like the one on particle accelerators, and

some that seem coincidental, like the one on hides and skins of goats.

From this example it is clear that the decision tree is a natural framework to deal with a set of data in which

some features (i.e., products) may be by far more informative than others, and so a hierarchical structure is

needed to take into account this heterogeneous feature importance.

Feature importance may be evaluated by looking at the normalized average reduction of the impurity at each

split that involves that feature50. In our case, we are considering the Gini impurity. In Fig.7 we plot this assess-

ment of the feature importance to predict the activation of valves and tubes. One can easily see that the average

over the dierent decision trees is even more meaningful than the single decision tree shown before, even if the

each one of the former sees fewer products than the latter: all the top products are reasonably connected with

the target product and so it is natural to expect them to be key elements to decide whether the given country

will export valves and tubes or not.

Time dependence. In the procedure discussed above we used a time interval

model

equal to 5 years for the

training, and we tested our out-of-sample forecasts using the same time interval

. Here we investigate how the

choice of the forecast horizon

aects the quality of the predictions. To make this analysis we used XGBoost

models trained with the cross validation methodand a lower

model =3

. e machine learning algorithms are

trained using data in the range

y∈[1996 ...2008]

and their output, obtained giving RCA

(2008)

as input, will be

compared with the various M

(2008+�)

by varying

. Being the 2018 the last year of available data, we can explore

a range of

s from 1 to 10. All details about the training procedure of the machine learning algorithms are given

in the “Methods” section.

e quality of the predictions as a function of the forecast horizon

are summarized in Fig.8, where we nor-

malized the indicators in such a way that they are all equal to 1 at

=1

. In the le gure we have the plot for the

activations prediction task: both precision and precision@10 increase with

, while the negative predictive value

decreases and accuracy shows an erratic behavior. is means that our ability to guess positive values improves

or, in other words, the greater the time you wait the higher the probability that a country sooner or later does

activate the products we predict. is improvement on positive values, however, corresponds to a worsening on

negative values that can be interpreted as the fact that countries during time develop new capabilities and start

to export products we cannot predict with a

interval too large.

If we look to a score that includes both performances on positive values and performance on negative values,

like accuracy, we have a (noisy) worsening with the increase of

.

In the gure on the right we show instead the full matrix prediction task. In this case all the scores decrease

with

because the algorithm can not leverage anymore on the strong auto-correlation of the RCA matrix.

0

0.02 0.04 0.06 0.08 0.1

Microwave tubes

Chromium wrought other than waste and scrap

Hydroxide and peroxide of magnesium

Navigational instruments and appliances

Isobutene−isoprene rubber

Valves and tubes receiver or amplifier

Threaded screws of iron and steel

Microscopes and diffraction apparatus

Halogenated derivatives of acyclic hydrocarbons

Television camera tubes

Feature Importance

Target product: valves and tubes

Figure7. Feature importance is a measure of how much a product is useful to predict the activation of the

target product. Here we use the average reduction of the Gini impurity at each split. All important products are

reasonably connected with the target.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

9

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

Note that the steepness of the decreasing curves is higher when we look at precision scores, the reason being

the high class imbalance and the large number of true negatives with respect to true positives asshown in Table1.

Discussion

One of the key issues in economic complexity and, more in general, in complexity science is the lack of systematic

procedures to test, validate, and falsify theoretical models and empirical, data-driven methodologies. In this

paper we focus on export data, and in particular on the country-product bipartite network, which is the basis

of most literature on economic complexity, and the likewise widespread concept of relatedness, that is usually

associated to an assessment of the proximity between two products or the density or closeness of a country with

respect to a target product. As detailed in the Introduction, many competing approaches exist to quantify these

concepts, however, a systematic framework to evaluate which approach works better is lacking, and the result

is the ourishing of dierent methodologies, each one tested in a dierent way and with dierent purposes. We

believe that this situation can be discussed in a quantitative and scientically sound way by dening a concrete

framework to compare the dierent approaches in a systematic way; the framework we propose is out-of-sample

forecast, and in particular the prediction of the presence or the appearance of products in the futureexport

baskets of countries. is approach has the immediate advantage to avoid a number of recognized issues65 such

as the mathiness of microfounded models66 and the p-hacking in causal inference and regression analyses1,67.

In this paper we systematically compare dierent machine learning algorithms in the framework of a super-

vised classication task. We nd that the statistical properties of the export data, namely the strong auto-corre-

lation and the class imbalance, imply that the appearance, or activation, of new products should be investigated,

and some indicators of performance, such as ROC–AUC and accuracy, should be considered with extreme

care. On the contrary, indicators such as the mean Precision@k have an immediate policy interpretation. In the

prediction tasks tree-based models, such as Random Forest and Boosted Trees, clearly outperform the other

algorithms and the quite strong benchmark provided by the simple RCA measure. e prediction performance

of Boosted Trees can be further improved by training them in a cross validation setting, at the cost of a higher

computational eort. e calibration curves, which show a high positive correlation between the machine learn-

ing scores and the actual probability of the activation of a new product, provide further support to the correct-

ness of these approaches. A rst step towards opening this black box is provided by the visual inspection of a

sample decision tree and the feature importance analysis, which shows that the hierarchical organization of the

decision tree is a key element to provide correct predictions but also insights about which products are more

useful in this forecasting task.

From a theoretical perspective, this exercise points out the relevance of context for the appearance of

new products, in the spirit ofthe New Structural Economics68, but it has also immediate policy implications:

each country comes with its own endowments and should follow a personalized path, and machine learning

approaches are able to eciently extract this information. In particular, the output of the Random Forest or the

Boosted Trees algorithm, provides scores, or progression probabilities, that a product will be soon activated by

the given country. is represents a quantitative and scientically tested measure of the feasibility of a product

in a country. is measure can be used in very practical contexts of investment design and industrial planning,

a key issue aer the covid-related economic crisis69,70.

Conclusion

Measuring the relatedness between countries and products is one of the main topics in the economic complex-

ity literature71, given its importance to assess the feasibility of investments and strategic policies72,73. Start-

ing from 2007 with the Product Space26, many dierent approaches to measure the relatedness have been

proposed11,32,35–37,39,43. With all these models in the literature, a big issue is the absence of a scientically sound

procedure to compare them and quantifying how good they are in measuring the relatedness.

Figure8. In the plot on the le we show the performance indicators in the case of the activations prediction

task. e performance on positive values improves, while the one on negative values gets worse. On the right we

show the same performance indicators in the case of the full matrix prediction task. All the scores get worse due

to the vanishing auto-correlation of the matrices.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

10

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

e rst contribution of this work is the proposal of out-of-sample forecasts of new exported products as

a method to compare dierent relatedness models. In this way, the problem of measuring the relatedness can

be casted as a binary classication exercise and, by using standard performance indicators, one can assess the

goodness of a measure and compare them quantitatively. e second contribution of the present paper is the

use of machine learning algorithms to measure the relatedness. We show that decision trees-based algorithms

like Random Forest51 and XGBoost48 provide the best assessment and represent the benchmark for possible new

measures of relatedness.

is paper opens up a number of research lines in various directions. One critical issue of the machine learn-

ing algorithms with respect to traditional network-based approaches is the explainability ot the results, so an

important direction of research is the construction of a model that is fully explainable and do not lose quality

with respect to the measures provided by machine learning algorithms. Another possible direction for future

research is the application of this framework to dierent bipartite networks using dierent databases. Finally,

one could use statistically validated projections31 to build density-based predictions and compare them within

our testing framework. All these studies will be presented in future works.

Methods

Data description. e data we use in this analysis are obtained from the UN-COMTRADE database, Har-

monized System 1992 classication (HS 1992) and include the volumes of the export ows between countries.

e raw database, however, presents internal inconsistencies: for instance, the import declaration of the buy-

ing country might not coincide with the corresponding export declaration of the selling country. e correct

exchanged volumes may be inferred using a Bayesian approach10. e data used in this work are obtained from

this cleaning procedure. e time range covered is 1996–2018 and for each year we have a matrix

V

whose ele-

ment

Vcp

is the amount, expressed in US dollars, of product p exported by country c. e total number of coun-

tries is 169 and the total number of products is 5040.

To binarize the data we determine if a country competitively exports a product by computing the Revealed

Comparative Advantage (RCA) introduced by Balassa38. e RCA of a country c in product p in year y is given by:

R(y)

cp

is a continuous value and represents the ratio between the weight of product p in the export basket of country

c and the total weight of that product in the international trade. Alternatively, the RCA can be seen as the ratio

between the market share of country c relatively to product p and the weight of country c with respect to the total

international trade. is is the standard way, in the economic complexity literature, to remove trivial eects due

to the size of the country and the size of the total market of the product. In this way, a natural threshold equal to

1 can be used to establish whether country c exports product p in a competitive way or not. As a consequence,

we dene the matrix

M

whose binary element

Mcp

tells us if country c is competitive in the export of product

p or not:

In this work we will try to predict future values of

Mcp

using past values of RCA. In Table2 we report the main

features of the country-export bipartite network described by the biadjacency matrix M (in dierent years). e

minimum country degree is zero from 1996 to 2011 due to South Sudan since it gained its independence on

2011. e minimum degree of the products is always zero because there are some products in which on some

years none of the countries has a RCA value greater than 1.

A detailed description of the dataset we used is available at74.

Supervised machine learning and relatedness. Before describing our approach to measure the relat-

edness, here we want to give a quick and intuitive description of how supervised machine learning works. A

simple example consists in the construction of a binary classier that predicts if a patient is healthy or it has

contracted COVID-19 starting from its symptoms (called features). A simple approach consists into drawing an

hyperspace with dimension equal to the number of features (N). Here a patient identies a specic point in this

space. A binary classier could be a simple hyperplane with dimension N−1 splitting the space in two distinct

areas. A patient is then classied as healthy or sick depending on which of the two areas he belongs to. e learn-

ing part consists in the denition of the hyperplane. During the training phase we provide to the model some

patients with their symptoms and the information whether they contracted COVID-19 or not. By minimizing a

suitable loss function the model nds the optimal hyperplane that separates the healthy from the sick.

is is a very simple example of the functioning of a supervised machine learning binary classier (that usu-

ally does not perform well, except in trivial cases where the positive and negative classes can be linearly sepa-

rated). e functioning of more complex architectures like the ones we present in this paper is not so dierent:

what we have is always a classier that learns its task looking at a set of training samples and their correct output.

In our case, we rst x a target product. us a sample is a country and its exported products are the features.

Looking to past data we show to the algorithm if a country aer 5 years will export the target product, and, once

the training phase is ended, the algorithm can be used to predict whether a country will export that product

aer 5 years or not given its present exports. en this procedure is repeated for all products, each of which thus

(1)

R

(y)

cp =V(y)

cp

p′V(y)

cp

′

c′V

(y)

c′p

c′p′V(y)

c

′

p

′

(2)

M

(y)

cp =

1if R

(y)

cp ≥

1

0if R(y)

cp <

1

Content courtesy of Springer Nature, terms of use apply. Rights reserved

11

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

needs a dierent training. In Fig.9 we show a schematic diagram with the general functioning of the machine

learning algorithms discussed here. As a rst step, the algorithm is trained receiving the matrix of the RCAs of

countries (

Xtrain

) and the information whether these countries will export a product or not (

Ytrain

). Once the

algorithm is trained, it receives in input the exports of countries in a year y (not used during the training stage)

and its output is the relatedness of countries with a product.

Training and testing procedure. We want to guess which products will be exported by a country aer

years. To do this, we exploit machine learning algorithms with the goal of (implicitly) understanding the

capabilities needed to export a product from the analysis of the export basket of countries. Since each product

requires a dierent set of capabilities, we need to train dierent models: in this work, we train 5040 dierent

Random Forests, one for each product.

e training procedure is analogous for all the models: they have to connect the RCA values of the products

exported by a country in year y with the element

M(y+�)

cp

, which tells us if country c in year

y+

is competitive

in the export of product p.

In the general case we have export data that covers a range of years [

y0

,

ylast

]. e last year is used for the

test of the model and so the training set is built using only the years [

y0

,

ylast −

]. In this way, no information

about the

years preceding

ylast

is given.

e input of the training set, that we call X

train

, is vertical stack of the R

(y)

matrices from

y0

to

ylast −2

(see

Fig.10). In such a way we can consider all countries and all years of the training set, and these export baskets will

be compared with the corresponding presence or absence of the target product p aer

years; this is because

our machine learning procedure is supervised, that is, during the training we provide a set of answers Y

train

corresponding to each export basket in X

train

. While X

train

is the same for all the models (even if they refer to

dierent products), the output of the training set Y

train

changes on the basis of the product we want to predict.

If we consider the model associated to product p, to build Y

train

we aggregate the columns corresponding to the

target product, C

(y)

p

, of the M matrices from

y0+

to

ylast −

(so we use the same number of years, all shied

by

years with respect to X

train

). is is graphically represented on the extreme le side of Fig.10.

Once the model is trained, in order to perform the test we give as input X

test

the matrix R

(ylast −�)

. Each model

will give us its prediction for the column p of the matrix M

(ylast )

and, putting all the results relative to the single

products together, we reconstruct the whole matrix of scores M

(y

last

)

pred

, which we compare with the empirical one.

ere are various ways to compare the predictions with the actual outcomes, and these performance metrics are

discussed in the following section.

As already mentioned, the same models can be tested against two dierent prediction tasks: either we can

look to the full matrix M

(ylast )

, either we can concentrate only on the possible activations, that is products that

were not present in an export basket and countries possibly start exporting. e set of possible activations is

dened as follows:

Table 2. Main properties of the country-export bipartite network over the years between 1996 and 2018.

Year Number of

countries Number of

products Number of links Min country

degree Max country

degree Avg country

degree Min product

degree Max product

degree Avg product

degree

1996 169 5040 83,754 0 2082 496 0 64 16.6

1997 169 5040 83,666 0 2059 495 0 61 16.6

1998 169 5040 84,976 0 2023 503 0 64 16.9

1999 169 5040 86,071 0 2089 509 0 66 17.1

2000 169 5040 90,327 0 2171 534 0 67 17.9

2001 169 5040 89,242 0 2138 528 0 71 17.7

2002 169 5040 88,849 0 2114 526 0 73 17.7

2003 169 5040 88,153 0 2089 522 0 73 17.5

2004 169 5040 88,662 0 2148 525 0 69 17.6

2005 169 5040 90,807 0 2171 537 0 74 18.0

2006 169 5040 90,429 0 2162 535 0 69 17.9

2007 169 5040 90,152 0 2155 533 0 72 17.9

2008 169 5040 90,505 0 2230 536 0 69 18.0

2009 169 5040 89,388 0 2157 529 0 72 17.7

2010 169 5040 88,742 0 2195 525 0 71 17.6

2011 169 5040 87,801 0 2286 520 0 68 17.4

2012 169 5040 88,368 8 2253 523 0 73 17.5

2013 169 5040 87,482 5 2222 518 0 79 17.4

2014 169 5040 85,724 7 2236 507 0 80 17.0

2015 169 5040 83,151 10 2236 492 0 81 16.5

2016 169 5040 83,012 11 2260 491 0 78 16.5

2017 169 5040 82,992 13 2202 491 0 81 16.5

2018 169 5040 81,059 12 2256 480 0 91 16.0

Content courtesy of Springer Nature, terms of use apply. Rights reserved

12

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

In other words, a pair (c,p) is a possible activation if country c has never been competitive in the export of product

p until year

ylast −

, that is its RCA values never exceeded 0.25. is selection of the test set may look too strict,

however it is key to test our algorithms against situations in which countries really start exporting new products.

Because of the RCA binarization, there are numerous cases in which a country noisily oscillates around RCA

=

1 and, de facto, that country is already competitive in that product; in these cases the RCA benchmark is more

than enough for a correct prediction.

e way to train the models we just described performs better on the full matrix than in the activations. e

reason is probably that the machine learning algorithms recognize the countries because the ones in the training

set and the ones in the test set are the same. When the algorithms receive as input the export basket of a coun-

try they have already seen in the training data, they tend to reproduce the strong autocorrelation of the export

matrices. To avoid this problem we used a k-fold cross validation, which means that we split the countries into k

groups. Since the number of countries is 169, the natural choice is to use k

=

13, so we randomly extract a group

α

of 13 countries from the training set, which is then composed by the remaining 156 countries, and we use

only the countries contained in

α

for the test. In this way each model is meant to make predictions only on the

countries of the group

α

, so to cover all the 169 countries we need to repeat the procedure 13 times, every time

changing the countries in the group

α

. is dierent training procedure is depicted on the right part of Fig.10.

So there will be 13 models associated to a single product and, for this reason, the time required to make the

training is 13 times longer. Like in the previous case, in the training set we aggregate the years in the range [

y0

,

ylast −

]. X

train

is the aggregation of the RCA matrices from

y0

to

ylast −2

and Y

train

is the aggregation of the

column p of the M matrices from

y

0

+

to

ylast −

. In both cases, the countries in the group

α

are removed.

When we perform the test, each models takes as X

test

the matrix RCA

(ylast −�)

with only the rows correspond-

ing to the 13 countries in group

α

and gives as output scores the elements of the matrix M

(y

last

)

pred

. All the 5040

×

13

models together give as output the whole matrix of scores M

(y

last

)

pred

that will be compared to the actual Y

test

=

M

(ylast )

.

Since the output of the machine learning algorithms is a probability, and most of the performance indicators

require a binary prediction, in order to establish if we predict a value of 0 or 1 we have to introduce a threshold.

e value of this threshold we use is the one that maximizes the F1-score. We note that the only performance

measures that do not require a threshold are the ones that consider areas under the curves, since these curves

are built precisely by varying the threshold value.

Figure10 schematically shows the training procedures with and without cross validation.

(3)

(

c,p)∈activations ⇐⇒ R

(y)

cp

<0.25 ∀y∈[y0,y

last

−�

]

Figure9. Schematic diagram with the functioning of machine learning algorithms to assess the relatedness

between countries and a target product. During the training phase the model receives an

Xtrain

matrix with the

training samples (countries) and their features (products) for the years from 1996 to 2008; they are compared

with the

Ytrain

vector that contains the corresponding possible exports the target product in 2001–2013 (that is,

a binary label for each sample). Once the model is trained, it can receive in input new data (that is, an export

basket) and will provide the probability that the label (the possible export of the target product)) is 1. is

progression score is our assessment of the relatedness.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

13

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

Performance indicators. e choice of the performance indicators is a key issue of supervised learning61,75

and, in general, strongly depends on the specic problem under investigation. Here we discuss the practical

meaning of the performance indicators we used to compare the ML algorithms. For all the scores but the areas

under curves, we need to dene a threshold above which the output scores of the ML algorithms are associated

with a positive prediction. For this purpose we choose the threshold that maximizes the F1 score76.

• Precision Precision is dened as the ratio between true positives and positives61. In our case, we predict that

a number of products will be competitively exported by some countries; these are the positives. e preci-

sion is the fraction that counts how many of these predicted products are actually exported by the respective

countries aer

years. A high value of precision is associated to a low number of false positives, that is if

products that are predicted to appear they usually do so.

• mean Precision@k (mP@k) is indicator usually corresponds to the fraction of the top k positives that are

correctly predicted. We considered only the rst k predicted products separately for each country, and then

we average on the countries. is is of practical relevance from a policy perspective, because many new

products appear in already highly diversied countries, while we would like to be precise also in low and

medium income countries. By using mP@k we quantify the correctness of our possible recommendations

of k products, on average, for a country.

• Recall Recall is dened as the ratio between true positives and the sum of true positives and false negatives

or, in other words, the total number of products that a country will export aer

years61. So a high recall is

associated with a low number of false negatives, that is, if we predict that a country will not start exporting

a product, that country will usually not export that product. A negative recommendation is somehow less

usual in strategic policy choices.

• F1 Score e F1 score or F-measure59,60 is dened as the harmonic mean of precision and recall. As such, it is

possible to obtain a high value of F1 only if both precision and recall are relatively high, so it is a very frequent

Figure10. e training and testing procedure with (right) and without (le) cross validation. See the text for a

detailed explanation.

Content courtesy of Springer Nature, terms of use apply. Rights reserved

14

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

choice to assess the general behavior of the classicator. As mentioned before, both precision and recall can

be trivially varied by changing the scores’ binarization threshold; however, the threshold that maximizes the

F1 score is far from trivial, since precision and recall quantify dierent properties and are linked here in a

nonlinear way. e Best F1 Score is computed by nding the threshold that maximizes the F1 score.

• Area under the PR curve It is possible to build a curve in the plane dened by precision and recall by varying

the threshold that identies the value above which the scores are associated to positive predictions. is value

is not misled by the class imbalance46.

• ROC–AUC e Area Under the Receiving Operating Characteristic Curve77,78 is a widespread indicator that

aims at measuring the overall predictive power, in the sense that the user does not need to specify a threshold,

like for Precision and Recall. On the contrary, all the scores are considered and ranked, and for each possible

threshold both the True and the False Positive Rate (TPR and FPR, respectively) are computed. is procedure

allows to dene a curve in the TPR/FPR plane, and the area under this curve represents the probability that a

randomly selected positive instance will receive a higher score than a randomly selected negative instance45.

For a random classier, AUC

=

0.5 . It is well known46,79 that in the case of highly imbalanced data the AUC

may give too optimistic results. is is essentially due to its focus on the overall ranking of the scores: in our

case, misordering even a large number of not exported products does not aect the prediction performance;

one makes correct true negative predictions only because there are a lot of negative predictions to make.

• Matthews coecient Matthews’ correlation coecient80 takes into account all the four classes of the confusion

matrix and the class imbalance issue81,82.

• Accuracy Accuracy is the ratio between correct predictions (true positives and true negatives) and the total

number of predictions (true positives, false positives, false negatives and true negatives)61. In our prediction

exercise we nd relatively high values of accuracy essentially because of the overwhelming number of (trivi-

ally) true negatives (see Table1).

• Negative predictive value Negative predictive value is dened as the ratio between true negatives and negatives,

that are the products we predict will not be exported by a country61. Also in this case, a major role is played

by the very large number of true negatives, that are however less signicant from a policy perspective.

Libraries for the ML models. Most of the models are implemented with scikit-learn 0.24.0 and, as

described in the Supplementary Information, we performed a carefully hyperparameter optimization; in par-

ticular we used (the unspecied hyperparameters values are the default ones):

• sklearn.ensemble.RandomForestClassier(n_estimators = 100, min_samples_leaf = 7)

• sklearn.svm.SVC(kernel = “rbf”)

• sklearn.linear_model.LogisticRegression(solver = “newton-cg”)

• sklearn.tree.DecisionTreeClassier()

• sklearn.tree.ExtraTreesClassier(n_estimators = 100, min_samples_leaf = 8)

• sklearn.ensemble.AdaBoostClassier(n_estimators = 3)

• sklearn.naive_bayes.GaussianNB()

• xgboost.XGBClassier(n_estimators = 15, min_child_weight = 45, reg_lambda = 1.5)

XGBoost is implemented using the library xgboost 1.3.1.

Finally, the neural network is implemented using keras 2.4.3. It consists on two layers with 64 neurons and

activation function RELU and a nal layer with a single neuron and sigmoid activation. We used rmsprop as

optimizer, binary_crossentropy as loss function, accuracy as loss metric and we stopped the training at 10 epochs.

For a detailed explanation about the choice of the hyperparameters the reader is referred to the supplementary

information. Note that in our case tree-based models perform better and it is known in the literature that the

random forest default values already provide very good results79,83,84. In our case, the hyperparameters optimiza-

tion increased our prediction performances of about 10%; in particular, it decreased the number of false positives.

Comparison with other works. Here we compare our Random Forest model with the other approaches

presented in literature that we cited in the introduction section, using a consistent testing framework (4-digits

classication, comparison between the relatedness computed in 2013 and the actual new exported products in

2018 that had RCA<0.25 from 1996 to 2013).

• Hidalgo etal. in 2007 dene the Product Space26 that is still widely used to measure relatedness37. It is a

projection of the country-product bipartite network into the layer of the products (thus dening a proximity

network of the products). e relatedness between a country and a product is dened as the density of the

former around the latter in the Product Space;

• O’Clery etal. in 2021 introduce a new approach to dene the proximity network of the products called

EcoSpace32. From this network they dene the Ecosystem density—that is the likelihood of the appearance

of a product in a country—as a relatedness measure;

• Medo etal. compare dierent approaches to perform a link prediction on bipartite nested networks nding

that the two most performing techniques are the Number of violations of the nestedness property (NViol)85

and the preferential attachment (prefA), where the relatedness is the product of the diversication of the

country with the ubiquity of the product36 .

Content courtesy of Springer Nature, terms of use apply. Rights reserved

15

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

In Table3 we show the AUC-PR, AUC-ROC, Best F1 and mean precision@5 of the dierent models. We nd

that the Random Forest outperforms the other approaches independently from the specic performance metric

used in the comparison.

Data availibility

e data that support the ndings of this study are available from UN-COMTRADE but restrictions apply to the

availability of these data, which were used under license for the current study, and so are not publicly available.

Data are however available from the authors upon reasonable request to the corresponding author and with

permission of UN-COMTRADE. An anonymized and processed version of the data is available at h t tps:// github.

com/ giamb a95/ Sapli ngSim ilari ty/ tree/ main/ data/ RCA to permit the full replicability of our study.

Received: 21 June 2022; Accepted: 13 January 2023

References

1. Athey, S. e impact of machine learning on economics. in e Economics of Articial Intelligence: An Agenda. 507–547 (University

of Chicago Press, 2018).

2. Rodrik, D. Diagnostics before prescription. J. Econ. Perspect. 24, 33–44 (2010).

3. Hausmann, R., Rodrik, D. & Velasco, A. Growth diagnostics. in e Washington Consensus Reconsidered: Towards a New Global

Governance. 324–355 (2008).

4. Baldovin, M., Cecconi, F., Cencini, M., Puglisi, A. & Vulpiani, A. e role of data in model building and prediction: A survey

through examples. Entropy 20, 807 (2018).

5. Hosni, H. & Vulpiani, A. Forecasting in light of big data. Philos. Technol. 31, 557–569 (2018).

6. Rodrik, D. Economics Rules: e Rights and Wrongs of the Dismal Science (WW Norton & Company, 2015).

7. Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. A new metrics for countries’ tness and products’ complex-

it y. Sci. Rep. 2, 723 (2012).

8. Cristelli, M., Gabrielli, A., Tacchella, A., Caldarelli, G. & Pietronero, L. Measuring the intangibles: A metrics for the economic

complexity of countries and products. PloS one 8, e70726 (2013).

9. Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L. Economic complexity: Conceptual grounding of a new

metrics for global competitiveness. J. Econ. Dyn. Control 37, 1683–1691 (2013).

10. Tacchella, A., Mazzilli, D. & Pietronero, L. A dynamical systems approach to gross domestic product forecasting. Nat. Phys. 14,

861–865 (2018).

11. Zaccaria, A., Cristelli, M., Tacchella, A. & Pietronero, L. How the taxonomy of products drives the economic development of

countries. PloS one 9, e113770 (2014).

12. Zaccaria, A., Cristelli, M., Kupers, R., Tacchella, A. & Pietronero, L. A case study for a new metrics for economic complexity: e

Netherlands. J. Econ. Interact. Coord. 11, 151–169 (2016).

13. Gaulier, G. & Zignago, S. Baci: International trade database at the product-level (the 1994–2007 version). inCEPII Working Paper

2010–2023 (2010).

14. Hidalgo, C.A. & Hausmann, R. e building blocks of economic complexity. Proc. Natl. Acad. Sci. 106, 10570–10575 (2009).

15. Albeaik, S., Kaltenberg, M., Alsaleh, M. & Hidalgo, C. Improving the Economic Complexity Index. arXiv preprint arXiv: 1707.

05826 (2017).

16. Gabrielli, A. etal. Why we like the eci+ algorithm. arXiv preprint arXiv: 1708. 01161 (2017).

17. Albeaik, S., Kaltenberg, M., Alsaleh, M. & Hidalgo, C. 729 new measures of economic complexity (addendum to improving the

economic complexity index). arXiv preprint arXiv: 1708. 04107 (2017).

18. Pietronero, L. etal. Economic complexity:“ Buttarla in caciara” vs a constructive approach. arXiv preprint arXiv: 1709. 05272 (2017).

19. Cristelli, M., Tacchella, A. & Pietronero, L. e heterogeneous dynamics of economic complexity. PloS one 10, e0117174 (2015).

20. Cristelli, M., Tacchella, A., Cader, M., Roster, K. & Pietronero, L. On the Predictability of Growth (e World Bank, 2017).

21. Liao, H. & Vidmer, A. A comparative analysis of the predictive abilities of economic complexity metrics using international trade

network. Complexity (2018).

22. Sciarra, C., Chiarotti, G., Ridol, L. & Laio, F. Reconciling contrasting views on economic complexity. Nat. Commun. 11, 1–10

(2020).

23. Frenken, K., Van Oort, F. & Verburg, T. Related variety, unrelated variety and regional economic growth. Region. Stud. 41, 685–697

(2007).

24. Hidalgo, C.A. etal. e principle of relatedness. in International Conference on Complex Systems. 451–457 (Springer, 2018).

25. Teece, D. J., Rumelt, R., Dosi, G. & Winter, S. Understanding corporate coherence: eory and evidence. J. Econ. Behav. Organ.

23, 1–30 (1994).

26. Hidalgo, C. A., Klinger, B., Barabási, A.-L. & Hausmann, R. e product space conditions the development of nations. Science 317,

482–487 (2007).

27. Breschi, S., Lissoni, F. & Malerba, F. Knowledge-relatedness in rm technological diversication. Res. Policy 32, 69–87 (2003).

Table 3. Comparison between our Random Forest model and other approaches proposed in literature.

e Random Forest provides a better assessment of the relatedness with all the performance indicators. e

highestvaluesof each indicator are in bold.

Algorithm AUC-PR Best F1 AUC-ROC mean Precision@5

Random Forest 0.015 0.042 0.689 0.049

Product Space 0.010 0.022 0.637 0.032

EcoSpace 0.013 0.035 0.663 0.042

prefA 0.011 0.024 0.645 0.046

NViol 0.011 0.025 0.607 0.046

Content courtesy of Springer Nature, terms of use apply. Rights reserved

16

Vol:.(1234567890)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

28. Pugliese, E., Napolitano, L., Zaccaria, A. & Pietronero, L. Coherent diversication in corporate technological portfolios. PloS one

14 (2019).

29. Nee, F., Henning, M. & Boschma, R. How do regions diversify over time? Industry relatedness and the development of new

growth paths in regions. Econ. Geogr. 87, 237–265 (2011).

30. Boschma, R. etal. Technological relatedness and regional branching. in Beyond Territory. Dynamic Geographies of Knowledge

Creation, Diusion and Innovation. 64–68 (2012).

31. Pugliese, E. et al. Unfolding the innovation system for the development of countries: Coevolution of science, technology and

production. Sci. Rep. 9, 1–12 (2019).

32. O’Clery, N., Yıldırım, M. A. & Hausmann, R. Productive ecosystems and the arrow of development. Nat. Commun. 12, 1–14 (2021).

33. Gnecco, G., Nutarelli, F. & Riccaboni, M. A machine learning approach to economic complexity based on matrix completion. Sci.

Rep. 12, 1–10 (2022).

34. Hausmann, R., Hwang, J. & Rodrik, D. What you export matters. J. Econ. Growth 12, 1–25 (2007).

35. Bustos, S., Gomez, C., Hausmann, R. & Hidalgo, C. A. e dynamics of nestedness predicts the evolution of industrial ecosystems.

PloS one 7, e49393 (2012).

36. Medo, M., Mariani, M. S. & Lü, L. Link prediction in bipartite nested networks. Entropy 20, 777 (2018).

37. Zhang, W.-Y., Chen, B.-L., Kong, Y.-X., Shi, G.-Y. & Zhang, Y.-C. Industry upgrading: Recommendations of new products based

on world trade network. Entropy 21, 39 (2019).

38. Balassa, B. Trade liberalisation and “revealed” comparative advantage 1. Manchester Sch. 33, 99–123 (1965).

39. Tacchella, A., Zaccaria, A., Miccheli, M. & Pietronero, L. Relatedness in the era of machine learning. arXiv preprint arXiv: 2103.

06017 (2021).

40. Hausmann, R. et al. A roadmap for investment promotion and export diversication: e case of Jordan (Technical Report. Center

for International Development at Harvard University, 2019).

41. Saracco, F., Di Clemente, R., Gabrielli, A. & Pietronero, L. From innovation to diversication: A simple competitive model. PloS

one 10, e0140420 (2015).

42. Tacchella, A., DiClemente, R., Gabrielli, A. & Pietronero, L. e build-up of diversity in complex ecosystems. arXiv preprint arXiv:

1609. 03617 (2016).

43. Che, N. X. Intelligent export diversication: An export recommendation system with machine learning (Technical Report. Interna-

tional Monetary Fund, 2020).

44. Angelini, O. & Di Matteo, T. Complexity of products: e eect of data regularisation. Entropy 20, 814 (2018).

45. Fawcett, T. An introduction to roc analysis. Pattern Recognit. Lett. 27, 861–874 (2006).

46. Saito, T. & Rehmsmeier, M. e precision–recall plot is more informative than the roc plot when evaluating binary classiers on

imbalanced datasets. PloS one 10, e0118432 (2015).

47. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).

48. Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd ACM Sigkdd International Conference

on Knowledge Discovery and Data Mining. 785–794 (2016).

49. Gulli, A. & Pal, S. Deep Learning with Keras (Packt Publishing Ltd, 2017).

50. Pedregosa, F. etal. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

51. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

52. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

53. HosmerJr, D.W., Lemeshow, S. & Sturdivant, R.X. Applied Logistic Regression. Vol. 398 (Wiley, 2013).

54. Quinlan, J. R. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).

55. Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).

56. Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst.

Sci. 55, 119–139 (1997).

57. John, G.H. & Langley, P. Estimating continuous distributions in Bayesian classiers. arXiv preprint arXiv: 1302. 4964 (2013).

58. Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From eory to Algorithms (Cambridge University Press,

2014).

59. Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945).

60. VanRijsbergen, C.J. Foundation of evaluation. J. Docum. (1974).

61. Powers, D. M. Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn.

Tec hnol. (2011).

62. Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell.

Res. 16, 321–357 (2002).

63. G éron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent

Systems ( O’Reilly Media, Inc., 2019).

64. Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).

65. Romer, P. e trouble with macroeconomics. Am. Econ. (2016).

66. Romer, P. M. Mathiness in the theory of economic growth. Am. Econ. Rev. 105, 89–93 (2015).

67. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T. & Jennions, M. D. e extent and consequences of p-hacking in science. PLoS

Biol. 13, e1002106 (2015).

68. Lin, J.Y. New Structural Economics: A Framework for Rethinking Development and Policy (e World Bank, 2012).

69. Fernandes, N. Economic eects of coronavirus outbreak (COVID-19) on the world economy. in Available at SSRN 3557504 (2020).

70. Nana, I. & Starnes, S. When trade falls-eects of covid-19 and outlook (Technical Report. International Finance Corporation-World

Bank Group, 2020).

71. Hidalgo, C. A. Economic complexity theory and applications. Nat. Rev. Phys. 3, 92–113 (2021).

72. Lin, J., Cader, M. & Pietronero, L. What African industrial development can learn from east Asian successes. in EM COmpass 88

(2020).

73. Pugliese, E. & Tacchella, A. Economic complexity for competitiveness and innovation: A novel bottom-up strategy linking global and

regional capacities (Technical Report. Joint Research Centre (Seville site), 2020).

74. Patelli, A., Pietronero, L. & Zaccaria, A. Integrated database for economic complexity. Sci. Data 9, 1–13 (2022).

75. Caruana, R. & Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. in Proceedings of the 23rd Inter-

national Conference on Machine Learning. 161–168 (2006).

76. Lipton, Z.C., Elkan, C. & Naryanaswamy, B. Optimal thresholding of classiers to maximize f1 measure. in Joint European Confer-

ence on Machine Learning and Knowledge Discovery in Databases. 225–239 (Springer, 2014).

77. Hanley, J. A. & McNeil, B. J. e meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology

143, 29–36 (1982).

78. Hanley, J. A. & McNeil, B. J. e meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology

143, 29–36 (1982).

79. Hanley, J. A. & McNeil, B. J. e meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology

143, 29–36 (1982).

Content courtesy of Springer Nature, terms of use apply. Rights reserved

17

Vol.:(0123456789)

Scientic Reports | (2023) 13:1481 | https://doi.org/10.1038/s41598-023-28179-x

www.nature.com/scientificreports/

80. Matthews, B.W. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochim. Biophys. Acta

(BBA)-Protein Struct. 405, 442–451 (1975).

81. Chicco, D. & Jurman, G. e advantages of the Matthews correlation coecient (mcc) over f1 score and accuracy in binary clas-

sication evaluation. BMC Genomics 21, 6 (2020).

82. Boughorbel, S., Jarray, F. & El-Anbari, M. Optimal classier for imbalanced data using Matthews correlation coecient metric.

PloS one 12, e0177678 (2017).

83. Genuer, R., Poggi, J.-M. & Tuleau, C. Random forests: Some methodological insights. arXiv preprint arXiv: 0811. 3619 (2008).

84. Probst, P., Wright, M. N. & Boulesteix, A.-L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data

Mining Knowl. Discov. 9, e1301 (2019).

85. Grimm, A. & Tessone, C. J. Analysing the sensitivity of nestedness detection methods. Appl. Netw. Sci. 2, 1–19 (2017).

Author contributions

Conceptualization: A.Z., A.T.; Methodology: all; Investigation, Soware: G.A.; Validation: A.Z., A.T., G.A.; Writ-

ing, Review and editing: all; Supervision: L.P.

Competing interests

e authors declare no competing interests.

Additional information

Supplementary Information e online version contains supplementary material available at https:// doi. org/

10. 1038/ s41598- 023- 28179-x.

Correspondence and requests for materials should be addressed to A.Z.

Reprints and permissions information is available at www.nature.com/reprints.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and

institutional aliations.

Open Access is article is licensed under a Creative Commons Attribution 4.0 International

License, which permits use, sharing, adaptation, distribution and reproduction in any medium or

format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the

Creative Commons licence, and indicate if changes were made. e images or other third party material in this

article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the

material. If material is not included in the article’s Creative Commons licence and your intended use is not

permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from

the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

© e Author(s) 2023

Content courtesy of Springer Nature, terms of use apply. Rights reserved

1.

2.

3.

4.

5.

6.

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-

scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By

accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these

purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal

subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription

(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will

apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within

ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not

otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as

detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may

not:

use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access

control;

use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is

otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in

writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal

content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,

royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal

content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any

other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or

content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature

may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied

with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,

including merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed

from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not

expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

Available via license: CC BY 4.0

Content may be subject to copyright.