ArticlePDF Available

Abstract and Figures

Freedom of expression is a core human right, yet the forces that seek to suppress it have intensified, increasing the need to develop tools that can measure the rates of freedom globally. In this study, we propose a novel freedom of expression index to gain a nuanced and data-led understanding of the level of censorship across the globe. For this, we used an unsupervised, probabilistic machine learning method, to model the status of the free expression landscape. This index seeks to provide legislators and other policymakers, activists and governments, and non-governmental and intergovernmental organisations, with tools to better inform policy or action decisions. The global nature of the proposed index also means it can become a vital resource/tool for engagement with international and supranational bodies.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
Research
Mapping theglobal free expression landscape using machine learning
SandraOrtega‑Martorell1· RyanA.A.Belleld1· SteveHarrison2· DreweryDyke3· NikWilliams4· IvanOlier1
Received: 5 September 2023 / Accepted: 30 October 2023
© The Author(s) 2023 OPEN
Abstract
Freedom of expression is a core human right, yet the forces that seek to suppress it have intensied, increasing the need
to develop tools that can measure the rates of freedom globally. In this study, we propose a novel freedom of expression
index to gain a nuanced and data-led understanding of the level of censorship across the globe. For this, we used an
unsupervised, probabilistic machine learning method, to model the status of the free expression landscape. This index
seeks to provide legislators and other policymakers, activists and governments, and non-governmental and intergov-
ernmental organisations, with tools to better inform policy or action decisions. The global nature of the proposed index
also means it can become a vital resource/tool for engagement with international and supranational bodies.
Article highlights
We propose a novel methodology using machine learn-
ing to model freedom of expression on a global scale.
The proposed approach is in nature less prone to sub-
jective interpretation and possibly more rigorous than
previous rankings.
The resulting freedom of expression indices can
be used as a powerful tool to better inform policy or
action decisions.
Keywords Generative topographic mapping· Machine learning· Data visualisation· Freedom of expression· Human
rights, censorship, media freedom, academic freedom, digital freedom
1 Introduction
In an increasingly atomised, polarised world, the free
expression of ideas is more important than ever. But
while the need for free expression has increased, so have
the forces which seek to suppress it and the technolo-
gies which enable its suppression. Freedom of expres-
sion is among the core human rights set out in the United
Nations (UN) Universal Declaration of Human Rights [1],
the International Covenant on Civil and Political Rights
Supplementary Information The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s42452- 023-
05554-x.
* Ivan Olier, I.A.OlierCaparroso@ljmu.ac.uk; Sandra Ortega-Martorell, S.OrtegaMartorell@ljmu.ac.uk; Ryan A. A. Belleld,
R.A.Belleld@2019.ljmu.ac.uk; Steve Harrison, S.Harrison1@ljmu.ac.uk; Drewery Dyke, drewery@indexoncensorship.org; Nik Williams,
nik@indexoncensorship.org | 1Data Science Research Centre, Liverpool John Moores University, LiverpoolL33AF, UK. 2Liverpool Screen
School, Liverpool John Moores University, LiverpoolL35RF, UK. 3Rights Realization Centre, LondonN167NL, UK. 4Index onCensorship,
LondonEC2A3BA, UK.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
(1966; entered into force in 1976) and subsequent trea-
ties, including those in Europe, the Americas and Africa,
for example, the European Convention on Human Rights
[2], entry into force in 1953; the American Convention on
Human Rights [3], entry into force in 1978; and the African
Charter on Human and Peoples’ Rights [4]; entry into force
in 1986.
These global mechanisms uphold the principle that
“everyone has the right to freedom of expression.“ Unfor-
tunately, in today’s world, this right is facing numerous
challenges. Rapid advancements in technology have
provided new avenues for those who wish to suppress
freedom of expression. Censorship and surveillance tools
are becoming more sophisticated and readily available,
enabling governments and other entities to monitor and
control the ow of information.
Censorship continues to operate across the globe, using
several diverse tactics and drivers, including state laws or
practices that restrict expression beyond what is included
in international instruments [5]. Examples of this include
the mixture of technological and legislative mechanisms
deployed by the Chinese state to block access to online
resources (colloquially called the Great Firewall of China
- see for example [6]), the reduction of civil space for pro-
tests and other acts of civic participation, and the use of
strategic lawsuits against public participation (SLAPPs)1 [7]
to prevent journalists and other public watchdogs from
being able to report in the public interest.
With the entry into force of such standards, the UN and
regional inter-governmental organisations established
bodies or mechanisms to assess state adherence to the
standards. This required techniques of assessment and
measurement which had been developed by scholars
starting in the 1930s with Greer’s study into the Reign of
Terror in revolutionary France [8] and which have become
increasingly sophisticated in terms of data sources and
statistical techniques – see, for example, [911]. The pur-
pose of measuring human rights is to assess the extent
to which these rights are upheld in theory, manifested in
reality, and advanced through eective policies [12]. By
conducting such measurements, we aim to identify areas
where human rights are being violated or neglected so
that appropriate solutions to address these challenges can
be developed.
This research introduces the Index Index, an innovative
analysis of global censorship practices, and proposes a
novel methodological approach to calculate it. Specically,
the Index Index focuses on academic, digital, and media/
press freedom. It uses Generative Topographic Mapping
(GTM, [13, 14]), an unsupervised Machine Learning algo-
rithm, to cluster and visualise countries in terms of their
levels of freedom of expression. By utilizing established
and robust indices and metrics, this research oers a com-
prehensive and nuanced assessment of the international
landscape of free expression. It sheds light on the various
threats that impede, curtail, suppress, or manipulate the
public’s right to access information, express themselves,
and engage with others2. Unlike recent studies that solely
rely on data related to internet accessibility, such as
[1517], the Index Index integrates a wide range of exist-
ing analyses and expertise to provide a comprehensive
ranking of the free expression environment in all countries
or nations where sucient data is available.
2 Materials andmethods
2.1 Data andresources thatinformed
thedevelopment oftheIndex Index
As this is an index of indices, the raw data comprises exist-
ing indices and metrics developed by a range of dierent
national and international bodies such as research insti-
tutes, as well as international non-governmental organi-
sations. Each pre-existing index has been selected based
on several criteria, including its usage and reference by
the wider community of practitioners, the robustness of its
methodology, and its geographic scope. Individually, they
are the product of internal testing and iterative develop-
ment and as a result are used in a range of public advocacy
and campaigning initiatives, including being referenced
by international bodies, such as European institutions and
UN bodies. For instance, V-Dem is funded by, among oth-
ers, the European Commission, the Swedish Ministry of
Foreign Aairs and the World Bank [18]; the World Press
Freedom Index is cited by the European Parliament in its
Normandy Index 2023 [19]; and the Committee to Project
Journalists has submitted evidence to the UN Special Rap-
porteur on the promotion and protection of the right to
freedom of opinion and expression [20].
We selected these indices on account of their robust-
ness and completeness. The datasets were collated after
in-depth conversations between the project team. Several
other sources were explored and ultimately discounted.
Further details about the selected and discounted data
1 SLAPPs are vexatious lawsuits targeting journalists and other
whistleblowers whereby powerful individuals and institutions use
civil lawsuits to intimidate and nancially threaten critics [7].
2 For the purpose of measurement, the term country’ refers to a
state or political entity, including Kosovo, Palestine, and Taiwan,
which are not recognized as states by the UN.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x Research
sources are provided below and in the Supplementary
Materials.
2.1.1 V‑Dem (varieties ofDemocracy)
The Varieties of Democracy (V-Dem) Research Project [18]
oers a nuanced and extensive analysis of democratisa-
tion, examining various dimensions and subcomponents.
The data forming the foundation of V-Dem’s component
variables are collected through surveys administered to a
network of over 3,500 Country Experts. The project aims
to ensure a minimum of ve experts for each indicator
per country, facilitating a robust and diverse perspective.
By employing a wide range of indicators and involving a
substantial number of experts, V-Dem strives to provide a
comprehensive understanding of democracy’s complexi-
ties and variations across countries.
The V-Dem database offers a comprehensive range
of democratic measures, surpassing the scope of the
Index Index. Recognising this, the research team carefully
extracted and isolated 171 variables from the extensive
dataset that held signicance for the model. These vari-
ables encompassed not only the three freedoms empha-
sised in the Index Index (academic, digital, and media/
press freedom) but also encompassed broader contextual
concerns, such as corruption and accountability measures,
alongside various civil liberties.
2.1.2 World press freedom index
The World Press Freedom Index, compiled by Reporters
Without Borders (RSF), serves the purpose of comparing
the level of press freedom across 180 countries and ter-
ritories [21]. It provides a snapshot of the press freedom
situation in these locations during the preceding calendar
year prior to its publication. The Index utilises a scoring
system ranging from 0 to 100 to rank each country or ter-
ritory. This score is derived from two key components: a
quantitative assessment of abuses against journalists and
media outlets, and a qualitative analysis of the overall situ-
ation within each country or territory.
To obtain the qualitative analysis, RSF distributes a
questionnaire in 23 languages to press freedom specialists,
including journalists, researchers, academics, and human
rights defenders. Following the calculation of scores, the
countries and territories are arranged in an ordinal list from
1 to 180, with 1 indicating the highest level of press free-
dom. It is this raw score calculated for each country that
we have utilised as a variable in our model’s development.
2.1.3 Committee toprotect journalists (CPJ)
The Committee to Protect Journalists (CPJ) collects com-
prehensive data [22] on the imprisonment, killing, and
disappearance of journalists. The CPJ’s annual imprison-
ment census provides a snapshot of incarcerated journal-
ists each year. However, this census does not account for
the numerous journalists who are imprisoned and released
throughout the year. Additionally, journalists who go miss-
ing or are abducted by non-state entities such as criminal
gangs or militant groups are not included in the prison
census.
Since 1992, the CPJ has maintained detailed records
of journalist fatalities. Their researchers independently
investigate and verify the circumstances surrounding each
death. The CPJ’s database encompasses both “conrmed”
cases, where it is evident that a journalist was murdered
as a direct reprisal for their work, during combat or cross-
re, or while undertaking a hazardous assignment, as well
as “unconrmed” cases that involve unclear motives but
may have a potential link to journalism. Ongoing research
allows for the reclassication of cases. It is important to
note that while both “conrmed” and “unconrmed” cases
are included in the CPJs database, targeted statistical anal-
yses only include the “conrmed” cases.
For the development of our model, we extracted the fol-
lowing information from the CPJ database for each coun-
try: the number of journalists and media workers killed,
the number of journalists imprisoned, and the number
of missing journalists. These variables serve as valuable
inputs in our model development process.
2.1.4 UNESCO observatory ofkilled journalists
The Observatory of Killed Journalists, managed by UNE-
SCO [23], serves as a visual representation of the institu-
tion’s strategic commitment to combating impunity and
addressing crimes against journalists. This initiative aligns
with the General Conference 36C/Resolution 53 (2011),
which urges UNESCO to collaborate with other United
Nations bodies in monitoring the state of press freedom
and the safety of journalists. In order to provide compre-
hensive insights, the study analyses information supplied
by UN Member States, which is then categorised as either
Resolved or Ongoing/Unresolved, shedding light on the
progress of investigations into journalist deaths. To con-
duct this analysis, we extracted data from the Observatory,
specically the number of journalists killed in each coun-
try, which was used as a variable in our model.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
2.1.5 Cost ofshutdown tool (COST)
COST [24], developed by NetBlocks, is an invaluable data-
driven online service that empowers a wide range of users,
including journalists, researchers, advocates, policymakers,
businesses, and others, to swiftly and eortlessly generate
approximate assessments of the economic impact caused
by Internet disruptions. By leveraging established meth-
odologies pioneered by esteemed institutions such as
the Brookings Institution and the Collaboration on Inter-
national ICT Policy for East and Southern Africa (CIPESA),
COST accurately gauges the potential economic conse-
quences of internet shutdowns, mobile data blackouts,
and social media restrictions. This powerful tool utilises
publicly available economic indicators that pertain to the
global digital economy. We utilised the COST platform to
construct an additional variable for model development,
specically capturing the hourly cost of shutdown in each
country, expressed in USD.
2.1.6 Global cybersecurity index
The Global Cybersecurity Index (GCI) [25] is a reputable
source that evaluates countries’ dedication to cybersecu-
rity on a global scale, with the aim of raising awareness
about the signicance and diverse aspects of the issue.
Given that cybersecurity encompasses a wide range of
applications spanning multiple industries and sectors,
each country’s level of development and engagement
is assessed across ve pillars: Legal Measures, Technical
Measures, Organisational Measures, Capacity Develop-
ment, and Cooperation. These pillars are then combined
to form an overall score.
The GCI adopts a multi-stakeholder approach and relies
on the expertise and capabilities of various organisations.
Its objectives include enhancing the survey’s quality, fos-
tering international cooperation, and promoting knowl-
edge exchange in the eld of cybersecurity. The initiative
is built upon the foundation and framework provided by
the ITU Global Cybersecurity Agenda (GCA). To develop
the model, the GCI score for each country were utilised
as a variable.
2.2 Data notincluded inthedevelopment
oftheIndex Index
The model does not include metrics which have no imme-
diate bearing on, or a proxy indication of, issues relating
to free expression. We nevertheless provide socio-eco-
nomic data and broader contextual information that can
be viewed when viewing data from a specic country on
the online map that accompanies this project, in a hover-
over box that appears while viewing specic country data.
The interactive map is included in the Supplementary
Materials.
We included this information to provide broadly cor-
ollary metrics that immediately show texture and depth
to the metrics featured. This rst revived iteration of the
Index Index is provided alongside contextual data on the
UN Human Development Index (HDI), the Gross Domes-
tic Product (GDP) per capita as compiled by the UN, and
the Population data as compiled by the United Nations
Population Fund (UNFPA), enabling the reader to explore
links—if any—between this data and the core metrics.
2.3 A note onthepolitical entities included
Our modelling and visualization are inuenced by the
indices comprising the dataset. This inuence becomes
evident through the inclusion and exclusion of various
countries and political entities in the Index Index. The
Index Index incorporates both UN and non-UN member
states, countries with observer status, and other nations
or regions that may be autonomous parts of other states.
For example, Kosovo and Taiwan are included in the Index
Index despite not being recognized as UN member states,
while Greenland, an autonomous part of Denmark, lacks
available data.
Moreover, the rankings of the British Overseas Ter-
ritories, which are autonomous parts of the UK, and the
overseas parts of France and the Netherlands, are attrib-
uted to their respective states. However, it is important to
note that the nature of these overseas territories varies
signicantly.
Unfortunately, due to gaps in the datasets, the Index
Index lacks data for several countries, including (but not
limited to) Liberia, Papua New Guinea, Federated States
of Micronesia, Kiribati, Palau, Tonga, Tuvalu, Samoa, Domi-
nica, Saint Kitts and Nevis, Saint Lucia, Saint Vincent and
the Grenadines, Grenada, Andorra, Liechtenstein, San
Marino, and the Holy See.
2.4 Machine learning method used
We modelled the data using an unsupervised, probabil-
istic machine learning method, namely Generative Topo-
graphic Mapping (GTM) [13, 14]. The GTM is a machine
learning algorithm designed for clustering, data strati-
cation and visualisation, which has sound foundations in
probability theory and provides a principled alternative to
the Self-Organising Map (SOM) algorithm [26]. Rather than
predicting whether two countries should be allocated the
same cluster, the GTM predicts the probability of belong-
ing to the same cluster. With this method, we created data
clusters, where each of them represents a group of one or
more countries that share similar characteristics. The GTM
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x Research
performs a soft assignment of countries to clusters. This
is a robust approach that considerably reduces the risk of
countries being assigned to the wrong clusters.
The GTM assumes that the observed data is generated
through a nonlinear and topology-preserving mapping
from a low-dimensional latent space in
𝔏
onto a manifold
embedded in the high-dimensional space,
𝔇
, where the
observed data reside. The function used to generate this
embedding takes the form:
where
𝐮
is a point in the L-dimensional latent space,
𝐖
is a matrix containing parameters that govern the
mapping, and
Φ
consists of
S
basis functions
ΦS
, which
for the standard GTM are radially symmetric Gaussians.
If a prior probability distribution of
is defined for the
latent space, then the distribution of data
𝐱
, for a given
𝐮
and
𝐖
, is chosen to be a radially-symmetric Gaussian
centred on
𝐲=𝐖Φ(𝐮)
having a variance of
β1
so that:
where
𝐲
is as dened in (1). The GTM latent space is con-
strained to form a uniform discrete grid of
M
centres, anal-
ogous to the distribution of SOM units, in the form:
Each of these centres is responsible for generating a
spherical Gaussian density function in the D-dimensional
data space. In this sense, the GTM can be understood as
a special case of a Gaussian mixture model in which each
component in the mixture defines the probability of an
observable data point (e.g., a country) given a latent
centre. Therefore, assuming the observed data points
xn
are independent and identically distributed (i.i.d.), the
parameter matrix
𝐖
and the inverse variance
β
can be
determined by maximising the log-likelihood given by:
where.
In Eq.(5),
yi
is defined using Eq.(1) and is a D-dimen-
sional point the manifold embedded in the data
space for the point
ui
in the latent space. The adap-
tive parameters of the model are optimised using the
(1)
𝐲=𝐖Φ(𝐮)
(2)
p
(𝐱
|
𝐮,𝐖,𝜷)=
(
𝛽
2𝜋
)D
2
exp
{
𝛽
2
||
𝐲𝐱
||
2
}
(3)
p
(u)=1
M
M
i=1
δ(uui
)
(4)
𝐋
(𝐖,β
|
𝐗)=
N
n=1
lnp
(
xn
|
𝐖,𝜷
)
=
N
n=1
ln
{
1
M
M
i=
1
p
(
xn
|
ui,𝐖,β
)}
(5)
p
xn
ui,𝐖,β
=
β
2𝜋
D
2
exp
β
2

y𝐢xi

2
expectation-maximisation (EM) algorithm. Matrix
𝐖
is updated as the solution to the following system of
equations:
where
Φ
is a
M×S
matrix with elements
𝜑S(
u
i)
;
X
is
the observed data matrix
N×D
matrix with elements
xnm
;
𝐑
is the matrix of responsibilities that define the
probability of the data point
xn
being generated by the
latent point
ui
defined as
Rin
=p
(
u
i|
x
n
,W
old
,𝛽
old )
; and
G
is a diagonal matrix with elements
n=1NRin
. Finally, the
β
parameter is updated according to the following:
Note that the observed data
X
requires to be normal-
ised before training (e.g. by centring the data around zero
and scaling the data so that the new standard deviation
becomes 1). For the full details on the calculations, please
refer to the original publication [13].
The GTM can not only assign data points to clusters but
also can visualise them in a cluster membership map by
projecting the latent centres. The GTM latent space can
serve for visualisation purposes if its number of dimen-
sions is 1 or 2, to which the mode probability (i.e. the high-
est cluster probability) is used to decide the country’s clus-
ter membership.
For the trained GTM, each cluster centre
yi
, henceforth
named as a reference vector, is a prototype of the data.
Reference maps associated with each of the variables
were generated based on the reference vector compo-
nents. These reference maps can be visualised in the form
of heatmaps and the high and low values can be used
to interpret the relationship between each variable and
each country cluster. This can provide further information/
interpretation about the role of each variable used in the
model.
2.5 Index Index ranking
A ranking was then generated by leveraging aggregated,
normalised information from the reference maps that
represent the relevant extracted variables. In this sense,
a country will be given a score, which is calculated as
follows:
where
yi
is the normalised reference vector or centre
yi
. Countries are ranked according to their calculated
(6)
ΦT
G
old
ΦW
T
new
−Φ
T
R
old
X=
0
(7)
(
𝛽new
)
1
=
1
ND
N
n=1
M
i=1
Rin
||
y𝐢xn
||2
(8)
Score
n=
M
i=1
Rin
y
i
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
score. This ranking is not a direct ranking of countries,
but instead, it is a ranking of the dierent country clusters
that were automatically identied from the data using
GTM. This means that in a single position of the rank-
ing, we could have more than one country sharing such
a position. The developed ranking was then divided into
10 groups according to its distribution of scores to form
the 10 deciles of the scale of free expression, where lower
deciles represent higher levels of free expression and
higher deciles represent lower levels.
3 Results
3.1 Country clustering visualisation
The visualisation in Fig.1 (representing the cluster
membership map in the GTM latent space of the devel-
oped model) shows a representation of a different kind
of world map, where every circle represents a cluster,
and each cluster is representing one or more coun-
tries. In accordance with the original GTM publication
[13], we set the number of clusters to 100 (arranged in
a grid of
10 ×10
) and the number of basis functions to
16 (arranged in a grid of
4×4
). The GTM regularisation
term was optimised, and the one resulting in the lowest
error (negative log-likelihood) was selected (TableS1,
Supplementary Material). As discussed earlier, the GTM
predicted the probability of countries belonging to the
same clusters in the below visualisation. The top-left-
hand side of the visualisation represents the highest
deciles of free expression, while the top right represents
the lowest. This visualisation of the data is intended to
help identify commonalities or differences and related
factors to better understand the changing free expres-
sion landscape. Figure1 shows the countries allocated
to a selection of clusters. The full allocation of countries
per cluster can be found in the Supplementary Materials.
Fig. 1 Country clustering visualisation (cluster membership map) colour-coded by the cluster ranking. The countries allocated to a selection
of clusters are displayed. Cluster separation indicates similarity (i.e. closer clusters are more similar than further clusters)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x Research
3.2 Visualisation ofthereference maps
A selection of reference maps is presented in Fig. 2,
showing the distribution of the clusters (and therefore
countries) against the selected variables. They are organ-
ised by IoC freedom index areas: academic, media and
digital freedom. The reference maps corresponding to all
the variables used can be found in the Supplementary
Materials.
3.3 Global ranking ofcountries/nations—deciles
The Index Index groups states’ free expression rank-
ing into ten categories - deciles - intended to convey
the complexity and nuance of the global practice of
Fig. 2 Selected reference maps for 15 of the variables used to produce the GTM model
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
Table 1 Global ranking of free expression by deciles. Lower ranks represent higher levels of free expression while higher ranks represent
lower levels of freedom
The countries within each grouping are ranked alphabetically and do not present a ranking within the groupings
Countries and nations Global rank
Austria, Belgium, Canada, Denmark, Estonia, Finland, Germany, Iceland, Ireland, Latvia, Lithuania, Luxembourg, Netherlands,
New Zealand, Norway, Sweden, Switzerland 1
Australia, Barbados, Cape Verde, Chile, Costa Rica, Cyprus, Dominican Republic, France, Israel, Italy, Jamaica, Japan, Malta, Por-
tugal, Slovakia, Spain, Trinidad and Tobago, Uruguay 2
Czechia, Greece, Moldova, Namibia, Panama, Romania, South Africa, South Korea, Suriname, Taiwan, Tunisia, United Kingdom,
United States of America, Vanuatu 3
Argentina, Armenia, Benin, Botswana, Bulgaria, Croatia, Georgia, Ghana, Guyana, Hungary, Kosovo, Mongolia, Montenegro,
Peru, Poland, Sao Tome and Principe, Senegal, Seychelles, Slovenia, Solomon Islands, Timor-Leste 4
Albania, Ecuador, Guatemala, Guinea-Bissau, Honduras, Madagascar, Malawi, Maldives, Mauritius, Mozambique, Niger, Nigeria,
Paraguay, Sierra Leone, The Gambia 5
Angola, Bhutan, Bolivia, Bosnia and Herzegovina, Brazil, Indonesia, Ivory Coast, Jordan, Kenya, Kyrgyzstan, Lesotho, Mexico,
Nepal, North Macedonia, Philippines, Serbia, Singapore 6
Burkina Faso, Central African Republic, Colombia, Comoros, Democratic Republic of the Congo, El Salvador, Fiji, Gabon, Haiti,
India, Kuwait, Lebanon, Malaysia, Mali, Morocco, Pakistan, Sri Lanka, Tanzania, Togo, Ukraine, Zambia 7
Algeria, Bangladesh, Cameroon, Chad, Djibouti, Ethiopia, Guinea, Iraq, Kazakhstan, Libya, Mauritania, Rwanda, Thailand,
Uganda, Zimbabwe 8
Afghanistan, Azerbaijan, Egypt, Hong Kong, Oman, Palestine, Qatar, Republic of the Congo, Russia, Somalia, Sudan, Türkiye,
Uzbekistan, Venezuela, Vietnam 9
Bahrain, Belarus, Burma/Myanmar, Burundi, Cambodia, China, Cuba, Equatorial Guinea, Eritrea, Eswatini, Iran, Laos, Nicaragua,
North Korea, Saudi Arabia, South Sudan, Syria, Tajikistan, Turkmenistan, United Arab Emirates, Yemen 10
Fig. 3 World map showing the global free expression ranking
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x Research
censorship, see Table1. The deciles ensure the eventual
ranking does not erase distinctions between countries/
nations, but also presents a clear picture of the global
free expression environment. A world map representa-
tion showing the global ranking of censorship by deciles
is shown in Fig.3, with the highest deciles of free expres-
sion represented in green (lowest values), and the lowest
levels in red (highest values). The rankings per area of
freedom (academic, digital and media) can be found in
the Supplementary Materials.
4 Discussion
4.1 Creating meaningful representations using
GTM
Due to the challenges of data collection and data rep-
resentation, there exist high levels of uncertainty that
could potentially have a negative impact on the model-
ling process. GTM, being a robust probabilistic algorithm,
calculates the probability of a cluster being responsible
for a country while accounting for this uncertainty. In
this analysis, the GTM cluster centres or prototypes serve
as representations of freedom of expression, effectively
stratifying the landscape of freedom of expression. A
crucial property of GTM is the preservation of data topol-
ogy, signifying that similar clusters will be positioned
closer together in the latent space. Consequently, even
if the most probable cluster assigned to a country does
not precisely correspond to the actual one, it is expected
to be closer to the correct one. In contrast, popular clus-
tering techniques such as k-means, lacking probabilistic
foundations, are not specifically designed to handle such
levels of uncertainty.
In addition, GTM is particularly useful for crafting
meaningful data representations by transforming high-
dimensional information into a lower-dimensional space
while retaining the intrinsic structure of the data. Alter-
native visualisation algorithms such as t-SNE [27] and
UMAP [28] have gained popularity for data visualisa-
tion through dimensionality reduction. However, these
techniques do not possess the capability to extract data
prototypes in the manner that GTM does, which poses a
challenge when it comes to stratifying countries based
on freedom of expression. In contrast, GTM creates a
visualisation (the cluster membership map) that cap-
tures the underlying patterns, relationships, and clus-
ters within the data by mapping data points to these
prototypes. This process allows for a more comprehen-
sible and interpretable depiction of complex data, aid-
ing in knowledge extraction and facilitating insights
that might otherwise remain hidden in the original
high-dimensional space. GTM has found applications in
various real-world scenarios across different domains.
In bioinformatics, it has been used to model protein
structures and understand their conformational spaces,
providing insights into protein folding and function,
which is crucial for drug design [29], disease understand-
ing [30], and other biomedical applications [3133]. It
has also been used to model species distributions and
understand ecological patterns, e.g., to understand spe-
cies composition of a forest to assess biodiversity [34],
and to study the ecological status of streams [35]. It has
also been used in the financial sector, e.g., for early iden-
tification of business opportunities [36]. These examples
highlight the versatility of the GTM in addressing real-
world challenges across diverse fields. However, to the
best of our knowledge, GTM has not been used before to
study censorship or freedom of expression, hence mak-
ing this a positional article in the application of GTM
within this field.
4.2 Interpreting thevisualisations (membership
andreference maps)
The country clustering visualisation (cluster membership
map) presented in Fig.1 provides another way to exam-
ine the data. It can then be used to show: (i) the details
of the individual countries within each cluster, indicating
that they share very similar characteristics; (ii) the loca-
tion of the countries across all clusters, allowing for the
representation of a certain degree of similarity if they are
allocated to neighbouring clusters; and (iii) the assigned
colour-coded ranking to each of the clusters, and therefore
to the countries that these clusters represent.
The reference maps provide further information/inter-
pretation about the role played by each variable in the
development of the GTM model, with high values repre-
senting areas of the maps where the variables had a higher
inuence, and low values representing otherwise. When
exploring the reference maps of the academic freedom
variables from Fig.2, which include freedom of academic
exchange and dissemination (Fig.2A), freedom of discus-
sion (Fig.2B), and freedom to research and teach (Fig.2C);
we can see that the higher values for those variables are on
the left-hand side of the reference maps, which coincide
with the areas with better rankings of freedom (see Fig.1).
Regarding the media freedom variables, we can also
see high values on the left-hand side of Fig.2D andF
which represent civil liberties and political civil liberties,
respectively. In the case of the public sector corruption
index (Fig.2I), we see high values in the top right quadrant
where the clusters represent countries such as Nicaragua,
Yemen, Somalia, and Eswatini, among others. Also in this
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
area, we see high values in the reference map of regime
corruption (Fig.2J).
In the case of the digital freedom variables, we can see
high levels of online media fractionalization (Fig.2N) in the
cluster of Eritrea, North Korea, and the United Arab Emir-
ates, and high levels of internet censorship eort (Fig.2M,
which higher values meaning that the governments allow
generally unrestricted Internet access) in countries rep-
resented by a higher level of freedom (left-hand side of
Fig.1). These examples illustrate how the role of each of
the variables used to produce the GTM model can be stud-
ied by visualising their respective reference maps.
4.3 Insights fromtheglobal ranking ofcountries/
nations
A closer inspection of the global ranking in Table1 and the
rankings in the dierent areas of freedom (academic, digi-
tal, and media/press) show that Europe dominates the list
of countries that were in the 1st decile (least censorship/
greatest freedom) for all three freedoms. These include
Austria, Belgium, Denmark, Estonia, Finland, Germany, Ice-
land, Ireland, Latvia, Lithuania, Luxembourg, Netherlands,
Norway, Sweden and Switzerland. The G20 Member States
are spread across the full Index Index. Using the global
ranking, Australia, Canada and Germany are the highest
place members (1st decile), with Saudi Arabia and China
being the lowest (10th decile).
For the global ranking, G7 Member States are placed:
Canada = 1st, France = 2nd, Germany = 1st, Italy = 2nd,
Japan = 2nd, United Kingdom = 3rd and USA = 3rd decile.
Much like G20 Members, UN Security Council members,
including both permanent and non-permanent members,
are spread across the full Index. Using the global ranking,
Ireland and Norway are the highest place members (1st
decile) and China and the United Arab Emirates are the
lowest (10th decile). Out of the Permanent members,
France (2nd decile) is the highest-ranking member, with
Russia (9th decile) the lowest. Across the three freedoms,
the United Kingdom is consistently found in the 3rd decile.
This is similar to the United States of America. However,
the latter is in the 4th decile for academic freedom.
The countries that were in the 10th decile for all three
freedoms are Bahrain, Belarus, Burma/Myanmar, Cuba,
Equatorial Guinea, Eritrea, Iran, Laos, North Korea, Syria,
Turkmenistan, United Arab Emirates and Yemen.
4.4 Use andpotential impact oftheIndex Index
By making available indices that provide objectively
veriable, clearly ranked data about rates of freedom of
expression, in contrast to or perhaps as linked to academic
freedom, the Index Index seeks to provide legislators and
other policymakers, activists and governments, and non-
governmental and intergovernmental organisations, with
tools to better inform policy or action decisions. Develop-
ing a wide range of campaigning and advocacy tools that
can benet from emergent and innovative technologies
and research approaches to synthesize and present com-
pelling and data-rich information is vital to ensure rights
advocacy is underpinned by all available expertise that can
be accessed easily and clearly. As seen in previous met-
rics, including those that are incorporated into the data-
set for the Index Index, empirical data generated by this
pilot project can be highly eective when communicated
with policymakers to encourage more armative action
when it relates to free expression, including more robust
protection for journalists [37, 38], the formulation of rights
policies for educational institutions and ensuring all sur-
veillance policies deployed for policing or national security
purposes are rights-respecting. These are a few examples
of how the Index Index can be used but should not be
assumed to limit how it can be used by a wide range of
stakeholders.
While the Index Index abstracts from the particular
experiences of writers, journalists and academics facing
daily repression across the globe, the overall ranking hints
at what is at stake. It constitutes a call, directing the atten-
tion of those with a voice to denounce it, to where free
expression is at greatest risk and providing insights into
the granular policy areas needing attention. The global
nature of the proposed index also means it can become a
vital resource and tool for engagement with international
and supranational bodies such as the United Nations, as
well as other regional mechanisms such as the European
Union, Council of Europe, African Union and the Inter-
American Commission on Human Rights, whose work
requires country-by-country, regional and global data
sources.
As the Index Index is an index of existing respected and
trusted indices and metrics it depends on robust and accu-
rate data produced by the wider community of experts.
The process of compiling and producing the Index has
demonstrated its own use-case as it has identied the
need for increased monitoring, verication and sharing of
granular country-by-country level data on a wide range of
markers against free expression more broadly, as well as
academic, artistic, digital and media/press freedom. While
also strengthening further iterations of this pilot project,
this will also strengthen the global movement to protect
free expression.
In this, too, the study provides the basis for developing
insights into the political economy of censorship and free-
dom which shine a light - not always attering - on human
conduct towards others in our midst. Objective data and
analysis provided by the Index Index encourage us to ask,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol.:(0123456789)
SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x Research
simply, what will it take for us to live less censored lives
and what must we do to achieve greater respect towards
human dignity.
5 Conclusions
This project collected and collated pre-existing, robust
data on the status of the free expression landscape on
a global scale. We modelled the data using the GTM, an
unsupervised, probabilistic machine learning method, to
explore whether the model produced new insights into
state conduct, human rights, and governance. The use of
such a model removes an element of subjective interpreta-
tion from the modelling process and provides the resulting
Index Index with a greater degree of rigour than previous
rankings.
On close examination, the reader can be expected to
nd unexpected outcomes that call into question, cor-
relation or causality. The Index Index provides a power-
ful policy tool for all those seeking a clear picture of the
health of the free expression environment, as well as what
needs to happen to change the rankings.
Author contributions Conceptualization, I.O. and S.O.M.; methodol-
ogy, I.O., S.O.M. and R.A.A.B; software, R.A.A.B, I.O. and S.O.M.; vali-
dation, N.W., D.D., and S.H.; formal analysis, I.O., S.O.M. and R.A.A.B;
resources, N.W., D.D; data curation, R.A.A.B and D.D.; writing—origi-
nal draft preparation, all authors; writing—review and editing, all
authors; visualization, R.A.A.B, I.O. and S.O.M.; supervision, I.O. and
S.O.M.; project administration, I.O.; funding acquisition, I.O. and N.W.
All authors have read and agreed to the published version of the
manuscript.
Funding R.A.A.B. received funding from Index on Censorship to work
on the analysis.
Data availability The data will be made available with the paper.
Declarations
Conflict of interest The authors declare no conict of interest.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate
if changes were made. The images or other third party material in
this article are included in the article’s Creative Commons licence,
unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visithttp:// creat iveco
mmons. org/ licen ses/ by/4. 0/.
References
1. United Nations. Universal Declaration of Human Rights (1948)
https:// www. un. org/ en/ about- us/ unive rsal- decla ration- of-
human- rights. Accessed December 2, 2022
2. European Court of Human Rights (1950) European Convention
on Human Rights.
3. Organization of American States (1969)
4. African States members of the Organisation of African Unity
(1979) African Commission on Human and Peoples’ Rights
Legalinstruments.
5. Bollinger LC, Callamard A (2021) Regardless of frontiers: global
freedom of expression in a troubled world.
6. Stevenson C (2007) Breaching the great rewall: China’s internet
censorship and the Quest for Freedom of Expression in a Con-
nected World. Boston Coll Int Comp Law Rev 30:531–558
7. Pring GW, Canan P, SLAPPs (1996) Getting sued for speaking out.
Temple University Press
8. Greer D (1935) The incidence of the terror during the French
Revolution: a statistical interpretation. Harv Hist Monogr. ;196
9. Landman T, Schwarz K (2022) Human rights indicators and
implementation. In: Murray R, Rachel H, Long D (eds) Human
rights indicators and implementation. Edward Elgar Publishing,
pp 309–326
10. Restrepo JA, Spagat M, Vargas JF (2016) J Peace Res 43:99–115.
https:// doi. org/ 10. 1177/ 00223 43306 059924. Special Data Fea-
ture; The Severity of the Colombian Conict: Cross-Country
Datasets Versus New Micro-Data
11. Ball P, Asher J, Sulmont D, Manrique D, For DM (2003) How many
peruvians have died? An estimate of the total number of victims
killed or disappeared in the armed internal conict between
1980 and 2000. American Association for the Advancement of
Science
12. Todd L (2010) Carvalho Edzia. Measuring human rights.
Routledge
13. Bishop C, Svensén M, Williams C (1998) GTM: the generative
topographic mapping. Neural Comput 10:215–234
14. Olier I, Vellido A (2008) Advances in clustering and visualization
of time series using GTM through time. Neural Netw 21:904–
913. https:// doi. org/ 10. 1016/J. NEUNET. 2008. 05. 013
15. Simurgh Aryan J (2013) Alex Halderman. Internet Censorship
in Iran: A First Look. 3rd USENIX Workshop on Free and Open
Communications on the Internet.
16. Dainotti A, Squarcella C, Aben E, Clay KC, Chiesa M, Russo M
etal (2014) Analysis of country-wide internet outages caused by
censorship. IEEE/ACM Trans Netw ( TON) 22:1964–1977. https://
doi. org/ 10. 1109/ TNET. 2013. 22912 44
17. Niaki AA, Cho S, Weinberg Z, Hoang NP, Razaghpanah A, Chris-
tin N etal (2020) ICLab: a global, longitudinal internet censor-
ship measurement platform. Proc IEEE Symp Secur Priv 2020–
May:135–151. https:// doi. org/ 10. 1109/ SP400 00. 2020. 00014
18. Coppedge M, Gerring J, Knutsen CH, Lindberg SI, Teorell J, Alt-
man D etal (2022) V-Dem codebook v12 varieties of Democracy
(V-Dem) project. https:// www.v- dem. net/ data/ the-v- dem- datas
et/
19. Lazarou E, Stanicek B (2023) Mapping threats to peace and
democracy worldwide. https:// www. europ arl. europa. eu/ RegDa
ta/ etudes/ STUD/ 2023/ 751422/ EPRS_ STU(2023) 751422_ EN. pdf
20. Radsch CC, Paterson K (2021) Submission from the Committee to
Protect Journalists to the Special Rapporteur on the promotion
and protection of the right to freedom of opinion and expres-
sion. https:// w ww. ohchr. org/ sites/ defau lt/ les/ Docum ents/
Issues/ Expre ssion/ disin forma tion/2- Civil- socie ty- organ isati ons/
Commi ttee- to- Prote ct- Journ alists. pdf
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Vol:.(1234567890)
Research SN Applied Sciences (2023) 5:354 | https://doi.org/10.1007/s42452-023-05554-x
21. Reporters Without Borders. World Press Freedom Index (2022)
https:// rsf. org/ en/ index? year= 2022
22. Committee to Protect Journalists. CPJ’s database of attacks on
the press (2022) https:// cpj. org/ data/
23. UNESCO (2022) UNESCO observatory of killed journalists.
https:// en. unesco. org/ themes/ safety- journ alists/ obser vatory
24. NetBlocks COST (2022) : The NetBlocks Cost of Shutdown Tool.
https:// netbl ocks. org/ proje cts/ cost
25. International Telecommunication Union (ITU). Global Cyberse-
curity Index (2022) https:// www. itu. int/ en/ ITU-D/ Cyber secur ity/
Pages/ global- cyber secur ity- index. aspx
26. Kohonen T (2001) Self-Organizing maps. vol.30. Third exte.
Springer, New York
27. van der Maaten L (2008) Visualizing data using t-SNE. J Mach
Learn Res. https:// doi. org/ 10. 1007/ s10479- 011- 0841-3
28. McInnes L, Healy J, Saul N, Großberger L (2018) UMAP: Uniform
Manifold approximation and projection. J Open Source Softw
3:861. https:// doi. org/ 10. 21105/ JOSS. 00861
29. Horvath D, Marcou G, Varnek A (2019) Generative topographic
mapping in drug design. Drug Discov Today Technol. https://
doi. org/ 10. 1016/j. ddtec. 2020. 06. 003
30. Orlov AA, Khvatov EV, Koruchekov AA, Nikitina AA, Zolotareva
AD, Eletskaya AA etal (2019) Getting to know the Neighbours
with GTM: the Case of Antiviral compounds. Mol Inf. https:// doi.
org/ 10. 1002/ minf. 20180 0166
31. Bathen TF, Engan T, Krane J, Axelson D (2000) Analysis and clas-
sication of proton NMR spectra of lipoprotein fractions from
healthy volunteers and patients with cancer or CHD. Anticancer
Res. ;20
32. Gaspar HA, Hübel C, Breen G (2019) Biological pathways and
Drug Gene-Sets: analysis and visualization. Eur Neuropsychop-
harmacol 29. https:// doi. org/ 10. 1016/j. euron euro. 2017. 08. 095
33. Olier I, Amengual J, Vellido A (2011) A variational bayesian
approach for the robust analysis of the cortical silent period
from EMG recordings of brain Stroke patients. Neurocomput-
ing 74. https:// doi. org/ 10. 1016/j. neucom. 2010. 12. 006
34. Polyakova A, Mukharamova S, Yermolaev O, Shaykhutdinova
G (2023) Automated Recognition of Tree species Composition
of Forest communities using Sentinel-2 Satellite Data. Remote
Sens (Basel). https:// doi. org/ 10. 3390/ rs150 20329
35. Vellido A, Martí E, Comas J, Rodríguez-Roda I, Sabater F (2007)
Exploring the ecological status of human altered streams
through Generative Topographic Mapping. Environ Model Softw
22. https:// doi. org/ 10. 1016/j. envso ft. 2006. 06. 005
36. Feng J, Liu Z, Feng L (2021) Identifying opportunities for sustain-
able business models in manufacturing: application of patent
analysis and generative topographic mapping. Sustain Prod
Consum 27. https:// doi. org/ 10. 1016/j. spc. 2021. 01. 021
37. Irum SA, Laila AS (2015) Media censorship: Freedom versus
responsibility. J Law Con Resolution 7:21–24. https:// doi. org/
10. 5897/ JLCR2 015. 0207
38. Clark M, Grech A (2017) Council of Europe. Journalists under
pressure: unwarranted interference, fear and self-censorship in
Europe.
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional aliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Article
Background Atrial fibrillation (AF) is the most common heart arrhythmia worldwide and is linked to a higher risk of mortality and morbidity. To predict AF and AF-related complications, clinical risk scores are commonly employed, but their predictive accuracy is generally limited, given the inherent complexity and heterogeneity of patients with AF. By classifying different presentations of AF into coherent and manageable clinical phenotypes, the development of tailored prevention and treatment strategies can be facilitated. In this study, we propose an artificial intelligence (AI)-based methodology to derive meaningful clinical phenotypes of AF in the general and critical care populations. Methods Our approach employs generative topographic mapping, a probabilistic machine learning method, to identify micro-clusters of patients with similar characteristics. It then identifies macro-cluster regions (clinical phenotypes) in the latent space using Ward’s minimum variance method. We applied it to two large cohort databases (UK-Biobank and MIMIC-IV) representing general and critical care populations. Findings The proposed methodology showed its ability to derive meaningful clinical phenotypes of AF. Because of its probabilistic foundations, it can enhance the robustness of patient stratification. It also produced interpretable visualisation of complex high-dimensional data, enhancing understanding of the derived phenotypes and their key characteristics. Using our methodology, we identified and characterised clinical phenotypes of AF across diverse patient populations. Interpretation Our methodology is robust to noise, can uncover hidden patterns and subgroups, and can elucidate more specific patient profiles, contributing to more robust patient stratification, which could facilitate the tailoring of prevention and treatment programs specific to each phenotype. It can also be applied to other datasets to derive clinically meaningful phenotypes of other conditions. Funding This study was funded by the DECIPHER project (LJMU QR-PSF) and the EU project TARGET (10113624).
Article
Full-text available
Information about the species composition of a forest is necessary for assessing biodiversity in a particular region and making economic decisions on the management of forest resources. Recognition of the species composition, according to the Earth’s remote sensing data, greatly simplifies the work and reduces time and labor costs in comparison with a traditional inventory of the forest, conducted through ground-based observations. This study analyzes the possibilities of tree species discrimination in coniferous–deciduous forests according to Sentinel-2 data using two automated recognition methods: random forest (RF) and generative topographic mapping (GTM). As remote sensing data, Sentinel-2 images of the Raifa section of Volga-Kama State Reserve in the Tatarstan Republic, Russia used: six images for the vegetation period of 2020. The analysis was carried out for the main forest-forming species. The training sample was created based on the cadastral data of the forest fund. The recognition quality was assessed using the F1-score, precision, recall, and accuracy metrics. The RF method showed a higher recognition accuracy. The accuracy of correct recognition by the RF method on the training sample reaches 0.987, F1-score = 0.976, on the control sample, accuracy = 0.764, F1-score = 0.709.
Article
Full-text available
Early identification of business opportunities is critical for technology-based manufacturers seeking to develop new sustainable business models (SBMs) for future competitive advantages. However, there exists an insufficiency of identifying business opportunities compared to previous studies which have focused mainly on technology opportunities and service opportunities. To fill this research gap, this study proposes a new systematic approach to identify business opportunities for new SBMs based on information relating to the manufacturers' technologies and patents. To illustrate, an example in the mining machinery industry was examined as a case study. The results demonstrated that 255 patent documents relating to the product were collected. Next, latent Dirichlet allocation was used to generate 26 business topics, which were categorized into the 9 building blocks of the business model canvas (BMC). Then, generative topographic mapping (GTM) was applied to identify 13 vacuums and related technology-driven business opportunities on the basis of BMC-based patent-business vectors. Finally, dynamic business modelling was conducted, which integrated sustainable BMCs and system dynamics in order to evaluate and rank these business opportunities. The proposed approach can promote consensus building between the technology and business planning departments on developing technology-driven SBMs in both public and private sectors.
Conference Paper
Full-text available
Researchers have studied Internet censorship for nearly as long as attempts to censor contents have taken place. Most studies have however been limited to a short period of time and/or a few countries; the few exceptions have traded off detail for breadth of coverage. Collecting enough data for a comprehensive, global, longitudinal perspective remains challenging. In this work, we present ICLab, an Internet measurement platform specialized for censorship research. It achieves a new balance between breadth of coverage and detail of measurements, by using commercial VPNs as vantage points distributed around the world. ICLab has been operated continuously since late 2016. It can currently detect DNS manipulation and TCP packet injection, and overt "block pages" however they are delivered. ICLab records and archives raw observations in detail, making retrospective analysis with new techniques possible. At every stage of processing, ICLab seeks to minimize false positives and manual validation. Within 53,906,532 measurements of individual web pages, collected by ICLab in 2017 and 2018, we observe blocking of 3,602 unique URLs in 60 countries. Using this data, we compare how different blocking techniques are deployed in different regions and/or against different types of content. Our longitudinal monitoring pinpoints changes in censorship in India and Turkey concurrent with political shifts, and our clustering techniques discover 48 previously unknown block pages. ICLab's broad and detailed measurements also expose other forms of network interference, such as surveillance and malware injection.
Article
This is a review article of Generative Topographic Mapping (GTM) – a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces – and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Article
Background Here, we use summary statistics from Genome-Wide Association Studies (GWAS) to find significant biological pathways, disease pathways, drug targets and drug gene-sets in schizophrenia, BMI, Body Fat % (BF%), and Fat Free Mass (FFM). We introduce a workflow encompassing data collection and curation, pathway analysis of drug gene-sets and biological pathways, drug class enrichment analysis, and visualization of "pathway landscapes". Methods We used the schizophrenia GWAS from the PGC Schizophrenia working group phase 2 (35,476 cases), and new GWASs of BMI, BF% and FFM based on UK Biobank data (N=83,477). We curated drug/gene interactions from various databases: DGIdb, DSigDB, Ki DB, PHAROS, and ChEMBL. Biological pathways were collected from MSigDB (GO and canonical pathways) and disease/phenotype pathways from the Open Targets platform. The software MAGMA was used to produce gene-wise and pathway-wise associations, and MetaXcan for transcriptome prediction based on GTEx data (Genotype-Tissue Expression project). The enrichment of drug classes was estimated by grouping drugs by ATC class (Anatomical Therapeutic Chemical) and estimating the enrichment using Wilcoxon-Mann-Whitney test and the AUC (area under the enrichment curve). Pathway landscapes were generated using the kernel Generative Topographic Mapping (GTM), a probabilistic dimensionality reduction algorithm. On these maps, gene-sets are represented by points and the background color is derived from GWAS pathway analysis p-values. Results Results show new Bonferroni-significant pathways in schizophrenia. Antipsychotics and antiepileptics are enriched in the latest and largest schizophrenia GWAS from the PGC Schizophrenia working group. Different disease/phenotype pathways are found in BMI and BF% - e.g., “binge eating” and “age at menarche” for BMI. 47 disease/phenotype pathways and 24 biological pathways are significantly associated with FFM: anthropometric measures, height, myopathies, midgut development, bone development, etc. Pathway landscapes reveal that many significant pathways overlap in “pathway clusters”, such as the skeletal system cluster or the myopathy cluster on FFM maps. Discussion A comprehensive approach is necessary to investigate drug repurposing opportunities and biological pathways using GWAS results. New visualization approaches may help to highlight the overlap between pathways and the genes driving the association. Two key issues are data availability (drug/target affinities, gene annotations) and sample sizes for GWAS studies. Growing public databases and increasing sample sizes will help us to improve our understanding of the genetic etiology of complex diseases.
Article
Recent outbreaks of dangerous viral infections, such as Ebola virus disease, Zika fever, etc., are forcing the search for new antiviral compounds. Preferably, such compounds should possess broad‐spectrum antiviral activity, as the development of drugs for the treatment of dozens of viral infections lacking specific treatment would require significant resources. Antiviral activity data present in public resources are very sparse and further investigation of structure‐activity relationships is necessary. One of the strategies could be the investigation of chemical space around known active compounds and assessment of activity against closely related viruses in order to fill in the antiviral activity matrix. Here we present an investigation of antiviral activity using universal maps built with generative topographic mapping (GTM) algorithm. The GTM‐based maps were used to find commercially available compounds in close proximity to already known compounds with anti‐flaviviral and anti‐enteroviral activities. Selected compounds were then assessed in cell‐based assays against tick‐borne encephalitis virus (TBEV) and a panel of enteroviruses. This approach allowed us to identify 23 new compounds showing anti‐TBEV activity with EC50 values in micromolar and submicromolar range.