PreprintPDF Available

Weak signal detection and identification in large data sets: a review of methods and applications

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The detection of anomalies, or event forecast, relies on understanding past and present events. Evaluating the dynamics of the received signals and determining which ones may have a relevant future is a crucial task. It is an important task for different levels of the data treatment, from a simple log analysis tool to the definition of the whole company's strategic planning for the next few years. In recent years, there has been an increasing demand for anticipating emerging issues or so-called weak signals. However, as these signals are small precursors, usually is lost in the middle of a large amount of noisy information, determine what is essential and what is not, remains a complex problem. Different researchers have worked applying the basic concepts of weak signal detection in a broad set of fields, using distinct methods. In this survey, we provide a comprehensive and ordered state-of-the-art review of the weak signal, early warnings, detection field. We focus on methods and techniques that could be applied to a broad range of domains. We also propose a taxonomy of the methods and discuss and compare the characteristics of different approaches. This survey presents a theoretical background of the field and some of the most interesting applications in the domain of weak signal detection.
Content may be subject to copyright.
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in
large data sets: a review of methods and
applications
Pauline Rousseau Author1,2*, Dimitris Kotzinos Author2,3
and Daniel Camara Author1,2
1*Department, CY Cergy Paris University, ENSEA, CNRS,
Street, City, 100190, State, Country.
2Department, Pole Judiciaire Gendarmerie Nationale, Street,
City, 10587, State, Country.
3Department, Organization, Street, City, 610101, State, Country.
*Corresponding author(s). E-mail(s): pauline.rousseau@ensea.fr;
Contributing authors: iiauthor@gmail.com;iiiauthor@gmail.com;
These authors contributed equally to this work.
Abstract
The detection of anomalies, or event forecast, relies on understanding
past and present events. Evaluating the dynamics of the received sig-
nals and determining which ones may have a relevant future is a crucial
task. It is an important task for different levels of data treatment, from a
simple log analysis tool to the definition of the company’s strategic plan-
ning for the next few years. In recent years, there has been an increasing
demand for anticipating emerging issues or so-called weak signals. How-
ever, as these signals are small precursors, usually is lost in the middle
of a large amount of noisy information, determining what is essential
and what is not remains a complex problem. Different researchers have
applied weak signal detection in a broad set of fields, using distinct
methods. This survey provides a comprehensive and ordered state-of-
the-art review of the weak signals, early warnings and detection field.
We focus on methods and techniques that could be applied to a broad
range of domains. We also propose a taxonomy of the methods and
discuss and compare the characteristics of different approaches. This
survey presents a theoretical background of the field and some of the
most interesting applications in the domain of weak signal detection.
1
Springer Nature 2021 L
A
T
E
X template
2Weak signal detection and identification in large data sets: a review of methods and applications
Keywords: weak signal, early warning, event detection, anomaly detection
1 Introduction
Data mining is the process of finding anomalies, patterns and correlations
within colossal data sets. This process supports decision-making by analyzing
and investigating historical data in order to predict events and outcomes.
We witness a tremendous increase in the volume and diversity of available
information, so much that we are sometimes unable to see the small significant
clues, that can give us a warning of important events to come. Finding these
clues is even harder if the events we are interested in are not known beforehand.
In this study, an event can be considered as any fact, information, or even a
series of data that can be individually identified, isolated and have a meaning
and a perceived impact. We consider a ”weak signal” any piece of information
that can be identified as a precursor warning of something that subsequently
will turn into an event. Identifying such signals, especially assuming no prior
knowledge, is a challenging and interesting task. The concept of early warning
signals is used in organizations and companies to adapt to emerging changes.
Based on the recognition of changes, this continually evolving process requires
a global vision of current circumstances and identifying the factors leading to
profound changes.
The field of weak signal identification has applications in various and
diverse areas, like for example, weak signals related to identifying terrorism or
mass transport attacks [1]; strategic decision [2]; predicting tipping points from
the London riots in 2011 [3]; predictive early biomarkers [4]. In this review, we
collected the different methods to detect and isolate weak signals over diverse
data sets, regardless of the application area, focusing on reproducible methods
and not extremely specific to a use case.
Note that in this paper, we decide to focus on the work of weak signal
detection in Big Data analysis and not in sense of signal processing such as
stochastic resonance.
1.1 Background
[5] describes a weak signal as a ”sudden, urgent, unfamiliar changes in the
firm’s perspective which threaten either a major profit reversal or loss of a
major opportunity” and considers that weak signals detection allows for better
decision-making and foresight planning. Ansoff, the weak signal concept’s pre-
cursor, considers a weak signal as the ”warning (external or internal), events
and developments that are still too incomplete to allow for an accurate esti-
mate of their impact and/or to determine a full adapted response”. However,
even if with strong roots and intuitively sound, the weak signal concept is not
accurately defined in the literature. A weak signal is both an event announcer
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 3
and an informal alert, even if it does not point to exact information. This pro-
vides some information about the future but its impact cannot be quantified
accurately , as its estimation relies upon the actions that follow the signal [6].
The interpretation of such a signal is challenging as it is hidden under the
rest of the data. Scanning for and using them in scenario work is considered a
successful way to look towards the future.
However, the definition of the weak signal poses a problem, as various
authors define the concept differently. Different authors use different terms
to reference weak signals, [7] employs ”Future signs”, [8] ”emerging issues”,
[9] ”seeds of changes”, ”fact emerging” [10], and [11] ”wild cards”, [12] and
[13] ”early warning signals”. Even if some of these concepts vary slightly, we
consider all weak signals.
1.2 Challenges
What makes it difficult to detect weak signals is that they are generally hidden
in the ”noise of the daily produced data” [14]. This means that one of the main
challenges that research in this area faces is to be able to distinguish something
that can become an event from background noise that will eventually disappear
in a large, continuously published data set. One of the traditional ways to
deal with weak signal identification is to rely on human experts, who would
analyze data sets and, based on experience, would make decisions on whether
something qualifies as a weak signal or not. These services are often costly
and not widely available. Even then, the increase in the volume of information
may represent a challenge to the experts. On the other hand, early warning
signals cannot be identified by using techniques on anomaly detection because
they are usually not strong enough to be ”accepted” as an anomaly or, even
worse, other times they are part of the mainstream information that exists in
the data set. So in that respect, techniques of anomaly detection cannot be
directly applied; but some can be adapted and explored as research evolves in
the area.
1.3 Contributions
There have been numerous attempts to detect weak signals even if this chal-
lenge remains recent. That is our motivation for summarizing these approaches.
The contributions of this work are:
1. We raise a taxonomy of community detection methods based on various
techniques used.
2. We present a qualitative panel of several definitions and techniques under
different categories.
3. We compare the different detection techniques for weak signal detection.
Springer Nature 2021 L
A
T
E
X template
4Weak signal detection and identification in large data sets: a review of methods and applications
1.4 Outline of the paper
This paper is divided into four main sections. Section 2 introduces the weak sig-
nal detection field, while Section 3 describes the theoretical framework behind
weak signal representation of a future sign. Section 4 presents how different
methods work over a standard data pipeline. Section 5 presents a taxonomy
of the works in the field and discusses the state-of-the-art methods for weak
signal detection and identification. Some conclusions are drawn in the final
section in Section 6.
2 Early warnings landscape
2.1 The origins
[15] first introduces the concept of weak signal in the ’70s to identify early
strategic signals . He perceived that that, in general, companies does not react
fast enough to early pieces of evidence of threats and opportunities over com-
panies’ strategic planning. Strategic planning is the process of ”converting
environmental information about strategic discontinuities into concrete action
plans, programs and budgets”.
According to Ansoff, a weak signal is considered as ”graduated response
through amplification”. The proposed method contrasts with traditional
strategic planning that relies on strong signals. For Ansoff, a weak signal has
a high uncertainty, given that part of their point over inconsistent directions.
In general weak signals can be detected with three filters [5]:
1. a surveillance filter to obtain the information
2. a mentality filter to indicate the relevance of the signal based on existing
experience
3. a power filter to apply the learned knowledge in the decision-making process
One of the challenges of this review is to expose what is a weak signal. The
intuition behind most definitions is that they represent unknown, unexpected
or rare data over a specific period. However, the rarity makes it hard for this
data to be identified as relevant, especially from a large volume of information
that is continuously being received [16]. Weak signals may be early warnings of
future changes and may forecast new possibilities. In fact, predictive research
using weak signals is indeed an active field (17;18;19;20;21;22;23; and 24).
We can advance briefly that weak signal detection implies the signal’s nature
(opportunity or threat) of the signal, its potential impact, probability, and
delay before its occurrence (urgency). This can be classified as neither good
nor bad and may threaten the current status quo or provide new opportunities
for strategic activities. They are warnings that a new idea will change the
present established environment, a novelty from the point of view of experts,
and often opposite to their present opinion.
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 5
2.2 Definitions
2.2.1 Context monitoring and foresight
Context monitoring means scanning the environment and gathering informa-
tion about events and their relationships (25 and 26). For [27], this also means
providing foresight. Foresight is the capacity and ability to plan actions based
on the acquired knowledge. For [28] the concept of weak signal detection only
makes sense when it is inside a defined context. The context provides comple-
mentary information but narrows the attention focus and allows for a reduction
of noise.
2.2.2 Data collection and organization
Several studies have been proposed to collect information, some focusing on
an automated process by using a crawler (29;30;31;32 and 33), others on
knowledge of humans [34]. After collection, a clustering approach is commonly
used to gather the number of identified signals and then identify weak signal
clusters. Clustering may be semantic [35] or based on the knowledge structures
of the data sources [32].
Adding a context to the collected data helps to improve its intensity, degree of
precision, number, and liability [36]. [37] proposes a future signals sense-making
framework (FSSF) based on environmental scanning and pattern management
principles for collecting and organizing information.
2.2.3 Weak and strong signals
Ansoff categorizes signals into two groups: weak signals and strong signals. For
[18], a weak signal is an alert warning ”less visible and having a larger relative
distance to the impact on future changes”. Previous studies (38;39) have also
indicated that weak signals incompleteness does not accurately estimate of
their impact. On the contrary, a strong signal remains easier to identify as
its impact is clear, and stays consistent over time. A strong signal has the
potential to become a trend, but a weak signal is a clue that may unveil how
trends will evolve. The strong signal allows for a more accurate assessment of
the situation, like tangible proof of a change [18].
It is essential to notice the fact that a weak signal may become a strong one.
Indeed, the dynamics over different signals and their future impacts are non-
negligible factors [20]. [16] characterizes a weak signal as ”a factor of change
hardly perceptible at present, but which will constitute a strong trend in the
future”.
The dynamics between weak signals that may become strong ones create a
paradox [40]. While weak signal information increases over time, a weak signal
can be regarded as a strong signal when the information permits accurate esti-
mation of the impact. However, not all weak signals manage to become strong
ones, and even if it happens, it may be too late for strategic decision-making
[41]. The strategic paradox lies between waiting to have enough information
Springer Nature 2021 L
A
T
E
X template
6Weak signal detection and identification in large data sets: a review of methods and applications
to perform better decisions or accepting the information’s incompleteness and
acting, accommodating the uncertainty that comes with it. According to [15],
strategic issue management may help to solve this paradox. By continually
monitoring the weak signals’ evolution, decision-markers can plan a graduated
and evolutive response on strategic issue management.
Here, we stated the weak and strong signals. It should be possible to corre-
late the management view of weak signals with the ”weak signals” treatment
in the signal processing field. In this sense, the signal-to-noise ratio of a sig-
nal could be more suitable to describe the strong and weak signals. A high
signal-to-noise ratio indicates a strong signal while the opposite could point to
a weak one.
2.3 Key features for understanding weak signals
2.3.1 Characteristics
[19], delves into the definition proposed by Ansoff with seven different
characteristics :
1. an idea or trend that will affect how we do business, what business we do,
and the environment in which we will work
2. new and surprising from the signal receiver’s advantage point (although
others may already perceive it)
3. difficulty in tracking down amid other noise and signals
4. various ways of interpretations of the same signal by the observer
5. improvement of its ability to thrive
6. often scoffed at by people who ”know”
7. lag time before it will mature and become mainstream
8. representation of an opportunity to learn, grow and evolve
[21] features six aspects to describe a weak signal:
1. as phenomena of transition
2. the duration of a weak signal
3. its objectivity and subjectivity
4. various ways of interpretations of the same signal by the observer
5. strengthening of the signal
6. issues related to the receiver and analyst of the signal
These six characteristics help classify and understand the weak/strong
signal dynamics.
2.3.2 Weak signals dynamics
[39] defines a deeper theoretical understanding of weak signals and proposes a
3D spatial model (i.e. signal, issue and interpretation) to define them, Fig. 1.
The model is based on Pierce’s model [7].
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 7
A signal is the number of signals or their visibility. i.e., the number of times
where the signal appears the signal means the visibility or the number of
the future sign
The issue is the number of events in which the signal appears. It indicates
how far the future sign spreads.
Interpretation represents the degree of recognition of future signals by
information users, i.e., how many of those the receiver understand.
The magnitude of the future sign can be measured by a combination of
these three dimensions [20]. A weak sign is a future sign which has a low
level of signal, issue, and interpretation, meaning that a presentation of the
future issue is about to diffuse to some extent but yet to be much interpreted
by receivers. A weak sign becomes a strong sign when interpretation of the
meaning of the sign becomes clear.
The combination of three dimensions allows to measure the the degree of a
future sign. A weak sign is a future sign with has a low-level signal, issue, and
interpretation left, lower, front part on Fig. 1. According to Hiltunen, these
ones have rarely exposed topics. For this reason, they have a small representa-
tion, even if they can become strong. This definition is popular and extensively
used by other researchers (34;42;43;29;44;45;46 and 47). This semiotic
model has become the standard to identify the dimensions of future signals.
Fig. 1 Spatial model proposed by [39]
2.3.3 Weak signals correlation
A large number of existing studies have examined the notion of weak signals
combination to reinforce their value and, in this way, identify specific events.
[18] observed that despite the incompleteness and imprecision of the informa-
tion, the combination of the weak signals could lead to the identification of an
event. [20] identify weak signals as ”current oddities (...) that are thought to be
in a key position in anticipating future changes” can be combined with other
signals. The combination of multiple weak signals provides more information
for the identification of future events. A weak signal may gain expressiveness
when combined with other signals. This signal becomes complete, which allows
better planning by the firm ([5] and [35]). [48] characterizes weak signals as
parts of a jigsaw puzzle building a holistic view of future changes.
Springer Nature 2021 L
A
T
E
X template
8Weak signal detection and identification in large data sets: a review of methods and applications
2.3.4 Association with signal processing field
This paper elaborates weak signal detection from fundamental theory with
Hiltunen and Ansoff to applications in different fields. It exists weak signal in
Mechanical Systems, Signal processing and Big Data analysis. To the best of
our knowledge, a survey on the subject combining weak signal in signal pro-
cessing and weak signal in big data has not been reported yet. This intriguing
research area attempts to extract weak signals of interest from noisy and com-
plex signals in various fields such as wind energy [49],sonar [50],deep-space
exploration and fault diagnosis of machinery [51]. In signal processing tech-
niques, stochastic resonance (SR) is known to enhance weak unknown signals.
SR can utilize the noise embedded in signals to extract weak fault charac-
teristics from the signals ([51]; [52]; [53]). SR is an advanced signal detection
method for early fault diagnosis of electromechanical equipment to ensure the
reliable operation of electromechanical equipment and prevent malfunction in
machinery fault detection [52].
2.3.5 Signal vs. Noise
One of the other challenges of detecting weak signals is distinguishing between
signal and noise. The limit between noise and a weak signal can be confus-
ing. For [54] the weak signal may be seen as ”noise” where the value is not
understood before the appropriate treatment. This kind of signal is difficult to
monitor and can be confusing. Mendon¸ca points out that ”weak signals can be
conceived as outliers (anomalies) that do not easily fit with the understanding
embedded in those coherent futures constructs of foresight project” [18].
Searching for relevant data for event anticipation presents different chal-
lenges: detection complexity, identification and interpretation. Generally, the
procedure for forecasting or early warning scanning consists of four steps [55]:
1. exploring the weak signals
2. assessing the weak signals
3. transforming the signals into issues
4. interpreting the issues for new futures
In this regard, this paper presents the different methods to detect and identify
weak signals in textual data sets.
3 Early warning signals’ theoretical framework
Analysts may tend to ignore information that does not conform to their own
biases or beliefs and, at the same time, underweight risks resulting over the
information they consider certain [56]. As a result of analyst bias, companies
may miss, or refuse to see, changes in the market as they cannot realize a
future based on different paradigms. This self hurting denial is not a new
phenomenon; for example, at the beginning of the last century, ice traders
already failed to forecast the ice trading market decline [57]. A more recent
example is Blockbuster that had the opportunity to buy Netflix, and refused
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 9
it, as it did not anticipate the changes in the video market [58]. Furthermore,
Kodak invented the first digital camera in 1975 but ultimately lost the digital
cameras’ revolution, leading it to file for bankruptcy in 2012 [59]. Kodak’s case
is intriguing, as it shows that even though analysts may have the information,
they may choose to base decisions over the parts of information that support
their own, possibly adverse to changes, opinions.
One of the early motivations for the study of weak signals is developing
strategic planning for companies. Having tools that help forecast, in an unbi-
ased way, changes in the market is crucial. For this reason, the weak signal
weak is popular in business environments [22]. The examples of applications
go from predicting future energy trends [47], for monitoring the use of health
resources applied, for treating chronic diseases [60], passing through forecast
the evolution of nanotechnology [43]. However, it is also used to follow and
forecast the evolution of other fields such as school bullying [46] and tourists
volume and flow [61], for example.
3.1 Weak signals dimensions
Different dimensions may be used to measure the importance of a signal. For
example, we can measure a signal regarding its frequency, temporality, impact,
rarity and unrelatedness paradigm. In this section, we will see some examples
of how different researchers have quantified signal dimensions.
3.1.1 Visibility
In general, weak signals present low visibility [8]; they are the kind of signal
that may discreetly pass without being remarked even by specialists. If we con-
sider the research done over the analysis of texts, visibility could be measured
by the terms’ frequency. Term frequency is the rate of occurrence of a word in
a series of texts and can be measured by counting the number of occurrences
of a term in a document [62]. [63] use betweenness centrality for convergence
measurement and minimum spanning tree (MST) as a way to put in evidence
the essential relations.
Based on Hiltunen’s concept, [29] proposes new metrics to quantify the
signal axis. The degree of visibility (DoV ) and the degree of diffusion (DoD).
To evaluate the degree of visibility of a signal, Yoon proposes measuring the
frequencies of keywords. DoV measures the degree of a defined keyword’s
frequency in a set of documents (section 3.1.1). The author proposes two mea-
sures for evaluating emergency and the issue impact of a signal. These two
indexed maps are called keyword emergence map (KEM) and keyword issue
map (KIM). Yoon focus on two main principles:
Proposition 1 The number of occurrences of a keyword is linked to its influence
Proposition 2 Recent occurrences of a keyword present greater interest than older
ones
Springer Nature 2021 L
A
T
E
X template
10 Weak signal detection and identification in large data sets: a review of methods and applications
For [29], the reasoning behind the weak signal detection lies in searching for
keywords with a low occurrence frequency and a high, increasing rate. These
two factors are markers that can be used to detect weak signals. The first
part is related to the visibility of the signal, and the second with its diffusion.
Keywords exhibiting a high occurrence frequency and high, increasing rate are
markers of strong signals. While both, weak and strong signals, expose growing
visibility, the total number of occurrences for weak signals is still lower when
compared to strong ones.
3.1.2 Diffusion
Diffusion is how Yoon classifies Hiltunen’s issue axis, and it represents the
spreading of the signal. Yoon’s metric to detect an emerging issue is the mea-
sure of keyword frequencies, the degree of diffusion (DoD) [29] of a keyword.
DoD is complementary to the DoV metric. DoD focuses on the issue-level of
a future sign, especially with the increased document frequencies of keywords.
Yoon measures each keyword’s frequency regarding the total number of doc-
uments in the portfolio maps method [29]. Note that [8] also highlights that
the signal strength increases with the number of issues.
3.1.3 Interpretation
The theoretical foundations of the weak signal definition proposed by [39]
convey an interpretation dimension. For Hiltunen, this dimension includes
the context aspect of how people forecast possible future events. At first
glance, a weak signal is an incomplete and fragmented piece of information,
so uninterpretable as it is. Strong signals, on the other hand, are more precise
and superior in quality. Contrary to Hiltunen, [45] tried to address this fea-
ture by employing keyword clustering and topic selection based on keyword
co-occurrences.
3.1.4 Rarity & paradigm unrelatedness
The main characteristics of a weak signal, as proposed by Hiltunen, constitute
a weak signal representation. [64] emphasize the fresh nature of signals with
an approach to novelty-focused weak signal detection.
3.2 Challenges
3.2.1 Weak signals detection
Whiteout a doubt, the main challenge for all the methods discussed in this
article is identifying a piece of information as a weak signal. A fundamental
characteristic of a weak signal is its rarity and incompleteness, which makes it
hard to detect as it can be confused with noise. However, the difficulty comes
from the complexity in characterizing small and spread pieces of information
as relevant. The problem also resides in recognizing their forecasting properties
as they announce future events. Mine, the relevant information buried under
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 11
excessive and contradictory noise, consistently and reliably, is the very essence
of the weak signals detection field.
3.2.2 Database consistency
According to [65], [66] and [67], newspapers, the internet and especially blogs
are good sources to find weak signals. Moreover, the correlation of some of
these sources can help put in evidence tendencies that are not clear when
just looking into one single source of information. [64] used two databases: a
referral database and a study database. The choice of the databases is crucial
as they significantly affect the results of the analysis. These databases have
to be coherent and meaningful regarding each other. Moreover, they need to
be complementary and provide additional information. If these characteristics
are not respected, hardly one could highlight any relevant information, even
when correlating different data sources.
3.2.3 Impact
For Ansoff, a weak signal is ”warnings(...), events and developments which are
still too incomplete to permit an accurate estimation of their impact”. Impact
estimation consists of identifying the potential consequences of a signal and
estimating its importance. [34] provide a method for cross-impact reasoning
to infer future events’ likelihood and measure the strength of influence. This
method is based on an inference model using the Bayesian network where a
node represents an event, and a link between two nodes represents the impact
probability of a state transition from one node to the other.
3.2.4 Interpretation
In general, at first glance, it is difficult to interpret a weak signal. However,the
interpretation may become more straightforward an explicit when a weak sig-
nal is combined with context and/or other weak signals. According to Hiltunen,
the ”interpretation” dimension is the assumption or sense of people regard-
ing future events (subjective matter). The authors in [68] emphasized the
limitations of Yoon [29]’s approach (presented in part 5.1.1) based on the dis-
appearance of some signals and the complexity of interpretation. This shows
the difficulty to measure this dimension.
3.2.5 Weak signals duration
[21] raises some questions regarding weak signal duration. For him, studies that
consider this dimension of the problem can be divided into two categories: (1)
the weak signal only lasts for a moment, or (2) the weak signal lasts longer.
The underlying assumption is that the weak signal is either the phenomenon
(Assumption 1) or a sign of a change (Assumption 2).
Springer Nature 2021 L
A
T
E
X template
12 Weak signal detection and identification in large data sets: a review of methods and applications
4 Weak signal detection pipeline
This section contextualizes contextualizes weak signal detection over a stan-
dard data treatment pipeline, a simplified version of CRISP-DM (CRoss
Industry Standard Process for Data Mining) process [69]. The data treat-
ment phases taken into account are data collection, preparation, modeling and
evaluation, as shown in Fig. 2
Fig. 2 Data Process for data treatment pipeline produced
4.1 Data collection
Data acquisition is the process of gathering data and measuring information
on variables of interest. This step is the most basic one; without data, no
signal can be processed or detected. The quality of the data collection phase
directly impacts on the overall result of the proposed system (70 ;71). It
entails the definition of the kind of data is needed, the evaluation of the data
sources reliability, the acquisition of the data and the quality verification of
the received information.
Authors use different methods to perform data collection. Traditional data
sources are newspapers, scientific journals, patents, specialized data sets, social
networks and the Internet in general, in special blogs. Some authors propose
the of web crawler and search queries (29;47;46;44;30;64;72). [34] propose
Global Trends Briefing (GTB) 1. GTB is a network of experts, from different
domains and spread internationally, proposing documents of futuristic interest.
Documents were collected, analyzed and validated before the integration into
the futuristic dataset.
1Korea Institute of Science and Technology Information, http://www.kisti.re.kr/english/.
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 13
4.2 Data preparation
Data preparation is the technique used to transform the acquired data into
a useful dataset [73]. It covers an extensive range of treatments applied to
the raw input to fix errors and create a valuable dataset for the next phase.
Raw data is often imperfect, inconsistent and redundant. Data preparation, or
preprocessing, methods solve all the main problems linked to the data itself
and adapt to the modeling phase’s requirements, improving its effectiveness.
Examples of treatments that can be done are dealing with missing or
incomplete information [74], noise [75], and removal of duplicates and data for-
matting. The operations vary according to the type of the raw acquired data
and the main objectives of the whole pipeline. A regular process in the data
processing phase is removing stopwords and performing general data format-
ting. In the processing phase, several authors use feature selection [76], feature
scaling and engineering to identify the data characteristics such as ”name”,
”age” or ”gender” [77]. The missing values can be treated with different math-
ematical tools like normalization, min-max technique, average, median value,
and other missing value-replacing methods.
4.3 Data Modeling
Modeling consists of choosing the most fitted models and their parameters to
reach the expected result. The number of possibilities is significant, and it is
in this phase that the primary intelligence resides. It is where the signals are
understood as real signals, and their relationship is revealed. There is a close
relation between modeling and data preparation. Often the data understand-
ing process consists of finding problems in the modeling phase, returning to
the preparation to create, or format, new data to be used in the modeling
phase [69].
As an example of techniques used on the model, phase are: processed for
interpretation and pattern recognition (linear discriminate analysis, support
vector machine, neural networks, decision tree, k-nearest neighboring method,
for instance).
4.4 Evaluation
Having the results of the modeling phase is not enough. The found weak signals
need to be evaluated to ensure the deployed models’ quality and efficiency.
This phase tries to verify that the reached results fit the task’s description, if
parts of the answers were not considered and if the process meets the quality
standards required regarding false positives and false negatives.
5 A categorization of weak signal detection
techniques
This section presents some methods proposed for weak signal detection and
tries to organize the strategies into different categories as shown in Fig 3. The
Springer Nature 2021 L
A
T
E
X template
14 Weak signal detection and identification in large data sets: a review of methods and applications
Fig. 3 A taxonomy of different methods of weak signal detection
objective is to highlight the different techniques used on some representative
techniques. For example, several authors use portfolio maps methodology with
two portfolio maps to extract the potential weak signals. Furthermore, [63]
and [31] use methods based on graph theory. [64] propose to observe the rar-
ity and the paradigm unrelatedness of a weak signal. [64] and [42] consider
the interpretation dimension of a future sign. [43] have a more text mining-
oriented approach, based on the word frequencies. The discussed methods
contribute to providing information on weak signal detection. The way the
methods characterize and handle a signal varies.
Table 1and Table 2summarise the main of the evaluated methods. They
expose the essence of the studied methods providing an easy way to compare
the methods and their strategies. The columns of Table 1and Table 2and
consist of:
Aim: The objective of the work, i.e, the problem it is trying to solve
Data used: Kind of data used for the analysis
Applications: Field of application of the work
Scoring metrics: Field of application of the work
Keywords selection: Uses keyword search to find documents
Anomaly detection: Considering weak signal as an outlier
Clustering: Uses Machine learning techniques for group unlabeled exam-
ples
Information granularity: The granularity of the treated data, whole doc-
ument or keywords Temporality dimension: Considers the temporality
dimension during the weak signal detection
Finding relations and patterns: Tries to discover links between elements
and determine a design that is repeated at regular intervals
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 15
Table 1 Comparison of some classic community weak signals detection
Papers
Features Aims Data used ApplicationsScoring Metrics
[30] Prediction of future organised criminal
threats
web sources,social media and LEA
Reports
C TF
[77] Modeling crime problems Crime date from a sheriff’s office C ×
[31] Prediction of future organised criminal
threats
web, social media C ×
[78] Prediction of future lone wolf web sites and forums C ×
[63] Detection industry service trends existing database E co-occurrence matrix
between phrases (K)
[79] Detecting changes for businesses Not given E ×
[33] Investigation for renewable energy storage technologies for renewable energy En ×
[47] Forecasting nuclear energy events news En DoV & DoD
[80] Forecasting Academic papers IT ×
[29] Detecting business opportunities Web news articles IT DoV & DoD
[72] Smart grids problems Academic papers IT DoV & DoD
[45] Foreseeing the advent of new technologies Academic papers IT DoV & DoD
[42] Forecasting changes for remote sensing newspaper, paper, social media IT DoV + DoD
[64] Technology foresight social media content, news IT rarity & paradigm unre-
latedness
[43] Predicting the innovation Academic papers IT annual growth rate and
standard deviation
[46] School bullying issue forecasting social media S DoV & DoD
[44] Land administration sector issue detec-
tion
annual Admn DoV & DoD
[81] Forecasting Academic papers Med intra indicator and inter
indicator
[34] Foresighting future technologies academic papers GF Knowledge Matrix
Springer Nature 2021 L
A
T
E
X template
16 Weak signal detection and identification in large data sets: a review of methods and applications
Table 2 Comparison of some classic community weak signals detection
Papers
Features
K-Selection
Anomaly Detection
Classifier
Clustering
Information granularity
Temporality dimension
Finding relations
Pattern recognition
Interpretation dimension
Prioritisation Ranking
Semantic consideration
Text-mining based approach (NLP)
Topic Modeling
Topic Assignation
Data Fusion
Statistical model
Graph theory
Bayesian Networks
Multi-words analysis
Primary metrics
Maps
[30] Y N N N K Y Y Y N Y Y Y N N N Y Y N N TF N
[77] N N N Y K N Y Y N N N Y N N N N N N N N N
[31] Y N N N K Y Y Y N N Y Y N N N N Y N N N Y
[78] Y N Y N K N Y Y N N Y Y Y N N N N Y Y TF N
[63] N N N N K N Y Y N Y Y Y N N N N Y N N TF N
[79] N N Y N N N N N N Y N N N NA NA NA N N N TF N
[33] Y N N Y K + D Y Y Y Y N Y Y N N N N N N Y TF + DF N
[47] Y N N N K + D Y N Y N Y N Y N C.A N Y N N N TF + DF Y
[80] NA N Y N K N N N N N N N N N N N N N NA N
[29] Y N Y N K + D Y N N N N N Y N C.A N Y N N N TF + DF Y
[72] Y N N N K + D Y N N N Y N Y N C.A N Y N N N TF + DF Y
[45] Y N N N K + D Y N N Y N N Y N TI N Y N N Y TF + DF Y
[42] Y N N N K + D Y N N Y Y N Y N N N Y N N Y TF + DF Y
[64] Y Y N Y K + D N N N N N N Y N N Y Y N N N TF + DF Y
[43] Y Y N N K N N N N Y N Y N N N Y N N N TF N
[46] Y N N N K + D Y N N N Y N N N C.A N Y N N N TF + DF Y
[44] Y N N Y K + D N N N Y Y Y Y Y N N Y N N Y TF + DF Y
[81] N N N Y K N Y Y Y Y Y Y Y N N Y N N Y TF + DF N
[34] N N N Y K Y Y Y Y Y N Y N N N N N Y N TF Y
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 17
Dimension Interpretation: The way the method express the receiver’s
meaning
Ranking data: Way to sort data based on frequency criteria
Semantic Consideration: Considers the meaning of knowledge using
semantic contexts and background knowledge
Text-mining techniques: Uses natural language methods for exploiting
unstructured documents
Topic Modeling: Uses automatic topic modeling to classify documents
Topics Assignation: Assigns a topic name to a set of words
Data fusion: Integrates multiple data sources
Statistical method: Uses models and techniques based on the statistical
analysis of raw research data to extract information
Graph Theory: Makes use of graph theory-based approaches
Bayesian Networks: Uses probabilistic graphical model using Bayesian
inference
Multi words Analysis: Considers group of words instead of single
keywords
Primary metrics : Which is the basic metric used on the work, e.g. term
frequency and document frequency
Scoring metrics : Which is the indicator used to quantify the interest of
signal
Maps : Makes use of portfolio maps technique to classify the keywords
5.1 Statistics-based method
This section presents propositions that use statistical approaches to detect
weak signals. The methods are further group the works on this class into
sub-classes, presented in Fig. 4.
5.1.1 Portfolio maps method
The methods in this group are categorized by constructing portfolio maps and
identifying topics during the weak signals analysis as shown in Fig. 5. Most
initial methods base their weak signal detection method over statistics linked
to the keywords on initial data. One of the first proposals is the portfolio maps’
methodology proposed by [29], followed by other authors such as [68], [72],
[46], [45], [44],[47],[43], and [42]. These works, in general, consider Hiltunen’s
signal classification (section 2.3.2) as a way to measure the importance of the
detected signals.
Initial works
In the original work, [29] suggests a quantitative approach based on keywords
text-mining to identify weak signal topics in the business opportunities on the
solar cells. The work objective is to minimize the intervention of experts on the
detection and identification of weak signal topics. For the data collection, Yoon
relies on news websites and the ProQuest database (http://search.proquest.
Springer Nature 2021 L
A
T
E
X template
18 Weak signal detection and identification in large data sets: a review of methods and applications
Fig. 4 Taxonomy for works using statistical methods produced
Fig. 5 Taxonomy for works using portfolio maps methods
com), including various information sources on different subjects (political,
economic, social and business, technological). A total of 28270 English Web
news articles were located using the keywords ”solar cells” and ”photovoltaic”,
for the period from Jan 1, 2006 to Mar 31, 2011.
For Yoon, weak signals are imprecise indicators, rarely considered by peo-
ple, but that announce future events. The developed method proposes two new
metrics, Degree of Visibility (DoV ), and Degree of Diffusion (DoD), as indi-
cators to analyze the detected signals. DoV , equation (2), corresponds to the
frequency of a defined keyword in a set of documents (signal axis), DoD, in
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 19
Fig. 6 Keyword emergence map - In area A, in area A are the keywords that are connected
with the weak signals as presented in [29]. In the portion B, keywords are connected with
the strong signals.
equation (1), represents the document frequency of each keyword.
DoDij =DFij
NNj× {1tw ×(nj)}(1)
DoVij =T Fij
NNj× {1tw ×(nj)}(2)
where T Fij is the frequency of keyword iin the period j,DFij is the docu-
ment frequency of keyword iin the period j,NNjis the number of considered
articles in the period j,nis the number of periods, and tw is a time-weight
which is linked to the changing speed of the target topic.
DoD and DoV metrics are designed in two keyword portfolio maps (KEM -
Keywords Emergence Map, Fig. 6, and KIM - Keywords Issue Map Fig. 7and
Fig. 8) to gauge the importance of the keywords extracted from the text min-
ing. Each map is composed of four quadrants that group keywords into weak
signal, strong signal, latent signal and well-known signal first mentioned in [68]
followed by [72], [45], [44] and [46]. The vertical dimension refers to the average
increasing rate of DoV (or average growth rate for DoD) and the horizontal
dimension the average term (or document) frequency. The step of identification
of weak signal topics is based on the previous representation of a weak sig-
nal. Weak signal words have a lower current frequency but higher growth rate,
which suggests that they can increase quickly. The keywords related to weak
Springer Nature 2021 L
A
T
E
X template
20 Weak signal detection and identification in large data sets: a review of methods and applications
Fig. 7 Keyword issue map - In area A, in area A are the keywords that are connected with
the weak signals as presented in [29]. In the portion B, keywords are connected with the
strong signals.
Fig. 8 Quadrant maps as presented in [46]
signal topics are situated in area A of KIM in Fig. 7and KEM in Fig. 6. One
of the advantages of this method is to automatically and dynamically catego-
rize the keywords.Based on the maps,experts can then interpret the results (82
and 83). According to Yoon, the proposed method results better than human
experts when dealing with massive textual information.
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 21
The authors in [68], [72], [45] and [44] raise the uncertainty problem and
lack of interpretation that occurs when keywords appear in different quadrants
of KIM and KEM.
Keyword portfolio map is a popular method and used by authors in dif-
ferent ways. For instance, [45], [44] and [42], introduce the use of multi-words
analysis. Their hypothesis is that working on a single keyword can penalize
the understanding of the specific subject.[47] have searched to verify the valid-
ity of portfolio maps’ method analyzing the evaluation of signals from weak to
strong.
Study based on multi-words
[45] explain that the use of a single keyword may lead to a loss of objectivity.
Two factors can explain that, the first is the lack of distinction when the signals
are mixed at each map. Indeed, in the original proposal in [29], the objective
was to allow experts to decide whether the keyword should be discarded or
not. The other limitation concerns the multiple numbers of meanings for a
given keyword. According to [68], [72], [45] and [44], interpretability is lost
when performing the analysis. Moreover, the keywords related to weak signals
are,generally isolated terms. When performing a deeper analysis the lack of
relations and context over the keywords limit the information. One of the
suggestions of these works to improve interpretability is to include semantic
analysis in future signal detection frameworks. To decrease the uncertainty
problem, [45] propose working with a group of words and concepts instead of
the isolated keywords, which should increase the whole system’s accuracy.
The analysis of [45] and [44] present some similarities. For example, the
weak signal detection pipeline collects and prepares data, builds map and
identifies topics. The collected data from EBSCO host database 1and Scopus,
respectively. Using queries correlated to their fields, ethnical issue in AI for Lee
and Park and land administration for [44]. They select relevant keywords by
applying a criteria-based method on the apparition frequency of a word. [45]
employed a handmade list as to the main criteria for keyword selection, while
[44] used a combination of automated and manual selection. [45] assess the
degree of interpretation by employing keyword clustering. They build the maps
for emergence and issue, focusing on a set of commonly located keywords in
the same quadrant in the KIM and KEM. They inspect each cluster’s keyword
and confirm whether the group should be a topic.[44] use the Latent Dirichlet
Allocation method for identifying the topics. The majority of the keywords
are classified as either strong or latent signals. The reason is that the average
values are used as thresholds. Intuitively, the number of topics considered is a
trade-off between the simplicity of interpretation and coherence regarding the
knowledge domain. They pay attention to the top fifteen most common terms
from each topic.
Then, They verify the keyword status in two portfolio maps. After this step,
a name and a group of keywords were associated with the topics. However,
1https://www.ebsco.com/fr-fr
Springer Nature 2021 L
A
T
E
X template
22 Weak signal detection and identification in large data sets: a review of methods and applications
some topics have common words, thus sometimes merged. In the end, nineteen
topics were considered. The topics identification under four signal categories
was based on the following rule: the category was determined by in which
quadrant the most important keywords of the topic landed. [45] find three
weak signal topics on eighteen topics and [44] obtain one weak signal topic on
nineteen.
These approaches do not consider the evolution of weak signals into strong
ones. A more systematic analysis over different and successive periods of time
could reveal topics’ evolution. Another potential approach could be discovering
semantic networks in patterns clustering keywords and study their evolution
(33 and 84).
[42] focus on the analysis of heterogeneous and unstructured information.
The authors consider weak signals as small and seemingly insignificant issues,
but that can provide changes in the future. They vary the number and types
of sources (ScienceDirect, New York Times and Twitter) and automatically
extract the documents’ keywords. The authors use the Degree of transmission
(DOT) metric to help in the evaluation of the importance of the terms. This
degree considers all h-index values from the journals from all texts where the
word ican be found, as shown in Equation (3).
DoTi=XHindex journal (3)
For the final consideration of terms related to weak signals, every DoD and
DoV are multiplied by their DoT . They observe that some word expressions
are identified as potential weak signals. In order to provide more accurate infor-
mation, authors apply multi-word analysis with Natural Language Processing.
In this way, they investigate the percentage of times the word appears beside
a potentially weak signal term in all the dataset documents. They exemplify
this analysis by the term ”desertification”, and show other terms that relates
to it, for instance, ”processes” with 36.76%, ”Areas” with 26.37% and Rates
with 9.58%. A possible evolution for this method would be consider the word
embedding [85], which can help to identify related and similar terms over the
analyzed documents.
The authors identify two assessments for the evaluation: identifying strong
signals and a group of experts. With these two assessments, the methodology
shows interesting and promising results. More generally, this work shows, on
the one hand, the validity of the portfolio maps method and, on the other
hand, an improvement in the integration of the dimension of interpretation.
Another additional point is that the system is independent of its input dataset
of documents. In other words, the research work can be applied to any field.
However, it is crucial to notice that the H-index does not necessarily give an
accurate ranking of the impact. It merely states how many times a scientist has
published and how many times it has been cited. The h-index puts newcomers
at a disadvantage since both publication output and observed citation rates
will be relatively low. A young scientist will score a low H-index, because they
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 23
do not have time on their side. H-index is based on rather long-term obser-
vations. Another drawback of using h-index is the temptation for researchers
and journals to boost their H-index through self and arranged cross citations.
Other works have rather focused on the topic modeling and not on the
categorization of signals. For instance, [86] work from topics and [81] with
topic modeling Latent Dirichlet Allocation (LDA) in section 5.1.4.
Signal transformation study
This section presents methods to evaluate the possible evolution of weak signals
into strong ones. Not all weak signals manage to become strong, but detecting
early the ones that may become is an advantageous for strategic planning, as it
may conduct foresight [27]. However, the dynamics may also create a strategic
paradox. Either one waits to obtain sufficient information to act better or
accepts the information’s incomplete nature to act quickly. Weak signals are
early indicators of change, but not sufficiently reliable for strategic decisions.
Acting only when they become strong signals, may imply that it is too late for
them to represent a strategical advantage. At this point, the signal has already
become general knowledge.
[47] propose a method for detecting the evolution of a sign. They wanted to
prove the validity of the portfolio map’s method to detect the transformation
for future signs. [42], in their turn, also consider the transformation of the weak
signal into strong in the evaluation step. These studies propose two different
approaches to show the transformation of the weak signal. [42] highlight the
process in the evaluation stage and [47] develop a robust analysis. [42] identify
two assessments. The first is based on detecting strong signals in a new input
dataset with more recent documents. The second assessment is based on a
group of experts. In the first evaluation process, the strong signals in recent
documents that were detected as weak signals in earlier references confirm the
process.
[47] target the prediction of Korea’s nuclear situation post-2019. The input
dataset consists of news articles with the term ”nuclear energy” from the
site ”Naver News” https://news.naver.com/ from 2005 to 2018. Articles are
grouped into six months periods. The frequency of occurrence determines the
pertinence of a word. A word is considered as a keyword if it appears at
least once at each six-months interval. On the evaluation of the results, two
hypotheses in [47] are raised to verify the Yoon’s methodology :
1. Can the weak-signal methodology predict the UAE nuclear contract by
using social media data from the first half of 2005 to the first half of 2010?
2. Is it possible to predict the Korean government’s nuclear phase-out decla-
ration using social media data from the first half of 2011 to the first half of
2017?
They apply the portfolio maps method over two different time intervals for
each hypothesis. The first one allows the detection of weak signals, while the
second is used to evaluate the evolution of the signal from weak to strong. For
Springer Nature 2021 L
A
T
E
X template
24 Weak signal detection and identification in large data sets: a review of methods and applications
the first hypothesis, the chosen intervals are from 2005 to 2009 and 2005 to
2010. The presented results point to the direction that using portfolio maps is
a reliable method, even if no definition over the expected time for the trans-
formation of weak to strong signal is provided. For the authors, a determining
factor is an increase in appearance rate, even more than the term frequency.
Their work argues that an increased rate of 0.3% indicates that the signal may
become a strong one in the future, even if not verified with experimental data.
Other applications
Some studies apply portfolio maps to gain understanding and follow the evo-
lution of a series of distinct topics. For example, [46] target the identification
of school bullying in South Korea through weak signals detection. [72] look for
smart grid issues using the future sign searching technique.
Traditional social-based methods are not enough for detecting new trends
of school bullying. For this reason, [46] rely on big data analytics to understand
the evolution of school bullying. Their method relies on online data (blogs,
Twitter, social networking websites and 257 news sites) in which the term
”school bullying” appears. Based on portfolio maps (KIM & KEM), some
topics show a high increasing rate compared to other words. These have a
high likelihood of becoming more socially impactful words. The work results
conclude that bullying, group assault incidents and physical violence cases are
perceived as extremely serious among youth.
[72] look for smart grid issues using the future sign searching technique
through text mining. For validating the method proposed, the authors collect
documents at different points in time by using search queries (’smart grids’)
over Radian6 (social media monitoring platform for marketers). In this inves-
tigation, the authors do not consider the transition of a weak signal to a strong
one. Indeed, working on this problem means finding a suitable time for a weak
signal to turn into a strong signal. It is decisive to notice that the conversion
of time (weak signal into a strong signal) is uncertain and may vary depend-
ing on the study’s subject. The work uses the portfolio maps’ approach to
track changes during the transitional period. It would be possible to find the
converting pattern from a specific signal into another signal in a time series
(converting a weak signal into a strong signal) and how general keywords go
through the quadrants. A further study would be to change the window time.
Here, the entire period is the year, but if the future work changes the time
window’s size, it would be possible to monitor a group of keywords.
5.1.2 Keywords-occurrence based method
The main fundamental characteristics of (64;29;46;44;45;47;42 and 43) are
the use term of frequency. The following studies (43 and 42) present additional
contributions to the detection of weak signal based on frequency.
[43] propose a study to predict innovation and explore the dynamic behav-
ior of weak signals using the agent-based simulating tool NetLogo. Thus, the
authors present a time series analysis of words based on the word frequencies
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 25
through text mining. This study’s technical challenge involves identifying the
concepts that have a strong possibility of being weak signals. Authors consider
weak signals as emerging topics related to keywords that were not selected. In
other words, a word that has an unusual growth rate could relate strongly to
future unfamiliar and unusual issues. They collect 50 articles from Journal of
the Korean Ceramic Society with search queries between 2008 and 2012. The
proposed method computes each word’s occurrence, an annual occurrence, and
an annual growth rate. Here, they use the frequency of a term as visibility of
a signal (”signal” axis from Hiltunen’s definition) and the growth rate of a
term as a description of diffusion (”issue” axis). Their concept of a weak signal
is consistent with Hiltunen’s definition. The hypothesis is that a weak signal
appears rarely and can become stronger when merged. Their research focuses
on infometric analysis; this allows the reinforcement of the expert-based detec-
tion method. Then, by applying these terms to each agent of the agent-based
model’s values, authors could extract the future strong signals. The model rep-
resents another method of identifying weak signals, but it would be interesting
to determine an optimal threshold.
5.1.3 Analytical comparison of two databases
Another research that has provided evidence for weak signal detection is 64.
They explore a direction to determine the degree of novelty of documents in
a dataset. In this way, they consider a weak signal as a novel future signal
(documents or keywords). The target research field for this work is futuristic
analysis on the Augmented Reality. Their technique uses different futuristic
data types to determine which documents could represent a significant future
trend. As proof of concept, they used two datasets, an online dataset used as
of weak signal’s source and a patent dataset used to aces existing paradigms
of technological innovation. The online futuristic dataset is a collection of arti-
cles from different technology foresight websites, the dataset that works as a
technology proxy is a collection of patents linked to the Augmented Reality
field. The method consists of standards steps, the first is the data collection.
As documents are unstructured, the second phase is to extract the keywords of
documents and them into keyword-document matrixes, one for each dataset.
The third phase is the classification of the documents related to their rarity.
The fourth step is to determine a signal-portfolio map to access document pat-
terns and find weak signals. Documents on both datasets are ranked on behalf
of two parameters: rarity and paradigm unrelatedness. Authors define rarity
as the rate over which technology foresighters do not commonly mention the
signal. It represents how unusual or infrequently seen the idea is, compared
to others in the same set, i.e., the infrequency of an idea [87]. Paradigm unre-
latedness is considered as the rate at which a signal is unrelated to previous
technology innovations.
Document-level and keyword-level analyses are implemented as the tech-
nique considers as a signal for both documents and keywords. The method
proposes using the Local Outlier Factor (LOF) [88] to organize the articles
Springer Nature 2021 L
A
T
E
X template
26 Weak signal detection and identification in large data sets: a review of methods and applications
and find out similar similar clusters. One of the advantages of LOF is that it
filters out local outliers, as it analyses the document’s density compared to its
neighborhood. LOF is based on the k-nearest neighbors. In this context, an
outlier is an individual having a lower density than their neighbors. The intu-
ition is that a new item will be seen as an outlier. LOF is calculated for both
levels, documents and keywords. Then, the rarity assessment and paradigm
unrelatedness are achieved with a signal-portfolio map. The map is built using
rarity and paradigm unrelatedness in Fig. 9as two axes and locate signals,
i.e., documents or keywords.
This model presents a new approach for the detection of weak signals. It
uses two distinct datasets to cross-check the importance and impact of the
themes on both. One of the main advantages of this work is that it can be
completely automatic. All the steps can be done without human intervention
from the data collection to the signal-portfolio map. The keyword extraction
takes charge of the tasks of defining the available themes. The most impor-
tant part of the method is LOF’s application to aces the rarity and paradigm
unrelatedness of the themes. On the one hand, the definition proposed by Kim
and Lee provides additional information to the standard definition, and on the
other hand, the LOF detection method shows the outliers, which can be weak
signals.
This work of research relies on the complementarity of the two chosen
datasets. One is used to access the technology (online data: posts from weblogs,
news, databases, discussion forums, Wikis,..) and another to evaluate the futur-
istic potential of these technologies (patent db - UPSTO). These datasets must
be carefully chosen, as the whole method relies on their characteristics. The
first step of the LOF consists of determining the Knumber of neighbors. The
definition of the value of Kis not straightforward and directly influences the
final result. In fact, K’s best value may vary regarding the size of the collection
or the considered documents’ quality. The method does not create a forecast-
ing model perse, it relies on the entire input datasets to produce a result. It
needs to keep all the analyzed data in memory what may be a problem for
large datasets.
5.1.4 Topic Modelling
[81] and [44] have investigated the use of topic modeling to detect weak signals
as shown in Fig. 10. [44] questions some past hypotheses about Yoon’s method.
That is why they combine Latent Dirichlet Allocation influences with Yoon’s
categorization.
[81] propose a methodology for weak signals detection using LDA and o
verifying the study feasibility. For the authors, a weak signal is characterized by
a small number of words per document, present in a few documents, meaning
rarity and abnormality. Words that present a close connection in a given theme
are unrelated to other themes but appear in similar contexts. Note that the
authors do not consider the temporal aspect of documents. Their approach is
based on clustering topics at multi-level documents and extracting significant
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 27
Fig. 9 Signal-portfolio map based on the rarity and paradigm unrelatedness proposed by
[64]
Fig. 10 Taxonomy works on topic modelling’s method
descriptors (weighted list of words). In the study, [81] combine Latent Dirichlet
Allocation and Word2Vec pre-trained on the French Wikipedia corpus. The
efficient use of LDA implies the correct setting of the number of topics (K)
and not capturing links between topics. The categorization is too broad if the
kis low, missing the needed granularity to find weak signals. If the Kis high,
the words may be ”over” categorized, leading to a loss of information due
to high precision. Authors vary the number of clusters Kto obtain a set of
partitions linked together in the form of a tree structure. The method proposes
two optimization procedures for finding the optimal Kfor the LDA, and get
the optimal clusters from different cluster levels. The technique presents two
indicators :
Springer Nature 2021 L
A
T
E
X template
28 Weak signal detection and identification in large data sets: a review of methods and applications
Remark 1 Intra-cluster criteria allow measuring the similarity within a cluster
(similarity criterion) based on Word2Vec. The higher the similarity indicator (4), the
closer is the intracluster similarity.
I1=X
wE
w2vSim (wi, wj) (4)
where, I1represents the sum of the similarity values of all combinations
of word pairs in each cluster, w2vSim is the similarity measure defined in
Word2Vec which is the cosine similarity between two vectors.
Remark 2 Inter-clusters criteria evaluate the similarity between clusters (con-
sistency criterion). The likeness indicator, Bhattacharyya Distance [89], allows
identifying the most pertinent clusters, more likely to represent a weak signal.
The method’s evaluation shows the robustness compared to each LDA with
k= 2, .., 8. The work shows the interest in the jointly use LDA and Word2Vec
approach.
The primary interest in using LDA is inferring the themes from a corpus of
documents. This model has the advantage of proposing a method for detect-
ing weak signals based on tree-multi-clustering, unlike other works that are
essentially based on the portfolio maps method. [81] and [44], did not consider
the temporal aspect; it would be interesting to study the evolution of a clus-
ter during a period of time and analyze the evolution of a cluster for different
cycles of time (e.g., daily, weekly, monthly).
5.2 Graph Theory
Fig. 11 Taxonomy for the works on graph theory methods
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 29
In most recent studies, weak signals have been detected with graph theory
way. Those papers (31;30 and 63) explore how graph theory impact weak
signal detection
5.2.1 Formal Concept Analysis
[30] and [31] present an approach for early warning in predicting organized
crimes. In concrete terms, this technique supports decision-makers in fight-
ing against current and emergent organized crime threats. These researches
form part of early Pursuit against Organized crime using environmental scan-
ning, the Law and IntelligenCE systems Project (ePOOLICE). For the rest of
the review,[31] propose an approach illustrated using human trafficking and
modern slavery as an example.
The authors question the need for assessing and predicting the potential for
future threats with weak signals. They allude to ”the presence and/or emer-
gence of criminality in citizen-generated content” [90]. For them, a weak signal
is intuitively perceived as ”a creative inference from a piece of data that is
assumed to suggest a link, a potentially meaningful relation between emerging
ideas” [18]. Intuitively, the authors consider a weak signal as a ”little tangible
value (..) isolated indicators” that has ”potential for them to be symptomatic of
a variety of phenomena” the Canadian Criminal Intelligence Service’s (CISC)
definitions. In this way, the authors consider the influence of parameters, for
example, temporal or geographic proximity to a specific location and type of
activity. The mode of data pipeline used by [30] and [31] is similar. ePOO-
LICE Project includes an environmental crawling via the open-source scanning
application on social media, structured and unstructured OSINT repositories.
In the use case of [30] and [31], experts know what they seek, they establish
a taxonomy with environmental scanning method PESTLE (91;92) and United
Nations Office on Drugs and Crime (UNODC) model //www.unodc.org/
unodc/fr/index.html. Both works use the Formal Concept Analysis method
(FCA) to organize information and search for weak signals for the analysis
step.
FCA is a mathematical method for implying a relationship between
elements [93]. The FCA approach has a number of attractive features:
discernment of hierarchy between objects, relationship identification and clas-
sification. This structured approach allows visualizing in real-time the concept
hierarchies and finds relations and patterns between objects.
It has been suggested in the researches of [94] and [95] that FCA allows
identifying atypical behaviors, which is an innovative approach in looking for
signals. Moreover, the case studies proposed by [95] and by [94] demonstrate
an additional advantage of FCA; identifying atypical behaviors dynamically.
Even if FCA presents a series of advantages, it also has a critical limitation: the
human intervention required in under-understanding or over-understanding
situations. The first is caused by no knowledge, or insufficient knowledge,
resulting in false positives. The second is the result of over-knowledge of the
field and the parameters’ influence, which may not let new trends emerge. Any
Springer Nature 2021 L
A
T
E
X template
30 Weak signal detection and identification in large data sets: a review of methods and applications
element cannot hold certain properties, and this increases noise. If needed, the
person who guides the process must know the domain knowledge to determine
the objects and objects’ attributes. Another limitation of this method is related
to the choice of features and properties. These will influence the interpreta-
tion, and the wrong set of properties may lead to a false interpretation. FCA
is also computationally intensive, as it presents a polynomial computational
complexity [96].
5.2.2 Minimum Spanning Tree
[63] use Minimum Spanning Tree (MST) to detect weak signals regarding
global listed companies’ product and service information. They collect data
from ORBIS database (https://orbis.bvdinfo.com). The authors compute the
co-occurrence matrix between phrases (keywords) which represent products &
services. The analysis step is based on the keyword networks analysis and MST
graph analysis. The method betweenness centrality over MST shows the impor-
tance of a node’s mediating role in the whole network. In consequence, a weak
signal appears as the smallest node. MST allows visualizing the edges, which
minimize the weighted sum of all nodes on a network. The use of betweenness
centrality to MST is fully justified. It will show the influence of the node on
the network or the diffusion of the event. The proposed methodology is advan-
tageous as detecting weak signals is less time-consuming and complex than it
is for other methods.
Their approach combines different techniques to extract the outliers so
that each technique’s strengths are utilized to identify weak signals. More pre-
cisely, by calculating the inverse values, the keywords with low frequency are
highlighted and are candidates for weak signals. This study attempted to dif-
ferentiate node sizes and express them in the MST in the visualization. Then,
the use of the betweenness centrality and MST provide an attractive visual-
ization tool. Their study provides excellent results in their fields, but it would
be interesting to use other information of business-like patent information,
information on products and services of unlisted companies.
5.3 Machine Learning approach
This section presents methods such as Bayesian networks-based approach,
clustering-based approach and gradient descent-based approach as shown in
Fig. 12.
5.3.1 Bayesian networks-based approach
[34] present a system called NEST (New and Emerging Signals of Trends),
which uses clustering, pattern recognition, and cross-impact analysis using a
Bayesian network. The goal is to detect weak signals to present supplemen-
tary information for future technology researchers. The system is based on
Hiltunen’s concept [20] where the strength of visibility and the significance of
the impact of a particular topic increases over time.
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 31
Fig. 12 Works on machine learning approach
The earliest events of a given topic, that become stronger are its weak sig-
nals. [34] compute the visibility of a signal with a conceptual evaluation index
and cross-impact estimation. The diffusion of an event is computable through
a Knowledge-Matrix. The proposed method uses a Bayesian network to quan-
tify the interpretation of the results. Each document is considered a signal in
the approach, and it is subdivided into a set of events. Detecting a weak signal
means investigating the events of a document cluster and detecting the emerg-
ing trends. An emerging trend is characterized by the rise of the frequency of
occurrence, the strength of visibility, and the significance of a specific topic’s
impact. In other words, this network-based trend detection model-related sig-
nals are connected. By tracing back the sub-network representing the future
trends, weak signals can be found.
NEST has four different goals:
1. creating a systematic process for the identification of weak signals
2. building a reference system supporting futuristic studies (e.g. technology
researchers and decision-makers)
3. dealing with unstructured data (for instance, mass media, news, confer-
ences, workshops, academic papers, the Internet) and
4. employing quantitative methods as well as qualitative methods
The approach comprises four steps: Global Monitoring, NEST-clipping,
NEST-signal detection and Trend detection. Unlike other proposals, NEST
data collection is based on a group of experts called Global Trend Briefing
(GTB -[97]). The group of experts propose documents, check the correctness,
timeliness, non-duplicability and correctness of the information. This process
guarantees a certain quality and coherence over the evaluated signals. The
Nest-clipping consists of finding similar information by building a Knowledge
Matrix with co-word and clustering them. Each cluster is considered as one
topic and it is named manually. Then, the NEST-signal phase concerns detect-
ing and analysing of the topics trying to identify topics with a weak signal
Springer Nature 2021 L
A
T
E
X template
32 Weak signal detection and identification in large data sets: a review of methods and applications
tracking board. The number of generated maps is linked to the number of
clusters, which means the method gets more complex with the growth of the
number of clusters and their size.
One of the main strengths of NEST is the mechanism put in place to guar-
antee the quality of the created dataset. Standard procedures are put in place
to collect data from different expert sources, reduce noise and ensure the non-
duplication of collected information, which is quite interesting. Even if the
collection of documents relies on the referral of experts, which is a potential
source of bias. The system can only observe what the experts consider poten-
tially relevant. However,specialists often overlook weak signals [56], and if the
collection of topics is not broad enough, the system may be blind to some
critical future changes.
[78], [98] propose two different ways to detect weak signals: a weighted
average model or Bayesian belief networks. The use of weighted value requires
a predetermined weight that expert assigns. Each parameter owns a degree
of importance. This method is interesting, but the expert’s intervention could
lead to a distorted assessment. The authors objective is to determine, from
posts on web forums, the interest of different users (alias) over the same topic.
However, the way authors propose the application of Bayesian networks would
imply the creation of one network for each alias, which could be hard to
implement and handle for large forums.
5.3.2 Gradient descent-based approach
[80] apply machine learning techniques to identify weak signals. Gradient
Descent is a machine learning technique where a convex function is used to ver-
ify the progress of a method as the parameters are interactively changed. The
lower the value, the best is the result, as we want to minimize the objective
function. This idea is used in, to find weak signals through the use of super-
vised learning. For the authors, weak signals’ detection usually relies largely
on tacit human knowledge. Their study focuses on weak signals that are ”the
information that a human reader is able to extract from a piece of news which
is no more than a hint that a certain event is going to happen”. The used
dataset is composed of 40,000 scientific articles published in the last 50 years.
For the data analysis step, a team of people annotated each piece of news
binarily, and they selected the documents where the whole team agreed. The
second round of annotation restricts the scope to paragraphs (containing 100-
250 words). In this phase, they evaluate the presence of weak signal corpus
for short term (referring to paragraph) and long term (referring to full docu-
ments). Then, they provide an evolution of the average number of documents
on which there was a strong disagreement.
As mentioned by the authors, the use of TFI-DF provides negligible
accuracy. According to them, researching a weak signal with tf-idf is not
only sufficient. In order to overcome this problem, they will use a couple of
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 33
approaches: QDA (scikit library 2) & SVM (Weka library 3) and gradient
descent. Their study highlights that machine learning methods can be used
efficiently for tasks where tacit human knowledge plays an important role, and
we can create annotated datasets.
5.3.3 Clustering-based approach
[77] proposes a clustering-based method with two objectives, the first detecting
crime patterns and the second improving the productivity of detectives and
other law enforcement officers. The main challenge faced in crime analysis is
treating fields such as the modus operandis, a free text field. Finding ways
to structure these fields so that regular methods can treat them is not an
easy task. The proposed method consists of four steps, (1) collecting data
information (public information, over sheriff’s office sites), (2) defining the
clusters, (3) clustering the collected information and (4) identifying. A cluster
is a group of crimes in a geographical region in the presented work.
The clustering method explored is k-means. Attributes are dynamically
weighted and clustered regarding the different types of crimes. Thanks to
the clusters, it is possible to represent them using geospatial plots in a map
(describing the hotspots of crime). The detective can then analyze the crime
patterns. Using k-means and the scheme for weighting the significant attributes
convey decision-making. However, the use of K-means requires setting a K-
number of clusters and so to know approximately input data and the number
of classes available. Also, the use of K-means implies that the initial k cluster
seeds may profoundly impact on the distribution of the classes.
5.4 Semantic approaches
Previous attempts to scan weak signals from quantitative data focus on earli-
ness but neglect signals’ evolving nature.[33] and, [78] and [98] propose to take
into account the semantic as shown in Fig. 13. [33] considers the semantic of
signals during their evolution through time.
In their work, authors use Latent Semantic Indexing (LSI) [99] based on
Singular Value Decomposition (SVD) to create a cluster of documents. Then
they investigate these clusters and measure their evolution at successive points.
Authors focus on the development of the discourse, as different people use
different writing styles to describe the same event. The objective is to under-
stand the evolution of signals through the evolution of the semantic textual
patterns of the identified clusters. With this, they intend to evaluate if a sig-
nal will stay weak or may become strong. For the data collection, [33] study
environmental scanning by crawling different websites in the German language
at different points in time. LSI considers the term dependencies to identify
semantic generalization. Notice that the number of clusters is found empirically
(k= 25).
2https://scikit-learn.org/0.16/modules/generated/sklearn.qda.QDA.html
3https://waikato.github.io/weka-wiki/use weka in your java code/
Springer Nature 2021 L
A
T
E
X template
34 Weak signal detection and identification in large data sets: a review of methods and applications
Fig. 13 Works on semantic based-approach
By considering the semantic of the words,[33] address the problems of
polysemy and homonyms of traditional lexical matching. Even though com-
putationally intensive, SVD enables the calculation of the most significant
number of relevant weak signals. Using topic modeling techniques to detect a
cluster of documents is interesting; however, the authors do not present any
study over the optimal number of clusters for a given topic. The k= 25 is
an empirical value. If k is too small, the clusters will be too broad, and If k
is too large, groups will capture what could be considered noise at first. The
detection of weak signals would require the overlapping of different clusters.
[78], [98] propose a method to collect information about people that are on
the verge of preparing and committing lone violent acts (i.e. lone wolf ). Their
approach involves breaking down this complex problem into sub-problems and
fusing the results into a global solution. They consider terrorist potential like
a weak signal and establish relations between aliases (usernames employed in
websites) to find authors that use different aliases. The work’s challenge is
determining the degree that an individual is/will become a lone wolf attack.
Their definition of weak signal is related to the context and the field of knowl-
edge. Naturally, a weak signal is a subtle key (an alias) that has a significant
impact. In this study, the authors propose a semantic analysis (WordNet 4)
to analyze individuals’ speech. However, this method does not consider the
sentence thone (ironic, jokes,..), which may lead to false positive.
Authors gather data by using a crawler and hyperlink analysis. From web
page URLs of well-known terrorist web-sites, it is possible to follow the source
page hyperlinks to find more web pages [100]. These results provide an exten-
sive network graph, where the nodes are the web pages and the edges, the
hyperlink found. To extract relevant websites, the authors suggest using topic-
filtered web harvesting. All aliases are extracted, and a model is created for
each alias from this organized text. In this study proposition, each alias receives
4https://wordnet.princeton.edu/
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 35
a score regarding the presence of hate, or violent comments [101], which affects
the analysis. A alias model is composed of all relevant information related to
the alias. For the analysis, using these indicators requires the classification
of radical and non-radical content, semantic relations (lexical database Word-
Net), the identification of similarities (Linguistic Inquiry and Word Count -
LIWC 5). [78] assume that when an author makes a post on a discussion board,
this reveals his interest in the topic, which can be used to identify aliases.
The evaluation of the alias is closer to the event’s diffusion (i.e. issue axis of
Hiltunen’s concept). This task could make it possible to quantify the impor-
tance of an alias or the diffusion of its actions on the network. Two methods
are proposed, a weighted average model and a Bayesian belief network. The
weighted average model requires experts to provide the weights. There is a
risk of adding a cognitive bias, as the weights imply the importance of the
parameters. The Bayesian network is immune to this bias, as they may be cre-
ated without the expert’s intervention, but the complexity is higher, as a new
network is required for each alias.
I(x) = f({J(w) : wWA(x, w)=1}, C(x)) (5)
where Iin (5) is the result of the interestingness function over each alias x,
Jthe interestingness function for a web site, C(x) a content analysis function
and A(x, w) is equal to 1 if xis active. The content of posts may reveal which
alias is behind that specific post, compared to other aliases. A step of match-
ing is realized to identify an alias that has several aliases from the ranking.
Different methods are proposed: Jaro-Winkler metric [102], Levenshtein list of
potential extremists [98]. The work does not present results to evaluate the
efficiency of the proposed methods. However, the concepts and ideas are inter-
esting, particularly using techniques such as topic-filtered web harvesting and
content analysis using natural language processing. Moreover, the recognition
of the document author through the content analysis is another interesting
contribution.
5.5 Expert-based approach
[79] focus on organizing the information and on the identifying of weak signals
to enable efficient innovation for companies. For the authors, a weak signal is
a piece of information that does not expose an immediate interest in the first
view. However, in a given context, they may represent a fundamental piece of
information [36]. To find them past and present need to be evaluated [103].
They propose the following method, (1) fixing a subject of research, (2) defining
a source of signs, (3) gathering data, (4) classing with keywords or areas of
activities and interpreting the results. Fixing a research subject means map
topics with X-Mind and (5) identifying relevant opportunities and selecting
the relevant topics. The identification step is based on the principle of scoring
(Eq.(6)) with is the likelihood of occurrence of the theme in the company
5https://liwc.wpengine.com/
Springer Nature 2021 L
A
T
E
X template
36 Weak signal detection and identification in large data sets: a review of methods and applications
Fig. 14 Evaluation of occurrence probability
Fig. 15 Evaluation of the impact
noted Pin Fig. 14 and the impact on the business, noted Iin Fig. 15.Pis a
likelihood assessment with the scale and Ithe scale of impact.
The X-Mind maps are then ranked by experts that assign priority, ”not
urgent”, ”to treat secondarily”, or ”to treat as a priority”. Then, analysts can
plan their actions according to the score of importance. The idea of assigning a
priority level is interesting but relies on the expertise and evaluation of a group
of experts. The result is mainly based on the experts’ intuition, experience and
commitment, as the work may be long and tedious. We can expect the final
result to provide cognitive bias and loss of performance from the expert.
The score of Importance is :
Score = I ×P (6)
6 Conclusions
This survey reviewed different methods for the detection of early warning
signals. Early warning signals can be used in a relatively large range of
situations, ranging from anomaly to criminal detection, including decision-
making and anticipation in the context of companies’ strategic planning.
This survey discusses the difficulties of the field and the ground principles
on which researchers rely on. In Section 4presents how different methods
fit a standard data treatment pipeline and in Section 5a categorization of
the available methods. The proposed taxonomy divides the reviewed meth-
ods into: statistics-based, Bayesian networks-based, semantic-based, gradient
descent-based, clustering-based and expert-based.
The research in weak signals is largely influenced by Ansoff ’s initial works
in the 70’s and Hiltunen’s concepts in 00’s. Hiltunen’s works introduce the
importance of a signal and the graduation of a signal into weak to strong one.
Some recent studies extended this concept using the portfolio maps method.
Conversely, [64] instead focused on the signal rarity and its discordance to the
context. Kim’s work highlights the importance of the complementarity of data
sources by merging two relevant databases. For [29] the changes in appearance
rate of keywords may reveal the evolution of weak signals into strong ones.
Some works try to take into account temporal aspects to track the evolution
signals (33;46;47), even if we still have room for works in this field. Reliable
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 37
methods for evaluating the probability of a signal to transform from weak to
strong are fundamental in forecasting.
The growth in the volume, kind, and quality of available data drives the
development of innovative approaches to dealing with it. Companies, govern-
mental agencies, and politics, among others, can get a real-time perspective
of what has the potential to become important through sources such as social
networks, where people can freely express themselves. This real-time, informa-
tion flow and the possibility of detecting early signals of changes may represent
a considerable advantage for different actors. Weak signal research is an active
topic, and we hope to see even more advancements in the future, considering
all of the potential interests.
Appendix A Section title of first appendix
Table A1: Weak signal definition
References Quotes
[5] ”Early indications about impactful events (...) all that is
known (of them) is that some Threats and Opportunities
will undoubtedly arise, but their shape and nature and
source are not yet known”
[15] ” sudden, urgent, unfamiliar changes in the firm’s perspec-
tive which threaten either a major prof- it reversal or loss
of a major opportunity ”
[27] ”Weak signals are first symptoms of strategic discontinu-
ities; they are symptoms of possible change in the future”
[104] ”seemingly innocuous ’data’ but with an interpretation that
can trigger an alert. This alert indicates that an event may
occur that could have a significant impact (in terms of
opportunities or threats). After interpreting, the signal is
no longer qualified as a weak signal, but as an early warning
signal”
[18] ”Gross, unstructured, fragmented, incomplete and inadver-
tent environmental data that may be refined into valuable
information regarding context and further be articulated
into strategically actionable knowledge” and ”premature,
imperfect information (. . . ) obfuscated by counfounfind
factors ”
[105] ” as information on potential change of a system to an
unknown direction.”
[106] ”the early signs of possible but not confirmed changes
that may later become more significant indicators of criti-
cal forces for development, (...) the first signs of paradigm
shifts, or future trends, drivers or discontinuities”
Continued on next page
Springer Nature 2021 L
A
T
E
X template
38 Weak signal detection and identification in large data sets: a review of methods and applications
Table A1 – continued from previous page
References Quotes
[78] ”the relevant clues”
[19] ”An idea or trend that will affect how we do business, what
business we do, and the environment in which we will work”
[16] ”Hardly perceivable at present but constitutes a strong
trend in the future”
[39] ”Current oddities, strange issues that are thought to be in
key position in anticipating future changes in organizational
environments”
[29] ”Imprecise and early indicators of impending important
events or trends, which are considered key to formulating
new potential business items”
[64] ”they are anomalies or strange issues that are not compat-
ible with the prevailing sensemaking paradigm (...). novel
future signals (documents or keywords), in futuristic data
that are not only rare but also have paradigm unrelatedness
[46] ”imprecise and non-mainstream signals with a potential to
change in the future”
[43] ” These indicators of change can be advanced, somewhat
noisy, and generally socially situated trends and systems
that constitute raw information to enable anticipatory
action. (. . . ) represent an unknown, unexpected, or rare
change, which makes them hard to distinguish as relevant.
References
[1] Koivisto, R., Kulmala, I., Gotcheva, N.: Weak signals and damage sce-
narios — Systematics to identify weak signals and their sources related
to mass transport attacks. Technological Forecasting and Social Change
104, 180–190 (2016). https://doi.org/10.1016/j.techfore.2015.12.010
[2] Lesca, H., Lesca, N.: Strategic Decisions and Weak Signals: Anticipation
for Decision-Making. Publisher: ISTE Ltd and John Wiley and Sons, ???
(2014). https://doi.org/10.1002/9781118959152
[3] Charitonidis, C., Rashid, A., Taylor, P.J.: Weak Signals as Pre-
dictors of Real-World Phenomena in Social Media. In: Proceed-
ings of the 2015 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining 2015, pp. 864–871.
ACM, Paris France (2015). https://doi.org/10.1145/2808797.2809332.
https://dl.acm.org/doi/10.1145/2808797.2809332 Accessed 2021-09-19
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 39
[4] Grissa, D., P´et´era, M., Brandolini, M., Napoli, A., Comte, B., Pujos-
Guillot, E.: Feature Selection Methods for Early Predictive Biomarker
Discovery Using Untargeted Metabolomic Data. Frontiers in Molec-
ular Biosciences 3(2016). https://doi.org/10.3389/fmolb.2016.00030.
Accessed 2021-10-12
[5] Ansoff, H.I.: Implanting Strategic Management / H. Igor Ansoff, p. 510.
Prentice/Hall International Englewood Cliffs, N.J, ??? (1984)
[6] Decker, R., Wagner, R., Scholz, S.W.: An internet-based approach
to environmental scanning in marketing planning. Marketing Intel-
ligence & Planning 23(2), 189–199 (2004). https://doi.org/10.1108/
02634500510589930. Accessed 2021-02-22
[7] Peirce, C.S.: Some consequences of four incapacities. Journal of Specu-
lative Philosophy 2(3), 140–157 (1868)
[8] Hiltunen, E.: Weak signals in organizational futures learning. Doctoral
thesis, Aalto University. School of Business (2010). http://urn.fi/URN:
ISBN:978-952-60-1022-9
[9] Molitor, G.T.T.: Molitor forecasting model:key dimension for plotting
the patterns of change. Journal of Future Studies 8(1), 61–72 (2003)
[10] Jim, D.: Universities without ”quality” and quality without ”universi-
ties”. On the Horizon 13(4), 199–215 (2005). https://doi.org/10.1108/
10748120510627321
[11] Hiltunen, E.: Was it a wild card or just our blindness to gradual change.
Journal of Futures Studies 11, 61–74 (2006)
[12] Nikander, I.O.: A phenomenon in project management, p. 207 (2002)
[13] Scheffer, M., Bascompte, J., Brock, W., Brovkin, V., Carpenter, S.,
Dakos, V., Held, H., Nes, E., Rietkerk, M., Sugihara, G.: Early-warning
signals for critical transitions. Nature 461, 53–9 (2009). https://doi.org/
10.1038/nature08227
[14] Welz, K., Brecht, L., Pengl, A., Kauffeldt, J., Schallmo, D.: Weak signals
detection: Criteria for social media monitoring tools. (2018)
[15] Ansoff, H.I.: Managing strategic surprise by response to weak
signals. California Management Review 18(2), 21–33 (1975)
https://doi.org/10.2307/41164635.https://doi.org/10.2307/41164635
[16] Godet, M.: From Anticipation to Action: A Handbook of Strategic
Prospective. UNESCO Publishing, ??? (1993)
Springer Nature 2021 L
A
T
E
X template
40 Weak signal detection and identification in large data sets: a review of methods and applications
[17] Blanco, S., Lesca, N.: From weak signals to anticipative information:
learning from the implementation of an information selection method
authors. (2003)
[18] Mendon¸ca, S., Cardoso, G., Cara¸ca, J.: The strategic strength of weak
signal analysis. Futures 44 (2012). https://doi.org/10.1016/j.futures.
2011.10.004
[19] Coffman, B.: Weak signal research, part i-v. Journal of Transition
Management 2(1997)
[20] Kuusi, O., Hiltunen, E.: The signification process of the future sign, vol.
16 (2012)
[21] Moijanen, M.: Heikot signaalit tulevaisuuden tutkimuksessa. Futura 4,
38–60 (2003)
[22] Ilmola, L., Kuusi, O.: Filters of weak signals hinder foresight: Monitoring
weak signals efficiently in corporate decision-making. Futures 38, 908–
924 (2006). https://doi.org/10.1016/j.futures.2005.12.019
[23] Kaivo-oja, J.: Weak signals analysis, knowledge management theory and
systemic socio-cultural transitions. Futures 44(3), 206–217 (2012). https:
//doi.org/10.1016/j.futures.2011.10.003. Special Issue: Weak Signals
[24] Ponomareva, J.V., Sokolova, A.: The identification of weak signals and
wild cards in foresight methodology: Stages and methods. SSRN Journal
(2015). https://doi.org/10.2139/ssrn.2655520. Accessed 2021-02-22
[25] Kosala, R., Blockeel, H.: Web mining research: A survey. ACM
SIGKDD Explorations Newsletter 2(2001). https://doi.org/10.1145/
360402.360406
[26] Teo, T., Choo, W.: Assessing the impact of using the internet for
competitive intelligence. Information & Management 39, 67–83 (2001).
https://doi.org/10.1016/S0378-7206(01)00080-5
[27] Holopainen, M., Toivonen, M.: Weak signals: Ansoff today. Futures 44
(2012). https://doi.org/10.1016/j.futures.2011.10.002
[28] Brizon, A.: Compr´ehension et gestion des signaux faibles dans le domaine
de la sant´e-s´ecurit´e. PhD thesis, Ecole des Mines ParisTech (2009). Th`ese
de doctorat dirig´ee par Wybo, Jean-Luc Sciences et g´enie des activit´es
`a risques Paris, ENMP 2009. http://www.theses.fr/2009ENMP1626
[29] Yoon, J.: Detecting weak signals for long-term business opportunities
using text mining of web news. Expert Systems with Applications 39,
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 41
12543–12550 (2012). https://doi.org/10.1016/j.eswa.2012.04.059
[30] Brewster, B., Andrews, S., Polovina, S., Hirsch, L., Akhgar, B.: Envi-
ronmental Scanning and Knowledge Representation for the Detec-
tion of Organised Crime Threats. In: Hernandez, N., J¨aschke,
R., Croitoru, M. (eds.) Graph-Based Representation and Reason-
ing vol. 8577, pp. 275–280. Springer, Cham (2014). https://doi.
org/10.1007/978-3-319-08389-6 22. Series Title: Lecture Notes in
Computer Science. http://link.springer.com/10.1007/978-3-319-08389-
622Accessed2021 02 27
[31] Andrews, S., Brewster, B., Day, T.: Organised crime and social media:
a system for detecting, corroborating and visualising weak signals of
organised crime online. Secur Inform 7(1), 3 (2018). https://doi.org/10.
1186/s13388-018-0032-8. Accessed 2021-02-22
[32] Tabatabaei, N.: Detecting weak signals by internet-based environmental
scanning. PhD thesis, University of Waterloo (2011)
[33] Thorleuchter, D., Scheja, T., Van den Poel, D.: Semantic weak signal
tracing. Expert Systems with Applications 41, 5009–5016 (2014). https:
//doi.org/10.1016/j.eswa.2014.02.046
[34] Kim, S., Kim, Y.-E., Bae, K.-J., Choi, S.-B., Park, J.-K., Koo, Y.-
D., Park, Y.-W., Choi, H.-K., Kang, H.-M., Hong, S.-W.: NEST: A
quantitative model for detecting emerging trends using a global moni-
toring expert network and Bayesian network. Futures 52, 59–73 (2013).
https://doi.org/10.1016/j.futures.2013.08.004. Accessed 2021-02-22
[35] Thorleuchter, D., Van Den Poel, D.: Protecting research and technology
from espionage. Expert Syst. Appl. 40(9), 3432–3440 (2013). https://
doi.org/10.1016/j.eswa.2012.12.051
[36] Barbaud, F., Mousnier, G.: Innovation par la maitrise
de la Data : comment discerner les signaux faibles ?
(2015). https://www.marketing-professionnel.fr/tribune-libre/
innovation-maitrise-data-comment-discerner-signaux-faibles\-201504.
html Accessed 2021-02-23
[37] Kuosa, T.: Futures signals sense-making framework (FSSF): A start-up
tool to analyse and categorise weak signals, wild cards, drivers, trends
and other types of information. Futures 42(1), 42–48 (2010). https://
doi.org/10.1016/j.futures.2009.08.003. Accessed 2020-07-17
[38] Ansoff, H.I.: Strategic response in turbulent environments. European
Institute for Advanced Studies in Management Brussels (1982)
Springer Nature 2021 L
A
T
E
X template
42 Weak signal detection and identification in large data sets: a review of methods and applications
[39] Hiltunen, E.: The future sign and its three dimensions. Futures 40, 247–
260 (2008). https://doi.org/10.1016/j.futures.2007.08.021
[40] Gutsche, T.: Automatic weak signal detection and forecasting. PhD
thesis, University of Twente (2018)
[41] uhlroth, C., Grottke, M.: A systematic literature review of mining weak
signals and trends for corporate foresight. Journal of Business Economics
88(5), 643–687 (2018). https://doi.org/10.1007/s11573-018-0898-4
[42] Griol-Barres, I., Milla, S., Cebri´an, A., Fan, H., Millet, J.: Detecting
Weak Signals of the Future: A System Implementation Based on Text
Mining and Natural Language Processing. Sustainability 12(19), 1–1
(2020)
[43] Yoo, S.H., Won, D.: Simulation of weak signals of nanotechnology inno-
vation in complex system. Sustainability (Switzerland) 10 (2018). https:
//doi.org/10.3390/su10020486
[44] Krigsholm, P., Riekkinen, K.: Applying text mining for identifying future
signals of land administration. Land 8(12) (2019). https://doi.org/10.
3390/land8120181
[45] Lee, Y.-J., Park, J.-Y.: Identification of future signal based on the
quantitative and qualitative text mining: a case study on ethical issues
in artificial intelligence. Quality and Quantity: International Jour-
nal of Methodology 52(2), 653–667 (2018). https://doi.org/10.1007/
s11135-017-0582-8
[46] Kim, H., Han, Y., Song, J., Song, T.M.: Application of Social Big Data
to Identify Trends of School Bullying Forms in South Korea. IJERPH
16(14), 2596 (2019). https://doi.org/10.3390/ijerph16142596. Accessed
2021-02-22
[47] Roh, S., Choi, J.: Exploring signals for a nuclear future using social
big data. Sustainability 12, 5563 (2020). https://doi.org/10.3390/
su12145563
[48] Kuosa, T.: Heikko signaali vai merkitykset¨on kohina : Pattern
management - ontologisesti uusi l¨ahestymistapa heikkojen signaalien
tarkasteluun ja tulkintaan. (2005)
[49] Breen, B.J., Rix, J.G., Ross, S.J., Yu, Y., Lindner, J.F., Mathewson, N.,
Wainwright, E.R., Wilson, I.: Harvesting wind energy to detect weak
signals using mechanical stochastic resonance. Phys. Rev. E 94, 062205
(2016). https://doi.org/10.1103/PhysRevE.94.062205
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 43
[50] Dong, H., Wang, H., Shen, X., He, K.: Parameter matched stochastic
resonance with damping for passive sonar detection. Journal of Sound
and Vibration 458, 479–496 (2019). https://doi.org/10.1016/j.jsv.2019.
06.021. Accessed 2021-11-17
[51] Cui, L., Yang, J., Wang, L., Liu, H.: Theory and Application of Weak Sig-
nal Detection Based on Stochastic Resonance Mechanism. Security and
Communication Networks 2021, 1–9 (2021). https://doi.org/10.1155/
2021/5553490. Accessed 2021-11-07
[52] Qiao, Z., Lei, Y., Li, N.: Applications of stochastic resonance to
machinery fault detection: A review and tutorial. Mechanical Systems
and Signal Processing 122, 502–536 (2019). https://doi.org/10.1016/j.
ymssp.2018.12.032. Accessed 2021-11-07
[53] Qiao, Z., Liu, J., Ma, X., Liu, J.: Double stochastic resonance induced
by varying potential-well depth and width. Journal of the Franklin Insti-
tute 358(3), 2194–2211 (2021). https://doi.org/10.1016/j.jfranklin.2020.
12.028. Accessed 2021-09-24
[54] MEVEL, O.: Du rˆole des signaux faibles sur la reconfiguration des pro-
cessus de la chaˆıne de valeur de l’organisation : l’exemple d’une centrale
d’achats de la grande distribution fran¸caise. Theses, Universit´e de Bre-
tagne occidentale - Brest (Dec 2004). https://tel.archives-ouvertes.fr/
tel-00009025
[55] Ansoff, H.I.: Strategic issue management. Strate-
gic Management Journal 1(2), 131–148 (1980)
https://onlinelibrary.wiley.com/doi/pdf/10.1002/smj.4250010204.
https://doi.org/10.1002/smj.4250010204
[56] Kahneman, D., Tversky, A.: Prospect Theory: An Analysis of Decision
under Risk. Econometrica 47(2), 263 (1979). https://doi.org/10.2307/
1914185. Accessed 2021-02-23
[57] Blain, B.B.: Melting markets: the rise and decline of the Anglo-
Norwegian ice trade, 1850-1920. Economic History Working Papers
22471, London School of Economics and Political Science, Department
of Economic History (February 2006). https://ideas.repec.org/p/ehl/
wpaper/22471.html
[58] Graser, M., Graser, M.: Epic Fail: How Blockbuster Could
Have Owned Netflix (2013). https://variety.com/2013/biz/news/
epic-fail-how-blockbuster-could-have-owned\-netflix-1200823443/
Accessed 2021-02-23
[59] In Kodak Bankruptcy, Another Casualty of the Digital
Springer Nature 2021 L
A
T
E
X template
44 Weak signal detection and identification in large data sets: a review of methods and applications
Revolution (2012). https://business.time.com/2012/01/20/
in-kodak-bankruptcy-another-casualty-of-the-digital-revolution/
[60] Sidhom, S., Lambert, P.: Information design for weak signal detection
and processing in economic intelligence: A case study on health resources.
Journal of Intelligence Studies in Business (JISIB) 1(1), 40–48 (2011)
[61] Li, X., Wu, Q., Peng, G., Lv, B.: Tourism forecasting by search engine
data with noise-processing. Afr. J. Bus. Manage. 10(6), 114–130 (2016).
https://doi.org/10.5897/AJBM2015.7945. Accessed 2021-02-23
[62] Sammut, C., Webb, G.I. (eds.): TF–IDF, pp. 986–987. Springer,
Boston, MA (2010). https://doi.org/10.1007/978-0-387-30164-8 832.
https://doi.org/10.1007/978-0-387-30164-8832
[63] Kwon, L.-N., Park, J.-H., Moon, Y.-H., Lee, B., Shin, Y., Kim, Y.-
K.: Weak signal detecting of industry convergence using information of
products and services of global listed companies - focusing on growth
engine industry in South Korea. J. open innov. 4(1), 10 (2018). https:
//doi.org/10.1186/s40852-018-0083-6
[64] Kim, J., Lee, C.: Novelty-focused weak signal detection in futuristic data:
Assessing the rarity and paradigm unrelatedness of signals. Technological
Forecasting and Social Change 120(C), 59–76 (2017). https://doi.org/
10.1016/j.techfore.2017.04.006
[65] Day, G., Schoemaker, P.: Scanning the periphery. Harvard business
review 83, 135–40142144 (2005)
[66] Hiltunen, E.: Where do Future-Oriented People Find Weak Signals.
Transcultural Futurist Magazine 6(2) (2007)
[67] Choo, C.: Information management for the intelligent organization: The
art of scanning the environment. Inf. Res. 8(2003)
[68] Park, C., Kim, H.-j.: A study on the development direction of the
new energy industry through the internet of things - searching for
future signals using text mining. Technical report, KOREA ENERGY
ECONOMICS INSTITUTE (2015). www.keei.re.kr
[69] Wirth, R., Hipp, J.: Crisp-dm: Towards a standard process model for
data mining. Proceedings of the 4th International Conference on the
Practical Applications of Knowledge Discovery and Data Mining (2000)
[70] Pel´anek, R., Rih´ak, J., Papouˇsek, J.: Impact of data collection
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 45
on interpretation and evaluation of student models. In: Proceed-
ings of the Sixth International Conference on Learning Analyt-
ics & Knowledge - LAK ’16, pp. 40–47. ACM Press, Edinburgh,
United Kingdom (2016). https://doi.org/10.1145/2883851.2883868.
http://dl.acm.org/citation.cfm?doid=2883851.2883868 Accessed 2021-04-
07
[71] Prokopowicz, D., Gwo´zdziewicz, S.: The big data technologies as an
important factor of electronic data processing and the development of
computerized analytical platforms, business intelligence. International
Journal of Small and Medium Enterprises and Business Sustainability
2, 27–42 (2017)
[72] Park, C., Cho, S.: Future sign detection in smart grids through text
mining. Energy Procedia 128 (2017). https://doi.org/10.1016/j.egypro.
2017.09.018. Accessed 2021-02-24
[73] Garc´ıa, S., Ram´ırez-Gallego, S., Luengo, J., Ben´ıtez, J.M., Herrera, F.:
Big data preprocessing: methods and prospects. Big Data Anal 1(1),
9 (2016). https://doi.org/10.1186/s41044-016-0014-0. Accessed 2021-04-
08
[74] Wang, H., Wang, S.: Mining incomplete survey data through classifi-
cation. Knowl Inf Syst 24(2), 221–233 (2010). https://doi.org/10.1007/
s10115-009-0245-8. Accessed 2021-04-08
[75] Frenay, B., Verleysen, M.: Classification in the Presence of Label Noise:
A Survey. IEEE Trans. Neural Netw. Learning Syst. 25(5), 845–869
(2014). https://doi.org/10.1109/TNNLS.2013.2292894
[76] Hall, M.: Correlation-based feature selection for machine learning.
Department of Computer Science 19 (2000)
[77] Nath, S.V.: Crime pattern detection using data mining. 2006
IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technology Workshops, 41–44 (2006)
[78] Brynielsson, J., Horndahl, A., Johansson, F., Kaati, L., Martenson, C.,
Svenson, P.: Analysis of Weak Signals for Detecting Lone Wolf Terror-
ists. In: 2012 European Intelligence and Security Informatics Conference,
pp. 197–204. IEEE, Odense, Denmark (2012). https://doi.org/10.1109/
EISIC.2012.20.http://ieeexplore.ieee.org/document/6298831/ Accessed
2021-02-22
[79] Munier, M., Jean, C., Aoussat, A., Peignon, A.: LES SIGNAUX
FAIBLES : DE FORTES SOURCES D’OPPORTUNITES INNO-
VANTES A SAVOIR RECHERCHER, 10 (2016)
Springer Nature 2021 L
A
T
E
X template
46 Weak signal detection and identification in large data sets: a review of methods and applications
[80] Irimia, A., Punguta, P., Gheorghiu, R.: Tacit Knowledge - Weak Signal
Detection, p. 4 (2018)
[81] Julien Maitre, Menard, M., Chiron, G., Bouju, A.: D´etection de signaux
faibles dans des masses de donn´ees faiblement structur´ees. RIDoWS 3(1)
(2019). https://doi.org/10.21494/ISTE.OP.2020.0463. Accessed 2021-
02-22
[82] Tseng, Y.-H., Lin, C.-J., Lin, Y.-I.: Text mining techniques for patent
analysis. Information Processing & Management 43(5), 1216–1247
(2007). https://doi.org/10.1016/j.ipm.2006.11.011
[83] Wang, H., Ohsawa, Y., Nishihara, Y.: Innovation support system for
creative product design based on chance discovery. Expert Systems with
Applications 39(5), 4890–4897 (2012). https://doi.org/10.1016/j.eswa.
2011.10.002. Accessed 2021-02-24
[84] Thorleuchter, D., Van den Poel, D.: Idea mining for web-based weak
signal detection. Futures 66, 25–34 (2015). https://doi.org/10.1016/j.
futures.2014.12.007
[85] Jurafsky, D., Martin, J.H.: Speech and Language Processing : an Intro-
duction to Natural Language Processing, Computational Linguistics, and
Speech Recognition. Pearson Prentice Hall, Upper Saddle River, N.J.
(2009)
[86] Kim, H., Ahn, S.-J., Jung, W.-S.: Horizon scanning in policy research
database with a probabilistic topic model. Technological Forecasting and
Social Change 146, 588–594 (2019). https://doi.org/10.1016/j.techfore.
2018.02.007
[87] Kahai, S.S., Sosik, J.J., Avolio, B.J.: Effects of leadership style,
anonymity, and rewards on creativity-relevant processes and outcomes
in an electronic meeting system context. The Leadership Quarterly
14(4), 499–524 (2003). https://doi.org/10.1016/S1048-9843(03)00049-3.
Accessed 2021-02-25
[88] Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: Iden-
tifying density-based local outliers. In: Proceedings of the 2000
ACM SIGMOD International Conference on Management of Data.
SIGMOD ’00, pp. 93–104. Association for Computing Machinery,
New York, NY, USA (2000). https://doi.org/10.1145/342009.335388.
https://doi.org/10.1145/342009.335388
[89] Bhattacharyya, A.: On a measure of divergence between two statistical
populations defined by their probability distributions. Bulletin of the
Calcutta Mathematical Society 35, 99–109 (1943)
Springer Nature 2021 L
A
T
E
X template
Weak signal detection and identification in large data sets: a review of methods and applications 47
[90] Bureau, C., Ottawa, NOTEONCOPYRIGHTANDUSAG, E.: Strategic
early warning for criminal intelligence theoretical framework and sentinel
methodology. (2007)
[91] Rastogi, N., Trivedi, D.M.K.: PESTLE TECHNIQUE – A TOOL TO
IDENTIFY EXTERNAL RISKS IN CONSTRUCTION PROJECTS.
International Research Journal of Engineering and Technology (IRJET)
03(01), 5 (2016)
[92] Yuksel, I.: Developing a multi-criteria decision making model for pestel
analysis. International Journal of Business and Management 7(2012).
https://doi.org/10.5539/ijbm.v7n24p52
[93] Saidi, O.B.B., Tebourski, W.: Formal concept analysis based association
rules extraction. CoRR abs/1209.3943 (2012) arXiv:1209.3943
[94] MENGUY, T.: Utilisation d’analyse de concepts formels pour la ges-
tion de variabilite d’un logiciel configur´
E dynamiquement. PhD thesis,
ECOLE POLYTECHNIQUE DE MONTREAL (Juin 2014)
[95] Messai, N.: Formal Concept Analysis guided by Domain Knowl-
edge: Application to genomic resources discovery on the Web. The-
ses, Universit´e Henri Poincar´e - Nancy I (March 2009). https://tel.
archives-ouvertes.fr/tel-00446548
[96] De Alburquerque Melo, C.: Real-time distributed computation of formal
concepts and analytics. Theses, Ecole Centrale Paris (July 2013). https:
//tel.archives-ouvertes.fr/tel-00966184
[97] KISTI, G.T.B. GTB: (2010). http://radar.ndsl.kr/
[98] Brynielsson, J., Horndahl, A., Johansson, F., Kaati, L., M˚artenson, C.,
Svenson, P.: Harvesting and analysis of weak signals for detecting lone
wolf terrorists. Security Informatics 2, 11 (2013). https://doi.org/10.
1186/2190-8532-2-11
[99] Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent
Semantic Indexing: A Probabilistic Analysis. Journal of Computer and
System Sciences 61(2), 217–235 (2000). https://doi.org/10.1006/jcss.
2000.1711
[100] Henzinger, M.R.: Hyperlink analysis for the Web. IEEE Internet Com-
put. 5(1), 45–50 (2001). https://doi.org/10.1109/4236.895141. Accessed
2021-02-25
[101] Abbasi, A., Chen, H.: Affect intensity analysis of dark web forums. In:
2007 IEEE Intelligence and Security Informatics, pp. 282–288 (2007).
Springer Nature 2021 L
A
T
E
X template
48 Weak signal detection and identification in large data sets: a review of methods and applications
https://doi.org/10.1109/ISI.2007.379486
[102] Friendly, F.: Jaro–winkler distance improvement for approximate string
search using indexing data for multiuser application. Journal of
Physics: Conference Series 1361, 012080 (2019). https://doi.org/10.
1088/1742-6596/1361/1/012080
[103] Cahen, P.: Signaux Faibles, Mode D’emploi: D´eceler les Tendances,
Anticiper les Ruptures. Eyrolles-´
Ed. d’Organisation, Paris (2010)
[104] Humbert, L.: Les Signaux Faibles et la Veille Anticipative Pour les
ecideurs : M´ethodes et Applications. Collection Business, ´economie et
soci´et´e. Herm`es science publ. Lavoisier, Paris (impr. 2011, 2011)
[105] Mendon¸ca, S., Cunha, M.P.e., Kaivo-oja, J., Ruff, F.: Wild cards, weak
signals and organisational improvisation. Futures 36(2), 201–218 (2004).
https://doi.org/10.1016/S0016-3287(03)00148-4
[106] Saritas, O., Smith, J.E.: The big picture trends, drivers, wild cards,
discontinuities and weak signals. Futures 43(3), 292–312 (2011). https:
//doi.org/10.1016/j.futures.2010.11.007. Accessed 2021-03-14
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Article
Effective risk management can bring greater rewards to project performance by enhancing productivity and reducing the impact of threats. Risk Identification is thus the first step in Risk Management Process. The objective of this article is to identify external risks associated with any construction project by using " PESTLE Technique " thus minimizing their impact on project objectives before they actually occur. PESTLE Technique is a strategic management technique which can be used effectively in external risk identification process of Risk Management Plan. It identifies risks under various subgroups under broad headings of Political , Economical , Social , Technological , Legal , Environmental. Internal Risks are easy to identify as plenty of past data for similar type of projects is available with company. It is the external risks which are the beyond the control of company and about which less data is available that makes construction projects vulnerable to failure or incurring heavy losses in monetary terms. As every stakeholder of project want high returns on the invested money it is necessary to identify the risks associated and to take the appropriate measures to mitigate them , so the project will meet all objectives and give better returns on the money invested. PESTLE Analysis works well in conjunction with SWOT Analysis as it helps in identifying internal risks associated with project. This article discusses PESTLE meaning , its historical background , its various forms , step by step method to identify the risks , its advantages and disadvantages , and finally conclusion .
Full-text available
Article
Stochastic resonance is a new type of weak signal detection method. Compared with traditional noise suppression technology, stochastic resonance uses noise to enhance weak signal information, and there is a mechanism for the transfer of noise energy to signal energy. The purpose of this paper is to study the theory and application of weak signal detection based on stochastic resonance mechanism. This paper studies the stochastic resonance characteristics of the bistable circuit and conducts an experimental simulation of its circuit in the Multisim simulation environment. It is verified that the bistable circuit can achieve the stochastic resonance function very well, and it provides strong support for the actual production of the bistable circuit. This paper studies the stochastic resonance phenomenon of FHN neuron model and bistable model, analyzes the response of periodic signals and nonperiodic signals, verifies the effect of noise on stochastic resonance, and lays the foundation for subsequent experiments. It proposes to feedback the link and introduces a two-layer FHN neural network model to improve the weak signal detection performance under a variable noise background. The paper also proposes a multifault detection method based on the total empirical mode decomposition of sensitive intrinsic mode components with variable scale adaptive stochastic resonance. Using the weighted kurtosis index as the measurement index of the system output can not only maintain the similarity between the system output signal and the original signal but also be sensitive to impact characteristics, overcoming the missed or false detection of the traditional kurtosis index. Experimental research shows that this method has better noise suppression ability and a clear reproduction effect on details. Especially for images contaminated by strong noise (D = 500), compared with traditional restoration methods, it has better performance in subjective visual effects and signal-to-noise ratio evaluation. 1. Introduction The principle of stochastic resonance applied to weak signal enhancement detection has practical application value, and it is also a new technology, which can realize the state monitoring and early fault diagnosis of electromechanical equipment, and has important economic significance for ensuring the reliable operation of electromechanical equipment and preventing malfunction. However, the working environment of electromechanical equipment is very harsh. The early weak faults are accompanied by complex mechanical interference and environmental noise, making early fault detection difficult. Therefore, it can be said that the monitoring of the operation of mechanical equipment has always been the current fault detection, being one of the hotspots and difficulties. The application of adaptive stochastic resonance principle for weak signal enhancement detection is a new technology with practical application value, and many foreign scholars have studied it. For example, Zhi-Hui adopts adaptive stochastic resonance technology under short data set conditions. The weak sine signal in strong noise is extracted, and it is also proposed that this method can also detect chirp, pulse amplitude modulation, frequency shift keying, and pulse width modulation signals under white noise [1]. Hongyan proposed the concept of adaptive stochastic resonance. Its essence is to adopt the method and theory of noise adaptive adjustment of stochastic resonance. Its goal is to solve the best noise level and which signal and which type of noise produce the best. “Stochastic resonance” problem, noise adjustment method, and theory of adaptive stochastic resonance have greatly promoted the development of stochastic resonance theory and application [2]. Arathi and Rajasekar analyzed the stochastic resonance phenomenon of a symmetric three-well system at different potential well depths. Huiqing Zhang et al. studied a three-potential well logic SR system in a non-Gaussian noise environment. Without changing the system characteristics, the system can generate all logic states, and whether it is multiplicative noise or additive noise, it will affect the potential. Function shape has a big impact [2]. Compared with the traditional method, the difference is that stochastic resonance uses noise to enhance useful signals in a nonlinear system and then realizes weak signal enhancement detection, so this detection method can well retain the details of the signal, and there is no damage to the characteristic signal. It provides a new method for detecting fault characteristics. In the research carried out by Chinese scholars, Blankenburg proposed an adaptive stochastic resonance method based on ant colony algorithm, which uses the multiparameter parallel ability of ant colony algorithm to obtain the best matching state of the system, and has powerful processing functions. The faint signal in the noise has been validated by simulation test and engineering application of early fault diagnosis of locomotive bearings [3]. Park designed a bistable adaptive stochastic resonance system. The system uses a linear random search algorithm to change the barrier height by fixing the system parameter b and adjusting the system parameter to obtain the best state of the stochastic resonance system. To extract weak periodic signals, the author also successfully used this method to extract weak fault signals from switching power supplies [4]. Dettmer designed the bistable stochastic resonance based on parameter adjustment and its adaptive control hardware circuit system, which verified the feasibility of the adaptive stochastic resonance method in practical engineering measurement. The above adaptive stochastic resonance method mainly optimizes a single parameter, but this ignores the interaction between the parameters. When the fixed parameter selection is not appropriate, the optimal resonance state cannot be achieved [5]. This paper studies the influence of nonlinear bistable system parameters on stochastic resonance and the characteristics of bistable stochastic resonance circuit and its Multisim simulation realization. Based on this research, an adaptive control bistable system parameter system is realized through the microcontroller AVR single-chip microcomputer to realize stochastic resonance; the hardware circuit of the adaptive stochastic resonance system is built, including bistable stochastic resonance circuit, A/D conversion circuit, voltage polarity conversion circuit, and power supply circuit. Finally, the signal detection system is in the laboratory. Experimental tests were carried out in the environment, which verified that the signal detection system can effectively realize weak signal detection under a specific noise background. 2. Weak Signal Detection Theory and Application Based on Stochastic Resonance Mechanism 2.1. Basic Theory of Stochastic Resonance Stochastic resonance is a nonlinear dynamic phenomenon produced by the synergy of the three basic elements of weak periodic signal, noise, and nonlinear system. It can effectively realize the transfer of noise energy to signal energy instead of simply suppressing noise. This phenomenon provides a powerful means for people to use the stochastic resonance theory to obtain weak signal information from the background of noise [6, 7]. 2.1.1. Stochastic Resonance Model of Bistable System In the study of stochastic resonance, the nonlinear bistable system described by the Langevin equation is a classic model for studying stochastic resonance, which can be expressed as follows: In formula (1), is the weak periodic signal to be measured with amplitude A and frequency and is the potential function of the bistable system: In the formula, a and b are the structural parameters of the bistable system and a>0 and b>0. 2.1.2. Immersion and Elimination Theory Since the adiabatic approximation condition is the hypothesis of the theory, the theory is also called the adiabatic approximation theory. The adiabatic approximation condition is that the input weak periodic signal and noise intensity are both very small; that is, A ≪ 1, f≪ 1, D ≪ 1. At this time, x = 0 divides the entire x area into two parts (−∞, 0) and (0, ∞). The probability distribution in the corresponding area is In the formula, . 2.2. Basic Bistable Stochastic Resonance Circuit and Its Multisim Simulation Since the operational amplifier can easily implement some common mathematical operations with peripheral resistance-capacitance components, the bistable circuit system can be built through the operational amplifier. For stochastic resonance systems, the following Langevin equation can be used to describe In the formula, is the periodic signal and is the white noise. The sum of the two can be regarded as the measured engineering signal. In order to realize the abovementioned first-order differential equation, the operational amplifier integrator is used to realize it; that is, formula (4) becomes It can be seen from formula (5) that Langevin’s equation is composed of an integral part, a multiplication part, and an addition part. The design of electronic circuits can be realized by the corresponding integration circuit, proportional multiplication circuit, and addition circuit. 2.3. Overall Hardware Design Scheme In order to realize the adaptive control and adjustment of the parameters of the bistable system, this article uses the AVR single-chip microcomputer to drive and control the digital potentiometer AD7376, adjust the resistance divider ratio in the bistable stochastic resonance circuit through the code of AD7376, and then change the bistable system parameters a and b, making the system produce stochastic resonance [8, 9]. The system is mainly composed of AD7376 bistable stochastic resonance circuit, A/D conversion circuit, voltage polarity conversion circuit, power supply circuit, and AVR single-chip microcomputer module. The specific implementation steps of the system are as follows:(1)Input the signal to be measured with noise into the stochastic resonance circuit system(2)The signal after passing through the system is a bipolar signal, which requires voltage polarity conversion(3)Converted into 0∼5 V voltage signal by voltage polarity conversion circuit(4)A/D conversion of the conditioned signal(5)The AVR single-chip computer samples and analyzes the converted signal(6)The AVR single-chip computer determines whether the system is in resonance through the resonance determination program(7)Enable adaptive search algorithm to drive and adjust AD7376 to achieve system resonance 2.4. Stochastic Resonance Mechanism in the Nervous System The Hodgkin-Huxley (HH) neuron model, because it first satisfies the electrophysiological characteristics of the neuron, can better describe the relationship between the membrane potential and the membrane current in the axon of the neuron, and it conforms to the law of generation and transmission of action potentials. It is a classic model of excitable cells. The FitzHugh–Nagumo model, as a simplification of the H-H neuron model, is a simplified typical model of excitable neurons and has received extensive attention in the field of neural engineering [10, 11]. 2.4.1. FitzHugh–Nagumo Neuron Model This model can describe many characteristics of nerve and myocardial fiber electrical impulses, such as the existence of excitation threshold, relative and absolute recovery periods, and the generation of pulse trains under the action of external currents [12, 13]. At the same time, because the FHN neuron model is a simplification of the H-H neuron model, it is a simple two-variable form, retaining the main characteristics of the excitable nerve cell regeneration excitation mechanism, and is widely used to study the properties of the excitable system and the law of spiral waves. The model can be described by the equations as follows:where represents the rapidly changing neuron membrane voltage; is the slowly changing recovery variable; ε is the time constant, which determines the firing rate of the neuron; is the critical value, prompting the neuron to fire regularly; B is the average signal level and the difference of ; a and b are the equation constants; is the external current input. = S (t) + ξ (t), where S (t) is the input signal and ξ (t) is the noise. 2.4.2. Stochastic Resonance Evaluation Method The evaluation indicators of stochastic resonance currently mainly include signal-to-noise ratio, mutual information rate, and cross-correlation coefficient. Among them, the traditional signal-to-noise ratio can be used to measure the stochastic resonance effect of periodic excitation; mutual information rate, as a quantitative tool to describe the degree of information association in information theory, has been widely used in the study of periodic and nonperiodic stochastic resonance phenomena. The cross-correlation coefficient method is used to study the phenomenon of nonperiodic stochastic resonance to measure the matching relationship between the input and output signals of the system [14, 15]. The above evaluation methods are all quantitative stochastic resonance from their own perspectives. Although the evaluation results are not the same, they all have certain rationality. Therefore, this paper selects the signal-to-noise ratio and the mutual information rate as the evaluation indicators of the stochastic resonance effect under periodic excitation and selects the mutual information rate and the cross-correlation coefficient as the evaluation indicators of the stochastic resonance effect under the nonperiodic excitation, in order to evaluate the stochastic resonance from multiple angles. The effect is judged [16, 17]. The signal-to-noise ratio is a traditional method of evaluating stochastic resonance, which is based on the analysis of the neuron response signal spectrum. The signal-to-noise ratio is defined as Among them, S and B, respectively, represent the peak value of the response signal and the intensity of the floor noise at the corresponding input periodic excitation frequency in the power spectral density, and the unit of the signal-to-noise ratio is decibel (dB). Calculating the power spectral density is the result of the accumulation and averaging of several power spectral densities; that is, several data segments of the same time length are obtained under the same input stimulus signal and noise intensity, and the power spectral density is calculated separately for each data segment. Then, after accumulating and averaging the power spectral density of all data segments, the desired result can be obtained. This can weaken the irregular fluctuations in the power spectral density graph, thereby reflecting an average frequency characteristic of the neuron response signal over a long period of time [18, 19]. 2.5. Adaptive Stochastic Resonance Process of Weak Shock Signal Based on Knowledge-Based Particle Swarm Algorithm Based on the knowledge of basic characteristics or principles, combined with particle swarm algorithm, a knowledge-based particle swarm algorithm adaptive stochastic resonance detection method for weak shock signals is constructed. The core of this method to improve optimization is that when the stochastic resonance particles do not transition, the fitness value of the individual particles in the particle swarm algorithm is evaluated according to the adopted kurtosis index, and when the stochastic resonance particles transition, the individual in the particle swarm algorithm is directly. The fitness value of the particle is assigned zero, eliminating the time-consuming process of calculating the output of the stochastic resonance system through the Runge–Kutta method and evaluating the fitness value of the particle according to the kurtosis index [20, 21]. For the detection of weak pulse signals, the knowledge-based particle swarm algorithm is used to synchronously optimize the structural parameters a and b of the nonlinear bistable stochastic resonance system, and the fitness function is the kurtosis index. The following describes the necessary points of this method.(1)Judgment of transition of resonant particles: when the particles of the bistable system transition, the value of the system output x (t) changes from positive to negative or from negative to positive. Assuming that the number of sampling points is N, the output is the number of positive values accumulated and recorded as Np. When Np satisfies , it is considered that the system has transitioned.(2)Reference barrier Ul: when the potential barrier is equal to this value, the particles of the stochastic resonance system can transition. Obviously, when the potential barrier is less than the reference barrier value, the particles of the stochastic resonance system can also transition [22].(3)The study found that the stochastic resonance appears near the lowest point of a certain potential barrier when detecting the shock signal. Therefore, the detected shock signal will have abrupt changes near the initial point until the system keeps up with the change of the signal, change near the lowest point of a potential barrier. Therefore, the value near the initial point should be avoided when calculating the relevant indicators, and the calculation of the previous i = 50 points should be avoided in the article. 2.6. Measures of Stochastic Resonance 2.6.1. SNR and SNR Gain The signal-to-noise ratio and signal-to-noise ratio gain occupy an extremely important position in the stochastic resonance measurement index. The definition of signal-to-noise ratio is the ratio of the amplitude of the system output signal frequency to the background noise of the same frequency.where is the signal power spectral density; is the noise power spectral density near the signal frequency. 2.6.2. Symbol Sequence Entropy Method The signal-to-noise ratio and the signal-to-noise ratio gain need to have a good estimate of the characteristic signal when describing stochastic resonance. Because it is difficult to estimate the signal-to-noise ratio in practice, the signal-to-noise ratio is based on the amplitude at the characteristic frequency and the same frequency. The ratio of background noise is local, and the output signal cannot be measured as a whole [23, 24]. However, the residence time distribution is very complicated, not easy to quantify, and difficult to achieve in engineering applications. In order to solve the above problems, symbol sequence entropy is proposed as a measure of stochastic resonance output, and an adaptive stochastic resonance system is designed to detect weak signals. Symbol sequence analysis is a “coarse-grained” process by turning a data sequence of multiple possible values into a symbol sequence with only a few different values. This process can capture signals with a deterministic structure, especially the period, or modulation signal characteristics, so it is suitable for mechanical failure signal analysis. The above comparison shows that, for a signal with a deterministic structure, its sequence entropy must be a certain value to some extent, while for a signal containing noise or a pure noise signal, its entropy must be greater than or close to 1. Through the above analysis, we know that, in the process of detecting weak stochastic resonance signals and adjusting the system parameters a and b to make the system output tend to be the best output, the entropy value of the system output signal will tend to the entropy value of the original pure signal. At this time, the output signal-to-noise ratio of the system will reach the maximum, and the amplitude of the detected weak signal will also reach the maximum; that is, the conversion of noise energy to the frequency of the periodic driving force reaches the maximum. 2.7. Stochastic Resonance Dynamic Behavior Mechanism Under certain conditions, increasing the noise intensity can make the output of the stochastic resonance system better highlight the components of the original input signal. This article discusses in depth the dynamic behavior of stochastic resonance systems under the combined action of signal and noise and then analyzes the mechanism of stochastic resonance from a qualitative perspective. 2.7.1. Transition Behavior Led by Large-Value Incentives Because there is a process greater than the system transition threshold in the excitation signal, a transition process occurs. We summarize this process as the behavior of “large value leading the transition”; that is, if the signal added to the system has a process beyond the transition threshold, then this process will lead to another subsequent process within the transition threshold. The output of the latter follows the former to occur near the same side of the attractor curve. This nature is very interesting. It can be illustrated with an image metaphor: for example, if there are two identical teams, due to the different nature of the “leader brother,” the nature of the two teams is also different. Moreover, image processing based on stochastic resonance effectively utilizes this property to highlight the information of certain characteristic patterns. 2.7.2. Subinterference and Major Interference Caused by Noise When the value of the noise-containing signal is between the horizontal lines A and B, the system currently has two attractor curves. At this time, the system will only firmly pull the moving points that belong to its own attraction domain to its side, and it is impossible to move it. The moving point in the other attraction domain is also attracted to its side, so the state of the system tends to be stable at this time. Since the movement trend of the moving point at this moment is highly consistent with the attractor at all times, this type of interference will not induce a transition. In other words, when the signal amplitude of the noise is within the system threshold, these interferences cannot induce the system to produce transition behaviors. We call this type of interference “subinterference.” Similarly, considering that Gaussian white noise has zero mean, in the absence of transitions, these disturbances are often oscillating motions that make the moving point back and forth near the equilibrium point; even during the transition, due to its insufficient motion amplitude to counter the big interference, this type of subinterference will not change the transition trend. At the same time, we still need to point out that, for systems with high noise intensity, there will be a large proportion of the interference amplitude with a large value, and their effect is likely to make the moving point occur easily and directly. We call this type of interference “overinterference”; this type of overinterference can easily lead to the phenomenon of “over-stochastic resonance.” 3. Experimental Research on Weak Signal Detection Based on Stochastic Resonance Mechanism 3.1. Design and Manufacture of Noise Sources During the experiment of stochastic resonance circuit, a noise source with variable noise intensity is needed to provide noise to the system and simulate the background noise of actual engineering signals. This text uses AT89C4051 one-chip computer system to realize a Gaussian white noise source that can meet the experimental requirement. The smallest system module of the single-chip microcomputer provides an executable program platform for the single-chip microcomputer. The single-chip microcomputer generates pseudorandom numbers that obey the Gaussian distribution; the D/A conversion module converts the pseudorandom numbers into Gaussian noise voltages with fixed variance; then we need to process the waveform adjustment module. The DC quantity and amplitude are adjusted and output. 3.2. Experimental Method 3.2.1. Initialization Estimate the system parameters a, b, and according to the measured periodic signal and background noise intensity, and set initial values for a and b; try to make the barrier height as small as possible for subsequent adjustments; set the division threshold for symbol sequence conversion , to prepare for subsequent symbolization. This article uses the most common threshold that is equal to zero. 3.2.2. Symbolization of Time Series The mechanical vibration signal collected by the sensor is used as the input of the bistable stochastic resonance system, and the fourth-order Runge–Kutta method is used to solve the Langevin equation to obtain the output signal. The calculation step takes the reciprocal of the sampling period of the signal to obtain the time series and discretize it into a sequence ; set an appropriate time interval Δt, and filter through a threshold function according to the set threshold , and convert the time series into a symbol sequence s (n). 3.2.3. Calculating the Improved Shannon Information Entropy Determine the symbol sequence length L, generally 3-8 in engineering. This paper uses L = 5, Δt = 0.001 to generate a short sequence ; sequence the symbol sequence to form decimal sequence s (L) based on ten. Calculate the probability of the symbol sequence code, and calculate Shannon information entropy Hs according to the formula. 3.3. Establishing a Model Evaluation Index System The evaluation index is a specific evaluation item determined according to some evaluation goals, which can reflect some basic characteristics of the evaluation object. The index is specific and measurable, and it is the observation point of the goal. Definite conclusions can be drawn through actual observation of the object. Generally speaking, the evaluation index system includes three levels of evaluation indexes: they are the relationship between gradual decomposition and refinement. Among them, the first-level evaluation index and the second-level evaluation index are relatively abstract and cannot be used as a direct basis for evaluation. The third-level evaluation indicators should be specific, measurable, and behavior-oriented and can be used as a direct basis for teaching evaluation. 3.4. Determining the Evaluation Weight The index weight is a numerical index indicating the importance and function of the index. In the indicator system of the evaluation plan, the weight of each indicator is different. Even if the indicator level is the same, the weight is different. Index weight is also called weight and is usually represented by a. It is a number greater than zero but less than 1, and the sum of the weights of all first-level indicators must be equal to 1, that is, satisfying the conditions 0 < a<1 and ∑a−1. 4. Experimental Research and Analysis of Weak Signal Detection Based on Stochastic Resonance Mechanism 4.1. Relationship between System Structural Parameters and Stochastic Resonance Suppose signal , sampling frequency = 5 Hz, sampling time t = 500 s, signal frequency = 1 Hz, signal amplitude A = 0.06, noise is Gaussian with intensity D = 2, white noise, and the number of sampling points is 3000. Next, analyze the changes of system parameters and the evolution process of the system entering the stochastic resonance state, and use simulation experiments to illustrate the changes in the time domain waveform and frequency spectrum of the output signal of the stochastic resonance bistable system as the system parameters a and b change. The experimental results are shown in Figure 1.
Full-text available
Article
Organizations, companies and start-ups need to cope with constant changes on the market which are difficult to predict. Therefore, the development of new systems to detect significant future changes is vital to make correct decisions in an organization and to discover new opportunities. A system based on business intelligence techniques is proposed to detect weak signals, that are related to future transcendental changes. While most known solutions are based on the use of structured data, the proposed system quantitatively detects these signals using heterogeneous and unstructured information from scientific, journalistic and social sources, applying text mining to analyze the documents and natural language processing to extract accurate results. The main contributions are that the system has been designed for any field, using different input datasets of documents, and with an automatic classification of categories for the detected keywords. In this research paper, results from the future of remote sensors are presented. Remote sensing services are providing new applications in observation and analysis of information remotely. This market is projected to witness a significant growth due to the increasing demand for services in commercial and defense industries. The system has obtained promising results, evaluated with two different methodologies, to help experts in the decision-making process and to discover new trends and opportunities.
Full-text available
Article
Since the start of the new Korean government in 2017, the Korean nuclear energy system has undergone a major change. This change in national energy policy can be forecasted by analyzing social big data. This study verifies whether future forecasting methodologies using weak signals can be applied to Korean nuclear energy through text mining the data of web news between 2005 and 2018, comparing and applying the methodology to notable events (i.e., the UAE nuclear power plant (NPP) contract and nuclear phase-out). In addition, we predict what changes will be made in the Korean nuclear energy system post-2019. Keywords extracted through text mining were quantitatively classified into a weak signal or a strong signal using a Keyword Emergence Map (KEM) and a Keyword Issue Map (KIM). The extracted keywords predicted the contract of the UAE NPPs in 2009 and nuclear phase-out in 2017. Furthermore, keywords revealing future signals beyond 2019 were found to be ‘nuclear phase-out’ and ‘wind energy’. The weak-signal methodology can be applied as a tool to predict future energy trends during the current circumstance of the rapidly changing world energy market.
Full-text available
Article
In this paper, we focus on data-driven approaches to human activity recognition (HAR). Data-driven approaches rely on good quality data during training, however, a shortage of high quality, large-scale, and accurately annotated HAR datasets exists for recognizing activities of daily living (ADLs) within smart environments. The contributions of this paper involve improving the quality of an openly available HAR dataset for the purpose of data-driven HAR and proposing a new ensemble of neural networks as a data-driven HAR classifier. Specifically, we propose a homogeneous ensemble neural network approach for the purpose of recognizing activities of daily living within a smart home setting. Four base models were generated and integrated using a support function fusion method which involved computing an output decision score for each base classifier. The contribution of this work also involved exploring several approaches to resolving conflicts between the base models. Experimental results demonstrated that distributing data at a class level greatly reduces the number of conflicts that occur between the base models, leading to an increased performance prior to the application of conflict resolution techniques. Overall, the best HAR performance of 80.39% was achieved through distributing data at a class level in conjunction with a conflict resolution approach, which involved calculating the difference between the highest and second highest predictions per conflicting model and awarding the final decision to the model with the highest differential value.
Full-text available
Article
Word searching method has been developed in many ways and named as: Hamming Distance, Jaccard Distance, Jaro Distance, Jaro-Winkler Distance, Levenshtein Distance, etc. Those methods are used for lexicographic comparison to find words according to the similarity of the words which searched. The time needed for searching by using these words distance method can cause overhead as some difference user might try to search the same words all over. If these method is used in a multi user application where the user generally searching for some keywords repeatedly, then the user might have a longer searching time compared to exact search. In spite of this problem, we try to propose a method where the first search result of the previous user, will be recorded to the database for future usage by indexing the search keywords. In order to try this method, we use Jaro-winkler Distance method to search words. From the test result show that combining indexing and similarity word searching by using Jaro-Winkler Distance method can decrease the searching time to 90-92% compared to just using the Jaro-Winkler Distance method only. As the searched data increased, the processing time can be shorten.
Full-text available
Article
Companies and governmental agencies are increasingly seeking ways to explore emerging trends and issues that have the potential to shape up their future operational environments. This paper exploits text mining techniques for investigating future signals of the land administration sector. After a careful review of previous literature on the detection of future signals through text mining, we propose the use of topic models to enhance the interpretation of future signals. Findings of the study highlight the large spectrum of issues related to land interests and their recording, as nineteen future signal topics ranging from climate change mitigation and the use of satellite imagery for data collection to flexible standardization and participatory land consolidations are identified. Our analysis also shows that distinguishing weak signals from latent, well-known, and strong signals is challenging when using a predominantly automated process. Overall, this study summarizes the current discourses of the land administration domain and gives an indication of which topics are gaining momentum at present.
Full-text available
Article
As the contemporary phenomenon of school bullying has become more widespread, diverse, and frequent among adolescents in Korea, social big data may offer a new methodological paradigm for understanding the trends of school bullying in the digital era. This study identified Term Frequency-Inverse Document Frequency (TF-IDF) and Future Signals of 177 school bullying forms to understand the current and future bullying experiences of adolescents from 436,508 web documents collected between 1 January 2013, and 31 December 2017. In social big data, sexual bullying rapidly increased, and physical and cyber bullying had high frequency with a high rate of growth. School bullying forms, such as “group assault” and “sexual harassment”, appeared as Weak Signals, and “cyber bullying” was a Strong Signal. Findings considering five school bullying forms (verbal, physical, relational, sexual, and cyber bullying) are valuable for developing insights into the burgeoning phenomenon of school bullying.