Conference PaperPDF Available

OSINT is the next Internet goldmine: Spain as an unexplored territory


Abstract and Figures

Phenomenons like Social Networks, Cloud Computing or Internet of Thing are unknowingly generating unimaginable quantities of data. In this context, Open Source Intelligence (OSINT) exploits such information to extract knowledge that is not easily appreciable beforehand by the human eye. Apart from the political, economic or social applications OSINT may bring, there are also serious global concerns that could be covered by this paradigm such as cyber crime and cyber threats. The paper at hand presents the current state of OSINT, the opportunities and limitations it poses, and the challenges to be faced in the future. Furthermore, we particularly study Spain as a potential beneficiary of this powerful methodology.
Content may be subject to copyright.
acionales de
nvestigación en
Cáceres 5-7 de junio
Editores: Andrés Caro Lindo, Luis Javier García Villalba, Ana Lucila Sandoval Orozco
Universidad de Extremadura
Servicio de Publicaciones
Actas de las
VJornadas Nacionales de Investigaci´
on en
Ciberseguridad (JNIC 2019)
Junio 5–7, 2019
aceres, Espa˜
es Caro Lindo, Luis Javier Garc´
ıa Villalba, Ana Lucila Sandoval Orozco
Universidad de Extremadura. Servicio de Publicaciones
OSINT is the next Internet goldmine:
Spain as an unexplored territory
Javier Pastor-Galindo, Pantaleone Nespoli, F´
elix G´
omez M´
armol, and Gregorio Mart´
ınez P´
Department of Information and Communications Engineering,University of Murcia, 30100 Murcia, Spain
Email: {javierpg, pantaleone.nespoli, felixgm, gregorio}
Abstract—Phenomenons like Social Networks, Cloud Comput-
ing or Internet of Thing are unknowingly generating unimagin-
able quantities of data. In this context, Open Source Intelligence
(OSINT) exploits such information to extract knowledge that is
not easily appreciable beforehand by the human eye. Apart from
the political, economic or social applications OSINT may bring,
there are also serious global concerns that could be covered by
this paradigm such as cyber crime and cyber threats. The paper
at hand presents the current state of OSINT, the opportunities
and limitations it poses, and the challenges to be faced in the
future. Furthermore, we particularly study Spain as a potential
beneficiary of this powerful methodology.
Index Terms—OSINT, Cyber security, Cyber defense, Cyber
intelligence, Spain, Law Enforcement Agencies, Threat Intelli-
Open Source Intelligence (OSINT) embraces a set of tech-
niques collecting information from different open sources
(e.g., legally available documents, social networks, public
activities of states, companies and society, etc.) in order
to infer knowledge to be used for a specific purpose [24].
Although it might seem to be a novel paradigm, it has actually
been around for a long time. For instance, during the World
War II the radio was snooped to spy the adversaries. Already
in the year 1941, the Foreign Broadcast Information Service
(FBIS) was created by the USA to gather public information
from other countries. Even during the Cold War the Soviet
Union, China and other countries used OSINT through the
exploration of public documents and technical information of
foreign developments [18].
Traditional OSINT was conducted in a manual fashion in
the sense that it was necessary to have analysts in charge
of collecting public data and analyzing it in order to extract
knowledge. However, the current era of the information has
provoked that such a growing and huge amount of data is
available on the Internet [23]. As a consequence, original OS-
INT processes become ineffective with this modern demanding
conditions. This issue motivates the development of innovative
tools for automating the collection and analysis of data.
Nowadays, OSINT is widely used by governments and
intelligence services to conduct their investigations [1]. Nev-
ertheless, it is not only utilised for state affairs, but also
new research lines are taking advantage of this paradigm for
many goals. Indeed, actual research works tend to follow three
Corresponding author
Cybercrime and
Organized Crime
Spot illegal actions
Retrieve suspicious traces
Monitor malicious groups
Cyber Security and
Cyber Defense
Foot printing
Forensics analysis
Proactive auditing
Social Opinion and
Sentiment Analysis
Political campaigns
Disaster management
Fig. 1. OSINT principal use cases
main applications which are represented in Figure 1 and are
described next:
Social opinion and sentiment analysis: Related to the
boom of social networks, it is possible to collect user’s
interactions, messages, interests and preferences to ex-
tract non-explicit knowledge. Such collection and anal-
ysis could be applied to marketing, political campaigns,
disaster management or even cyber defense [3].
Cybercrime and organized crime: The open data is con-
tinuously analyzed and matched by OSINT processes in
order to spot criminal intentions at an early stage. Taking
into account adversaries’ patterns and relationships be-
tween felonies provides to security forces an opportunity
to promptly detect illegal actions [16].
Cyber security and cyber defense: Information and Com-
munication Technology systems are continuously at-
tacked by criminals [12]. Research becomes hence crucial
to defend ourselves from cyber attackers, concretely by
facing the challenges that are still open in the field of
cyber security [10]. In this sense, data sciences are not
only being applied to the footprinting in pentestings, but
also to the preventive protection of the organizations and
companies. Concretely, by performing analysis of daily
on II: Monitorizaci´
on de eventos de seguridad
102 JNIC 2019
attacks, correlating them and supporting decision making
for an effective defense, but also for a prompt reaction
[19]. In the same way, OSINT can be also considered in
this context as a source of information for tracebacks and
Additionally, it is important to note that the utilization of
public data has also compromising issues. There is a strong
ethical component which is linked to the user’s privacy. In
particular, the profiling of people [15] could reveal personal
details such as their political preference, sexual orientation or
religious beliefs.
This article addresses the current state of OSINT since, to
the best of our knowledge, there is no published work that
integrates the recent advances of this paradigm, the opportu-
nities it offers and the existing facilities to support OSINT
processes. Specially, we study the spread and employment of
OSINT techniques in Spain.
Furthermore, our purpose is to stimulate researches and
advances in OSINT. As we have seen so far, OSINT is a
promising mechanism that concretely improves the traditional
cyber intelligence and cyber defense fields. However, there is
still a long way to go to explore in this topic, and this article
presents some future lines of research.
The remainder of this paper is organized as follows. Section
II offers a review of recent research works in the field of
OSINT. Section III discusses the motivation, pros and cons of
the development of OSINT. Then, Sections IV and V describe
some techniques and tools that facilitate searches through
open data sources. Section VI contextualizes OSINT in Spain
by describing some evidences of its usage and presenting
certain Spanish public databases. Section VII poses some open
challenges relative to research in OSINT. Finally, Section VIII
concludes with some keys remarks, as well as future research
In recent years, with the advances of big data and data
mining techniques, the research community has noticed that
open data is a powerful source of analyzing social behaviors
and obtaining relevant information [4].
With regards to the use of OSINT for extracting social
opinion and emotions, Santarcangelo et al. [22] proposed a
model for determining user opinions about a given keyword
through social networks, specifically studying the adjectives,
intensifiers and negations used in tweets. Unfortunately, it is
a simple keyword-based solution only designed for Italian
language not taking into account semantic issues. On the
other hand, Kandias et al. [14] could relate people’s usage
of social networks (in particular, Facebook) to their stress
level. However, the experiments were carried out only with
405 users, while nowadays there is a chance of processing
much larger amounts of data.
In the context of cybercrime and organized crime, there
are several works that explore the application of OSINT for
criminal investigations. For example, OSINT could increase
the accuracy of persecutions and arrests of culprits with frame-
works like the proposed by Quick et al. in [21]. Concretely,
authors apply OSINT to digital forensic data of a variety of
devices to enhance the criminal intelligence analysis.
In this field, another opportunity that OSINT offers is the
detection of illegal actions as well as the prevention of future
crimes such as terrorist attacks, murders or violations. In fact,
the European projects ePOOLICE [20] and CAPER [2] were
designed to develop effective models for scanning open data
automatically in order to analyze the society and detect emerg-
ing organized crime. In contrast to the previous mentioned
projects, whose proposals were not practically used in real
cases, Delavallade et al. [6] describe a model based on social
networks data that is able to extract future crime indicators.
Such model is then applied to copper theft and to jihadist
propaganda use cases.
From the point of view of cyber security and cyber
defense, OSINT represents a valuable tool for improving our
protection mechanisms against cyber attacks. Pinto et al. [11]
propose the use of OSINT in the Colombian context to prevent
attacks and even to allow strategic anticipation. It includes
not only plugins for collecting information, but also machine
learning models to perform sentiment analysis. Moreover, the
DiSIEM european project [7] maintains as a first goal the
integration of diverse OSINT data sources in current SIEMs
(Security Information and Event Management) to help reacting
to recently-discovered vulnerabilities in the infrastructure or
even predict possible emerging threats. Lee et al. [17] also
designed an OSINT-based framework to inspect cyber security
threats of critical infrastructure networks. However, all these
approaches have not been applied to real world scenarios, thus
their effectiveness remains questionable.
The incredible growth of new technologies, services and
social networks based on the Internet is putting information
on the central axis of the world. In fact, a large part of it is
publicly available, which means that anyone at any time in
any place has access to this data.
Another phenomenon that is going on nowadays is the
evolution from traditional criminal techniques to cybercrime.
Extortion, fraud, identity theft or child exploitation are now
carried out through the network, burglary has become hacking
and fraudulent calls are recently known as phishing.
Fortunately, the good news is that almost every cybercrim-
inal uses the Internet not only for illegal actions, but also
for personal purposes. Leveraging this fact, OSINT seeks to
connect both issues through the analysis of public data to
produce cyber intelligence.
From a technical point of view, as we can see in TABLE I,
OSINT exposes a number of benefits although it has also to
deal with some restrictions. Regarding the positive points, we
could highlight the following:
Huge amount of worthwhile open source data to be
analyzed, crossed and linked [23]. It includes social net-
works, public government documents and reports, online
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 103
Pros Cons
Huge amount of public information Complexity of data management
High capacity of computing Unstructured information
Big data and machine learning Misinformation
Complementary types of data Data sources reliability
Flexible purpose and wide scope Strong ethical/legal considerations
multimedia content, newspapers and even the Deep web
and Dark web, among others. The latest commented
sources are especially interesting for OSINT [13]. Both
the Deep Web and the Dark Web (the latter circumscribed
within the former) contain even more information than
the Surface Web (the Internet known by most users). In
order to be able to access these networks, it is necessary
to use specific tools since their contents are not indexed
by traditional search engines.
Unlike the Surface Web and most of the Deep Web, the
Dark Web offers anonymity and privacy to users who
utilize it. This property makes criminals use this network
to surf, conduct their searches and publish for illegitimate
purposes while hiding their identity. Therefore, the Dark
Web is an ideal source to apply OSINT and fight against
cybercrime, organized crime or cyber threats. On the
other hand, the persecution and de-anonymization of
these people are a challenge for OSINT to work.
Powerful computing capacity to mix large sets of data,
relationships and patterns from different types of open
sources. In particular, it allows the creation of complex
inferences that are naturally unpredictable to humans [9].
Emerging proliferation of big data and data mining tech-
niques, as well as machine learning algorithms, which can
automate and make investigation processes and decision
making more intelligent and efficient [9].
Possibility of completing OSINT with other types of
information [5]. The system’s inherent structure is open
enough to accept also classified data or citizen collabo-
ration within its engine.
Generic implementation that allows different kinds of
targets and several paths of exploration. As a conse-
quence, OSINT applications could monitor suspicious
people or dangerous groups, detect influence profiles
related to radicalization, study worrying trends of the
society, support the relative attribution of cyber attacks
and crimes, etc [1].
However, using open source data also presents disadvan-
tages which need to be considered as well:
The quantity of data is immeasurable and, logically, it is
challenging to handle it efficiently and effectively [8].
The public information available on the Internet is inher-
ently unstructured. This means that the data collected by
OSINT is so heterogeneous that makes it difficult to clas-
sify, link and examine it in order to extract relationships
and knowledge [3].
Social networks and communication media are flooded
with subjective opinions, fake news and canards [3]. For
this reason, the existence of misinformation has to be
considered in the implementation of OSINT mechanisms.
The reliability and authority of the information are indeed
the key to success.
Ideally, the collected data should come from authorita-
tive and reviewed sources (official documents, scientific
reports, reliable communication media) [8]. In practice,
OSINT will also deal with subjective or non-authoritative
sources, as it could be the content of social networks or
manipulated media.
Ethical and legal considerations are fundamental in the
development of OSINT. The results should respect user’s
privacy and not reveal intimate and personal issues [15].
In fact, the scope of the searches should be, by definition,
limited to open data sources.
Since we can not allocate the police within each possible
communication of the world, there is still an opportunity of
using the public data to detect anomaly patterns and malicious
behaviours. How calm would the cybercriminals be if, not only
every single step of their telematic actions, but also their daily
life, were relevant clues for investigators?
As it has been shown, OSINT is quite promising and
powerful, but its implementation is also challenging. Thus
for instance there are several manual techniques that provide
public data to the end user, as we will see next.
A. Search engines
Everyone knows of the existence of Google,Bing or Yahoo
search engines, among others. The traditional use of them is
the simplest way of applying OSINT.
Moreover, services like Google support filters to refine
searchs 1. For instance, the use of “” permits exact-matches,
OR and AND act as logical operators, or *as a wildcard. It also
allows the introduction of conditions like filetype to specify a
certain file type, site to limit results to those from a specific
website, or intitle to find pages with certain keywords within
their title.
It is worth noting that, for example, a search in Google for
DNIs (i.e., Spanish ID cards) within the Region of Murcia
website outputs more than 15,000 results in less than half a
second through the following query: filetype:pdf intext:dni
B. Social networks
Nowadays, services like Facebook or Twitter have invaded
our society. Any curious person has realized that lot of per-
sonal information can be found without advanced knowledge
of these platforms. Thus, these applications offer precise
search possibilities in the context of OSINT.
on II: Monitorizaci´
on de eventos de seguridad
104 JNIC 2019
Facebook permits specific queries by visiting elaborated
URLs. For example,
id/search-token, where facebook-id is the user identificator and
search-token defines the criterion of the search (namely, pages-
liked,photos-liked,places-visited, etc). Twitter not only sup-
ports advanced searches through URLs, but also implements a
user-friendly interface in
Logically, these characteristics can be extended to the rest
of social networks in some way.
C. Other OSINT services
There are other specific websites that offer relevant infor-
mation given a certain kind of input:
Email address: The website returns whether
an email address is valid or not,
informs whether an email address has been hacked and finds information related to the owner of such
email address.
Username: The service checks the availabil-
ity of a given username in social networks or domains.
Real name: Apart from social networks, there are geneal-
ogy sites like FamilySearch or GENi that provide kinship
Location:Google Maps or Wikimapia are well known
sites to find out locations from GPS coordinates. On the
contrary, it is also possible to get the GPS coordinates
from a location name at
IP Address: The service gets the
location from a given IP address, whereas
provides more technical information (whois, reverse IP
lookup, traceroute, etc.).
Domain: It is possible to visualize domain
connections through or Furthermore, checking
DNS and mailservers is also useful, by visiting or There are also
services like and
which calculate traffic statics and others like which search for subdomains.
Finally, the site explores content within
a number of archived domains.
It is clear that, by combining different techniques, it is
possible to produce extremely useful knowledge about any
connected target. Nevertheless, the scope of these resources
have a general purpose and it is limited to specific fields.
For that reason, researchers and developers have implemented
more precise solutions for gathering better quality information.
Fortunately for people conducting OSINT activities, there
are also more sophisticated tools that automate the collec-
tion of public information and infer interesting relationships.
TABLE II presents the main features of the most popular
and relevant open source OSINT tools today. Nevertheless,
a complete view of the variety of OSINT resources can be
displayed on the OSINT framework2.
A. FOCA (Fingerprinting Organizations with Collected
This product3, designed by ElevenPaths, analyzes the meta-
data of documents (Microsoft Office, PDF, Open Office,
etc) available on the Internet. The software finds the hidden
information, unifies it, and recognises the files that have been
created in the same computer, or servers and clients that
could be related to them. The server discovery module also
includes more functionalities like web search, DNS search or
IP resolution.
B. IntelTechniques
IntelTechniques consists in a website4, created by Michael
Bazzel, which offers hundreds of online search utilities. There
are several modules divided by the target data that allow
searching by email, social network profile, real name and user
name, among others, in order to present to the end user the
collected public information. It is a comprehensive tool that
makes use of other simpler techniques, as the ones commented
C. Maltego
Maltego is a well-known application5that finds public
information within different sources about a certain target and
presents it in the form of a directed graph for its analysis.
Specifically, this tool infers advanced relationships (from data
X to data Y) automatically with the so-called transforms.
Although Maltego implements its generic transforms, it is
also possible to implement and include custom ones for more
specific purposes. For example, it would be very interesting
to develop OSINT transforms for the Spanish context in order
to take advantage of the existing open sources of Spain.
D. Metagoofil
Metagoofil is an information gathering tool6that extracts
metadata of the files found for a specific domain or URL
target. It is usually used for pentesting as it is able to reveal
usernames, software versions and servers or machine names.
E. Recon-NG
Recon-NG is a web recognition framework7similar to
Metasploit8which focuses its search depending on the loaded
modules and the introduced input. It could obtain emails of
the organization, locations, information of the administrator
and users, whois information, etc.
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 105
OSINT tools Input Output Extensibility Interface Platform Other features
Data File Data
FOCA 7Domain File type
Metadata 7Program Linux,
Web, DNS
and IP refeed
IP Address
File name,
File type,
File URL
Several Multiple
results 7Web
interface Online
Public records,
OSINT virtual
Domain File URL 7Multiple
transforms Program
Auto input/
output refeed,
Results in
oriented graph
Metagoofil 7Domain File type 7Metadata 7Command
Windows Limit of results
Recon-NG Personal
information Domain 7Several Multiple
results 7Command
line Linux
Modules for
discovery and
IP Address,
7 7 Network
info 7Web
interface Online Location,
Webcam captures
Spiderfoot Email
IP Address,
7Several Multiple
Modules for
Results in
oriented graph
The Harvester Company Domain,
DNS server 7Several Network
info 7Command
Results in reports,
Limit of results
F. Shodan
Shodan is a search engine9that provides public information
of Internet-connected nodes, including IoT devices. The rec-
ollection of information is made through protocols like HTTP
or SSH, so it allows search filters such as IP address, country
name or even keywords. In general, it is used for network
purposes, as it could be the monitoring of the network security
or exploring network topologies.
G. Spiderfoot
Similar to Maltego, Spiderfoot is a reconnaissance tool10
that automatically goes through lots of public data sources to
compile intelligence related to IP addresses, domain names, e-
mail addresses, names and more. Given the target, Spiderfoot
uses the selected modules (equivalent to transforms) to per-
form its analysis. The results are represented in a node graph
with all the found entities and relationships. In this case it is
also possible to define our own modules.
H. The Harvester
This software11 allows the collection of public information
relative to a domain or company name. In particular, it is
capable of listing emails of the company or hosts related
to the domain. It also permits user-friendly HTML/XML
representations of the results.
I. OSINT tools comparison
Depending on the user needs (see TABLE II), some tools
will be more suitable than others for a given task.
If we want to extract hidden information from files,FOCA
and Metagoofil are specific tools designed for this purpose.
In particular, the first product seems to be more complete
than the second one, in the sense that it is able to infer more
information from the metadata.
If we are looking for network-focused information,
Shodan and The Harvester are interesting options for this cer-
tain task. However, we would recommend Shodan as it permits
wider variety of inputs, it offers a user-friendly interface and
it does not require installation.
Finally, if the aim of the search is to gather as much
information as possible for a given input, the resources
InterTechniques, Maltego, Recon-NG and Spiderfoot will re-
turn diverse data and relationships. Among them, the most
sophisticated ones would be InterTechniques and Maltego. The
first website offers different types of search which will operate
through a very large number of data sources, but it is not
as integrated as Maltego. In fact, this last tool implements
automated inference processes between inputs and outputs that
raise the scope of the original search. Moreover, it is extensible
with custom discovery procedures.
Logically, although this comparison has been made accord-
ing to the desired output, in practice the user will be limited by
the available input and the data type accepted by OSINT tools.
on II: Monitorizaci´
on de eventos de seguridad
106 JNIC 2019
Finally, note that these tools are complementary, meaning that
a deep OSINT investigation could profit from all of them.
Intelligence services have been traditionally associated with
the labour of Law Enforcement Agencies (LEAs) and Military
Bodies. In the same way, OSINT is considered nowadays as an
important key of classified investigations and secret operations
in state affairs [1].
As far as we were able to explore in the official websites,
reports and documentation, government organizations seem
to implement internal mechanisms which basically consist
in gathering raw information and transforming it into useful
knowledge. In a representative way, we could mention the
United Estates Federal Bureau of Investigation (FBI), United
States Central Intelligence Agency (CIA), Canadian Security
Intelligence Service (CSIS), EUROPOL, North Atlantic Treaty
Organization (NATO), US DA (Department of Army) or NPIA
(National Policing Improvement Agency) of England.
In Spain, the situation is quite similar. It is not easy to find
clear evidences of the application of OSINT by the state forces.
The confidentiality of this type of agencies makes it difficult
to discover their internal operating mode and the impact of
OSINT in their current investigations. Nevertheless, as a result
of a deep search, we have some subtle findings that confirm,
indeed, that OSINT is currently used by Spanish LEAs:
Yet in 2007, the director of the CNI (i.e., Spanish
National Intelligence Agency) said12 that open sources
were “fundamental to the elaboration and work of Intel-
CIFAS (i.e., Spanish Military Intelligence Agency) also
seems to use OSINT as a way of obtaining information.
We have found some slides that confirm this, dated in
2008, which are uploaded in the Spanish Defense Staff
In 2010, when the director of the CNI announced14 the
creation of an ethical code for special agents, he also
insisted on the fact that modern intelligence was not just
based on physical presence, as today “you might get more
information sitting on a computer, exploring messages
from the bad guys”.
In 2017, the Spanish Ministry of Defense opened a public
call15 for the contract called “Development of OSINT tool
based on IDOL HAVEN platform”.
In the present, the Spanish Army is designing a new
model called Brigade 203516 which incorporates inno-
vative technological advances for enhancing operations.
In this project, one of the defined combat functions is
be5b-36ef3fd4e65b/DOC CN2017-498874.pdf?MOD=AJPERES
16 2035/principal.html
Intelligence, which clearly states OSINT as a key respon-
sibility: “Other facilities of growing importance will be
open source obtainment (including social networking)....
The Spanish Ministry of the Interior has published in the
Annual Recruitment Plan for 201917 some investments in
systems for obtaining OSINT in the cyberspace
Bearing in mind all these points, it seems that currently
OSINT is indeed relevant in the internal affairs of Spain. In
addition, note that to be effective, this paradigm depends on
the public data available on the Internet, among other sources.
In this regard, apart from social networks and other open
data sources, there are authoritative Spanish sites where public
information is published.
According to the European Data Portal and its official
reports18 about Open Data maturity across Europe, Spain is
one of the most advanced countries in transparency and open
data. In fact, it has been in first or second position in the
ranking of Open Data Maturity in the last four years. As
it is indicated, the Spanish Government has promoted more
than 160 open data initiatives and has over 23,800 public
information catalogues. For example, the Open Data Initiative
of the Government of Spain19 is a clear proof of how Spain
encourages transparency. OSINT could benefit from that, but
it should deal with aggregated and statistical information by
linking it and inferring new knowledge.
However, it would be more interesting to analyze govern-
mental platforms which are not anonymized. For instance, the
Spanish Ministry of the Treasury, the Spanish Ministry of the
Interior or the Spanish Ministry of Defense usually publish
documents with personal information (
filetype:pdf intext:dni, for example). In the same way, this
could be also applied to Spanish Autonomous Communities
Moreover, Europe has also a public data platform20 where
we could find a lot of public information. For instance, in
the context of foreign policy and security, an updated list of
financial sanctions is presented in the “European Union Con-
solidated Financial Sanctions List” document. In particular,
it reveals personal information about individuals, groups and
All the aforementioned facts demonstrate that Europe, and
especially Spain, are adopting strong Open Data policies. As a
direct consequence, the amount of objective data available on
the Internet is rapidly increasing. OSINT should, in addition
to other open sources of information, take advantage of
this powerful opportunity to collect, analyze, link and infer
knowledge from reliable and official sources.
After a review of the existing OSINT techniques, tools and
status, it is also necessary to enumerate some gaps of this
17 docs/2019/
PACDEF 2019 Documento Pxblico.pdf
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 107
paradigm. Although OSINT seems to be ideal, in reality it
is necessary to make it more sophisticated and applicable to
uncontrolled scenarios of the real world. As far as we know,
there are some challenges that are not solved yet and should
be faced by the research community:
A. Propagation of the gathering process
With the development of big data and data mining tech-
niques, it should not be a problem to avoid collecting data in
a manual manner. Although OSINT techniques (Section IV)
and tools (Section V) improve this, they work with single
and basic explorations. In this sense, it would be appealing
to implement refeed mechanisms by concatenating searches
from outputs to inputs. As a consequence, the original search
would also have, not only direct inferences, but also indirect
and not explicit relationships.
B. Integration of several open data sources
OSINT activities should consult as many sources as possible
in order to cover the widest possible spectrum. This means
that the system has to normalize the gathered information,
which is typically unstructured, in order to perform an effective
analysis. In this context, it is important to filter repeated items.
C. Detection of irrelevant data and misinformation
Due to the huge amount of data publicly available, an
OSINT process needs to be capable of distinguishing the
relevance of each piece of information, discarding data which
do not add value. Furthermore, it is crucial to detect misin-
formation that would corrupt the results. In fact, it would be
interesting to analyze information as well fake with the aim
of extracting intelligence.
D. Extension across the whole world
One of the main drawbacks of the existing OSINT resources
is that they are usually oriented to specific countries. As
a result, they are leaving aside other interesting open data
sources from different territories. Taking this negative issue
into account, interoperability is a desirable property to be
considered in OSINT design. Note that it would increase the
scope of the searches and the usage by end-users.
In Spain, for instance, we use tools that are designed in (and
for) foreign countries. However, there are not OSINT solutions
which include Spanish public repositories in the gathering
phase (as government open data platforms could be). In this
sense, we are not benefiting from the goldmine that supposes
being one of the most transparent countries of Europe.
E. Enhancement of the analysis process
OSINT analysis is not as intelligent as it could be. The
existing tools are limited to throwing all the information found
and its relationships. However, the analysis process should
incorporate semantic analysis, study of patterns, correlation
with other events, occurrences or datasets. Ideally, the OSINT
of the future should be able to provide the end user with the
specific piece of information he/she is searching, as well as to
return convincing answers in investigations.
F. Awareness of ethical and legal considerations
The use of OSINT should be restricted to legal activities
and non-malicious purposes. To achieve this, OSINT has to
be designed respecting the user’s privacy and data protection
laws. Furthermore, OSINT tool developers should also take
into account that the end-user could be a delinquent trying to
commit a crime. For this reason, the use of the most powerful
tools should be limited to LEAs and Intelligence Services.
G. Summary
All the abovementioned challenges build the path between
the Second Generation and the Third Generation of OSINT.
As it is presented in [24], the Second Generation started with
the rise of Internet and Social Media, and the challenges
were “technical expertise, virtual accessibility and constant
acquisition”. In contrast, the evolution to the Third Generation
is supposed to appear nowadays and will have to include
direct and indirect machine processing of data, machine
learning, and automated reasoning”.
OSINT is changing the traditional intelligence processes
into an automated procedure capable of taking investigations
to all parts of the world. In fact, it is not only available for Law
Enforcement Agencies (LEAs) and Intelligence Services, but
also for curious people without technical training. However,
there is still a lack of serious approaches for transforming
OSINT into a robust and self-managed solution.
The paper described the status of this paradigm today. It
revealed that the effectiveness of current works is questionable
due to their poor application in real scenarios. The article
also presented some OSINT techniques for basic searches and
described the most sophisticated OSINT tools for advanced
In the context of Spain, we pointed out some indications
which might confirm that Spanish LEAs use OSINT in their
internal procedure. Furthermore, we categorized Spain as a
goldmine due to its Open Data maturity. Actually, it is one
of the highest one of Europe according to the European Data
Finally, the article outlined some open challenges related to
gathering, analyzing and extracting real knowledge from the
immersion of the internet. The future directions could address
such challenges by including advanced techniques in OSINT
processes in order to improve the current performance. To this
extent, the OSINT ultimate goal is to be able to ensure the
desired finding for a certain purpose, in an automated and a
self-driven way.
This work has been supported by a Leonardo Grant 2017
for Researchers and Cultural Creators awarded by the BBVA
Foundation; and by a Ram´
on y Cajal research contract (RYC-
2015-18210) granted by the MINECO (Spain) and co-funded
by the European Social Fund.
on II: Monitorizaci´
on de eventos de seguridad
108 JNIC 2019
[1] B. Akhgar, P. S. Bayerl, and F. Sampson. Open Source Intelligence
Investigation: From Strategy to Implementation. Springer Publishing
Company, Incorporated, 1st edition, 2017.
[2] C. Aliprandi, J. Arraiza Irujo, M. Cuadros, S. Maier, F. Melero, and
M. Raffaelli. Caper: Collaborative information, acquisition, processing,
exploitation and reporting for the prevention of organised crime. In
C. Stephanidis, editor, HCI International 2014 - Posters’ Extended Ab-
stracts, pages 147–152, Cham, 2014. Springer International Publishing.
[3] G. Bello-Orgaz, J. J. Jung, and D. Camacho. Social big data: Recent
achievements and new challenges. Information Fusion, 28:45 – 59, 2016.
[4] H. Chen, R. H. L. Chiang, and V. C. Storey. Business intelligence and
analytics: From big data to big impact. MIS Quarterly, 36(4):1165–1188,
[5] T. Day, H. Gibson, and S. Ramwell. Fusion of OSINT and Non-OSINT
Data, pages 133–152. Springer International Publishing, Cham, 2016.
[6] T. Delavallade, P. Bertrand, and V. Thouvenot. Extracting Future Crime
Indicators from Social Media. In Using Open Data to Detect Organized
Crime Threats, pages 167–198. Springer International Publishing, Cham,
[7] DiSIEM project. Diversity Enhacements for Security Information and
Event Management Project:
[8] C. S. Fleisher. Using open source data in developing competitive and
marketing intelligence. European Journal of Marketing, 42(7/8):852–
866, 2008.
[9] A. Gandomi and M. Haider. Beyond the hype: Big data concepts, meth-
ods, and analytics. International Journal of Information Management,
35(2):137 – 144, 2015.
[10] F. G ´
omez M´
armol, M. Gil P´
erez, and G. Mart´
ınez P´
erez. I don’t trust
ict: Research challenges in cyber security. In S. M. Habib, J. Vassileva,
S. Mauw, and M. M¨
auser, editors, Trust Management X, pages 129–
136, Cham, 2016. Springer International Publishing.
[11] M. J. Hern´
andez, C. C. Pinz´
on, D. O. D´
ıaz, J. C. C. Garc´
ıa, and R. A.
Pinto. Open source intelligence (OSINT) as support of cybersecurity
operations. Use of OSINT in a colombian context and sentiment
Analysis. Revista Vnculos: Ciencia, tecnologa y sociedad, 15:29–40,
[12] J. Jang-Jaccard and S. Nepal. A survey of emerging threats in
cybersecurity. Journal of Computer and System Sciences, 80(5):973–
993, 2014.
[13] G. Kalpakis, T. Tsikrika, N. Cunningham, C. Iliou, S. Vrochidis,
J. Middleton, and I. Kompatsiaris. OSINT and the Dark Web, pages
111–132. Springer International Publishing, Cham, 2016.
[14] M. Kandias, D. Gritzalis, V. Stavrou, and K. Nikoloulis. Stress level
detection via OSN usage pattern and chronicity analysis: An OSINT
threat intelligence module. Computers & Security, 69:3–17, aug 2017.
[15] M. Kandias, L. Mitrou, V. Stavrou, and D. Gritzalis. Which side are you
on? A new Panopticon vs. privacy. In 2013 International Conference
on Security and Cryptography (SECRYPT), pages 1–13, July 2013.
[16] H. L. Larsen, J. M. Blanco, R. Pastor Pastor, and R. R. Yager,
editors. Using Open Data to Detect Organized Crime Threats. Springer
International Publishing, Cham, 2017.
[17] S. Lee and T. Shon. Open source intelligence base cyber threat inspec-
tion framework for critical infrastructures. In 2016 Future Technologies
Conference (FTC), pages 1030–1033. IEEE, dec 2016.
[18] S. C. Mercado. Sailing the Sea of OSINT in the Information Age.
Journal of the American Intelligence Professional, 48(3), 2004.
[19] P. Nespoli, D. Papamartzivanos, F. G´
omez M´
armol, and G. Kam-
bourakis. Optimal Countermeasures Selection against Cyber Attacks: A
Comprehensive Survey on Reaction Frameworks. IEEE Communications
Surveys and Tutorials, 20(2):1361–1396, 2018.
[20] R. P. Pastor and H. L. Larsen. Scanning of Open Data for Detection of
Emerging Organized Crime ThreatsThe ePOOLICE Project. In Using
Open Data to Detect Organized Crime Threats, pages 47–71. Springer
International Publishing, Cham, 2017.
[21] D. Quick and K.-K. R. Choo. Digital forensic intelligence: Data subsets
and Open Source Intelligence (DFINT+OSINT): A timely and cohesive
mix. Future Generation Computer Systems, 78:558–567, jan 2018.
[22] V. Santarcangelo, G. Oddo, M. Pilato, F. Valenti, and C. Fornaro.
Social opinion mining: An approach for italian language. In 2015 3rd
International Conference on Future Internet of Things and Cloud, pages
693–697, Aug 2015.
[23] B. L. William Wong. Fluidity and Rigour: Addressing the Design Con-
siderations for OSINT Tools and Processes, pages 167–185. Springer
International Publishing, Cham, 2016.
[24] H. J. Williams and I. Blum. Defining Second Generation Open Source
Intelligence (OSINT) for the Defense Enterprise, 2018.
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 109
... Sci. 2020, 10, 7617 2 of 25 decision-making process, intelligence takes place [4][5][6]. The activities of gathering and correlating such information through the use of tools is called Open-source intelligence (OSINT) [6]. ...
... On the other hand, [5,10], apart from describing the current state of OSINT by making a comprehensive review of the paradigm focusing on services and techniques that improve the field of cybersecurity, also raises challenges on OSINT, such as (i) automation of capture processes, (ii) improvement of knowledge analysis and extraction processes, (iii) filtering of irrelevant data, among others. Regarding the main trends on OSINT, identified by authors such as [5,10,36,37], are focusing on the defense and security analytics segment (video analytics, reducing network traffic, providing a real-time indication of external threats, detection and prevention of inside threats, and monitoring of suspicious activity in the organization, and forth). ...
... On the other hand, [5,10], apart from describing the current state of OSINT by making a comprehensive review of the paradigm focusing on services and techniques that improve the field of cybersecurity, also raises challenges on OSINT, such as (i) automation of capture processes, (ii) improvement of knowledge analysis and extraction processes, (iii) filtering of irrelevant data, among others. Regarding the main trends on OSINT, identified by authors such as [5,10,36,37], are focusing on the defense and security analytics segment (video analytics, reducing network traffic, providing a real-time indication of external threats, detection and prevention of inside threats, and monitoring of suspicious activity in the organization, and forth). ...
Full-text available
Given the growing application of open-source intelligence (OSINT), which has facilitated fast decision-making, this study aims to explore how research and educational material production in OSINT has evolved. For this analysis, two OSINT material sources are examined: the research dissemination databases and educational resources repositories. Considering that web information may or may not be publicly available, web Scraping and querying web interface strategies are used to metadata extraction. Finally, we suggest a findings hierarchical classification for the metadata retrieval results. Our main results: (1) Google Scholar and NewsBank are the centralizing axes of OSINT publications; (2) OSINT presents a broad development in the areas of defense and security; thus, presenting itself a promising future; (3) it is necessary both to generate educational resources that complement OSINT training processes and documenting existing resources with a metadata structure defined for this purpose; (4) pay increased attention to the last stages of the OSINT process, to use this knowledge in more assertive ways. This study allows guiding the researchers to the current state of research and education in OSINT and promotes a useful metadata description to make resources accessible and reusable in the educational environment.
... The paper at hand, which is an extension of the work proposed in [17], encompasses the present and future of OSINT by analyzing its positive and negative points, describing ways of applying this type of intelligence, and enunciating future directions for the evolution of this paradigm. In addition, a more detailed description of different techniques, tools and open challenges is presented in this work. ...
Full-text available
The amount of data generated by the current interconnected world is immeasurable, and a large part of such data is publicly available, which means that it is accessible by any user, at any time, from anywhere in the Internet. In this respect, Open Source Intelligence (OSINT) is a type of intelligence that actually benefits from that open natureby collecting, processing and correlating points of the whole cyberspace to generate knowledge. In fact, recent advances in technology are causing OSINT to currently evolve at a dizzying rate, providing innovative data-driven and AI-powered applications for politics, economy or society, but also offering new lines of action against cyberthreats and cybercrime. The paper at hand describes the current state of OSINT and makes a comprehensive review of the paradigm, focusing on the services and techniques enhancing the cybersecurity field. On the one hand, we analyze the strong points of this methodology and propose numerous ways to apply it to cybersecurity. On the other hand, we cover the limitations when adopting it. Considering there is a lot left to explore in this ample field, we also enumerate some open challenges to be addressed in the future. Additionally, we study the role of OSINT in the public sphere of governments, which constitute an ideal landscape to exploit open data.
Full-text available
Open source intelligence (OSINT) is used to obtain and analyze information related to adversaries, so it can support risk assessments aimed to prevent damages against critical assets. This paper presents a research about different OSINT technologies and how these can be used to perform cyber intelligence tasks. One of the key components in the operation of OSINT tools are the “transforms”, which are used to establish relations between entities of information from queries to different open sources. A set of transforms addressed to the Colombian context are presented, which were implemented and contributed to the community allowing to the law enforcement agencies to develop information gathering process from Colombian open sources. Additionally, this paper shows the implementation of three machine learning models used to perform sentiment analysis over the information obtained from an adversary. Sentiment analysis can be extremely useful to understand the motivation that an adversary can have and, in this way, define proper cyber defense strategies. Finally, some challenges related to the application of OSINT techniques are identified and described.
Full-text available
In comparison with intelligence analysis, OSINT requires different methods of identifying, extracting and analyzing the data. Analysts must have the tools that enable them to flexibly, tentatively and creatively generate anchors to start a line of inquiry, develop and test their ideas, and to fluidly transition between methods and thinking and reasoning strategies to construct critical and rigorous arguments as that particular line of inquiry is finalised. This chapter illustrates how analysts think from a design perspective and discusses the integration of Fluidity and Rigour as two conflicting design requirements. It further proposes that designs for OSINT tools and processes should support the fluid and rapid construction of loose stories, a free-form approach to the assembly of data, inference making and conclusion generation to enable the rapid evolution of the story rigorous enough to withstand interrogation. We also propose that the design encourages the analyst to develop a questioning mental stance to encourage self-checking to identify and remove dubious or low reliability data.
It is without doubt that today the volume and sophistication of cyber attacks keeps consistently growing, militating an endless arm race between attackers and defenders. In this context, full-fledged frameworks, methodologies, or strategies that are able to offer optimal or near-optimal reaction in terms of countermeasure selection, preferably in a fully or semi-automated way, are of high demand. This is reflected in the literature, which encompasses a significant number of major works on this topic spanning over a time period of 5 years, that is, from 2012 to 2016. The survey at hand has a dual aim, namely: first, to critically analyze all the pertinent works in this field, and second to offer an in-depth discussion and side-by-side comparison among them based on 7 common criteria. Also, a quite extensive discussion is offered to highlight on the shortcomings and future research challenges and directions in this timely area.
One of the most important aspects for a successful police operation is the ability for the police to obtain timely, reliable and actionable intelligence related to the investigation or incident at hand. Open Source Intelligence (OSINT) provides an invaluable avenue to access and collect such information in addition to traditional investigative techniques and information sources. This book offers an authoritative and accessible guide on how to conduct Open Source Intelligence investigations from data collection to analysis to the design and vetting of OSINT tools. In its pages the reader will find a comprehensive view into the newest methods for OSINT analytics and visualizations in combination with real-life case studies to showcase the application as well as the challenges of OSINT investigations across domains. Examples of OSINT range from information posted on social media as one of the most openly available means of accessing and gathering Open Source Intelligence to location data, OSINT obtained from the darkweb to combinations of OSINT with real-time analytical capabilities and closed sources. In addition it provides guidance on legal and ethical considerations making it relevant reading for practitioners as well as academics and students with a view to obtain thorough, first-hand knowledge from serving experts in the field.
This work provides an innovative look at the use of open data for extracting information to detect and prevent crime, and also explores the link between terrorism and organized crime. In counter-terrorism and other forms of crime prevention, foresight about potential threats is vitally important and this information is increasingly available via electronic data sources such as social media communications. However, the amount and quality of these sources is varied, and researchers and law enforcement need guidance about when and how to extract useful information from them. The emergence of these crime threats, such as communication between organized crime networks and radicalization towards terrorism, is driven by a combination of political, economic, social, technological, legal and environmental factors. The contributions to this volume represent a major step by researchers to systematically collect, filter, interpret, and use the information available. For the purposes of this book, the only data sources used are publicly available sources which can be accessed legally and ethically. This work will be of interest to researchers in criminology and criminal justice, particularly in police science, organized crime, counter-terrorism and crime science. It will also be of interest to those in related fields such as applications of computer science and data mining, public policy, and business intelligence.
Criminal organizations have been continuously evolving toward more global and structured organizations. This evolution is kindled by emerging communications technologies and open circulation of goods and persons. Fighting efficiently against this new kind of criminal requires a good acknowledgement of spatiotemporal trends. To capture such trends, police records can be used to derive relevant indicators. Yet using such records only would lead us to focus on some specific and well-covered areas. Hence, additional global and local indicators have to be extracted from complementary sources, e.g., open data for low frequency indicators or social media for higher frequency indicators revealing ongoing criminal activity. In this chapter, we focus on social media. Our main contribution lies in the formalization of a generic intelligence-driven process for extracting indicators from social media . We furthermore describe a concrete implementation of this process through the OsintLab platform and illustrate its interest and strength on two experiments. The first one studies copper theft activities, such as relating to the stealing of railway signaling cabling and its subsequent onwards trade, which are closely related to organized crime. The second one aims at understanding the drivers of jihadist propaganda on social media, such as relating to the current threat of Islamic State (IS/ISIS/ISIL/Daesh). Based on our experimental findings, we eventually propose a generic framework for the construction of crime indicators from social media feeds.
Conference Paper
Critical infrastructure, which used in energy, industry, and financial area, is one of essential components for modern society. Thus, critical infrastructures and its availability must be kept secure. Inspecting cyber threat is needed to prevent cyber-attack on critical infrastructure. By using OSINT (Open Source INTelligence) process, it is possible to gather meaningful intelligence related to security. In this paper, a cyber threat inspection framework for critical infrastructure based on OSINT process, is proposed.
Advances in technologies and changing trends in consumer behaviour have led to an increase in the volume, variety, velocity, and veracity of data available for digital forensic analysis. A benefit of analysis of big digital forensic data is that there may be case-related information contained within disparate data sources. This paper presents a framework for entity identification and open source information cohesion to add value to data holdings from digital forensic data subsets. Application of the framework to test data resulted in locating additional information relating to the entities contained within the digital forensic data subsets, which led to adding intelligence value relating to the entities. Analysis of real-world data confirmed the potential to add value to big digital forensic data to uncover disparate information and open source information. The results demonstrate the benefits of applying the process to achieve greater understanding of digital forensic data in a timely manner.