Content uploaded by Javier Pastor-Galindo
Author content
All content in this area was uploaded by Javier Pastor-Galindo on Jun 11, 2019
Content may be subject to copyright.
VJ
ornadas
N
acionales de
I
nvestigación en
C
iberseguridad
Cáceres 5-7 de junio
Editores: Andrés Caro Lindo, Luis Javier García Villalba, Ana Lucila Sandoval Orozco
Universidad de Extremadura
Servicio de Publicaciones
Actas de las
VJornadas Nacionales de Investigaci´
on en
Ciberseguridad (JNIC 2019)
Junio 5–7, 2019
C´
aceres, Espa˜
na
Editores:
Andr´
es Caro Lindo, Luis Javier Garc´
ıa Villalba, Ana Lucila Sandoval Orozco
Universidad de Extremadura. Servicio de Publicaciones
OSINT is the next Internet goldmine:
Spain as an unexplored territory
Javier Pastor-Galindo⇤, Pantaleone Nespoli, F´
elix G´
omez M´
armol, and Gregorio Mart´
ınez P´
erez
Department of Information and Communications Engineering,University of Murcia, 30100 Murcia, Spain
Email: {javierpg, pantaleone.nespoli, felixgm, gregorio}@um.es
Abstract—Phenomenons like Social Networks, Cloud Comput-
ing or Internet of Thing are unknowingly generating unimagin-
able quantities of data. In this context, Open Source Intelligence
(OSINT) exploits such information to extract knowledge that is
not easily appreciable beforehand by the human eye. Apart from
the political, economic or social applications OSINT may bring,
there are also serious global concerns that could be covered by
this paradigm such as cyber crime and cyber threats. The paper
at hand presents the current state of OSINT, the opportunities
and limitations it poses, and the challenges to be faced in the
future. Furthermore, we particularly study Spain as a potential
beneficiary of this powerful methodology.
Index Terms—OSINT, Cyber security, Cyber defense, Cyber
intelligence, Spain, Law Enforcement Agencies, Threat Intelli-
gence
I. INTRODUCTION
Open Source Intelligence (OSINT) embraces a set of tech-
niques collecting information from different open sources
(e.g., legally available documents, social networks, public
activities of states, companies and society, etc.) in order
to infer knowledge to be used for a specific purpose [24].
Although it might seem to be a novel paradigm, it has actually
been around for a long time. For instance, during the World
War II the radio was snooped to spy the adversaries. Already
in the year 1941, the Foreign Broadcast Information Service
(FBIS) was created by the USA to gather public information
from other countries. Even during the Cold War the Soviet
Union, China and other countries used OSINT through the
exploration of public documents and technical information of
foreign developments [18].
Traditional OSINT was conducted in a manual fashion in
the sense that it was necessary to have analysts in charge
of collecting public data and analyzing it in order to extract
knowledge. However, the current era of the information has
provoked that such a growing and huge amount of data is
available on the Internet [23]. As a consequence, original OS-
INT processes become ineffective with this modern demanding
conditions. This issue motivates the development of innovative
tools for automating the collection and analysis of data.
Nowadays, OSINT is widely used by governments and
intelligence services to conduct their investigations [1]. Nev-
ertheless, it is not only utilised for state affairs, but also
new research lines are taking advantage of this paradigm for
many goals. Indeed, actual research works tend to follow three
⇤Corresponding author
OSINT
Cybercrime and
Organized Crime
Spot illegal actions
Retrieve suspicious traces
Monitor malicious groups
Cyber Security and
Cyber Defense
Foot printing
Forensics analysis
Proactive auditing
Social Opinion and
Sentiment Analysis
Marketing
Political campaigns
Disaster management
Fig. 1. OSINT principal use cases
main applications which are represented in Figure 1 and are
described next:
•Social opinion and sentiment analysis: Related to the
boom of social networks, it is possible to collect user’s
interactions, messages, interests and preferences to ex-
tract non-explicit knowledge. Such collection and anal-
ysis could be applied to marketing, political campaigns,
disaster management or even cyber defense [3].
•Cybercrime and organized crime: The open data is con-
tinuously analyzed and matched by OSINT processes in
order to spot criminal intentions at an early stage. Taking
into account adversaries’ patterns and relationships be-
tween felonies provides to security forces an opportunity
to promptly detect illegal actions [16].
•Cyber security and cyber defense: Information and Com-
munication Technology systems are continuously at-
tacked by criminals [12]. Research becomes hence crucial
to defend ourselves from cyber attackers, concretely by
facing the challenges that are still open in the field of
cyber security [10]. In this sense, data sciences are not
only being applied to the footprinting in pentestings, but
also to the preventive protection of the organizations and
companies. Concretely, by performing analysis of daily
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
102 JNIC 2019
attacks, correlating them and supporting decision making
for an effective defense, but also for a prompt reaction
[19]. In the same way, OSINT can be also considered in
this context as a source of information for tracebacks and
investigations.
Additionally, it is important to note that the utilization of
public data has also compromising issues. There is a strong
ethical component which is linked to the user’s privacy. In
particular, the profiling of people [15] could reveal personal
details such as their political preference, sexual orientation or
religious beliefs.
This article addresses the current state of OSINT since, to
the best of our knowledge, there is no published work that
integrates the recent advances of this paradigm, the opportu-
nities it offers and the existing facilities to support OSINT
processes. Specially, we study the spread and employment of
OSINT techniques in Spain.
Furthermore, our purpose is to stimulate researches and
advances in OSINT. As we have seen so far, OSINT is a
promising mechanism that concretely improves the traditional
cyber intelligence and cyber defense fields. However, there is
still a long way to go to explore in this topic, and this article
presents some future lines of research.
The remainder of this paper is organized as follows. Section
II offers a review of recent research works in the field of
OSINT. Section III discusses the motivation, pros and cons of
the development of OSINT. Then, Sections IV and V describe
some techniques and tools that facilitate searches through
open data sources. Section VI contextualizes OSINT in Spain
by describing some evidences of its usage and presenting
certain Spanish public databases. Section VII poses some open
challenges relative to research in OSINT. Finally, Section VIII
concludes with some keys remarks, as well as future research
directions.
II. STATE O F T H E AR T
In recent years, with the advances of big data and data
mining techniques, the research community has noticed that
open data is a powerful source of analyzing social behaviors
and obtaining relevant information [4].
With regards to the use of OSINT for extracting social
opinion and emotions, Santarcangelo et al. [22] proposed a
model for determining user opinions about a given keyword
through social networks, specifically studying the adjectives,
intensifiers and negations used in tweets. Unfortunately, it is
a simple keyword-based solution only designed for Italian
language not taking into account semantic issues. On the
other hand, Kandias et al. [14] could relate people’s usage
of social networks (in particular, Facebook) to their stress
level. However, the experiments were carried out only with
405 users, while nowadays there is a chance of processing
much larger amounts of data.
In the context of cybercrime and organized crime, there
are several works that explore the application of OSINT for
criminal investigations. For example, OSINT could increase
the accuracy of persecutions and arrests of culprits with frame-
works like the proposed by Quick et al. in [21]. Concretely,
authors apply OSINT to digital forensic data of a variety of
devices to enhance the criminal intelligence analysis.
In this field, another opportunity that OSINT offers is the
detection of illegal actions as well as the prevention of future
crimes such as terrorist attacks, murders or violations. In fact,
the European projects ePOOLICE [20] and CAPER [2] were
designed to develop effective models for scanning open data
automatically in order to analyze the society and detect emerg-
ing organized crime. In contrast to the previous mentioned
projects, whose proposals were not practically used in real
cases, Delavallade et al. [6] describe a model based on social
networks data that is able to extract future crime indicators.
Such model is then applied to copper theft and to jihadist
propaganda use cases.
From the point of view of cyber security and cyber
defense, OSINT represents a valuable tool for improving our
protection mechanisms against cyber attacks. Pinto et al. [11]
propose the use of OSINT in the Colombian context to prevent
attacks and even to allow strategic anticipation. It includes
not only plugins for collecting information, but also machine
learning models to perform sentiment analysis. Moreover, the
DiSIEM european project [7] maintains as a first goal the
integration of diverse OSINT data sources in current SIEMs
(Security Information and Event Management) to help reacting
to recently-discovered vulnerabilities in the infrastructure or
even predict possible emerging threats. Lee et al. [17] also
designed an OSINT-based framework to inspect cyber security
threats of critical infrastructure networks. However, all these
approaches have not been applied to real world scenarios, thus
their effectiveness remains questionable.
III. BACKGRO UND
The incredible growth of new technologies, services and
social networks based on the Internet is putting information
on the central axis of the world. In fact, a large part of it is
publicly available, which means that anyone at any time in
any place has access to this data.
Another phenomenon that is going on nowadays is the
evolution from traditional criminal techniques to cybercrime.
Extortion, fraud, identity theft or child exploitation are now
carried out through the network, burglary has become hacking
and fraudulent calls are recently known as phishing.
Fortunately, the good news is that almost every cybercrim-
inal uses the Internet not only for illegal actions, but also
for personal purposes. Leveraging this fact, OSINT seeks to
connect both issues through the analysis of public data to
produce cyber intelligence.
From a technical point of view, as we can see in TABLE I,
OSINT exposes a number of benefits although it has also to
deal with some restrictions. Regarding the positive points, we
could highlight the following:
•Huge amount of worthwhile open source data to be
analyzed, crossed and linked [23]. It includes social net-
works, public government documents and reports, online
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 103
Pros Cons
Huge amount of public information Complexity of data management
High capacity of computing Unstructured information
Big data and machine learning Misinformation
Complementary types of data Data sources reliability
Flexible purpose and wide scope Strong ethical/legal considerations
TABLE I
OSINT PROS AND CONS
multimedia content, newspapers and even the Deep web
and Dark web, among others. The latest commented
sources are especially interesting for OSINT [13]. Both
the Deep Web and the Dark Web (the latter circumscribed
within the former) contain even more information than
the Surface Web (the Internet known by most users). In
order to be able to access these networks, it is necessary
to use specific tools since their contents are not indexed
by traditional search engines.
Unlike the Surface Web and most of the Deep Web, the
Dark Web offers anonymity and privacy to users who
utilize it. This property makes criminals use this network
to surf, conduct their searches and publish for illegitimate
purposes while hiding their identity. Therefore, the Dark
Web is an ideal source to apply OSINT and fight against
cybercrime, organized crime or cyber threats. On the
other hand, the persecution and de-anonymization of
these people are a challenge for OSINT to work.
•Powerful computing capacity to mix large sets of data,
relationships and patterns from different types of open
sources. In particular, it allows the creation of complex
inferences that are naturally unpredictable to humans [9].
•Emerging proliferation of big data and data mining tech-
niques, as well as machine learning algorithms, which can
automate and make investigation processes and decision
making more intelligent and efficient [9].
•Possibility of completing OSINT with other types of
information [5]. The system’s inherent structure is open
enough to accept also classified data or citizen collabo-
ration within its engine.
•Generic implementation that allows different kinds of
targets and several paths of exploration. As a conse-
quence, OSINT applications could monitor suspicious
people or dangerous groups, detect influence profiles
related to radicalization, study worrying trends of the
society, support the relative attribution of cyber attacks
and crimes, etc [1].
However, using open source data also presents disadvan-
tages which need to be considered as well:
•The quantity of data is immeasurable and, logically, it is
challenging to handle it efficiently and effectively [8].
•The public information available on the Internet is inher-
ently unstructured. This means that the data collected by
OSINT is so heterogeneous that makes it difficult to clas-
sify, link and examine it in order to extract relationships
and knowledge [3].
•Social networks and communication media are flooded
with subjective opinions, fake news and canards [3]. For
this reason, the existence of misinformation has to be
considered in the implementation of OSINT mechanisms.
The reliability and authority of the information are indeed
the key to success.
•Ideally, the collected data should come from authorita-
tive and reviewed sources (official documents, scientific
reports, reliable communication media) [8]. In practice,
OSINT will also deal with subjective or non-authoritative
sources, as it could be the content of social networks or
manipulated media.
•Ethical and legal considerations are fundamental in the
development of OSINT. The results should respect user’s
privacy and not reveal intimate and personal issues [15].
In fact, the scope of the searches should be, by definition,
limited to open data sources.
Since we can not allocate the police within each possible
communication of the world, there is still an opportunity of
using the public data to detect anomaly patterns and malicious
behaviours. How calm would the cybercriminals be if, not only
every single step of their telematic actions, but also their daily
life, were relevant clues for investigators?
IV. OSINT TECHNIQUES
As it has been shown, OSINT is quite promising and
powerful, but its implementation is also challenging. Thus
for instance there are several manual techniques that provide
public data to the end user, as we will see next.
A. Search engines
Everyone knows of the existence of Google,Bing or Yahoo
search engines, among others. The traditional use of them is
the simplest way of applying OSINT.
Moreover, services like Google support filters to refine
searchs 1. For instance, the use of “” permits exact-matches,
OR and AND act as logical operators, or *as a wildcard. It also
allows the introduction of conditions like filetype to specify a
certain file type, site to limit results to those from a specific
website, or intitle to find pages with certain keywords within
their title.
It is worth noting that, for example, a search in Google for
DNIs (i.e., Spanish ID cards) within the Region of Murcia
website outputs more than 15,000 results in less than half a
second through the following query:
site:carm.es filetype:pdf intext:dni
B. Social networks
Nowadays, services like Facebook or Twitter have invaded
our society. Any curious person has realized that lot of per-
sonal information can be found without advanced knowledge
of these platforms. Thus, these applications offer precise
search possibilities in the context of OSINT.
1https://support.google.com/websearch/answer/2466433?hl=en
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
104 JNIC 2019
Facebook permits specific queries by visiting elaborated
URLs. For example, www.facebook.com/search/facebook-
id/search-token, where facebook-id is the user identificator and
search-token defines the criterion of the search (namely, pages-
liked,photos-liked,places-visited, etc). Twitter not only sup-
ports advanced searches through URLs, but also implements a
user-friendly interface in https://twitter.com/search-advanced.
Logically, these characteristics can be extended to the rest
of social networks in some way.
C. Other OSINT services
There are other specific websites that offer relevant infor-
mation given a certain kind of input:
•Email address: The website hunter.io returns whether
an email address is valid or not, haveibeenpwned.com
informs whether an email address has been hacked and
pipl.com finds information related to the owner of such
email address.
•Username: The service knowem.com checks the availabil-
ity of a given username in social networks or domains.
•Real name: Apart from social networks, there are geneal-
ogy sites like FamilySearch or GENi that provide kinship
information.
•Location:Google Maps or Wikimapia are well known
sites to find out locations from GPS coordinates. On the
contrary, it is also possible to get the GPS coordinates
from a location name at www.gps-coordinates.net.
•IP Address: The service www.iplocation.net gets the
location from a given IP address, whereas viewdns.info
provides more technical information (whois, reverse IP
lookup, traceroute, etc.).
•Domain: It is possible to visualize domain
connections through www.threatcrowd.org or
www.visualsitemapper.com. Furthermore, checking
DNS and mailservers is also useful, by visiting
www.domaincrawler.com or who.is/dns. There are also
services like www.alexa.com and www.similarweb.com
which calculate traffic statics and others like
findsubdomains.com which search for subdomains.
Finally, the site web.archive.org explores content within
a number of archived domains.
It is clear that, by combining different techniques, it is
possible to produce extremely useful knowledge about any
connected target. Nevertheless, the scope of these resources
have a general purpose and it is limited to specific fields.
For that reason, researchers and developers have implemented
more precise solutions for gathering better quality information.
V. OSINT TOOLS
Fortunately for people conducting OSINT activities, there
are also more sophisticated tools that automate the collec-
tion of public information and infer interesting relationships.
TABLE II presents the main features of the most popular
and relevant open source OSINT tools today. Nevertheless,
a complete view of the variety of OSINT resources can be
displayed on the OSINT framework2.
A. FOCA (Fingerprinting Organizations with Collected
Archives)
This product3, designed by ElevenPaths, analyzes the meta-
data of documents (Microsoft Office, PDF, Open Office,
etc) available on the Internet. The software finds the hidden
information, unifies it, and recognises the files that have been
created in the same computer, or servers and clients that
could be related to them. The server discovery module also
includes more functionalities like web search, DNS search or
IP resolution.
B. IntelTechniques
IntelTechniques consists in a website4, created by Michael
Bazzel, which offers hundreds of online search utilities. There
are several modules divided by the target data that allow
searching by email, social network profile, real name and user
name, among others, in order to present to the end user the
collected public information. It is a comprehensive tool that
makes use of other simpler techniques, as the ones commented
previously.
C. Maltego
Maltego is a well-known application5that finds public
information within different sources about a certain target and
presents it in the form of a directed graph for its analysis.
Specifically, this tool infers advanced relationships (from data
X to data Y) automatically with the so-called transforms.
Although Maltego implements its generic transforms, it is
also possible to implement and include custom ones for more
specific purposes. For example, it would be very interesting
to develop OSINT transforms for the Spanish context in order
to take advantage of the existing open sources of Spain.
D. Metagoofil
Metagoofil is an information gathering tool6that extracts
metadata of the files found for a specific domain or URL
target. It is usually used for pentesting as it is able to reveal
usernames, software versions and servers or machine names.
E. Recon-NG
Recon-NG is a web recognition framework7similar to
Metasploit8which focuses its search depending on the loaded
modules and the introduced input. It could obtain emails of
the organization, locations, information of the administrator
and users, whois information, etc.
2osintframework.com
3https://www.elevenpaths.com/es/labstools/foca-2
4https://inteltechniques.com
5https://www.paterva.com/web7/buy/maltego-clients.php
6https://github.com/laramies/metagoofil
7https://bitbucket.org/LaNMaSteR53/recon-ng/wiki/browse
8https://www.metasploit.com/
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 105
OSINT tools Input Output Extensibility Interface Platform Other features
Identity
Data
Network
Data File Data
Source
FOCA 7Domain File type
Google,
Bing,
Exalead
Metadata 7Program Linux,
Windows
Web, DNS
and IP refeed
InterTechniques
Personal
information,
company,
community
Domain,
IP Address
File name,
File type,
File URL
Several Multiple
results 7Web
interface Online
Location,
Public records,
OSINT virtual
machine
Maltego
Personal
information,
company,
community
Domain File URL 7Multiple
results
Custom
transforms Program
Linux,
Windows,
MAC
Location,
Auto input/
output refeed,
Results in
oriented graph
Metagoofil 7Domain File type 7Metadata 7Command
line
Linux,
Windows Limit of results
Recon-NG Personal
information Domain 7Several Multiple
results 7Command
line Linux
Location,
Modules for
discovery and
exploitation
Shodan
Country,
City,
Keyword,
Hostname
Operating
system,
IP Address,
Port
7 7 Network
info 7Web
interface Online Location,
Webcam captures
Spiderfoot Email
Domain,
IP Address,
Subnet
7Several Multiple
results
Custom
modules
Web
interface
Linux,
Windows
Modules for
discovery,
Results in
oriented graph
The Harvester Company Domain,
DNS server 7Several Network
info 7Command
line
Linux,
Windows,
MAC
Results in reports,
Limit of results
TABLE II
MAIN FEATURES OF THE SELECTED OSINT TOOLS
F. Shodan
Shodan is a search engine9that provides public information
of Internet-connected nodes, including IoT devices. The rec-
ollection of information is made through protocols like HTTP
or SSH, so it allows search filters such as IP address, country
name or even keywords. In general, it is used for network
purposes, as it could be the monitoring of the network security
or exploring network topologies.
G. Spiderfoot
Similar to Maltego, Spiderfoot is a reconnaissance tool10
that automatically goes through lots of public data sources to
compile intelligence related to IP addresses, domain names, e-
mail addresses, names and more. Given the target, Spiderfoot
uses the selected modules (equivalent to transforms) to per-
form its analysis. The results are represented in a node graph
with all the found entities and relationships. In this case it is
also possible to define our own modules.
H. The Harvester
This software11 allows the collection of public information
relative to a domain or company name. In particular, it is
capable of listing emails of the company or hosts related
to the domain. It also permits user-friendly HTML/XML
representations of the results.
9https://www.shodan.io
10https://www.spiderfoot.net
11https://github.com/laramies/theharvester
I. OSINT tools comparison
Depending on the user needs (see TABLE II), some tools
will be more suitable than others for a given task.
If we want to extract hidden information from files,FOCA
and Metagoofil are specific tools designed for this purpose.
In particular, the first product seems to be more complete
than the second one, in the sense that it is able to infer more
information from the metadata.
If we are looking for network-focused information,
Shodan and The Harvester are interesting options for this cer-
tain task. However, we would recommend Shodan as it permits
wider variety of inputs, it offers a user-friendly interface and
it does not require installation.
Finally, if the aim of the search is to gather as much
information as possible for a given input, the resources
InterTechniques, Maltego, Recon-NG and Spiderfoot will re-
turn diverse data and relationships. Among them, the most
sophisticated ones would be InterTechniques and Maltego. The
first website offers different types of search which will operate
through a very large number of data sources, but it is not
as integrated as Maltego. In fact, this last tool implements
automated inference processes between inputs and outputs that
raise the scope of the original search. Moreover, it is extensible
with custom discovery procedures.
Logically, although this comparison has been made accord-
ing to the desired output, in practice the user will be limited by
the available input and the data type accepted by OSINT tools.
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
106 JNIC 2019
Finally, note that these tools are complementary, meaning that
a deep OSINT investigation could profit from all of them.
VI. OSINT IN SPA I N
Intelligence services have been traditionally associated with
the labour of Law Enforcement Agencies (LEAs) and Military
Bodies. In the same way, OSINT is considered nowadays as an
important key of classified investigations and secret operations
in state affairs [1].
As far as we were able to explore in the official websites,
reports and documentation, government organizations seem
to implement internal mechanisms which basically consist
in gathering raw information and transforming it into useful
knowledge. In a representative way, we could mention the
United Estates Federal Bureau of Investigation (FBI), United
States Central Intelligence Agency (CIA), Canadian Security
Intelligence Service (CSIS), EUROPOL, North Atlantic Treaty
Organization (NATO), US DA (Department of Army) or NPIA
(National Policing Improvement Agency) of England.
In Spain, the situation is quite similar. It is not easy to find
clear evidences of the application of OSINT by the state forces.
The confidentiality of this type of agencies makes it difficult
to discover their internal operating mode and the impact of
OSINT in their current investigations. Nevertheless, as a result
of a deep search, we have some subtle findings that confirm,
indeed, that OSINT is currently used by Spanish LEAs:
•Yet in 2007, the director of the CNI (i.e., Spanish
National Intelligence Agency) said12 that open sources
were “fundamental to the elaboration and work of Intel-
ligence”.
•CIFAS (i.e., Spanish Military Intelligence Agency) also
seems to use OSINT as a way of obtaining information.
We have found some slides that confirm this, dated in
2008, which are uploaded in the Spanish Defense Staff
website13.
•In 2010, when the director of the CNI announced14 the
creation of an ethical code for special agents, he also
insisted on the fact that modern intelligence was not just
based on physical presence, as today “you might get more
information sitting on a computer, exploring messages
from the bad guys”.
•In 2017, the Spanish Ministry of Defense opened a public
call15 for the contract called “Development of OSINT tool
based on IDOL HAVEN platform”.
•In the present, the Spanish Army is designing a new
model called Brigade 203516 which incorporates inno-
vative technological advances for enhancing operations.
In this project, one of the defined combat functions is
12https://www.elconfidencialdigital.com/articulo/vivir/CNI-califica-
fundamental-abiertas-contradice/20071023000000049386.html
13http://www.emad.mde.es/Galerias/EMAD/novemad/fichero/EMD-
CIFAS-esp.pdf
14https://www.lavanguardia.com/politica/20100624/53951898847/el-
director-del-cni-anuncia-un-codigo-etico-para-los-agentes-secretos.html
15https://contrataciondelestado.es/wps/wcm/connect/ff96fa82-7fd6-40bd-
be5b-36ef3fd4e65b/DOC CN2017-498874.pdf?MOD=AJPERES
16http://www.ejercito.mde.es/estructura/briex 2035/principal.html
Intelligence, which clearly states OSINT as a key respon-
sibility: “Other facilities of growing importance will be
open source obtainment (including social networking)...”.
•The Spanish Ministry of the Interior has published in the
Annual Recruitment Plan for 201917 some investments in
“systems for obtaining OSINT in the cyberspace”
Bearing in mind all these points, it seems that currently
OSINT is indeed relevant in the internal affairs of Spain. In
addition, note that to be effective, this paradigm depends on
the public data available on the Internet, among other sources.
In this regard, apart from social networks and other open
data sources, there are authoritative Spanish sites where public
information is published.
According to the European Data Portal and its official
reports18 about Open Data maturity across Europe, Spain is
one of the most advanced countries in transparency and open
data. In fact, it has been in first or second position in the
ranking of Open Data Maturity in the last four years. As
it is indicated, the Spanish Government has promoted more
than 160 open data initiatives and has over 23,800 public
information catalogues. For example, the Open Data Initiative
of the Government of Spain19 is a clear proof of how Spain
encourages transparency. OSINT could benefit from that, but
it should deal with aggregated and statistical information by
linking it and inferring new knowledge.
However, it would be more interesting to analyze govern-
mental platforms which are not anonymized. For instance, the
Spanish Ministry of the Treasury, the Spanish Ministry of the
Interior or the Spanish Ministry of Defense usually publish
documents with personal information (site:hacienda.gob.es
filetype:pdf intext:dni, for example). In the same way, this
could be also applied to Spanish Autonomous Communities
websites.
Moreover, Europe has also a public data platform20 where
we could find a lot of public information. For instance, in
the context of foreign policy and security, an updated list of
financial sanctions is presented in the “European Union Con-
solidated Financial Sanctions List” document. In particular,
it reveals personal information about individuals, groups and
entities.
All the aforementioned facts demonstrate that Europe, and
especially Spain, are adopting strong Open Data policies. As a
direct consequence, the amount of objective data available on
the Internet is rapidly increasing. OSINT should, in addition
to other open sources of information, take advantage of
this powerful opportunity to collect, analyze, link and infer
knowledge from reliable and official sources.
VII. OPEN CHALLENGES
After a review of the existing OSINT techniques, tools and
status, it is also necessary to enumerate some gaps of this
17http://www.defensa.gob.es/Galerias/gabinete/ficheros docs/2019/
PACDEF 2019 Documento Pxblico.pdf
18https://www.europeandataportal.eu/en/dashboard#2018
19https://datos.gob.es/es
20http://data.europa.eu/euodp/en/data
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 107
paradigm. Although OSINT seems to be ideal, in reality it
is necessary to make it more sophisticated and applicable to
uncontrolled scenarios of the real world. As far as we know,
there are some challenges that are not solved yet and should
be faced by the research community:
A. Propagation of the gathering process
With the development of big data and data mining tech-
niques, it should not be a problem to avoid collecting data in
a manual manner. Although OSINT techniques (Section IV)
and tools (Section V) improve this, they work with single
and basic explorations. In this sense, it would be appealing
to implement refeed mechanisms by concatenating searches
from outputs to inputs. As a consequence, the original search
would also have, not only direct inferences, but also indirect
and not explicit relationships.
B. Integration of several open data sources
OSINT activities should consult as many sources as possible
in order to cover the widest possible spectrum. This means
that the system has to normalize the gathered information,
which is typically unstructured, in order to perform an effective
analysis. In this context, it is important to filter repeated items.
C. Detection of irrelevant data and misinformation
Due to the huge amount of data publicly available, an
OSINT process needs to be capable of distinguishing the
relevance of each piece of information, discarding data which
do not add value. Furthermore, it is crucial to detect misin-
formation that would corrupt the results. In fact, it would be
interesting to analyze information as well fake with the aim
of extracting intelligence.
D. Extension across the whole world
One of the main drawbacks of the existing OSINT resources
is that they are usually oriented to specific countries. As
a result, they are leaving aside other interesting open data
sources from different territories. Taking this negative issue
into account, interoperability is a desirable property to be
considered in OSINT design. Note that it would increase the
scope of the searches and the usage by end-users.
In Spain, for instance, we use tools that are designed in (and
for) foreign countries. However, there are not OSINT solutions
which include Spanish public repositories in the gathering
phase (as government open data platforms could be). In this
sense, we are not benefiting from the goldmine that supposes
being one of the most transparent countries of Europe.
E. Enhancement of the analysis process
OSINT analysis is not as intelligent as it could be. The
existing tools are limited to throwing all the information found
and its relationships. However, the analysis process should
incorporate semantic analysis, study of patterns, correlation
with other events, occurrences or datasets. Ideally, the OSINT
of the future should be able to provide the end user with the
specific piece of information he/she is searching, as well as to
return convincing answers in investigations.
F. Awareness of ethical and legal considerations
The use of OSINT should be restricted to legal activities
and non-malicious purposes. To achieve this, OSINT has to
be designed respecting the user’s privacy and data protection
laws. Furthermore, OSINT tool developers should also take
into account that the end-user could be a delinquent trying to
commit a crime. For this reason, the use of the most powerful
tools should be limited to LEAs and Intelligence Services.
G. Summary
All the abovementioned challenges build the path between
the Second Generation and the Third Generation of OSINT.
As it is presented in [24], the Second Generation started with
the rise of Internet and Social Media, and the challenges
were “technical expertise, virtual accessibility and constant
acquisition”. In contrast, the evolution to the Third Generation
is supposed to appear nowadays and will have to include
“direct and indirect machine processing of data, machine
learning, and automated reasoning”.
VIII. CONCLUSIONS AND FUTURE WORK
OSINT is changing the traditional intelligence processes
into an automated procedure capable of taking investigations
to all parts of the world. In fact, it is not only available for Law
Enforcement Agencies (LEAs) and Intelligence Services, but
also for curious people without technical training. However,
there is still a lack of serious approaches for transforming
OSINT into a robust and self-managed solution.
The paper described the status of this paradigm today. It
revealed that the effectiveness of current works is questionable
due to their poor application in real scenarios. The article
also presented some OSINT techniques for basic searches and
described the most sophisticated OSINT tools for advanced
investigations.
In the context of Spain, we pointed out some indications
which might confirm that Spanish LEAs use OSINT in their
internal procedure. Furthermore, we categorized Spain as a
goldmine due to its Open Data maturity. Actually, it is one
of the highest one of Europe according to the European Data
Portal.
Finally, the article outlined some open challenges related to
gathering, analyzing and extracting real knowledge from the
immersion of the internet. The future directions could address
such challenges by including advanced techniques in OSINT
processes in order to improve the current performance. To this
extent, the OSINT ultimate goal is to be able to ensure the
desired finding for a certain purpose, in an automated and a
self-driven way.
ACKNOWLEDGMENT
This work has been supported by a Leonardo Grant 2017
for Researchers and Cultural Creators awarded by the BBVA
Foundation; and by a Ram´
on y Cajal research contract (RYC-
2015-18210) granted by the MINECO (Spain) and co-funded
by the European Social Fund.
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
108 JNIC 2019
REFERENCES
[1] B. Akhgar, P. S. Bayerl, and F. Sampson. Open Source Intelligence
Investigation: From Strategy to Implementation. Springer Publishing
Company, Incorporated, 1st edition, 2017.
[2] C. Aliprandi, J. Arraiza Irujo, M. Cuadros, S. Maier, F. Melero, and
M. Raffaelli. Caper: Collaborative information, acquisition, processing,
exploitation and reporting for the prevention of organised crime. In
C. Stephanidis, editor, HCI International 2014 - Posters’ Extended Ab-
stracts, pages 147–152, Cham, 2014. Springer International Publishing.
[3] G. Bello-Orgaz, J. J. Jung, and D. Camacho. Social big data: Recent
achievements and new challenges. Information Fusion, 28:45 – 59, 2016.
[4] H. Chen, R. H. L. Chiang, and V. C. Storey. Business intelligence and
analytics: From big data to big impact. MIS Quarterly, 36(4):1165–1188,
2012.
[5] T. Day, H. Gibson, and S. Ramwell. Fusion of OSINT and Non-OSINT
Data, pages 133–152. Springer International Publishing, Cham, 2016.
[6] T. Delavallade, P. Bertrand, and V. Thouvenot. Extracting Future Crime
Indicators from Social Media. In Using Open Data to Detect Organized
Crime Threats, pages 167–198. Springer International Publishing, Cham,
2017.
[7] DiSIEM project. Diversity Enhacements for Security Information and
Event Management Project: http://disiem-project.eu/.
[8] C. S. Fleisher. Using open source data in developing competitive and
marketing intelligence. European Journal of Marketing, 42(7/8):852–
866, 2008.
[9] A. Gandomi and M. Haider. Beyond the hype: Big data concepts, meth-
ods, and analytics. International Journal of Information Management,
35(2):137 – 144, 2015.
[10] F. G ´
omez M´
armol, M. Gil P´
erez, and G. Mart´
ınez P´
erez. I don’t trust
ict: Research challenges in cyber security. In S. M. Habib, J. Vassileva,
S. Mauw, and M. M¨
uhlh¨
auser, editors, Trust Management X, pages 129–
136, Cham, 2016. Springer International Publishing.
[11] M. J. Hern´
andez, C. C. Pinz´
on, D. O. D´
ıaz, J. C. C. Garc´
ıa, and R. A.
Pinto. Open source intelligence (OSINT) as support of cybersecurity
operations. Use of OSINT in a colombian context and sentiment
Analysis. Revista Vnculos: Ciencia, tecnologa y sociedad, 15:29–40,
2018.
[12] J. Jang-Jaccard and S. Nepal. A survey of emerging threats in
cybersecurity. Journal of Computer and System Sciences, 80(5):973–
993, 2014.
[13] G. Kalpakis, T. Tsikrika, N. Cunningham, C. Iliou, S. Vrochidis,
J. Middleton, and I. Kompatsiaris. OSINT and the Dark Web, pages
111–132. Springer International Publishing, Cham, 2016.
[14] M. Kandias, D. Gritzalis, V. Stavrou, and K. Nikoloulis. Stress level
detection via OSN usage pattern and chronicity analysis: An OSINT
threat intelligence module. Computers & Security, 69:3–17, aug 2017.
[15] M. Kandias, L. Mitrou, V. Stavrou, and D. Gritzalis. Which side are you
on? A new Panopticon vs. privacy. In 2013 International Conference
on Security and Cryptography (SECRYPT), pages 1–13, July 2013.
[16] H. L. Larsen, J. M. Blanco, R. Pastor Pastor, and R. R. Yager,
editors. Using Open Data to Detect Organized Crime Threats. Springer
International Publishing, Cham, 2017.
[17] S. Lee and T. Shon. Open source intelligence base cyber threat inspec-
tion framework for critical infrastructures. In 2016 Future Technologies
Conference (FTC), pages 1030–1033. IEEE, dec 2016.
[18] S. C. Mercado. Sailing the Sea of OSINT in the Information Age.
Journal of the American Intelligence Professional, 48(3), 2004.
[19] P. Nespoli, D. Papamartzivanos, F. G´
omez M´
armol, and G. Kam-
bourakis. Optimal Countermeasures Selection against Cyber Attacks: A
Comprehensive Survey on Reaction Frameworks. IEEE Communications
Surveys and Tutorials, 20(2):1361–1396, 2018.
[20] R. P. Pastor and H. L. Larsen. Scanning of Open Data for Detection of
Emerging Organized Crime ThreatsThe ePOOLICE Project. In Using
Open Data to Detect Organized Crime Threats, pages 47–71. Springer
International Publishing, Cham, 2017.
[21] D. Quick and K.-K. R. Choo. Digital forensic intelligence: Data subsets
and Open Source Intelligence (DFINT+OSINT): A timely and cohesive
mix. Future Generation Computer Systems, 78:558–567, jan 2018.
[22] V. Santarcangelo, G. Oddo, M. Pilato, F. Valenti, and C. Fornaro.
Social opinion mining: An approach for italian language. In 2015 3rd
International Conference on Future Internet of Things and Cloud, pages
693–697, Aug 2015.
[23] B. L. William Wong. Fluidity and Rigour: Addressing the Design Con-
siderations for OSINT Tools and Processes, pages 167–185. Springer
International Publishing, Cham, 2016.
[24] H. J. Williams and I. Blum. Defining Second Generation Open Source
Intelligence (OSINT) for the Defense Enterprise, 2018.
Sesi´
on II: Monitorizaci´
on de eventos de seguridad
JNIC 2019 109