Conference PaperPDF Available

Semantic Tools for Forensics: A Highly Adaptable Framework

Authors:

Abstract

Textual information or data annotated with textual information (meta-information) are regular targets of securing or confiscating relevant material in the field of criminal proceedings. In general evaluation of relevant material is complex, especially the manual (re)search in the increasing amount of data as a result of cheaper storage capacity available nowadays therefore the identification of valid relations are enormously complex, error-prone and slow. In addition, the adherence to time limits and data privacy protection make searching even more difficult. The development of an (semi-)automatic high modular solution for exploration of this kind of data using capabilities of computer linguistic methods and technologies is presented in this work. From a scientific perspective, the biggest challenge is the au-tomatic handling of fragmented or defective texts and hidden semantics. A domain-specific language has been defined using the model-driven approach of the Eclipse Modeling Framework for the purpose of developing forensic taxonomies and ontologies. Based on this, role-based editors have been developed to allow the definition of case-based ontologies and taxonomies and the results of manual annotation of texts. The next steps required for further development are going to include comparison of several back-end frameworks, e.g., for indexing, information extraction, querying and the providing of a graphical representation of relations as a knowledge map. Finally, the overall process needs to be optimized and automated.
Semantic Tools for Forensics: A Highly Adaptable
Framework
Michael Spranger, Stefan Schildbach, Florian Heinke, Steffen Grunert and Dirk Labudde
University of Applied Sciences Mittweida
Bioinformatics Group
Department of MNI
Germany, 09691 Mittweida
Email: {name.surname}@hs-mittweida.de
Abstract—Textual information or data annotated with textual
information (meta-information) are regular targets of securing or
confiscating relevant material in the field of criminal proceedings.
In general evaluation of relevant material is complex, especially
the manual (re)search in the increasing amount of data as a result
of cheaper storage capacity available nowadays therefore the
identification of valid relations are enormously complex, error-
prone and slow. In addition, the adherence to time limits and
data privacy protection make searching even more difficult. The
development of an (semi-)automatic high modular solution for
exploration of this kind of data using capabilities of computer
linguistic methods and technologies is presented in this work.
From a scientific perspective, the biggest challenge is the au-
tomatic handling of fragmented or defective texts and hidden
semantics. A domain-specific language has been defined using
the model-driven approach of the Eclipse Modeling Framework
for the purpose of developing forensic taxonomies and ontologies.
Based on this, role-based editors have been developed to allow the
definition of case-based ontologies and taxonomies and the results
of manual annotation of texts. The next steps required for further
development are going to include comparison of several back-end
frameworks, e.g., for indexing, information extraction, querying
and the providing of a graphical representation of relations as a
knowledge map. Finally, the overall process needs to be optimized
and automated.
Keywordsforensic; ontology; taxonomy; querying; framework.
I. INTRODUCTION
The analysis of texts retrieved from a variety of sources,
e.g., secured or confiscated storage devices, computers and
social networks, as well as the extraction of information, are
two of the main tasks in criminal proceedings for agents or
other parties involved in forensic investigations. However, the
heterogeneity of data and the fast changeover of communi-
cation forms and technologies make it difficult to develop
one single tool covering all possibilities. In order to address
this problem, a domain framework is presented in this paper
applying computer linguistic methods and technologies on
forensic texts.
In this context, the term forensics relates to all textual
information which maybe used during the procedure of taking
evidence in a particular criminal proceeding. In particular, it
corresponds to the hidden information and relations between
entities achieved through the exploration and application of
computer linguistic processing of potential texts.
Generally, there are a variety of tasks which need to be
addressed:
Recognition of texts with a case-based criminalistic rel-
evance
Recognition of relations in these texts
Uncovering of relationship networks
Uncovering of planned activities
Identification or tracking of destructive texts
Identification or tracking of hidden semantics
In this context, the term hidden semantics is synonymous
with one kind of linguistic steganography, whereas such texts
are defined as ”...made to appear innocent in an open code.”
[1]. Each of these tasks can be processed and solved by
combining several highly specialized services that encapsulate
a problem solver based on a specific text mining technology.
This problem solver can be combined and recombined like a
tool kit to achieve a polymorphous behaviour depending on the
kind of texts and the particular question under investigation.
Basic structural concepts of an application framework suit-
able to deal with these problems are presented within this
paper. The previous steps of development will be outlined in
the following sections.
Development of criminalistic ontologies
Development of criminalistic corpora
Development of the framework’s architecture
Implementation of a prototype for manual evaluation
Specific ontologies and taxonomies are not being introduced
in this paper. Case-based specific ontology and taxonomy are
currently under evaluation applying the generic ontology editor
developed in this work and will be released soon together with
basic structures.
II. DEVELOPMENT OF CRIMINALISTIC ONTOLOGIES
The term ontology is commonly understood as a formal
and explicit specification of a common conceptualization. In
particular, it defines common classified terms and symbols
referred to a syntax and a network of associate relations
[2] [3]. Developing ontologies for criminalistic purposes is a
prior condition for annotating texts and raise questions in this
particular domain. The term taxonomy as a subset of ontology
is used for the classification of terms (concepts) in ontologies
27Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1
IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
and documents. On the one hand, a criminalistic ontology is
charactarised by its case-based polymorphic structure and on
the other by special terms used in criminal proceedings.
A domain-specific language is necessary at the beginning to
describe taxonomies and ontologies for the development of a
criminalistic ontology. The domain ontologies considered need
to be highly specialized by taking into account the individual
nuances of the particular criminal proceeding and the legal
requirements due to privacy protection. For these reasons, a
vast ontology covering all areas of crime is not employable.
Special case-based ontologies, in accordance to a suitable
predefined ontology, are necessary and preferably developed
by the person heading the investigation. Thus, it is important
that the definition of the predefined ontology is easily and case
specific adaptable.
The Eclipse Modeling Framework (EMF) [4] [5] has been
chosen for the purposes of this work mainly because of its
perfect integration into the Eclipse environment, but also for
participating in the manifold advantages of the approach of a
model-driven software development. To follow this paradigm,
the next step required is the definition of an abstract syntax
(meta-model) for describing such taxonomies and ontologies.
The meta-model created that way is used for generating a
concrete syntax, especially source code, that provides all
model and utility classes required.
In the literature there are different approaches for
representing semantics under discussion, with Topic Maps
have been proven to be one of the most expressive. Topic
Maps is an ISO-standardized technology for representation of
knowledge and its connection to other relevant information.
It enables multiple, concurrent, structurally unconstrained
views on sets of information objects and is especially
useful for filtering and structuring of unstructured texts [2]
[6]. Therefore the Topic Maps standard has been chosen
to be the starting point of the meta-model development.
Since EMF already includes options for persistence as
well as model searching and (strategic) traversing, only
the necessary syntactical elements and paradigms from
the ISO standard have been adopted. These syntactical
elements provide a complete description of semantic
relationships. Note, the specification given in this work takes
into account the specifics of the domain with respect to
slang, multilingualism and the underlying hidden semantics.
The syntactical elements used for further development
are defined below and examplified by Figure 1 attached.
Subject (Topic) red circle represents an abstract or
concrete entity in the domain to be
analyzed.
Instance (Topic) yellow circle is the concrete manifes-
tation of a subject.
Descriptor (Topic) orange circle typifies any other syn-
tactical elements; i.e. adds further de-
tails related.
Association blue rectangle is a relation between two
topics, usually subject and instance.
Association
Role
specifies the roles of the topics in an asso-
ciation (optional).
Occurrence corresponds to the concrete manifestation
of a topic in a resource, usually related to
an Instance.
Topic Name is the name representation of topics (con-
tainer).
Name Item denotes the name of a specific topic, asso-
ciated to a Scope.
Facet names a class of attributes of a topic and
can include several Facet Values.
Facet Value a particular attribute as distinct value, can
be a topic or another Facet.
Scope defines semantic layers; e.g. causing system
to focus by filtering particular syntactical
elements.
Figure 1: Use case tax fraud - an application of Topic
Maps derivative as developed under this work for modeling
a criminalistic ontology.
balance_sheet_2011.pdf
Descriptor (Topic)
Subject (Topic)
Instance (Topic)
Association
Association Role
Occurrence Facet
Topic Name
Name Item
Legend:
Umsatzsteuer A
sales tax B
Steuer A
tax B
unterschlägt A
embezzles B
Person A
person B
Unternehmer A
businessman B
Paul
symmetrisch A
symmetric B
false
name
deutsch
A
english
B
Scopes:
Bilanz A
balance sheet B
Facet Value
Reference Value
Fig. 1. Sample extract of the application ”tax fraud” demonstrating a
typical interaction of the different elements. The Subject person is described
more specifically by adding the Instance Paul. The person called Paul then
is related to the Subject tax, more specific the Instance sales tax. This
by the Association, described in more detail by the Descriptor embezzles.
Pauls role in this relationship is specified by the Descriptor of the Asso-
ciation Role Businessman. The Instance sales tax is creating an Occurence
specified by the Descriptor balance sheet referring to the Reference Value
balance sheet 2011.pdf attached as evidence.
After developing the Domain Specific Language (DSL), the
user interface of the ontology and taxonomy editor have been
designed in this work.
At this stage of the work, the real development of a
criminalistic classification and ontology has been initiated. The
basis for comprehending forensic data and its relationship to
28Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1
IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
case-based information has been achieved in cooperation with
the local criminal investigation department. In this way, a set
of metadata could be established entitled to be as close to
reality as possible.
III. DEVELOPMENT OF CRIMINALISTIC CORPORA
An extensive corpus is needed for the evaluation of the
implemented functionalities and development of powerful al-
gorithms in order to detect more semantic details, especially
in fragmented and defective texts and for detecting hidden
semantics. Building the extensive corpus required using orig-
inal data from prior preliminary investigations is not suitable
because of legal requirements of data privacy protection. This
data is exclusively available during the current proceedings.
An alternative method is the exploration of significant
characteristics of forensic texts and generating corpora in an
artificial way where it is possible to take a completely artificial
creation of text into consideration. This can be realized in two
ways. The first is character level based, which causes the text
to be alienated by non-words and unsuitable, but proper names
[7]. The second way, superior from our point of view, is based
on morphemes. While the occurrences of non-words can be
eliminated, the target language, in the current case German,
as a non-agglutinative language, raises problems among this
method in shaping and bending words [8]. In summary, the
basic problem with both approaches is the possible semantic
interruption of text units.
A further method is to generate texts by modifying existing
sources. In this case, the internet holds numerous poten-
tial domain-specific corpora. Analyzing significant websites,
ebooks or expert forums are just a few options for generating
suitable texts.
Concluding, the Internet-based concept is more valuable
for the project presented here. Therefore, a method for trans-
forming texts is necessary. Common approaches, like lookup
based exchanges of words (via free dictionaries), adapting typo
errors (missing, wrong or twisted letters) and manipulating the
orthography of words, are suitable in this case.
IV. THE F RA ME WO RK S ARCHITECTURE
Especially due to its platform independency, Java has been
used for the development. The high modularity is ensured
by employing the Eclipse RCP as a basis. Its OSGi [9]
implementation Equinox allows to construct service-oriented
architectures (SOA) within the Java Virtual Machine. The
framework conceptually consists of three main modules (see
Figure 2):
Ontology Machine it includes all functionalities for
developing criminalistic taxonomies and ontologies.
Indexing Machine it includes functions and methods for
extraction and annotation of forensic data.
Querying Machine it includes the functions for search-
ing and visualizing semantic coherences.
The framework is developed using the OSGi paradigm by
participating in its progressive concepts of service oriented
architectures, like loosely coupling, reusability, composability
(a) administrator
original data index data metadata
(c) agent
Indexing Machine
(b) division head
Ontology Machine
Querying Machine
Fig. 2. A black-box view on the new multiple-role framework. (a) The ad-
ministrator defines at least one taxonomy, in order to enable the classification
of texts using the Ontology Machine. This data will be stored at the metadata
server. (b) The person heading the investigation (division head) defines a case-
based ontology using the same machine. In addition he/she can annotate the
original data using the Indexing Machine. Whereas this machine combines
original data and metadata and transforms it to index data. (c) The agent
can access the system using the Querying Machine, which only has access to
reading the index data.
and statelessness. Each service encapsulates a single computer
linguistic method or technology. In this way, it is ensured
that new functionalities based on actual insights of research
can be added without adapting the framework’s architecture.
A qualitative scheme of the service landscape is depicted in
Figure 3. The core of the framework is split in three service-
tiers:
a) Persistence: In addition to index and metadata server
the persistence-tier includes the original data server. It keeps
sensitive and evidentiary data strictly separated from other
parts of the system. Its interface permits read-only access for
the system. The index server provides access to the processed
and annotated documents in their intermediate form. The
metadata server manages the ontology and taxonomy data in
addition to user accounts. In contrast to the original data server,
the interfaces of index and metadata server provide full access
to the system.
b) Logic: Four low-level services compose the core of
the logic-tier. The extracting service is mainly responsible for
extracting text from numerous data types, such as .doc, .pdf,
.jpg, etc. In addition, several filters for morphological analysis
can be applied. The document provider service transforms
the extracted data into the document-based intermediate form.
It collaborates as service composition with the extracting
service, therefore service consumers only need to utilize this
service. The main task of the index service is to provide
CRUD-operations (acronym for Create, Read, Update, Delete)
for accessing index data. It will be used for annotating and
querying the document’s index by the high-level services of
29Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1
IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
Annotation Service Query Service
Document
Provider Service
Ontology Provider
Service
Permission Service
Model ServiceIndex Service
Extracting Service
Metadata
Server
Index Server
Original Data
Server
PersistenceLogic
Access
Fig. 3. Architecture overview. The framework’s service landscape is divided
into three tiers. Each machine (see Fig. 2) can use services from the access-tier
directly. The logic-tier is providing atomic services and service compositions
to solve a single problem (The figure shows only a few services exemplified).
Accessing these services is only possible by utilizing services of the lower-
level tier. The persistence-tier is responsible for keeping data and contains
the three servers described in Fig. 2. It is only accessible by services of the
logic-tier.
the access-tier (see c). In the same way, the model service
is providing CRUD-operations for accessing metadata. This
service is being used by higher-level services working with
ontologies and user permissions.
c) Access: The access-tier contains the high-level ser-
vices for using the low-level services from outside of the
core. Subsequently the data is bound to the user interface. The
function of high-level services is similar to the facade pattern
[10]. The annotation service takes the documents from the doc-
ument provider service and enriches them with additional user-
specified data or data derived by other automatic information
extraction services. The index service is used for transforming
the data into the document-based intermediate form and pushes
them to the index server. The query service fetches index data
via the index service from this server, satisfying various filter
criteria. The ontology provider service has to perform two
tasks. On the one hand, it controls the collaborative access to
the ontology model. On the other hand, it provides CRUD-
operations on this on a higher level than the model service.
Finally, the permission service controls the access permissions
of each user to the well-defined data types (see I). Because the
user data model is developed in a model-driven way analogue
to the data model of ontologies and taxonomies this service
collaborates with the low-level model-service. Thus, the same
infrastructure as the ontology provider service can be used.
Especially the logic-tier is designed to include new func-
tionality, since its services have an open architecture for ex-
tending their capabilities. For example, they provide interfaces
for adding further services, such as text extraction methods,
machine learning algorithms, etc.
V. CONCLUSION
The development of a high-modular framework for applying
methods of natural language processing on forensic data is
discussed In this work. Its service-oriented architecture is
particularly suitable to include new functionality based on
actual insights of research. In this way new knowledge will
become available for the points of interest in shorter times.
The main task of the new framework is to support the
criminal proceedings in evaluation of forensic data. The con-
cept discussed in this paper is schematically summarized and
illustrated in Figure 4. As elucidated, the structure mentioned
gives the advantage that accessing and working with the
framework is reliably ensured by using the few high-level
services exclusively. In contrast, the service-compositions on
the lower level can be as complex as needed and can be
adapted at any time to achieve improved problem solving.
Task 1 Task 2 Task 3
ServerResponsible ServiceTask
index data index dataoriginal data
Text Extraction
Annotation
Service
Annotation
Annotation
Service
Querying
Query Service
offender
Object: car
Date:
2012/01/06
Fig. 4. Task matrix. Task 1) Texts can be extracted from the original data by
using the high-level Annotation Service. Task 2) Subsequently, extracted texts
can be annotated (semi-)automatically and indexed by the same service. To
ensure proper processing of this task, an ontology and taxonomy have to be
created before using the Ontology Provider Service (schematically depicted
in Figure 3). Task 3) At this point, each agent can access the indexed data
and create knowledge maps using the high-level Query Service.
Currently, the first prototype for manual annotation and
development of criminalistic taxonomies and ontologies is
evaluated in practice. In the next steps, the development of
powerful algorithms for automation is emphasized. Especially,
ways to extract information from defective texts and hidden
semantics will be evaluated and revised.
ACK NOW LE DG ME NT
The authors would like to thank the members of the criminal
investigation department Chemnitz/Erzgebirge (Germany). We
acknowledge funding by ”Europ¨
aischer Sozialfonds” (ESF),
30Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1
IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
the Free State of Saxony and the University of Applied
Sciences Mittweida.
REFERENCES
[1] Friedrich L. Bauer, Decrypted Secrets - Methods and maxims of Cryptol-
ogy, 1st ed. Berlin, Heidelberg, Germany: Springer, 1997.
[2] Andreas Dengel, Semantische Technologien, 1st ed. Heidelberg, Ger-
many: Spektrum Akademischer Verlag, 2012.
[3] Thomas R. Gruber, Toward Principles for the Design of Ontologies
Used for Knowledge Sharing. In Nicola Guarino and Roberto Poli (Eds),
Formal Ontology in Conceptual Analysis and Knowledge Representation,
Kluwer Academic Publishers, 1993.
[4] The Eclipse Foundation 2012, Eclipse Modeling Framework Project
(EMF), viewed 03 August 2012, http://www.eclipse.org/emf.
[5] Dave Steinberg, Frank Budinsky, Marcelo Paternostro and Ed Merks,
EMF Eclipse Modeling Framework, 3rd ed. Boston : Addison-Wesley,
2009.
[6] JTC 1/SC 34/WG 3, ISO/IEC 13250 - Topic Maps, Information Technol-
ogy, Document Description and Processing Languages, 2nd ed. 2002.
[7] Ilya Sutskever, James Martens, and Geoffrey Hinton, Generating Text
with Recurrent Neural Networks, In Proceedings of the International
Conference on Machine Learning (ICML), pp. 1017-1024, 2011.
[8] Kai-Uwe Carstensen, Christian Ebert, Cornelia Ebert, Susanne Jekat, Ralf
Klabunde and Hagen Langer, Computerlinguistik und Sprachtechnologie -
Eine Einf¨
uhrung, 3rd ed. Heidelberg, Germany: Spektrum Akademischer
Verlag, 2010.
[9] OSGiT M Alliance 2012, Technology, viewed 03 August 2012,
http://www.osgi.org/Technology/HomePage.
[10] Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides,
Design Patterns. Elements of Reusable Object-Oriented Software., 1st ed.
Amsterdam : Addison-Wesley Longman, 1994.
[11] Jeff McAffer, Paul VanderLei and Simon Archer, OSGi and Equinox
- Creating Highly Modular Java Systems, 1st ed. Boston : Addison-
Wesley, 2010.
31Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1
IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
... In order to meet the special needs and challenges of forensics, especially with regard to the dynamics of language in social networks, it is necessary to resort to expert knowledge. This knowledge can be represented in the form of a Forensic Topic Map (FoTM) as explained in detail in [20]. In particular, abstract threats are modeled here, which form the basis for the assessment and evaluation of the communication content. ...
... A prerequisite for the assessment of the probability that the danger occurs is once again the experience-based knowledge of the investigator, which must be available for each individual risk type, for example, in the FoTM as discussed in [20] [41]. ...
... The aim of the prototype's architecture is to implement the frameworks described in [20] as well as the sections above. It was programmed with Java and built as an Eclipse Rich Client Platform (RCP). ...
Article
Full-text available
Major incidents can disturb the state of balance of a society and it is important to increase the resilience of the society against such disturbances. There are different causes for major incidents, one of which are groups of individuals, for example at demonstrations. The ideal way to handle such events would be to prevent them, or at least provide information to ensure the appropriate security services are prepared. Nowadays, a lot of communication, even criminal, takes place in social networks, which, hence, provide the ideal ground to gain the necessary information, by monitoring such groups. In the present paper, we propose an application framework for knowledge-based social network monitoring. The ultimate goal is the prediction of shortterm activities, as well as the long-term development of potentially dangerous groups, based on sentiment and topic analysis and the identification of opinion-leaders. Here, we present the first steps to reach this goal, which include the assessment of the risk for a major incident caused by a group of individuals based on the sentiment in the social network groups and the topics discussed.
... The complexity of the evaluation makes it difficult to develop one single tool covering all fields of application. In order to address this problem, a domain framework is currently under development (see [5] for further discussions). ...
... In order to define the extraction tasks as well as to introduce case-based knowledge the first of all is the creation of the criminological ontology in its specialized form as Topic Map, which we have developed in an earlier work [5]. This step may be supported by using existing ontologies created in similar previous cases. ...
... The ontology model we used is based on the Topic Map standard. In our previous work [5], we stated that each topic can contain a set of facets. These facets are used beside others to model rules that an inference machine can use to reason the appropriate role of an entity within a post-process. ...
Article
Full-text available
The analysis of digital media and particularly texts acquired in the context of police securing/seizure is currently a very time-consuming, error-prone and largely manual process. Nevertheless, such analysis are often crucial for finding evidential information in criminal proceedings in general as well as fulfilling any judicial investigation mandate. Therefore, an integrated and knowledge-based computational solution for supporting the analysis and subsequent evaluation process is currently developed by the authors. In this work, we outline the main ideas of this framework and present an approach for categorizing texts with adjustable precision combining rule-based decision formula and machine learning techniques. Furthermore, we introduce a text processing pipeline for deep analysis of forensic texts as well as approaches towards solving domain specific problems like detection and understanding of hidden semantics as well as the automatic assignment of forensic roles.
... The complexity of the evaluation makes it difficult to develop one single tool covering all fields of application. In order to address this problem, a domain framework is currently under development (see [4] for further discussions). ...
... In order to define the extraction tasks as well as to introduce case-based knowledge the first of all is the creation of the criminological ontology in its specialized form as Topic Map we have developed in an earlier work [4]. This step may be supported by using existing ontologies created in similar previous cases. ...
... The ontology model we used is based on the Topic Map standard. In our previous work [4] we stated that each topic can contain a set of facets. These facets are used beside others to model rules that an inference machine can use to reason the appropriate role of an entity within a post-process. ...
Conference Paper
Full-text available
The analysis of digital media and particularly texts acquired in the context of police securing/seizure is currently a very time-consuming, error-prone and largely manual process. Nevertheless, such analysis are often crucial for finding evidential information in criminal proceedings in general as well as fulfilling any judicial investigation mandate. Therefore, an integrated computational solution for supporting the analysis and subsequent evaluation process is currently developed by the authors. In this work, we present an approach for categorizing texts with adjustable precision combining rule-based decision formula and machine learning techniques. Furthermore, we introduce a text processing pipeline for deep analysis of forensic texts as well as an approach for the identification of criminological roles.
... This semantic storage is considered to be utilized in future development of reports and relevant case suggestions based on collected facts and data. The frameworks described by Spranger [13] is compatible and would allow semantic evaluation of forensic data. The search in external system is handled by the system integration component with custom adapters to all registered systems. ...
Conference Paper
Full-text available
The paper introduces a blueprint for system that automates and facilitates the process of medico-legal exams and inquests. The system responsibilities are to mediate the communication between institutions and forensic doctors and to organize, store and archive tasks and data based on the legal regulations. Currently in Bulgaria and the Balkans there is no fully computerized solution which is the main reason for this paper. The system would allow examiners to focus on the actual medical work and not on the paper work. Key advantages of the system are the automation in the process, the communication with external systems and participants, the dynamic modelling of the processes and the security model that would keep patient privacy and data integrity.
... This manual task is very time consuming and therefore not economically justifiable in the cases of small and medium crimes. Analysing and evaluating forensic texts in an automatic way is generally challenging, as shown by the authors in previous work [1] [2]. ...
Conference Paper
Full-text available
Mobile devices are a popular means for planning, appointment and conduct of criminal offences. In particular, short messages and chats therefore often contain evidential information. Due to the terms of their use these types of messages are fun- damentally different from other forms of written communication regarding their grammatical and syntactic structure. The decline in prices of storage media leads to removing messages rarely. On the one hand this fact is quite positive as possible evidential information is not lost. On the other hand, considering only short messages, 15,000 and more on only one cell phone is not uncommon. In the most cases of organized or gang crime there is not one but many devices in use. Analysing this huge amount of messages manually is time consuming and therefore just at small and medium crime not economically justifiable. In this work we propose an process chain which enables to de- crease the analysis and evaluation time dramatically by reducing the messages which needs to be examined manually.
Conference Paper
Full-text available
Abstract—Digital forensics is the digital equivalence of traditional crime investigations that leverages digital technologies to facilitate criminal investigations. The expertise and previous experience of an investigator or any security agent in traditional forensic investigations play an important role in the success, efficiency, and effectiveness of the investigation. Similarly employing the power of intelligence in the current computational resources in the digital investigation process will lead to more efficient and effective digital investigation results. This study builds on existing works in the literature to propose the use of clustering and classification machine learning algorithms on evolving database files to achieve intelligence in the digital investigation process and finally come up with a comprehensive block diagram as a general-purpose framework for intelligent forensic investigation. Keywords— cybercrime, crime variables, digital forensics, intelligence, machine learning
Conference Paper
Full-text available
Efficient and sensitive monitoring of social networks has become increasingly important for criminal investigations and crime prevention during the last years. However, with the growing amount of data and increasing complexity to be considered, monitoring tasks are difficult to handle, up to a point where manual observation is infeasible in most cases and, thus, automated systems are very much needed. In this paper a system of adaptive agents is proposed which aims at monitoring publicly accessible parts of a given social network for malign actions, such as propaganda, hate speeches or other malicious posts and comments made by groups or individuals. Subsequently, some of these agents try to gain access to crime relevant information exchanged in closed environments said individuals or groups are potentially part of. The presented monitoring and investigation processes are implemented by mimicking central aspects of the human immune system. The monitoring processes are realized by network-traversing informant units similar to pathogen-sensing macrophages, which initialize the human immune response. The subsequent investigation process is commenced by gathering information automatically about the targeted individual or group. Furthermore, based on the gathered information one can identify closed and usually inaccessible environments in the social network (e.g. private groups). Using so-called endoceptor units—automatically generated social bots imitating environment-typical appearance and communication— closed environments are accessed through individuals susceptible to the bot’s strategy. Once being part of the closed network, an endoceptor aims to intercept and report back crime relevant communications and information to the investigators.
Chapter
Die Analyse digitaler forensischer Texte ist regelmäßig eine Aufgabe in vielen Bereichen kriminalistischer Ermittlungen. Als digitale forensische Texte werden in diesem Zusammenhang alle Texte bezeichnet, welche auf digitalen Speichermedien sichergestellt wurden, weil sie entweder selbst inkriminiert oder anderweitig geeignet sind, eine begangene strafbare Handlungen, i. S. d. materiellen Strafrechts, auch teilweise, zu rekonstruieren oder zu beweisen, und somit Gegenstand kriminalistischer Ermittlungen mit dem Ziel der Beweissicherung geworden sind. Die kriminalistische Relevanz reicht dabei von Computerstraftaten, wie Computerbetrug oder Computerspionage, bis hin zu klassischen Delikten aus den Bereichen der Betrug, Wirtschafts- oder Rauschgiftkriminalität, die mit Unterstützung von Computern, mobilen Endgeräten oder des Internets begangen worden sind.
Conference Paper
Full-text available
In the field of criminal proceedings a large quantity of textual material is frequently confiscated or secured by criminologists for evaluating and conserving of evidential information or fulfilling any judicial investigation mandate. The search for specific information or finding of correlations between virtually countless documents is currently a time-consuming hand-crafted work. The di ffi culties remain in the identification of evidential documents and valid relations between entities on the one hand and the adherence to time-limits and data privacy-protection on the other. In this work, an integrated computational solution developed by the authors for supporting the evaluation process of forensic texts using computer linguistic technologies is outlined. The application framework under construction is designed towards a QA- system and especially being able to solve a specific criminal issue, and visualize issue-centred case-relevant relationships. For this purpose, several state-of-the-art techniques in the fields of text categorization and information / event extraction are analysed with respect to their suitability for the peculiarities of the considered domain. Subsequently, several approaches for solving domain- specific problems are introduced. The results of this study will form the basis for constituent parts of the currently developed framework.
Article
Full-text available
Recent work in Artificial Intelligence (AI) is exploring the use of formal ontologies as a way of specifying content-specific agreements for the sharing and reuse of knowledge among software entities. We take an engineering perspective on the development of such ontologies. Formal ontologies are viewed as designed artifacts, formulated for specific purposes and evaluated against objective design criteria. We describe the role of ontologies in supporting knowledge sharing activities, and then present a set of criteria to guide the development of ontologies for these purposes. We show how these criteria are applied in case studies from the design of ontologies for engineering mathematics and bibliographic data. Selected design decisions are discussed, and alternative representation choices are evaluated against the design criteria.
Conference Paper
Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or “gated”) connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for characterlevel language modeling – a hierarchical nonparametric sequence model. To our knowledge this represents the largest recurrent neural network application to date. 1.
Book
Cryptology, for millennia a "secret science", is rapidly gaining in practical importance for the protection of communication channels, databases, and software. Beside its role in computerized information systems (public key systems), more and more applications within computer systems and networks are appearing, which also extend to access rights and source file protection. The first part of this book treats secret codes and their uses - cryptography. The second part deals with the process of covertly decrypting a secret code - cryptanaly-sis - where in particular advice on assessing methods is given. The book presupposes only elementary mathematical knowledge. Spiced with a wealth of exciting, amusing, and sometimes personal stories from the history of cryptology, it will also interest general readers. "Decrypted Secrets" has become a standard book on cryptology. This 3rd edition has again been revised and extended in many technical and biographical details. "The best single book on cryptology today" (David Kahn, Cryptologia) "For those who work actively with cryptology this book is a must. For amateurs it is an important dictionary which in many cases will guide them to make their ciphers more secure." (Arne Fransen, International Intelligence History Study Group). © Springer-Verlag Berlin Heidelberg 1997, 2000, 2002, 2007. All rights are reserved.
Article
The book is an introduction to the idea of design patterns in software engineering, and a catalog of twenty-three common patterns. The nice thing is, most experienced OOP designers will find out they've known about patterns all along. It's just that they've never considered them as such, or tried to centralize the idea behind a given pattern so that it will be easily reusable.
OSGi and Equinox - Creating Highly Modular Java Systems, 1st ed. Wesley Heidelberg, Ger-Boston Addison- 31Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1 IMMM 2012
  • Jeff Mcaffer
  • Paul Vanderlei
  • Simon Archer
Jeff McAffer, Paul VanderLei and Simon Archer, OSGi and Equinox - Creating Highly Modular Java Systems, 1st ed. Wesley, 2010. Heidelberg, Ger-Boston : Addison-Wesley, 2002. viewed 03 August 2012, Boston : Addison- 31Copyright (c) IARIA, 2012. ISBN: 978-1-61208-227-1 IMMM 2012 : The Second International Conference on Advances in Information Mining and Management
Marcelo Paternostro and Ed Merks, EMF Eclipse Modeling Framework
  • Dave Steinberg
  • Frank Budinsky
Dave Steinberg, Frank Budinsky, Marcelo Paternostro and Ed Merks, EMF Eclipse Modeling Framework, 3rd ed. Boston : Addison-Wesley, 2009.
  • T Osgi
  • Alliance
OSGi T M Alliance 2012, Technology, viewed 03 August 2012, http://www.osgi.org/Technology/HomePage.
ISO/IEC 13250 -Topic Maps, Information Technology , Document Description and Processing Languages
JTC 1/SC 34/WG 3, ISO/IEC 13250 -Topic Maps, Information Technology, Document Description and Processing Languages, 2nd ed. 2002.