Conference PaperPDF Available

Problems of Privacy, Security, Identity, Integrity, Legality and Confidentiality in Internet Crime investigation and evidence collection

Authors:

Abstract

In the investigation of cybercrime, the investigation usually starts from the data collected from a suspect’s computer and in the case of networked computers also from network monitoring logs. There also exist a large number of sources of information that can both assist in the building of a picture of the network device in the context of other similar activity and also provide evidence of the wider criminal interconnections. Many of the sources of data are widely used outside of criminal investigation. We have summarised the more significant such data collection exercises to show how they may have value in the network forensics in earlier work [1]. Such data sources are usually published by neutral and independent publishers and may also contain errors and/or forgeries questioning their integrity in using them for evidence collection and prosecution. The methods employed to gather data from various sources to create public databases as well as usage of these databases have some security and privacy implications for internet users. There are also legal issues with respect to creating, publishing and using such databases in evidence collection and prosecution. This paper aims to discuss problems of integrity, privacy, security and confidentiality in using various data sources in cybercrime investigation and evidence collection.
1
Problems of Privacy, Security, Identity, Integrity, Legality and Confidentiality in Internet Crime
investigation and evidence collection
Simon Prior, Brian Tompsett,
Department of Computer Science, University of Hull
Hull, HU6 7RX, UK
S.Prior@hull.ac.uk, B.C.Tompsett@dcs.hull.ac.uk
Abstract. In the investigation of cybercrime, the investigation usually
starts from the data collected from a suspect’s computer and in the case
of networked computers also from network monitoring logs. There also
exist a large number of sources of information that can both assist in
the building of a picture of the network device in the context of other
similar activity and also provide evidence of the wider criminal
interconnections. Many of the sources of data are widely used outside
of criminal investigation. We have summarised the more significant
such data collection exercises to show how they may have value in the
network forensics in earlier work [1]. Such data sources are usually
published by neutral and independent publishers and may also contain
errors and/or forgeries questioning their integrity in using them for
evidence collection and prosecution. The methods employed to gather
data from various sources to create public databases as well as usage of
these databases have some security and privacy implications for
internet users. There are also legal issues with respect to creating,
publishing and using such databases in evidence collection and
prosecution. This paper aims to discuss problems of integrity, privacy,
security and confidentiality in using various data sources in cybercrime
investigation and evidence collection.
1. Introduction
The Internet has become a large and prominent part in the lives of the majority of the
world’s population. With the Internet becoming such a large part of everyone’s lives,
it means there is much more crime that involves the Internet in some way. This means
that there is a larger desire for digital evidence to be available for use in prosecution
of the offenders. There are many issues which may affect the integrity and the legality
of the data being used. It is therefore, important that the data is collected in such a
way that the offender can have no complaints about how the data was collaborated.
There are various techniques that can be used to collect the data needed for evidence,
the ones explored include Private and Public Internet Databases and Honeypots, these
can help to create cyberprofiles of the suspects involved and give a more rounded
picture of how they are thinking and what kind of activities they get up to regarding
Internet crime. The Cyberprofiling project [2] is a project aimed to combine
criminological techniques of profiling with internet abuse and computer forensics.
This applies the expertise of Lawyers, Criminologists, Computer Forensics and
Internet Specialists, panning across the Universities of Hull, Sheffield and Teesside.
The idea is to show that many of the techniques used to detect ‘traditional’ criminals
can be used to create profiles of ‘cyber-criminals’, these are the techniques used in
Offender profiling [3] and Geographic profiling [4].
2
2. Nature of Collected Data
The data that is collected from system logs, honeypots and the Internet Databases can
be seen as being quite similar; they all hold data that could be useful in crime
detection, criminal profiling or forensics in that they could potentially incriminate
someone of malicious intent.
Earlier work[1] has outlined the data that appears in the internet databases will
include information such as the IP address where the traffic originated and detailed
further information, which varies with the nature of the particular database, but might
include, ownership information, information on a particular attack, and so forth. The
databases also differ in how this data is originated and collected. Some invite people
to enter data or it is created by software that automatically generates records. The
different databases have different functions, such as detecting the origin of spam or
holding information on abusive hosts, but ultimately they are all there for the same
reasons, this is to detect and hold data for public viewing and sharing. The data from
this is generally used to understand the nature of problematic network traffic, then the
users that have read and analysed the information in these databases will be able to
develop new ways of protecting their systems from future dangers.
Like the public Internet databases, the system log files can vary from system to
system, however, the data will be similar as they are all there for a similar purpose,
which is to monitor system usage, which can aid the detection of malicious access and
give enough information for the administrators of these machines to learn about the
activities that the attacker has been attempting. When it comes to using this data, the
formats that it comes in are similar; they will be lists of IP addresses and the details
that relate to them.
Reliance on public databases or on internal databases such as system logs is not
sufficient to provide all the data on suspect behaviour on the internet that is necessary
to detect or profile crime in this area. There are a wide variety of types of crime that
utilise the internet[5] and a more specific or specialised form of data collection tool is
needed, that relies less on human intervention. One of the major techniques is called
the Honeypot[6]. This places a computer or software on the internet, which perform
no purpose other than to log access and monitor activity to generate logs of suspicious
behaviour. These logs are most valuable in performing profiling and supplementing
the data from other sources.
The honeypots collect data and create logs of all the access there has been to the ports
and sockets on that particular machine. The data generally includes all the details
about the connection, including the time and date, the IP address where the
connection came from, the port it was recorded on, the protocol and the data sent in
the packets, this data that was sent in the packets will be useful when it comes to
finding out the reasons behind the attack to help create the profile. This data will be
formed into a log for the session that the honeypot was loaded up for; these may be
very hard to read, especially for the ones that have had a lot of hits from potential
attackers. The logs would need to be searched through with great detail to find the
important hits that actually could have been intended to cause damage. Depending on
the type of interaction that the honeypot is detecting, may also detect different kinds
of attacks. There are honeypots that have the ability to detect every single interaction
3
that comes through any port and will also provide interaction with the attacker to fool
them into thinking that they have actually attacked a vulnerable machine. The
honeypot logs can be filtered to only display the information that is necessary. It may
be possible to find a certain IP in a honeypot log file, and then use this IP to search the
internet databases to see if there are any other matches relating to slightly different
fields of computer crime other than malicious access to a system, this will help build
up a profile on this particular IP address.
3. Integrity of Collected Data
With the increasing worry that criminals are using the Internet as a tool for their
crimes, it is important to have Internet Databases or Intrusion Detection Systems such
as Honeypots available to collect data. This is so that everyone can feel slightly more
secure about their computers because they know that any hacker with malicious intent
will be identified by the systems and their data will be logged. This goes for the
malicious email spammer or the hacker who is intent on causing damage to important
computer systems.
The Internet databases store endless amounts of IP addresses that have been marked
as suspect for a particular reason and with there being so many different databases; it
is possible that if a suspicious IP address isn’t in one, then it should be in another.
With all the internet databases storing data for different reasons, it may be possible to
find that certain IP addresses may be marked for more than one reason. This could be
used to start to form a profile about that IP address, which would then make people
more aware of what this person of suspicious intent may be up to, meaning that the
next activity to be attempted by this IP could be predicted or at best, thwarted. The
data in these databases is generally either uploaded by users or generated by specially
developed software and displayed, but with this data comes integrity issues. The
publishers of this data are often concerned about incorrect, or even malicious, records
that incorrectly implicate an internet location. This is why a lot of the databases will
have a form of monitoring to check out the data and also have mechanisms to identify
particular data so they can be checked to see if they are a reliable source. This is all to
stop ‘poisoned’ data from becoming available which could compromise systems if
users try to protect their systems from it, and equally compromise a criminal
investigation if this false information was used. It is important, therefore, that data in
these databases is correct and accurate as it will affect the users who use this data as
their guide, and also it will affect the reputation of the database itself. Low reputation
databases will suffer by having less data submitted, and will eventually become
totally irrelevant and unused.
Using data from these databases can be risky as there is a chance that the person that
is being investigated can find out, they can then attempt to divert the investigation by
providing lots of false and trivial data. Fortunately, the databases generally have a
filtering system attached to them and this can provide them with the opportunity to
separate the important severe attacks from the pointless trivial attempts.
With the Honeypots that have been used on the project so far[7], it has been seen that
they pick up all network traffic that has come in contact with them, some show the
data in more detail than others and also, where one honeypot may see the data as a
threat, another will just acknowledge that it has had a connection and think nothing
4
more of it. It is therefore important that the user can get as much information from the
honeypot as possible so that they can make their own judgement as to whether the
data is a danger or not. Even honeypots that don’t pick up general network traffic will
still find that not all the data that it has picked up is malicious, there may be access
made by the network administrator who may just be trying to find out what it is at that
particular network address.
The problem with the created data files is that in the wrong hands, they could be
doctored fairly easy by someone who has the knowledge, so the only way of
definitely knowing that the file is totally pure and correct is to make sure that the data
is sent straight to the team that are putting the profile together. This way, the team will
know that the data is completely valid and there was no way it could’ve been
tampered with on the way.
3.1 Issues will Collection and Conglomeration
With creating the profile, there will be several issues regarding collecting data and
merging data from different and diverse sources. The collection of data is an
important procedure as we need to be sure that only information that is useful for the
profiles is collected. This would eliminate all the data that does not have any
relevance to profiling. When putting the data together, it is important that it is put into
some kind of order such as chronological although it will not always be appropriate.
This is because data will have come from all kinds of sources that it may be difficult
to distinguish which came first, especially with data that came about at similar or
identical times.
The data used from these logs and databases, may not be enough on its own to create
the profile, there may also be the need to draw on other databases to get personal
details, this stage would not be necessary until there had been a large amount of
malicious activity that there was the desire to stop this person. Otherwise, there would
be more highly sensitive material floating around the profiles than is needed.
As stated in the Data Protection Act[8], all data must be fairly and lawfully processed,
which means the profile has to show an accurate representation of the data that is
actually there, the data cannot be used to paint a worse picture than there really is.
This however, may be difficult to do anyway with the data that is available to the
cause from the log files and the databases.
When it comes to sharing information, it is highly important that certain highly
sensitive data is hidden from all the sources; this is where there is the need for
anonymising. This would mean there would be no personal data on view, only the
suspicious data would be seen, there would be no link to the people that had been
suspected of sending it. This can be seen in the case that recently hit the news about
AOL releasing their search results [9], all the personal data was anonymised so
nobody could be incriminated for their search results but there were still issues over
some of the data that was released as there were a lot of searches for peoples names
which could have meant that they were egotistical searches for their own names. This
would not be the case with the logs as the only information that would be in there
would be technical information about the ports and protocols that were used.
5
There have been issues that have arisen due to the sharing of the honeypot and
database logs, and the Information Commissioner has indicated that by sharing the
logs, we risk sharing personal information, as there is belief that the IP addresses may
be a direct link to individuals. The argument on this within the Cyberprofiling project
is that an IP address cannot be directly related to a human being; it would relate to a
certain machine or a network but is not seen as direct personal information. There is
the chance that it can be related to personal information by linking it to databases of
other details but in its raw data form, it is not seen as personal information.
4. Handling of Data as Evidence
The data collected from the honeypots and from the Internet Databases will be in a
very general format as it will include all the data in a log. This would not be very
useful on its own as evidence, as it would need to have been set up within a corporate
environment for everyday use, not just set up because there is the suspicion of
malicious attacks to the system. In the cyberprofiling project, would be used to help
create a cyberprofile of the attacker, mapping all of their activity and be able to
predict when and where they would attack next and what they would do in their
attack. This can help the prosecution in court when they have a more rounded
understanding of an individual because they have a profile of all they are accountable
for.
If just the logs were to be used in court, there would have to be guidelines on how the
data should be handled and if they were gained improperly, they may not have been
allowed in court. As mentioned above, the logs would have to be set up as part of the
day to day running a company and not just because of suspicions of malicious attacks.
The main way for the data to be handled would be for the prosecution to be able to
provide a solid chain of custody from the day the log was completed to the start of the
trial and can account for every access that people have had to the log since it was
completed, why they had access and what they did while they had access. With the
profiles being created and using these logs as part of the profile, it is important that
the relevant data is pulled from the log files to help make them as accurate as
possible. This brings the first major difference between using log files and cyber
profiles, log files have a lot of data in them that would not be relevant, whereas
profiles will be full of information which is dedicated purely to the cause.
5. Human Rights and Issues of Privacy
We must also be sensitive to the issues of human rights and privacy in relation to the
handling and storing of this data. Users may feel that they should always have access
to data that involves them. With information gained from honeypots and Internet
Databases, this is not always possible. With the honeypots, hackers will not even
know that their access has been recorded because they will feel that the machine they
are accessing is a normal machine. This is because they are receiving responses from
it in exactly the way they would expect. So, when they finally find that they have been
caught hacking, they have no idea. The hacker then reacts because they are totally
unaware of the log, and they may start to make out that the people who set up the log
6
are in the wrong, but surely by doing something illegal there is always a risk of being
recorded? There is the risk of being caught every time they attempt to hack into a
machine and if they are that worried about their data being held, then they shouldn’t
hack in the first place, it’s a case of the hacker having their freedom if they stay on the
correct side of the law, as soon as the line is crossed, the freedom is no longer there.
Storing data can be useful but some may have issues with privacy when it comes to
the data being released for public viewing, one example of this is the AOL search
results being released, although they were released to researchers to look into
profiling people on their search keywords, it caused uproar that sensitive personal
data about users was available. This would be the same reaction if some of the
profiling data was released to the public, once the profiles have been created.
If they have been logged on different kinds of databases and a cyber profile has been
created on them, the hacker will feel that they have had their privacy taken away as
they will now be monitored with anything they do on the Internet. Some may say that
this is deserved; others would claim this is wrong and it is creating the ‘big brother’
effect and that it is another case of people being monitored.
6
. Data Storage and Retention
When creating the cyberprofile, it is important that only the relevant data is stored and
not data that isn’t valid to the cause. Also, the data must only be stored for the
relevant cause and not be released to people that are not involved with the data. This
is important because if this data is leaked out, it could cause major problems,
especially for the people whose data has been released. This may cause them to take
legal action against the company. If the data is to be used in a court of law, it is
imperative that the data is only seen by the parties that will be in court to use the
evidence and make a verdict, otherwise, this data will become void and will not be
authorised to use.
It is important to make sure that the data is not retained for longer than necessary and
if the data hasn’t been used for so long it should be discarded of.
7
. Law Enforcement vs. Private Investigator
The police along with other law enforcement agencies and private investigators will
all be investigating for the same cause when it comes to detecting cyber crime, but
their methods will be different due to their status and their legal restrictions.
Law Enforcement will not be able to routinely collect evidence from logs, as they are
regulated in the way they can routinely collect evidence and conduct surveillance of
the public. We do not live in a ‘Police State’ therefore they are not allowed to
constantly monitor everyone’s movements, all of the time. This makes the use of
many of the data collection techniques we have outlined to be problematic when used
for profiling by a law enforcement agency. However, conversely, the honeypot system
only records accesses made to systems voluntarily. It would not be the results of a
search or seizure to simply record those that attempt to access a police system. The
dilemma between these two extreme positions shows the ambiguous nature of the use
7
of data logging in criminal profiling, which continues to require further resolutuion by
the law.
Private Investigators on the other hand, have more leeway over their investigation
techniques, there are less legal restrictions than the police would have against them,
the private investigator will be able to research more ways, and are mainly restricted
by privacy legislation. This means they would be able to use honeypots and such
much more. Some jurisdictions do have legal limitations on the handling of evidence
by unregistered investigators, and this can cause a problem for private use of
honeypots. If the monitoring shows that an offence has taken place, and this evidence
is not passed to the authorities it can create a legal problem.
This shows that there are currently legal problems with the global use of honeypots
and the collection of information for cyberprofiling purposes by both law enforcement
and other agencies. This is a situation that will require a greater degree of clarity, with
the expansion of digital crimes will come the greater need for capabilities such as
profiling.
Law enforcement bodies do have one advantage in investigations in that they have the
power to access information that the Private Investigator will not be able to do; in that
they can use search warrants, subpoenas and other legal means to force disclosure.
8. Suggested Approach
The main problem for the data is the sensitivity of some of the information in the logs,
namely the IP addresses, whether this is classed as personal information or not,
something has to be put into place to make sure that this information is not released.
Others have started to address the issue of log anonymisation[10], but the available
methods still require further refinement. There were some techniques suggested on
how to anonymise the IP addresses, such as “black marker anonymisation”, this just
simply replaces the IP address with a constant. This would take away the data from
public viewing but the data will lose its network structure, also, this procedure is
totally irreversible.
Another of the techniques was to truncate the values, and choose the point at which to
truncate from, such as to the network used, this may be ok but the addresses would
lose their individuality and it would be difficult to detect the difference between two
addresses that may be in the same subnet. Information is lost and the process again is
non reversible.
The third technique was to use random permutations of the IP addresses, this means
there is a one to one correlation between the original IP and the anonymised IP, this
process is reversible but only by people who know the permutation. Again data is lost
about the IP address but actions by a certain IP can be grouped together.
The fourth and final technique that has been previously tried was pseudonymising the
data and a more advanced version of this was ‘prefix-preserving pseudonymisation’,
pseudonymisation involves mapping the IP address to a pseudonym that doesn’t have
to be another IP address. With the prefix preserving, the original IP is mapped to a
pseudo-random anonymised IP address by the use of a specific function. This enables
8
the network structure and all the subnets to be preserved and the data stays totally
anonymised.
The idea that is being suggested on this project is to use anonymisation that is similar
to that which is used in a Network Address Translator. A network address translator is
used as a router between two IP networks with different internal and external
addresses. It maps IP addresses from one numbering to another. The IP address that is
in the logs would be reassigned to the private network 10.0.0.0. Although this would
involve mapping the whole 32 bit IPv4 address space into the 24 bits available, this is
likely to be ample for the subset used in profiling, and can map each Internet network
onto an internal subnet number for the purposes of anonymisation, for example
150.237.92.11 could be assigned to 10.128.0.1. As each new network address is
encountered it could be allocated a new dynamic IP address much as DHCP server
would. If the original IP is in the same subnet as another original IP, it will be in the
same subnet in the newly generated private network. This means that the network
structure of the original network will be retained and also the structure of the subnets
will also be retained. The problem of arranging the boundary between subnet numbers
and host numbers can also be resolved, as recommended by internet best practise, by
retaining the largest number of binary zeroes between the subnet numbers and the
host numbers. That is subnet numbers start from 128 and host numbers start at one,
and the space between them is retained for as long as possible. The first network
anonymised would be 128, the second 64, the third 192 and so on. The first host on
that network would be 1, the second two the third three and so on.
The issue remains of who has the authority to reverse the anonymisation. The scheme
requires a master database of the mappings from non-anonymised to anonymised
addresses. If this database is too widely available, then the mapping does not retain its
anonymity. If the database is entirely secure, then data cannot be extracted from it for
the purposes of criminal detection and prosecution, which is counter to the whole
crime reduction ethos of the project, but it does enable the necessary degree of
privacy and data protection.
The database that maps from public to anonymised IPs can be managed centrally,
with a single authority, but there is also the opportunity to implement it in a
distributed manner, with the various data collectors controlling their own segment of
the anonymisation. This might actually be better for data protection purposes, as there
is no single authority than can extract or map any particular address.
In addition to IP addresses, the databases are likely to contain other fields or
components that form the evidence that will also require automatic anonymisation.
These include web addresses, such as URLs, email addresses and so on. We can hide
original URLs by encoding every section of the URL, this will be so it is not
recognisable when the data is released, but it would still be structured in such a way
that it would be useful. If there were two URLs that are identical apart from the very
last part, the newly anonymised URLs will display this same fact. This idea will work
very similar to that of the ‘TinyURL’[11] system, where a long URL is encoded so
that it can be displayed as a much smaller URL It is just creating an alias to the
original URL so the same data would still be accessible.
9
The anonymisation would first of all consist of dividing the URL down to the first
part which signifies the address of the site, then the second part which would be the
path to the individual page which is being viewed; these parts could be anonymised
separately. There could be a way to encode it so that the URL would retain its
mapping, meaning that pages from the same set of websites would still be recognised
as coming from the same site. An example of this would be the BBC internet sites,
there is ‘news.bbc.co.uk’ and ‘www.bbc.co.uk’, linking to different pages, yet they
are both obviously part of the BBC system. Therefore, pages that are anonymised,
will be done so according to their mapping. There would be an issue that would arise
with the anonymisation for two sites that have the same path to a file but the initial
URL is different, the paths should be encoded the same and attached to the
anonymised URL, but would they then map the same? This may give hackers the
opportunity to deduce the file from either site. Therefore there is possibly the need for
the anonymisation to be more random, perhaps changing the paths encoding so that it
depends on that of the initial URLs, so that some form of detail from the URL is used
in encoding the path. With the need for it to be more random, there may be the chance
that there could be the loss of information. This could be classed as ‘The
Anonymisers Dilemma’.
Another feature to be anonymised would be email addresses, so that they would be
encoded similar to anonymous relay, where emails are sent and the senders email
address is hidden. The way it would work would be similar to that of the URL,
anonymising the email sender and the company or address provider separately. The
most important part is the address provider as this holds the most information
regarding where the email came from, therefore, two addresses from the same
provider would be automatically linked for profiling purposes and their addresses
would be encoded to the same, all apart from their personal identification, which
would be encoded separately. The problem may arise again regarding the mapping,
this would occur if there were two addresses that had the same identification, the
question would be whether they would be encoded to the exact same code? Or would
they be different? This is the same dilemma as before, because if they are made
different, there may be the risk of loss of information again.
9. Conclusion
We are part way through our funded research program, and we hope to be able show
results from profiling using data gathered using the above techniques. We aim to
show that profiling and detection can be performed on anonymised data that satisfies
legal and regulator constraints. We are confident that our improved forms of
anonymisation permit this. We look forward to being able to report on our progress
when our evaluations are complete.
10
10. References
[1] A Desai, B C Tompsett: “The use of Internet databases in analysis and evidence
collection in Cybercrime”, Conference on Advances in Computer Security and
Forensics, Liverpool, UK, 2006.
[2] B C Tompsett, A M Marshall, N C Semmens: “Cyberprofiling : Offender Profiling
and Geographic Profiling of Crime on the Internet”, IEEE Securecomm, Athens,
Greece, 2005
[3] C Andrews: “Offender Profiling Techniques”, Cyberprofiling Internal Report,
2005
[4] C Andrews: “The Geographical Profiling of Crime”, Cyberprofiling Internal
Report, 2005
[5] A M Marshall, B C Tompsett: “Silicon Pathology”, Science and Justice 44(1),
2004
[6] L Spitzner. Honeypots: Tracking Hackers. Addison-Wesley,
2003.
[7] S Prior: “Survey of Honeypot Software and Systems”, Cyberprofiling Internal
Report, 2006.
[8] The Data Protection Act 1998, available at:
http://www.opsi.gov.uk/ACTS/acts1998/19980029.htm
[9] A Orlowski: “AOL Publishes database of user’s intentions”, 2006, available at:
http://www.theregister.co.uk/2006/08/07/aol_search_logs/,
[10] A J Slagell, Y Li, K Luo: “Sharing Network Logs for Computer Forensics: A
New Tool for the Anonymization of NetFlow Records”, IEEE Securecomm, Athens,
Greece, 2005
[11] TinyURL, available at: http://tinyurl.com.
... A representative of The Office of the UK Information Commissioner identified that there could be Data Protection issues if the data were stored in their raw format, and some form of anonymisation would be required (Tompsett and Prior, 2006). A workable prototype solution to the anonymisation problem was developed which allowed for the anonymisation to be reversed should it be required for forensic purposes. ...
... The end result of this stage of the project was a tool which could be seen as the internet equivalent of a CCTV surveillance system to monitor some quiet street or alley. However, before such an appliance can be widely deployed from crime detection and/or prevention it is essential that issues of privacy and legality relating to such data collection or "Dataveillance" are explored (Tompsett and Prior, 2006). ...
Conference Paper
Full-text available
From work on the EPSRC-funded Cyberprofiling project, the authors have proposed an algorithmic approach to profiling of illicit activity online. The model is informed by profiling methodologies from Criminal and Geographic profiling of offenders. A useful basic dataset has been considered and how that data may be collected. The collection of data from existing sources and new sources are outlined. The need for large-scale network monitoring to create a data corpus that provides useful results is highlighted. The initial work on the project examined the available technology for network data monitoring and evaluated various forms of network "honeypot" which permitted the non-invasive generation of network abuse traffic data. The technical issues of implementing a set of such network monitoring instruments are also discussed.
... The system log data is a useful source for profiling, but it is often does not contain all the relevant information required for Cyberprofiling as it was collected for a different intended purpose. The re-use of this data for profiling also presents legal and administrative challenges [9]. Cyberprofiling requires the creation of a specific data collection appliance, which collects the data needed for effective profiling, and which also addresses the issues with using random system log files as sources. ...
... A project, such as Cyberprofiling needs to mindful of the legal and ethical implications of collecting and analysing internet usage and abusage data [9]. ...
... A representative of The Office of the UK Information Commissioner identified that there could be Data Protection issues if the data were stored in their raw format, and some form of anonymisation would be required (Tompsett and Prior, 2006). A workable prototype solution to the anonymisation problem was developed which allowed for the anonymisation to be reversed should it be required for forensic purposes. ...
... The end result of this stage of the project was a tool which could be seen as the internet equivalent of a CCTV surveillance system to monitor some quiet street or alley. However, before such an appliance can be widely deployed from crime detection and/or prevention it is essential that issues of privacy and legality relating to such data collection or "Dataveillance" are explored (Tompsett and Prior, 2006). ...
Conference Paper
Full-text available
From work on the EPSRC-funded Cyberprofiling project, the authors have proposed an algorithmic approach to profiling of illicit activity online. The model is informed by profiling methodologies from Criminal and Geographic profiling of offenders. A useful basic dataset has been considered and how that data may be collected. The collection of data from existing sources and new sources are outlined. The need for large-scale network monitoring to create a data corpus that provides useful results is highlighted. The initial work on the project examined the available technology for network data monitoring and evaluted various forms of network "honeypot" which permitted the non-invasive generation of network abuse traffic data. The technical issues of implementing a set of such network monitoring instruments are also discussed. It is well known and understood, in the network communications industries, that routine monitoring of network usage can provide information necessary to allow network manager to optimise performance and respond to incidents. It is believed that all network devices, worldwide, are at least capable of recording considerable quantities of information about the data passing through them. A single network identifier, typically the Internet Protocol address, is commonly used as the principal key. In recent times, legal interpretations and opinions on the use of network identifiers as personal identifiers have changed at national and international levels. The authors draw comparisons with other monitoring technologies such as CCTV and wiretaps, and other forensic investigation techniques such as DNA profiling and debate these views from the perspective of technology practitioners and present some further areas for debate.
Technical Report
Full-text available
The change in the legal, social and educational environments in the UK have resulted in increasing numbers of disabled students entering higher education. Some subjects have been experiencing this expansion in disabled student numbers earlier than others, and Computer Science (or Computing generally) is one of those subject areas. Universities have responded in various ways to the changing nature of the students’ on their courses, and the University of Hull has been particularly pro­active in supporting, nurturing and attracting disabled students. This paper outlines the environment in the UK for disabled students in Higher Education and the experiences of one department and one subject area at one university. The paper explores the assistive technology used, and the changes to the environment and the teaching and learning that were required. The paper concludes by looking towards the future and what new technological barriers will need to be overcome and what further changes to environment and teaching might be needed as the expansion continues.
Conference Paper
Full-text available
In the investigation of a cybercrime the investigation usually starts from the data collected from a suspect’s computer disk. Often a small number of sources are consulted in building a picture of criminal activity. Internet Service providers are often used to gain additional information regarding network traffic and network usage. There exist a large number of sources of information that can both assist in the building of a picture of the network device (computer) in context of other similar activity and also provide evidence of the wider criminal interconnections. Many of these sources of data are publicly available, and are widely used outside of criminal investigations. Many of these technologies are new and evolving rapidly with the technology. Experts in forensics are largely unaware of the technology and information available to them. This paper aims to summarise the more significant such data collection exercises to show how they may have value in the network forensics.
Conference Paper
Full-text available
A project to combine criminological techniques of profiling with Internet abuse and computer forensic data is outlined. The multidisciplinary approach which applies the expertise of lawyers, criminologists, computer forensics and Internet specialists together is seen as a response to the explosion of e-crimes. Future work that involves the presentation of the results of cyberprofiling is proposed.
Article
Various aspects of honeypots, a security resource whose value lies in being probed, attacked or compromised, are discussed. It is found that production honeypots are used to secure the organization by either preventing, detecting or assisting in the response to an attack. The analysis showed that research honeypots are more complex and more risky than production honeypots.
Article
this paper is reallyabout the implementation and performance of the system. Assuch, the results of [4] can be applied to [1], [2]
Article
Conventional forensic computing practice concentrates on the retrieval of evidence from suspects' equipment, often without much consideration of how the evidence may have arisen. The authors show that, given the complexity of modern computing environments, it is possible for ``evidence'' to be deposited on equipment through a variety of mechanisms. Comparisons are drawn with human diseases and disease propagation mechanisms to show how a computing environment can become host to a variety of unwanted software and data without the owner's or user's intervention. The model is extended to suggest hypotheses that investigators should consider when "diagnosing'' the origins of evidence found on suspect equipment, and to provide prophylactic and curative techniques.
Survey of Honeypot Software and Systems
  • S Prior
S Prior: "Survey of Honeypot Software and Systems", Cyberprofiling Internal Report, 2006.
The Geographical Profiling of Crime
  • C Andrews
C Andrews: "The Geographical Profiling of Crime", Cyberprofiling Internal Report, 2005
AOL Publishes database of user's intentions
  • A Orlowski
A Orlowski: "AOL Publishes database of user's intentions", 2006, available at: http://www.theregister.co.uk/2006/08/07/aol_search_logs/,
Offender Profiling Techniques
  • C Andrews
C Andrews: "Offender Profiling Techniques", Cyberprofiling Internal Report, 2005