Conference Paper

Detection and Analysis of Tor Onion Services

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Tor onion services can be accessed and hosted anonymously on the Tor network. We analyze the protocols, software types, popularity and uptime of these services by collecting a large amount of .onion addresses. Websites are crawled and clustered based on their respective language. In order to also determine the amount of unique websites a de-duplication approach is implemented. To achieve this, we introduce a modular system for the real-time detection and analysis of onion services. Address resolution of onion services is realized via descriptors that are published to and requested from servers on the Tor network that volunteer for this task. We place a set of 20 volunteer servers on the Tor network in order to collect .onion addresses. The analysis of the collected data and its comparison to previous research provides new insights into the current state of Tor onion services and their development. The service scans show a vast variety of protocols with a significant increase in the popularity of anonymous mail servers and Bitcoin clients since 2013. The popularity analysis shows that the majority of Tor client requests is performed only for a small subset of addresses. The overall data reveals further that a large amount of permanent services provide no actual content for Tor users. A significant part consists instead of bots, services offered via multiple domains, or duplicated websites for phishing attacks. The total amount of onion services is thus significantly smaller than current statistics suggest.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Some studies have utilized machine learning models, such as Sparse Composite Document Vector, Random Forest, Support Vector Machines, Naive Bayes, and LightGBM, to infer the categories of Tor onion services [11]. Others have employed probabilistic models based on word distributions or third-party software that automatically extract topics without human intervention [12]. However, the challenge of accurately and efficiently categorizing Tor onion services persists, especially when there are advanced techniques available today in the area of Natural Language Processing (NLP) based on Hidden Markov Models, Neural Networks and Bi-directional Encoder Representations from Transformers (BERT) [13], or the recent paradigm of prompt learning [14]. ...
... Total onions Active onions Portion [25] 7,000 1,450 20.7% [26] 198,050 7,257 3.7% [12] 47,439 14,232 30% [27] 250,000 7,000 2.8% [28] 124,589 3,536 2.8% [29] 12,882 4,509 35% [30] 25,742 6,227 24.2% [31] 15,503 4,089 26.4% [19] 25,261 2,527 10% services measured was around 1,450 out of more than 7,000 identified addresses [25], 7,257 onion links were active out of 198,050 [26], 30% was online at least 90% of the experiment with 47,439 onions identified [12], 7 K Tor pages were alive out of more than 250 K addresses [27], or strategies that returned 124,589 addresses with only 3,536 active [28]. ...
... Total onions Active onions Portion [25] 7,000 1,450 20.7% [26] 198,050 7,257 3.7% [12] 47,439 14,232 30% [27] 250,000 7,000 2.8% [28] 124,589 3,536 2.8% [29] 12,882 4,509 35% [30] 25,742 6,227 24.2% [31] 15,503 4,089 26.4% [19] 25,261 2,527 10% services measured was around 1,450 out of more than 7,000 identified addresses [25], 7,257 onion links were active out of 198,050 [26], 30% was online at least 90% of the experiment with 47,439 onions identified [12], 7 K Tor pages were alive out of more than 250 K addresses [27], or strategies that returned 124,589 addresses with only 3,536 active [28]. ...
... A list of hidden services can be obtained by collecting data from hidden server descriptors HSDir. Hidden services are investigated by extracting onion addresses from HSDir in [8], [9], [10]. In [8] authors collected hidden services descriptors by exploiting flaws in the protocol and implementation of Tor and using the shadowing technique The expanse of Tor hidden services is explored and analyzed. ...
... In [13] [14] [15] to analyze the product prices and supplies on the sites, to rank HS based on the link-based approach, and to analyze the structure and privacy of Hidden services respectively. In [10] information on hidden services is extracted from descriptors onion addresses and crawled to find text types of services available on them, languages, popularity, up-time, and amount of service protected by descriptor cookie. In continuation to their previous work in [16] authors in [14] have contributed with a new dataset "Darknet Usage Text Addresses" DUTA10K 8 and ranking algorithm for HS and analyzed activities, content distribution, and languages on the web pages. ...
Article
Full-text available
The only way to access onion services is via the TOR browser providing anonymity and privacy to the client as well as the server. Information about these hidden services and the contents available on them cannot be gathered like websites on the surface web. So, they become a fertile ground for illegal content dissemination and hosting for cybercriminals. There is a persistent need to classify and block such content from onion sites. In this paper, we investigate data requested from onion services to help law enforcement agencies collect traces of cybercrime on these hidden services. We propose a system using fuzzy encoded LSTM to analyze contents retrieved from these sites and raise alerts if found illegal. The accuracy of fuzzy-encoded LSTM is found to be 81.04 % and it outperforms other classifiers.
... LSTM networks are now the best way to look at sequential data because they can pick up on longterm relationships and timing trends. LSTM networks have been used in digital probes to model and predict behavior in a wide range of areas, such as financial transactions, hacking events, and now, activities on onion sites [8]. To use them, you have to train models on sets of anonymous user interactions and content updates so that they can spot patterns that could mean illegal activity, like quick changes in access patterns or content updates that don't make sense. ...
Article
When it comes to digital investigations, the fact that onion sites make people anonymous makes investigative research much harder. For multidimensional forensic analysis of onion sites, this study suggests a new method that combines Long Short-Term Memory (LSTM) networks with fuzzy encoding methods. When you use the Tor network to get to onion sites, both users and services become anonymous, which makes it harder to do regular investigations. LSTM networks are used to look at trends in how people use the site and when material is updated over time. These networks are good at modeling sequential data. Fuzzy encoding methods work with LSTM to handle the uncertainty that comes with analyzing anonymous data, which makes investigative results more accurate. The method involves keeping time logs of what people do on onion sites and turning them into fuzzy sets to show how user interactions and content changes change over time. LSTM models are taught on these recorded sequences to find patterns that point to suspicious activity, like sudden changes in content or strange patterns of user access. This method can be used to find illegal activities, find secret services that host illegal content, and make profiles of how users behave in an anonymous network. Investigators can better track actions across onion sites when they use LSTM's ability to see how events depend on each other over time and fuzzy encoding's ability to deal with inaccurate data. Real-world onion site data used for experiments shows that the proposed method can find secret behaviors and patterns that are hard for traditional investigative methods to find. The results show that the accuracy and efficiency of detecting are much better than with traditional methods.
... The onion addresses are extracted from an existing dark web database generated by the Panda Projekt 3 using a keyword list. This database is generated by regular automatic crawls of the Tor network [70]- [72]. Fig. 5 visualizes this process. ...
Article
Full-text available
In today’s world, cyber-attacks are becoming more frequent and thus proactive protection against them is becoming more important. Cyber Threat Intelligence (CTI) is a possible solution, as it collects threat information in various information sources and derives stakeholder intelligence to protect one’s infrastructure. The current focus of CTI in research is the clear web, but the dark web may contain further information [1]. To further advance protection, this work analyzes the dark web as Open Source Intelligence (OSINT) data source to complement current CTI information. The underlying assumption is that hackers use the dark web to exchange, develop, and share information and assets. This work aims to understand the structure of the dark web and identify the amount of its openly available CTI related information.We conducted a comprehensive literature review for dark web research and CTI. To follow this up we manually investigated and analyzed 65 dark web forum (DWF), 7 single-vendor shops, and 72 dark web marketplace (DWM). We documented the content and relevance of DWFs and DWMs for CTI, as well as challenges during the extraction and provide mitigations. During our investigation we identified IT security relevant information in both DWFs and DWMs, ranging from malware toolboxes to hacking-as-a-service . One of the most present challenges during our manual analysis were necessary interactions to access information and anti-crawling measures, i.e ., CAPTCHAs. This analysis showed 88% of marketplaces and 53% of forums contained relevant data. Our complementary semi-automated analysis of 1 186 906 onion addresses indicates, that the necessary interaction makes it difficult to see the dark web as an open, but rather treat it as specialized information source, when clear web information does not suffice.
... vice for Tor users [9]. English is the most dominant language of Tor hidden services. ...
Article
Full-text available
Content on the World Wide Web that is not indexable by standard search engines defines a category called the deep Web. Dark networks are a subset of the deep Web. They provide services of great interest to users who seek online anonymity during their search on the Internet. Tor is the most widely used dark network around the world. It requires unique application layer protocols and authorization schemes to access. The present evidence reveals that in spite of great efforts to investigate Tor, our understanding is limited to the work on either the information or structure of this network. Also, interplay between information and structure that plays an important role in evaluating socio-technical systems including Tor has not been given the attention it deserves. In this article, we review and classify the present work on Tor to improve our understanding of this network and shed light on the new directions to evaluate Tor. The related work can be categorized into proposals that (1) study the security and privacy on Tor, (2) characterize Tor’s structure, (3) evaluate the information hosted on Tor, and (4) review the related work on Tor from 2014 to the present.
... Beyond the works referenced in the section before, we have also discussed a number of additional aspects of the (Tor) darknet in previous papers. In [27] we show how to detect and analyze tor onion services. In [33] we compare cyber attacks in clearnet and darknet, while in [28] we focus on phishing. ...
Article
Full-text available
Darknet marketplaces in the Tor network are popular places to anonymously buy and sell various kinds of illegal goods. Previous research on marketplaces ranged from analyses of type, availability and quality of goods to methods for identifying users. Although many darknet marketplaces exist, their lifespan is usually short, especially for very popular marketplaces that are in focus of law enforcement agencies. We built a data acquisition architecture to collect data from White House Market, one of the largest darknet marketplaces in 2021. In this paper we describe our architecture and the problems we had to solve, and present findings from our analysis of the collected data.
... Due to their anonymity, darknets are popular for botnet infrastructures [3]. Some work show that at certain times up to 50% of all existing onion services belonged to botnet command and control (C&C) services [11,49]. Botnets can therefore represent a large fraction of all darknet services. ...
Article
Full-text available
The darknet terminology is not used consistently among scientific research papers. This can lead to difficulties in regards to the applicability and the significance of the results and also facilitates misinterpretation of them. As a consequence, comparisons of the different works are complicated. In this paper, we conduct a review of previous darknet research papers in order to elaborate the distribution of the inconsistent usage of the darknet terminology. Overall, inconsistencies in darknet terminology in 63 out of 97 papers were observed. The most common statement indicated that the dark web is a part of the deep web. 19 papers equate the terms darknet and dark web. Others do not distinguish between dark web and deep web, or between deep web and darknet.
... Other works employ the aforementioned method and other complementary sources, such as crawlers seeded with tor-specific search engines to obtain 13,145 [37], and repositories (e.g., Pastebin, Reddit or The Hidden Wiki) to collect 46,562 Tor links [2]. Even onions obtained through the vulnerability are used as seeds for a crawler to finally analyze 53,466 persistent sites [61]. There is a strong trend in analyzing the content of sites to categorize the topics, themes, or legality of the dark web. ...
... They utilize the anonymity of the network to stay safe from legal prosecution. While our own research on the frequency of visited hidden services [38] indicates that such offerings are only second to command and control infrastructures in the darknet, other research see a more prominent role: Al Nabki et al. point out that the sale of illegal products and services is the most common observable suspicious activity in the darknet [26]. Many big marketplaces, which were designed like Ebay or Amazon, are hosted in the Tor-network. ...
Article
Full-text available
Single-vendor shops are darknet marketplaces where individuals offer their own goods or services on their own darknet website. There are many single-vendor shops with a wide range of offers in the Tor-network. This paper presents a method to find similarities between these vendor websites to discover possible operational structures between them. In order to achieve this, similarity values between the darknet websites are determined by combining different features from the categories content, structure and metadata. Our results show that the features HTML-Tag, HTML-Class, HTML-DOM-Tree as well as File-Content, Open Ports and Links-To proved to be particularly important and very effective in revealing commonalities between darknet websites. Using the similarity detection method, it was found that only 49% of the 258 single-vendor marketplaces were unique, meaning that there were no similar websites. In addition, 20% of all vendor shops are duplicates. 31% of all single-vendor marketplaces can be sorted into seven similarity groups.
Conference Paper
This study analyzes the availability of Tor onion services over time, as they have a limited lifetime and can be activated and deactivated intermittently. Various onion services use different protocols, which are analyzed, including 15 in total. The presence of tor links in online resources for advertising is also characterized. Having collected 54,602 onion addresses for 6 months, the experimental analysis shows that 23.65% were dead and 76.35% were alive, of which 32.74% were deactivated and activated at some point in a window of 31 days. In terms of protocols, HTTP was observed to be predominant, followed by SSH and SMTP. In advertising, 49.72% of the links are found on the surface, 5.79% on the dark, and 44.49% on both, finding a large number of link aggregator repositories.
Article
Full-text available
Drawing on Raymond William’s concept of the ‘magic system,’ this article argues that advertising on the Dark Web is a ‘dark magic system’. The article first defines ‘Dark Web’ and then analyzes over 300 banner advertisements appearing on Tor onion service search engines. The advertisements are categorized into navigation, individual vendors, services, and markets. Next, the article traces the associations the advertisements make between advertised objects and values. The predominant values the advertisements invoke include navigation, OPSEC politics, and a justification to exploit the openness of others. The article then traces the limits of the dark magic system, including the system’s inability to offer metrics, the problems of ‘onion cloners,’ and the constant threat of scams. The article concludes with an argument that the dark magic system is incapable of addressing the sort of anonymized political communication the Dark Web might afford.
Conference Paper
Descriptives of the darknet, and in particular of the Tor network, appear inconsistent and implausible in nature. In order to gain insight into how these conflicting results are produced, the goal of this study is to review previous research on the matter with regard to terminology used, methodology of sample collection and the analysis of the data. Our results indicate six critical aspects that in particular pertain to (A) an inconsistent use of terminology, (B) the methodology with which the sample was gathered, as well as the handling of (C) short-lived services, (D) botnet command and control servers, (E) web services with undetermined content and (F) duplicates of onion services. Further, we include a small case study on darknet marketplaces to demonstrate how reports concerning the number of a certain category can easily mislead. Through the implications of these aspects the presented description of Tor does not necessarily reflect the actual nature of Tor.
Article
Full-text available
The World Wide Web (www) consists of the surface web, deep web, and Dark Web, depending on the content shared and the access to these network layers. Dark Web consists of the Dark Net overlay of networks that can be accessed through specific software and authorization schema. Dark Net has become a growing community where users focus on keeping their identities, personal information, and locations secret due to the diverse population base and well-known cyber threats. Furthermore, not much is known of Dark Net from the user perspective, where often there is a misunderstanding of the usage strategies. To understand this further, we conducted a systematic analysis of research relating to Dark Net privacy and security on N=200 academic papers, where we also explored the user side. An evaluation of secure end-user experience on the Dark Net establishes the motives of account initialization in overlaid networks such as Tor. This work delves into the evolution of Dark Net intelligence for improved cybercrime strategies across jurisdictions. The evaluation of the developing network infrastructure of the Dark Net raises meaningful questions on how to resolve the issue of increasing criminal activity on the Dark Web. We further examine the security features afforded to users, motives, and anonymity revocation. We also evaluate more closely nine user-study-focused papers revealing the importance of conducting more research in this area. Our detailed systematic review of Dark Net security clearly shows the apparent research gaps, especially in the user-focused studies emphasized in the paper.
Chapter
Full-text available
Zusammenfassung Ziel der vorliegenden Arbeit ist die Erstellung einer Anonymitätsmatrix. Der Fokus liegt hierbei insbesondere in der Verbindung der technischen und psychologischen Komponenten der Betrachtung. Ausgangssituation ist die Verwendung einer Privacy Enhancing Technology, konkret dem Tor-Browser. So ist das Ziel, die Tor-Nutzergruppe in Bezug auf ihre Online-Privatheitskompetenz, Nutzungsweise und Grad der Anonymität zu erforschen. Hierzu wurde eine Online-Befragung ( N=120N = 120 N = 120 ) sowie ein Leitfadeninterview mit einem Experten aus der IT-Sicherheitsforschung durchgeführt.
Chapter
Cyber attacks on clearnet services are discussed widely in the research literature. However, a systematic comparison of cyber attacks on clearnet and darknet services has not been performed. This chapter describes an approach for setting up and simultaneously running honeypots with vulnerable services in the clearnet and darknet to collect information about attacks and attacker behavior. Key observations are provided and the similarities and differences regarding attacks and attacker behavior are discussed.
Article
Full-text available
Purpose of the study: compilation of a set of features that allow to detect and identify the establishment of a connection between the client and the anonymous network Tor in conditions of using encryption of the data stream using the TLS v1.3 protocol. Method: software analysis of the data flow, frequency methods, decomposition of the content of data packets according to their number, sequence, finding frames in a packet and sizes, a comparative method in point of different versions of the encryption protocol and resources making the connection were used. Results: a set of features of the Tor network connection established using TLS v1.3 encryption was compiled, allowing to detect and identify in the data stream a “handshake” between the client and the Tor network in order to legally block the connection; a comparative analysis of the data of the Tor network and the VKontakte social network during the establishment of an encrypted connection was carried out; studied and described the structure and differences of the “handshake” of the TLS protocols v1.2 and v1.3; the structure, size and arrangement of frames and data packets of the Tor network and a connection of other network type, both using TLS v1.3 encryption, has been revealed.
Method
Full-text available
Since the advent of darknet markets, or illicit cryptomarkets, there has been a sustained interest in studying their operations: the actors, products, payment methods, and so on. However, this research has been limited by a variety of obstacles, including the difficulty in obtaining reliable and representative data, which present challenges to undertaking a complete and systematic study. The Australian National University’s Cybercrime Observatory has developed tools that can be used to collect and analyse data obtained from darknet markets. This paper describes these tools in detail. While the proposed methods are not error-free, they provide a further step in providing a transparent and comprehensive solution for observing darknet markets tailored for data scientists, social scientists, criminologists and others interested in analysing trends from darknet markets.
Conference Paper
Full-text available
Tor is the most popular volunteer-based anonymity network consisting of over 3000 volunteer-operated relays. Apart from making connections to servers hard to trace to their origin it can also provide receiver privacy for Internet services through a feature called "hidden services". In this paper we expose flaws both in the design and implementation of Tor's hidden services that allow an attacker to measure the popularity of arbitrary hidden services, take down hidden services and deanonymize hidden services. We give a practical evaluation of our techniques by studying: (1) a recent case of a botnet using Tor hidden services for command and control channels; (2) Silk Road, a hidden service used to sell drugs and other contraband; (3) the hidden service of the DuckDuckGo search engine.
Conference Paper
Full-text available
Tor hidden services allow running Internet services while protecting the location of the servers. Their main purpose is to enable freedom of speech even in situations in which powerful adversaries try to suppress it. However, providing location privacy and client anonymity also makes Tor hidden services an attractive platform for every kind of imaginable shady service. The ease with which Tor hidden services can be set up has spurred a huge growth of anonymously provided Internet services of both types. In this paper we analyse the landscape of Tor hidden services. We have studied Tor hidden services after collecting 39824 hidden service descriptors on 4th of Feb 2013 by exploiting protocol and implementation flaws in Tor: we scanned them for open ports; in the case of HTTP services, we analysed and classified their content. We also estimated the popularity of hidden services by looking at the request rate for hidden service descriptors by clients. We found that while the content of Tor hidden services is rather varied, the most popular hidden services are related to botnets.
Article
Tor hidden services allow someone to host a website or other transmission control protocol (TCP) service whilst remaining anonymous to visitors. The collection of all Tor hidden services is often referred to as the 'darknet'. In this study, the authors describe results from what they believe to be the largest study of Tor hidden services to date. By operating a large number of Tor servers for a period of 6 months, the authors were able to capture data from the Tor distributed hash table to collect the list of hidden services, classify their content and count the number of requests. Approximately 80,000 hidden services were observed in total of which around 45,000 are present at any one point in time. Abuse and Botnet C&C servers were the most frequently requested hidden services although there was a diverse range of services on offer.
Nmap manual - chapter 14. understanding and customizing nmap data files
  • Nmap
  • Org
Secure drop - share documents securely with these organizations
  • Securedrop
  • Org
The rise and challenge of dark net drug markets
  • J Buxton
  • T Bingham
  • Buxton J.
Nmap manual - chapter 15
  • Nmap
  • Org
Configuring onion services for tor
  • Torproject
  • Org
Summer reruns: Threat actors are sticking with malware that works
  • N Desai
  • Desai N.
Iranian web crackdown drives surge in privacy technology
  • A J Martin
  • Martin A. J.
Content and popularity analysis of tor hidden services
  • A Biryukov
  • Weinmann R Pustogarov
  • F Thill
{Online; As seen on 01
  • Duckduckgo
{Online; As seen on 16
  • Torproject
  • Org
{Online; As seen on 03
  • Torproject
  • Org
Buxton and T. Bingham. The rise and challenge of dark net drug markets
  • J Buxton
  • T Bingham
Content and popularity analysis of tor hidden services. 2013. A. Biryukov Weinmann R. Pustogarov I. and F. Thill. Content and popularity analysis of tor hidden services
  • A Biryukov Weinmann
  • R Pustogarov
  • F Thill
The Pirate Bay. The pirate bay - about
  • The Pirate
  • Bay
  • Bay The Pirate