Marco Mellia

Politecnico di Torino, Torino, Piedmont, Italy

Are you Marco Mellia?

Claim your profile

Publications (241)96.42 Total impact


  • No preview · Article · Jan 2016

  • No preview · Article · Jan 2016 · IEEE Communications Magazine
  • Article: CrowdSurf

    No preview · Article · Sep 2015 · ACM SIGCOMM Computer Communication Review
  • [Show abstract] [Hide abstract]
    ABSTRACT: Malicious activities on the Web are increasingly threatening users in the Internet. Home networks are one of the prime targets of the attackers to host malware, commonly exploited as a stepping stone to further launch a variety of attacks. Due to diversification, existing security solutions often fail to detect malicious activities that remain hidden and pose threats to users' security and privacy. Characterizing behavioral patterns of known malware can help to improve the classification accuracy of threats. More importantly, as different malware might share commonalities, studying the behavior of known malware could help the detection of previously unknown malicious activities. We pose the research question if it is possible to characterize such behavioral patterns analyzing the traffic from known infected clients. We present our quest to discover such characterizations. Results show that commonalities arise but their identification may require some ingenuity. We also present our discovery of malicious activities that were left undetected by commercial IDS.
    No preview · Article · Jul 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hypertext transfer protocol (HTTP) has become the main protocol to carry out malicious activities. Attackers typically use HTTP for communication with command-and-control servers, click fraud, phishing and other malicious activities, as they can easily hide among the large amount of benign HTTP traffic. The user-agent (UA) field in the HTTP header carries information on the application, operating system (OS), device, and so on, and adversaries fake UA strings as a way to evade detection. Motivated by this, we propose a novel grammar-guided UA string classification method in HTTP flows. We leverage the fact that a number of ‘standard’ applications, such as web browsers and iOS mobile apps, have well-defined syntaxes that can be specified using context-free grammars, and we extract OS, device and other relevant information from them. We develop association heuristics to classify UA strings that are generated by ‘non-standard’ applications that do not contain OS or device information. We provide a proof-of-concept system that demonstrates how our approach can be used to identify malicious applications that generate fake UA strings to engage in fraudulent activities. Copyright © 2015 John Wiley & Sons, Ltd.
    No preview · Article · Jul 2015 · International Journal of Network Management
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network measurements are of high importance both for the operation of networks and for the design and evaluation of new management mechanisms. Therefore, several approaches exist for running network measurements, ranging from analyzing live traffic traces from campus or Internet Service Provider (ISP) networks to performing active measurements on distributed testbeds, e.g., PlanetLab, or involving volunteers. However, each method falls short, offering only a partial view of the network. For instance, the scope of passive traffic traces is limited to an ISP’s network and customers’ habits, whereas active measurements might be biased by the population or node location involved. To complement these techniques, we propose to use (commercial) crowdsourcing platforms for network measurements. They permit a controllable, diverse and realistic view of the Internet and provide better control than do measurements with voluntary participants. In this study, we compare crowdsourcing with traditional measurement techniques, describe possible pitfalls and limitations, and present best practices to overcome these issues. The contribution of this paper is a guideline for researchers to understand when and how to exploit crowdsourcing for network measurements.
    No preview · Article · Jul 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Security tools have evolved dramatically in the recent years to combat the increasingly complex nature of attacks, but to be effective these tools need to be configured by experts that understand network protocols thoroughly. In this paper we present FieldHunter, which automatically extracts fields and infers their types; providing this much needed information to the security experts for keeping pace with the increasing rate of new network applications and their underlying protocols. FieldHunter relies on collecting application messages from multiple sessions and then applying statistical correlations is able to infer the types of the fields. These statistical correlations can be between different messages or other associations with meta-data such as message length, client or server IPs. Our system is designed to extract and infer fields from both binary and textual protocols. We evaluated FieldHunter on real network traffic collected in ISP networks from three different continents. FieldHunter was able to extract security relevant fields and infer their nature for well documented network protocols (such as DNS and MSNP) as well as protocols for which the specifications are not publicly available (such as SopCast) and from malware such as (Ramnit).
    No preview · Article · Jun 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Anycast routing is an IP solution that allows packets to be routed to the topologically nearest server. Over the last years it has been commonly adopted to manage some services running on top of UDP, e.g., public DNS resolvers, multicast rendez-vous points, etc. However, recently the Internet have witnessed the growth of new Anycast-enabled Content Delivery Networks (A-CDNs) such as CloudFlare and EdgeCast, which provide their web services (i.e., TCP traffic) entirely through anycast. To the best of our knowledge, little is known in the literature about the nature and the dynamic of such traffic. For instance, since anycast depends on the routing, the question is how stable are the paths toward the nearest server. To bring some light on this question, in this work we provide a first look at A-CDNs traffic by combining active and passive measurements. In particular, building upon our previous work, we use active measurements to identify and geolocate A-CDNs caches starting from a large set of IP addresses related to the top-100k Alexa websites. We then look at the traffic of those caches in the wild using a large passive dataset collected from a European ISP. We find that several A-CDN servers are encountered on a daily basis when browsing the Internet. Routes to A-CDN servers are very stable, with few changes that are observed on a monthly-basis (in contrast to more the dynamic traffic policies of traditional CDNs). Overall, A-CDNs are a reality worth further investigations.
    Preview · Article · May 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: YouTube relies on a massively distributed Content Delivery Network (CDN) to stream the billions of videos in its catalogue. Unfortunately, very little information about the design of such CDN is available. This, combined with the pervasiveness of YouTube, poses a big challenge for Internet Service Providers (ISPs), which are compelled to optimize end-users' Quality of Experience (QoE) while having no control on the CDN decisions. This paper presents YouLighter, an unsupervised technique to identify changes in the YouTube CDN. YouLighter leverages only passive measurements to cluster co-located identical caches into edge-nodes. This automatically unveils the structure of YouTube's CDN. Further, we propose a new metric, called Constellation Distance, that compares the clustering obtained from two different time snapshots, to pinpoint sudden changes. While several approaches allow comparison between the clustering results from the same dataset, no technique allows to measure the similarity of clusters from different datasets. Hence, we develop a novel methodology, based on the Constellation Distance, to solve this problem. By running YouLighter over 10-month long traces obtained from two ISPs in different countries, we pinpoint both sudden changes in edge-node allocation, and small alterations to the cache allocation policies which actually impair the QoE that the end-users perceive.
    Preview · Article · Mar 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When surfing the Internet, individuals leak personal and corporate information to third parties whose (legitimate or not) businesses revolve around the value of collected data. The implications are serious, from a person unwillingly exposing private information to an unknown third party, to a company unable to manage the flow of its information to the outside world. The point is that individuals and companies are more and more kept out of the loop when it comes to control private data. With the goal of empowering informed choices in information leakage through the Internet, we propose CROWDSURF, a system for comprehensive and collaborative auditing of data that flows to Internet services. Similarly to open-source efforts, we enable users to contribute in building awareness and control over privacy and communication vulnerabilities. CROWDSURF provides the core infrastructure and algorithms to let individuals and enterprises regain control on the information exposed on the web. We advocate CROWDSURF as a data processing layer positioned right below HTTP in the host protocol stack. This enables the inspection of clear-text data even when HTTPS is deployed and the application of processing rules that are customizable to fit any need. Preliminary results obtained executing a prototype implementation on ISP traffic traces demonstrate the feasibility of CROWDSURF.
    Preview · Article · Feb 2015
  • Enrico Bocchi · Idilio Drago · Marco Mellia

    No preview · Article · Jan 2015 · IEEE Transactions on Cloud Computing
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers an approach to identify previously undetected malicious clients in Internet Service Provider (ISP) networks by combining flow classification with a graphbased score propagation method. Our approach represents all HTTP communications between clients and servers as a weighted, near-bipartite graph, where the nodes correspond to the IP addresses of clients and servers while the links are their interconnections, weighted according to the output of a flow-based classifier. We employ a two-phase alternating score propagation algorithm on the graph to identify suspicious clients in a monitored network. Using a symmetrized weighted adjacency matrix as its input, we show that our algorithm is less vulnerable towards inflating the malicious scores of popular Web servers with high in-degrees compared to the normalization used in PageRank. Experimental results on a 4-hour network trace collected by a large Internet service provider showed that incorporating flow information into score propagation significantly improves the precision of the algorithm.
    No preview · Article · Jan 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Increased user concern over security and privacy on the In-ternet has led to widespread adoption of HTTPS, the secure version of HTTP. HTTPS authenticates the communicating end points and provides confidentiality for the ensuing com-munication. However, as with any security solution, it does not come for free. HTTPS may introduce overhead in terms of infrastructure costs, communication latency, data usage, and energy consumption. Moreover, given the opaqueness of the encrypted communication, any in-network value added services requiring visibility into application layer content, such as caches and virus scanners, become ineffective. This paper attempts to shed some light on these costs. First, taking advantage of datasets collected from large ISPs, we examine the accelerating adoption of HTTPS over the last three years. Second, we quantify the direct and indi-rect costs of this evolution. Our results show that, indeed, security does not come for free. This work thus aims to stimulate discussion on technologies that can mitigate the costs of HTTPS while still protecting the user's privacy.
    No preview · Conference Paper · Dec 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clouds and CDNs are systems that tend to separate the content being requested by users from the physical servers capable of serving it. From the network point of view, monitoring and optimizing performance for the traffic they generate are challenging tasks, given that the same resource can be located in multiple places, which can, in turn, change at any time. The first step in understanding cloud and CDN systems is thus the engineering of a monitoring platform. In this paper, we propose a novel solution that combines passive and active measurements and whose workflow has been tailored to specifically characterize the traffic generated by cloud and CDN infrastructures. We validate our platform by performing a longitudinal characterization of the very well known cloud and CDN infrastructure provider Amazon Web Services (AWS). By observing the traffic generated by more than 50 000 Internet users of an Italian Internet Service Provider, we explore the EC2, S3, and CloudFront AWS services, unveiling their infrastructure, the pervasiveness of web services they host, and their traffic allocation policies as seen from our vantage points. Most importantly, we observe their evolution over a two-year-long period. The solution provided in this paper can be of interest for the following: 1) developers aiming at building measurement tools for cloud infrastructure providers; 2) developers interested in failure and anomaly detection systems; and 3) third-party service-level agreement certificators who can design systems to independently monitor performance. Finally, we believe that the results about AWS presented in this paper are interesting as they are among the first to unveil properties of AWS as seen from the operator point of view.
    No preview · Article · Dec 2014 · IEEE Transactions on Network and Service Management
  • Enrico Bocchi · Marco Mellia · Sofiane Sarni
    [Show abstract] [Hide abstract]
    ABSTRACT: Data storage is one of today's fundamental services with companies, universities and research centers having the need of storing large amounts of data every day. Cloud storage services are emerging as strong alternative to local storage, allowing customers to save costs of buying and maintaining expensive hardware. Several solutions are available on the market, the most famous being Amazon S3. However it is rather difficult to access information about each service architecture, performance, and pricing. To shed light on storage services from the customer perspective, we propose a benchmarking methodology, apply it to four popular offers (Amazon S3, Amazon Glacier, Windows Azure Blob and Rackspace Cloud Files), and compare their performance. Each service is analysed as a black box and benchmarked through crafted workloads.We take the perspective of a customer located in Europe, looking for possible service providers and the optimal data center where to deploy its applications. At last, we complement the analysis by comparing the actual and forecast costs faced when using each service. According to collected results, all services show eventual weaknesses related to some workload, with no all-round eligible winner, e.g., some offers providing excellent or poor performance when exchanging large or small files. For all services, it is of paramount importance to accurately select the data center to where deploy the applications, with throughput that varies by factors from 2x to 10x. The methodology (and tools implementing it) here presented is instrumental for potential customers to identify the most suitable offer for their needs.
    No preview · Article · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network measurements are a fundamental pillar to understand network performance and perform root cause analysis in case of problems. Traditionally, either active or passive measurements are considered. While active measurements allow to know exactly the workload injected by the application into the network, the passive measurements can offer a more detailed view of transport and network layer impacts. In this paper, we present a hybrid approach in which active throughput measurements are regularly run while a passive measurement tool monitors the generated packets. This allows us to correlate the application layer measurements obtained by the active tool with the more detailed view offered by the passive monitor. The proposed methodology has been implemented following the mPlane reference architecture, tools have been installed in the Fastweb network, and we collect measurements for more than three months. We report then a subset of results that show the benefits obtained when correlating active and passive measurements. Among results, we pinpoint cases of congestion, of ADSL misconfiguration, and of modem issues that impair throughput obtained by the users.
    No preview · Article · Nov 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This deliverable presents an extended set of Analysis Modules, including both the improvements done to those presented in deliverable D4.1 as well as the new analysis algorithms designed and developed to address use-cases. The deliverable also describes a complete workflow description for the different use-cases, including both stream processing for real-time monitoring applications as well as batch processing for ''off-line'' analysis. This workflow description specifies the iterative interaction loop between WP2, WP3, T4.1, and T4.2, thereby allowing for a cross-checking of the analysis modules and the reasoner interactions.
    Full-text · Technical Report · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally updating (rolling-over) various reports and statistics over high-volume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other big data problems with high volume and velocity.
    No preview · Conference Paper · Oct 2014
  • S. Traverso · C. Kiraly · E. Leonardi · M. Mellia
    [Show abstract] [Hide abstract]
    ABSTRACT: The goal of this paper is to investigate rate control mechanisms for unstructured P2P-TV applications adopting UDP as transport protocol. We focus on a novel class of Hose Rate Controllers (HRC), which aim at regulating the aggregate upload rate of each peer. This choice is motivated by the peculiar P2P-TV needs: video content is not elastic but it is subject to real-time constraints, so that the epidemic chunk exchange mechanism is much more bursty for P2P-TV than file sharing applications. Furthermore, the peer up-link (e.g., ADSL/Cable) is typically the shared for flows in real scenarios. We compare two classes of aggregate rate control mechanisms: Delay Based (DB) less-than-best-effort mechanisms, which aim at tightly controlling the chunk transfer delay, and loss-based Additive Increase Multiplicative Decrease (AIMD) rate controllers, which are designed to be more aggressive and can compete with other AIMD congestion controls, i.e., TCP. Both families of mechanisms are implemented in a full-fledged P2P-TV application that we use to collect performance results. Only actual experiments – conducted both in a controlled test-bed and over the wild Internet, and involving up to 1800 peers – are presented to assess performance in realistic scenarios. Results show that DB-HRC tends to outperform AIMD-HRC when tight buffering time constraints are imposed to the application, while AIMD-HRC tends to be preferable in severely congested scenarios, especially when the buffering time constraints are relaxed.
    No preview · Article · Aug 2014 · Computer Networks
  • [Show abstract] [Hide abstract]
    ABSTRACT: YouTube is the most popular service in today’s Internet. Its own success forces Google to constantly evolve its functioning to cope with the ever growing number of users watching YouTube. Understanding the characteristics of YouTube’s traffic as well as the way YouTube flows are served from the massive Google CDN is paramount for ISPs, specially for mobile operators, who must handle the huge surge of traffic with the capacity constraints of mobile networks. This papers presents a characterization of the YouTube traffic accessed through mobile and fixed-line networks. The analysis specially considers the YouTube content provisioning, studying the characteristics of the hosting servers as seen from both types of networks. To the best of our knowledge, this is the first paper presenting such a simultaneous characterization from mobile and fixed-line vantage points.
    No preview · Conference Paper · Jun 2014

Publication Stats

4k Citations
96.42 Total Impact Points

Institutions

  • 1970-2015
    • Politecnico di Torino
      • DET - Department of Electronics and Telecommunications
      Torino, Piedmont, Italy
  • 2012
    • Consorzio Nazionale Interuniversitario per le Telecomunicazioni
      Genova, Liguria, Italy
  • 2009-2011
    • Carnegie Mellon University
      Pittsburgh, Pennsylvania, United States