Article

Internet Traffic Tends Toward Poisson and Independent as the Load Increases

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Network devices put packets on an Internet link, and multiplex, or superpose, the packets from different active connections. Extensive empirical and theoretical studies of packet traffic variables — arrivals, sizes, and packet counts — demonstrate that the number of active connections has a dramatic effect on traffic characteristics. At low connection loads on an uncongested link — that is, with little or no queueing on the link-input router — the traffic variables are long-range dependent, creating burstiness: large variation in the traffic bit rate. As the load increases, the laws of superposition of marked point processes push the arrivals toward Poisson, the sizes toward independence, and reduces the variability of the counts relative to the mean. This begins a reduction in the burstiness; in network parlance, there are multiplexing gains. Once the connection load is sufficiently large, the network begins pushing back on the attraction to Poisson and independence by causing queueing on the link-input router. But if the link speed is high enough, the traffic can get quite close to Poisson and independence before the push-back begins in force; while some of the statistical properties are changed in this high-speed case, the push-back does not resurrect the burstiness. These results reverse the commonly-held presumption that Internet traffic is everywhere bursty and that multiplexing gains do not occur. Very simple statistical time series models — fractional sum-difference (FSD) models — describe the statistical variability of the traffic variables and their change toward Poisson and independence before significant queueing sets in, and can be used to generate open-loop packet arrivals and sizes for simulation studies. Both science and engineering are affected. The magnitude of multiplexing needs to become part of the fundamental scientific framework that guides the study of Internet traffic. The engineering of Internet devices and Internet networks needs to reflect the multiplexing gains.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We consider the arrivals of segments from different servers as a Markovian batch arrival process having Poisson distributions that are independent of each other. A Markovian batch arrival is a stochastic point process that generalizes the standard Poisson process by allowing for batches of arrivals, dependent inter-arrival times, non-exponential inter-arrival time distributions, and correlated batch sizes [10]. Cao et al. [10] have shown that packet arrivals in a congested network follow a Poisson distribution. ...
... A Markovian batch arrival is a stochastic point process that generalizes the standard Poisson process by allowing for batches of arrivals, dependent inter-arrival times, non-exponential inter-arrival time distributions, and correlated batch sizes [10]. Cao et al. [10] have shown that packet arrivals in a congested network follow a Poisson distribution. Based on the split and the addition properties of Poisson distributions, the segment arrivals can be modeled with a Poisson distribution as well. ...
Article
Full-text available
The significant popularity of HTTP adaptive video streaming (HAS), such as Dynamic Adaptive Streaming over HTTP (DASH), over the Internet has led to a stark increase in user expectations in terms of video quality and delivery robustness. This situation creates new challenges for content providers who must satisfy the Quality-of-Experience (QoE) requirements and demands of their customers over a best-effort network infrastructure. Unlike traditional single server DASH, we developed a D istributed Q ueuing theory bitrate adaptation algorithm for DASH (DQ-DASH) that leverages the availability of multiple servers by downloading segments in parallel. DQ-DASH uses a M x /D/1/K queuing theory based bitrate selection in conjunction with the request scheduler to download subsequent segments of the same quality through parallel requests to reduce quality fluctuations. DQ-DASH facilitates the aggregation of bandwidth from different servers and increases fault-tolerance and robustness through path diversity. The resulting resilience prevents clients from suffering QoE degradations when some of the servers become congested. DQ-DASH also helps to fully utilize the aggregate bandwidth from the servers and download the imminently required segment from the server with the highest throughput. We have also analyzed the effect of buffer capacity and segment duration for multi-source video streaming.
... The size (in bytes) of a video segment is rd, and the arrival rate of the segment λ is b/(rd ). Cao et al. [5] have shown that packet arrivals in a congested network follows Poisson distribution. Based on the split property and the addition property of Poisson distribution, the segment arrivals can be modeled with a Poisson distribution as well. ...
... QoE. Figure 5 compares the QoE for all five methods of rate adaptation over three different buffer capacities, computed using Equation 5 and averaged over all 28 combinations of video samples and network profiles. Compare to other methods, QUETRA exhibits the highest QoE of 13% to 140%, 7% to 97%, and 9% to 51% for the 30/60s, 120s, and 240s buffer respectively. ...
Conference Paper
Full-text available
DASH, or Dynamic Adaptive Streaming over HTTP, relies on a rate adaptation component to decide on which representation to download for each video segment. A plethora of rate adaptation algorithms has been proposed in recent years. The decisions of which bitrate to download made by these algorithms largely depend on several factors: estimated network throughput, buffer occupancy, and buffer capacity. Yet, these algorithms are not informed by a fundamental relationship between these factors and the chosen bitrate, and as a result, we found that they do not perform consistently in all scenarios, and require parameter tuning to work well under different buffer capacity. In this paper, we model a DASH client as an M/D/1/K queue, which allows us to calculate the expected buffer occupancy given a bitrate choice, network throughput, and buffer capacity. Using this model, we propose QUETRA, a simple rate adaptation algorithm. We evaluated QUETRA under a diverse set of scenarios and found that, despite its simplicity, it leads to better quality of experience (7% - 140%) than existing algorithms.
... On pourra consulter l'ouvrage de Basseville et al. [TNB14b] pour plus de précision sur le modèle. Enfin, les paquets au sein du trafic internet sont souvent modélisés par un processus de Poisson (voir les articles [PT12], [CCLS03], [KMFB04], [VSO09] ou [SKR08]) : la détection d'une anomalie s'interprète alors comme un problème de détection de rupture dans l'intensité d'un processus de Poisson. ...
Thesis
Motivés par des applications en cybersécurité et en épidémiologie, les résultats présentés dans cette thèse portent sur la détection et la localisation d’une rupture soudaine dans un processus de Poisson. Nous nous intéressons d’abord à la détection d’une rupture caractérisée par un saut (rupture non transitoire) ou un segment (rupture transitoire) dans l’intensité d’un processus de Poisson à partir d’une constante. Nous proposons une étude minimax non asymptotique complète des problèmes de tests associés, lorsque l’intensité de référence est connue ou inconnue. L’adaptation au sens du minimax pour chaque paramètre de la rupture (taille, localisation, longueur) fournit un aperçu exhaustif des différents ordres des vitesses de séparation minimax. En second lieu, on aborde la question de la détection et la localisation simultanées d’un saut dans l’intensité d’un processus de Poisson quand l’intensité de référence est connue, et de la construction d’estimateurs pour l’instant de saut. Ce problème, formulé comme un problème de tests multiples, est étudié d’un point de vue minimax non asymptotique. L’adaptation au sens du minimax en la hauteur du saut et en l’instant de rupture est considérée et deux régimes pour les vitesses de séparation minimax par famille sont établis. Aussi, une correspondance entre des procédures de tests multiples minimax et des intervalles de confiance pour l’instant de rupture est explicitée. Les tests simples et multiples minimax ou minimax adaptatifs sont basés sur des statistiques de comptage inspirées des tests de Neyman-Pearson ou des statistiques quadratiques, pouvant être combinées à des stratégies d’agrégation.
... The former analysis of network traffic encounters luck of accuracy and robustness (Karagiannis et al. 2004). In fact, many research works have shown that the assumption of Poisson distribution, or a derived version of it, is in accordance with real internet packet arrivals (Cao et al. 2003;Karagiannis et al. 2004;Sukhov et al. 2016;Yu et al. 2006). More specifically, at the edges of the internet (i.e. ...
Article
Full-text available
Failure detectors (FDs) are fundamental building blocks for distributed systems. An FD detects whether a process has crashed or not based on the reception of heartbeats’ messages sent by this process over a communication channel. A key challenge of FDs is to tune their parameters to achieve optimal performance which satisfies the desired system requirements. This is challenging due to the complexities of large-scale networks. Existing FDs ignore such optimisation and adopt ad-hoc parameters. In this paper, we propose a new Mixed Integer Linear Programming (MILP) optimisation-based FD algorithm. We obtain the MILP formulation via piecewise linearisation relaxations. The MILP involves obtaining optimal FD parameters that meet the optimal trade-off between its performance metrics requirements, network conditions and system parameters. The MILP maximises our FD’s accuracy under bounded failure detection time while considering network and system conditions as constraints. The MILP’s solution represents optimised FD parameters that maximise FD’s expected performance. To adapt to real-time network changes, our proposed MILP-based FD fits the probability distribution of heartbeats’ inter-arrivals. To address our FD scalability challenge in large-scale systems where the MILP model needs to compute approximate optimal solutions quickly, we also propose a heuristic algorithm. To test our proposed approach, we adopt Amazon Cloud as a realistic testing environment and develop a simulator for robustness tests. Our results show consistent improvement of overall FD performance and scalability. To the best of our knowledge, this is the first attempt to combine the MILP-based optimisation modelling with FD to achieve performance guarantees.
... Packet-byte count shows the phenomenon of long-range dependence. Cao et al. [10] find that the marginal distribution of the interarrival times was matching Weibull distribution best. The research in [11] totally focuses on traffic generated by an individual source. ...
... N t = max{n : t n ≤ t} is the number of burst arrivals (or, the arrivals of IP packets in the case of a modeling approach at the packet level) in [0, t]. By current empirical studies (cf. [2]) it is well known that both at small and large observation time scales, we can assume that a stationary packet or burst stream constitutes a recurrent arrival process with independently marked burst or packet length Z. By Wald's equation we obtain for all t > 0 the meanFigure 5: Exceedance e n (u) against the threshold u for the duration of sub- sessions.Figure 6: Exceedance e n (u) against the threshold u for the size of sub- sessions. ...
... An algorithmic procedure for estimating the parameters of LFSN processes was given by A. Karasaridis and D. Hatzinakos [30]. Other researches suggest a Poisson assumption in short time scale and a non-stationary view along with long-range dependence in long time scale behavior of network traffic [31], [32]. Through observing the long-term behavior of measured traffic traces, nonstationarity seems to be another remarkable feature in those data. ...
... Internet traffic is such a complex phenomenon that no model is entirely satisfactory , and recent studies have even proposed a return to Poisson models for the smallest time scales in high-aggregation environments (see [4] [21] [43]). It is clear that much work and new modeling techniques are needed, and this fertile ground has created a growing interest by network researchers, applied probabilists and statisticians. ...
Article
Full-text available
In this paper we provide a framework for analyzing network traffic traces through trace-driven queueing. We also introduce several queueing metrics together with the associated visualization tools that provide insight into the traffic features and facilitate comparisons between traces. Some techniques for non-stationary data are discussed. Applying our framework to both real and synthetic traces we i) illustrate how to compare traces using trace-driven queueing, and ii)s how that traces that look 'similar' under various statistical measures (such as the Hurst index) can exhibit rather different behavior under queueing simulation.
... Assuming that the network operator provisions enough resources to support T , we propose to use a simple " opportunistic " PDU insertion scheme: a node can insert a PDU on the first free timeslot present on a convenient data channel. As traffic arriving to a node is aggregated, the insertion process can be modelled by a a simple discrete time Geo/Geo/1 queue [5], where PDU generation at a node is modelled by a Bernoulli process and the distribution of the number of slots between two free timeslots is geometrically distributed [6]. As long as client layers conform with T (conformance can be enforced at each node by a simple access control procedure), the performance delivered by POADM rings can be shown to comply to QoS objectives set for Carrier Ethernet in a MAN [7] . ...
Conference Paper
Full-text available
Metro networks support increasing traffic volumes and evolving traffic profiles. Revisiting metro networks architecture, this paper shows that both optical transparency and sub-wavelength granularity can be achieved, while still ensuring transport network QoS levels.
... In any case, it must also be stressed that Poisson-based models were largely discredited in early studies of Internet traffic that identified longterm statistical dependence. However, in recent years, as the number of interconnected hosts, the amount of data transmitted, and the speed of Internet links have exponentially increased, current studies suggest that network traffic can be well represented by the Poisson model for subsecond time scales [37] or approximately for large-scale traffic aggregation [7]. Before proceeding, we characterize ∆ by breaking it down into two components. ...
Article
The provision of content confidentiality via message encryption is by no means sufficient when facing the significant privacy risks present in online communications. Indeed, the privacy literature abounds with examples of traffic analysis techniques aimed to reveal a great deal of information, merely from the knowledge, even if probabilistic, of who is communicating with whom, when, and how frequently. Anonymous-communication systems emerge as a response against such traffic analysis threats. Mixes, and in particular threshold pool mixes, are a building block of anonymous communications systems. These are nodes that receive, store, reorder and delay messages in batches. However, the anonymity gained from the statistical difficulty to link incoming and outgoing messages comes at the expense of introducing a potentially costly delay in the delivery of those messages. In this paper we address the design of such mixes in a systematic fashion, by defining quantitative measures of both anonymity and delay, and by mathematically formalizing practical design decisions as a multiobjective optimization problem. Our extensive theoretical analysis finds the optimal mix parametrization and characterizes the optimal trade-off between the contrasting aspects of anonymity and delay, for two information-theoretic measures of anonymity. Experimental results show that mix optimization may lead to substantial delay reductions for a desirable level of anonymity.
... ● When the system has very short delays—for example when the sensors send few data, as it is the case with the typical range sensors or the odometry of a mobile robot, or when the communications and/or software run really fast—the shape of the statistical distribution of the delays is very close to an exponential [38]. Actually, exponential distributions are used for modeling inter-arrival times in many situations [39]. ...
Article
Full-text available
Networked telerobots are remotely controlled through general purpose networks and components, which are highly heterogeneous and exhibit stochastic response times; however their correct teleoperation requires a timely flow of information from sensors to remote stations. In order to guarantee these time requirements, a good on-line probabilistic estimation of the sensory transmission delays is needed. In many modern applications this estimation must be computationally highly efficient, e.g., when the system includes a web-based client interface. This paper studies marginal probability distributions that, under mild assumptions, can be a good approximation of the real distribution of the delays without using knowledge of their dynamics, are efficient to compute, and need minor modifications on the networked robot. Since sequences of delays exhibit strong non-linearities in these networked applications, to satisfy the iid hypothesis required by the marginal approach we apply a change detection method. The results reported here indicate that some parametrical models explain well many more real scenarios when using this change detection method, while some non-parametrical distributions have a very good rate of successful modeling in the case that non-linearity detection is not possible and that we split the total delay into its three basic terms: server, network and client times.
... An algorithmic procedure for estimating the parameters of LFSN processes was given by A. Karasaridis and D. Hatzinakos [30]. Other researches suggest a Poisson assumption in short time scale and a non-stationary view along with long-range dependence in long time scale behavior of network traffic [31], [32]. Through observing the long-term behavior of measured traffic traces, nonstationarity seems to be another remarkable feature in those data. ...
... An algorithmic procedure for estimating the parameters of LFSN processes was given by A. Karasaridis and D. Hatzinakos [30]. Other researches suggest a Poisson assumption in short time scale and a non-stationary view along with long-range dependence in long time scale behavior of network traffic [31], [32]. Through observing the long-term behavior of measured traffic traces, nonstationarity seems to be another remarkable feature in those data. ...
... An algorithmic procedure for estimating the parameters of LFSN processes was given by A. Karasaridis and D. Hatzinakos [30]. Other researches suggest a Poisson assumption in short time scale and a non-stationary view along with long-range dependence in long time scale behavior of network traffic [31], [32]. Through observing the long-term behavior of measured traffic traces, nonstationarity seems to be another remarkable feature in those data. ...
Data
Full-text available
Article
Full-text available
This study focuses on evaluating how well a wireless voice over internet protocol (VOIP) system works for 4G and 5G communications by using wavelets. It suggests creating a transmission system for wireless VOIP based on data from the wavelet transform (both image and audio data). Each wavelet’s performance will be assessed using several metrics, including signal-to-noise ratio, peak signal-to-noise ratio, root mean squared error, percentage of retained signal energy, and compression ratio. Additionally, other factors such as packet loss, Delay, and Throughput will be analyzed. These parameters will be evaluated with both raw data (without wavelets) and data processed using different wavelets. The simulation results show that the proposed model performs better with various wavelets compared to Markov’s model. Notably, the Daubechies wavelet family yields superior results compared to other types. This model could be beneficial for high-speed IP-based cellular systems where minimizing delay (latency) is crucial. For 5G systems, parallel computing techniques may be used to implement this model effectively.
Article
Full-text available
Problem Definition: Managers in ad agencies are responsible for delivering digital ads to viewers on behalf of advertisers, subject to the terms specified in the ad campaigns. They need to develop bidding policies to obtain viewers on an ad exchange and allocate them to the campaigns to maximize the agency's profits, subject to the goals of the ad campaigns. Academic/Practical Relevance: Determining a rigorous solution methodology is complicated by uncertainties in the arrival rates of viewers and campaigns, as well as uncertainty in the outcomes of bids on the ad exchange. In practice, ad hoc strategies are often deployed. Our methodology jointly determines optimal bidding and viewer allocation strategies and obtains insights about the characteristics of the optimal policies. Methodology: New ad campaigns and viewers are treated as Poisson arrivals and the resulting model is a Markov decision process where the state of the system is the number of undelivered impressions in queue for each campaign type in each period. We develop solution methods for bid optimization and viewer allocation and perform a sensitivity analysis with respect to the key problem parameters. Results: We solve for the optimal dynamic, state dependent bidding and allocation policies as a function of the number of ad impressions in queue, for both the finite horizon and steady state cases. We show that the resulting optimization problems are strictly concave in the decision variables and develop and evaluate a heuristic method that can be applied to large problems. Managerial Implications: Numerical analysis of our heuristic solution shows that its errors are generally small, and that the optimal dynamic, state dependent bidding policies obtained by our model are significantly better than optimal static policies. Our proposed approach is managerially attractive because it is easy to implement in practice. We identify the capacity of the impression queue as an important managerial control lever, and show that it can be more effective than using higher bids to reduce delay penalties. We quantify potential operational benefits from the consolidation of ad campaigns as well as merging ad exchanges.
Article
With the rapid development of big data and cloud computing, data management has become increasingly challenging. Over the years, a number of frameworks for data management have become available. Most of them are highly efficient, but ultimately create data silos. It becomes difficult to move and work coherently with data as new requirements emerge. A possible solution is to use an intelligent hierarchical (multi-tier) storage system (HSS). A HSS is a meta solution that consists of different storage frameworks organized as a jointly constructed storage pool. A built-in data migration policy that determines the optimal placement of the datasets in the hierarchy is essential. Placement decisions is a non-trivial task since it should be made according to the characteristics of the dataset, the tier status in a hierarchy, and access patterns. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL). We present a mathematical model, a software architecture, and implementations based on both simulations and a live cloud-based environment. We compare the proposed RL-based strategy to a baseline of three rule-based policies, showing that the RL-based policy achieves significantly higher efficiency and optimal data distribution in different scenarios.
Article
Cyber security is an important concern for all individuals, organisations and governments globally. Cyber attacks have become more sophisticated, frequent and dangerous than ever, and traditional anomaly detection methods have been proved to be less effective when dealing with these new classes of cyber threats. In order to address this, both classical and Bayesian models offer a valid and innovative alternative to the traditional signature-based methods, motivating the increasing interest in statistical research that it has been observed in recent years. In this review, we provide a description of some typical cyber security challenges, typical types of data and statistical methods, paying special attention to Bayesian approaches for these problems.
Article
Full-text available
As indispensable components of intelligent transportation systems, traffic detection and surveillance technologies deliver speed monitoring, traffic counting, and vehicle identification and classification. This paper proposes a normal distribution transform (NDT) algorithm to improve the speed accuracy and robustness of a laser-based detector. This method can deliver more accurate estimation of vehicle speed, enabling computation of the parameters of length and height. The results of simulation with different detector update rates suggest that the average estimation errors of vehicle parameters can be reduced using the NDT matching method, especially for the low detector update rate. The study also implemented a series of field experiments using the proposed detector prototype to verify the detector’s measurements of vehicle parameters. The proposed method is a promising way in which to improve the laser-based traffic detector. In simulation test, initial experiments show that the accuracy of speed estimation can reach 95%, given the update rate of 1000 Hz for detector, the average length error can be reduced by approximately 60%. Even for speeding vehicles traveling at 150 km/h, the estimated speed error is limited to 10 km/h. In field test, for a vehicle at the speed of 80km/h, the estimation errors are within the threshold of the maximum errors of simulation, that is, 32 cm for length error and 5.71 km/h for speed error results.
Article
Full-text available
Time synchronization among sensor devices connected through non-deterministic media is a fundamental requirement for sensor fusion and other distributed tasks that need a common time reference. In many of the time synchronization methods existing in literature, the estimation of the relation between pairs of clocks is a core concept; moreover, in applications that do not have general connectivity among its devices but a simple pairwise topology, such as embedded systems, mobile robots or home automation, two-clock synchronization is actually the basic form of the time estimation problem. In these kinds of applications, especially for critical ones, not only the quality of the estimation of the relation between two clocks is important, but also the bounds the methods provide for the estimated values, and their computational effort (since many are small systems). In this paper, we characterize, with a thorough parameterization, the possible scenarios where two-clock synchronization is to be solved, and then conduct a rigorous statistical study of both scenarios and methods. The study is based on exhaustive simulations run in a super-computer. Our aim is to provide a sound basis to select the best clock synchronization algorithm depending on the application requirements and characteristics, and also to deduce which ones of these characteristics are most relevant, in general, when solving the problem. For our comparisons we have considered several representative methods for clock synchronization according to a novel taxonomy that we also propose in the paper, and in particular, a few geometrical ones that have special desirable characteristics for the two-clock problem. We illustrate the method selection procedure with practical use-cases of sensory systems where two-clock synchronization is essential.
Article
Numerous time synchronization methods have been proposed during the last decades targeted at devices transmitting in general-topology networks, e.g. wireless sensor networks. Interestingly enough, there are still sensorics applications that, from the data flow point of view, just consist of pairs of devices –typically, being one of them a central controller shared among all pairs–; this is common in embedded systems, mobile robots, factory cells, domotic installations, etc. In these cases, time synchronization takes the form of the estimation of the relative drifts and offsets of these pairs of clocks, thus the quality, guarantees and computational cost of the methods become crucial. Under that specific perspective, it is still possible to propose novel approaches that are computationally efficient while providing guarantees on the resulting estimates for the relative clocks relation. In particular, solving the synchronization problem through a geometrical interpretation is specially suitable, since it provides both efficiency and sound estimates with hard bounds. In this paper we analyze a direct geometrical approach for pairwise systems that, through a more direct formulation of the common geometrical setting, improves efficiency and adapts better to diverse stochastic transmissions, while providing good estimates. We demonstrate its advantages with a real low-cost, embedded test bed that can be appropriately instrumented for measuring times, comparing its performance against previous synchronization approaches suitable for pairwise systems.
Article
In this paper, we present the case of utilizing interference temperature (IT) as a dynamic quantity rather than as a fixed quantity in an orthogonal frequency division multiple access (OFDMA) based spectrum sharing systems. The fundamental idea here is to reflect the changing capacity demand of primary user (PU) over time in setting the interference power threshold for secondary user (SU). This type of dynamic IT will allow the SU to opportunistically have higher transmit power during relaxed IT period, thereby resulting in higher network capacity. The cognitive radio network (CRN) considered in this paper has an underlay network configuration in which the available spectrum of the PU is accessed concurrently by SU provided that the interference power at the PU receiver from SU is under a certain power threshold. This power threshold is set to maintain and guarantee a certain level of quality of service (QoS) for PU network. Theoretical expressions for outage probability and mean capacity for SU network are derived, and validated with simulation results, and it is observed that utilizing dynamic IT results in high network performance gains as compared to utilizing a fixed IT in cognitive radio system.
Article
Full-text available
With the extensive use of peer-to-peer applications in recent years, the network traffic becomes more dynamic and less predictable, which leads to the decline of network resource utilization and the degradation of network performance. Aiming towards the above problems, we explore how to strengthen the cooperation between peer-to-peer applications and networks, making the application adjust its own traffic mode according to current network traffic status to enhance the stability of network traffic. We improve two key algorithms of peer selection and choking/unchoking in the protocol and introduce traffic relaxation to characterize traffic state while taking the current most popular peer-to-peer application (bit torrent protocol) as an example. In our improved method, peers are selected probabilistically according their traffic relaxation, and the double-parameter selection problem that simultaneously considers the traffic relaxation and transfer rate of peers is also solved. Finally, we conduct simulation experiments in two real network typologies with real traffic matrix data and different sizes of bit torrent swarms; the experimental results show that our method can significantly improve the stability of the network traffic without sacrificing or even improving the performance of the bit torrent protocol when compared with original BT protocol.
Article
Full-text available
Nowadays, the traffic over the networks is changing because of new protocols, devices and applications. Therefore, it is necessary to analyze the impact over services and resources. Traffic Classification of network is a very important prerequisite for tasks such as traffic engineering and provisioning quality of service. In this paper, we analyze the variable packet size of the traffic in an university campus network through the collected data using a novel sniffer that ensures the user data privacy. We separate the collected data by type of traffic, protocols and applications. Finally, we estimate the traffic model that represents this traffic by means of a Poisson process and compute its associated numerical parameters.
Article
Full-text available
The aim of this paper is to present some preliminary results and non-extensive statistical properties of selected operating system counters related to hard drive behaviour. A number of experiments have been carried out in order to generate the workload and analyse the behaviour of computers during man–machine interaction. All analysed computers were personal ones, worked under Windows operating systems. The research was conducted to demonstrate how the concept of non-extensive statistical mechanics can be helpful in the description of computer systems behaviour, especially in the context of statistical properties with scaling phenomena, long-term dependencies and statistical self-similarity. The studies have been made on the basis of perfmon tool that allows the user to trace operating systems counters during processing.
Conference Paper
The statistical properties of traffic in Internet access networks have long been of interest to networking researchers and practitioners. In this paper, we analyse network traffic originating and terminating from various types of Internet access networks (Ethernet, Digital Subscriber Line, Wireless hotspot and their next tier Internet Service Provider's core network) and show that renewal processes having heavy-tail distributed interarrival times (also known as fractal renewal processes) have a great potential in capturing statistical properties of traffic in access networks.
Article
In the mid-90's, it was shown that the statistics of aggregated time series from Internet traffic departed from those of traditional short range dependent models, and were instead characterized by asymptotic self-similarity. Following this seminal contribution, over the years, many studies have investigated the existence and form of scaling in Internet traffic. This contribution aims first at presenting a methodology, combining multiscale analysis (wavelet and wavelet leaders) and random projections (or sketches), permitting a precise, efficient and robust characterization of scaling which is capable of seeing through non-stationary anomalies. Second, we apply the methodology to a data set spanning an unusually long period: 14 years, from the MAWI traffic archive, thereby allowing an in-depth longitudinal analysis of the form, nature and evolutions of scaling in Internet traffic, as well as network mechanisms producing them. We also study a separate 3-day long trace to obtain complementary insight into intra-day behavior. We find that a biscaling (two ranges of independent scaling phenomena) regime is systematically observed: long-range dependence over the large scales, and multifractal-like scaling over the fine scales. We quantify the actual scaling ranges precisely, verify to high accuracy the expected relationship between the long range dependent parameter and the heavy tail parameter of the flow size distribution, and relate fine scale multifractal scaling to typical IP packet inter-arrival and to round-trip time distributions.
Conference Paper
The problem to accurately and parsimoniously characterize random series of events (RSEs) seen in the Web, such as Yelp reviews or Twitter hashtags, is not trivial. Reports found in the literature reveal two apparent conflicting visions of how RSEs should be modeled. From one side, the Poissonian processes, of which consecutive events follow each other at a relatively regular time and should not be correlated. On the other side, the self-exciting processes, which are able to generate bursts of correlated events. The existence of many and sometimes conflicting approaches to model RSEs is a consequence of the unpredictability of the aggregated dynamics of our individual and routine activities, which sometimes show simple patterns, but sometimes results in irregular rising and falling trends. In this paper we propose a parsimonious way to characterize general RSEs, namely the Burstiness Scale (BuSca) model. BuSca views each RSE as a mix of two independent process: a Poissonian and a self-exciting one. Here we describe a fast method to extract the two parameters of BuSca that, together, gives the burstiness scale ψ, which represents how much of the RSE is due to bursty and viral effects. We validated our method in eight diverse and large datasets containing real random series of events seen in Twitter, Yelp, e-mail conversations, Digg, and online forums. Results showed that, even using only two parameters, BuSca is able to accurately describe RSEs seen in these diverse systems, what can leverage many applications.
Article
A multifractal fractional sum-difference model (MFSD) is a monotone transformation of a Gaussian fractional sum-difference model (GFSD). The GFSD is the sum of two independent components: a moving sum of length two of discrete fractional Gaussian noise (fGn); and white noise. Internet traffic packet interarrival times are very well modeled by an MFSD in which the marginal distribution is Weibull; this is validated by extensive model checking for 715,665,213 measured arrival times on three Internet links. The simplicity of the model provides a mathematical tractability that results in a foundation for understanding the statistical properties of the arrival process. The current foundation is time scaling: properties of aggregate arrivals in successive equal-length time intervals and how the properties change with the interval length. This scaling is also the basis for the widely discussed multifractal wavelet models. The MFSD provides a more fundamental foundation that is based on how changes in the fGn and white noise components result in changes in the arrival process as various factors change such as the aggregation time length or the traffic packet rate. Logistic models relate the MFSD model parameters to the packet rate, so only the rate needs to be specified in using the MFSD model to generate synthetic packet arrivals for network engineering simulation studies.
Conference Paper
The Named Data Networking (NDN) and Content-Centric Networking (CCN) architectures advocate Interest aggregation as a means to reduce end-to-end latency and bandwidth consumption. To enable these benefits, Interest aggregation must be realized through Pending Interest Tables (PIT) that grow in size at the rate of incoming Interests to an extent that may eventually defeat their original purpose. A thorough analysis is provided of the Interest aggregation mechanism using mathematical arguments backed by extensive discrete-event simulation results. We present a simple yet accurate analytical framework for characterizing Interest aggregation in a CCN router, and use our model to develop an iterative algorithm to analyze the benefits of Interest aggregation in a network of interconnected routers. Our findings reveal that, under realistic assumptions, an insignificant fraction of Interests in the system benefit from aggregation, compromising the effectiveness of using PITs as an integral component of Content-Centric Networks.
Article
Full-text available
The problem to accurately and parsimoniously characterize random series of events (RSEs) present in the Web, such as e-mail conversations or Twitter hashtags, is not trivial. Reports found in the literature reveal two apparent conflicting visions of how RSEs should be modeled. From one side, the Poissonian processes, of which consecutive events follow each other at a relatively regular time and should not be correlated. On the other side, the self-exciting processes, which are able to generate bursts of correlated events and periods of inactivities. The existence of many and sometimes conflicting approaches to model RSEs is a consequence of the unpredictability of the aggregated dynamics of our individual and routine activities, which sometimes show simple patterns, but sometimes results in irregular rising and falling trends. In this paper we propose a highly parsimonious way to characterize general RSEs, namely the Burstiness Scale (BuSca) model. BuSca views each RSE as a mix of two independent process: a Poissonian and a self-exciting one. Here we describe a fast method to extract the two parameters of BuSca that, together, gives the burstyness scale, which represents how much of the RSE is due to bursty and viral effects. We validated our method in eight diverse and large datasets containing real random series of events seen in Twitter, Yelp, e-mail conversations, Digg, and online forums. Results showed that, even using only two parameters, BuSca is able to accurately describe RSEs seen in these diverse systems, what can leverage many applications.
Article
Full-text available
The population of the Internet users has exceeded 2.4 billion. Data centers provide Internet services to fulfill the demand from these users. Many data centers adopt the cluster-based server systems to host the required Internet services. These server systems consume significant amount of energy, but much of the power is used to maintain service capacity during idle or low workload periods. This paper surveys some recent approaches addressing this issue. As learned from the traditional telephone call center planning processes, a queueing model is adopted to model the server clusters with homogeneous architecture. By analyzing the model, a set of the factors affecting the energy cost is identified. Based on the identified factors, an on-line energy management technique is then deigned. The proposed approach is simulated with a real-world workload trace. The simulation result shows that approximately 70 to 75 % of the originally consumed energy can be saved.
Article
Optical buffering is one major challenge in realizing all-optical packet switching. In this paper we focus on a delay-line buffer architecture, called a Multiple-Input Single-Output (MISO) buffer, which is realized by cascaded fiber delay lines (FDLs). We consider the MISO buffers in a network scenario where the incoming packets are asynchronous and of fixed length. Instead of matching the fixed packet length, the length of basic FDL is adjusted to minimize the packet loss under different scenarios. A novel Markov model is proposed to analyze the performance of our buffering scheme, in terms of packet loss ratio, average packet delay and the output link utilization. Both simulation and analytical results show that the optimal basic FDL length is equal to the fixed packet length under low system load. However, under high system load, the basic FDL length is much smaller than fixed packet length for minimum packet loss. In addition, this paper gives clear guidelines for designing optimal basic FDL lengths under different network scenarios. It is noticeable that this optimal length value is independent of the buffer sizes.
Article
Optical buffering is one major challenge in realizing all-optical packet switching. In this paper we focus on a delay-line buffer architecture, called a Multiple-Input Single-Output FIFO (MISO-FIFO) optical buffer. This architecture reduces the physical size of a buffer by up to an order of magnitude or more by allowing reuse of its basic optical delay line elements. We consider the MISO-FIFO optical buffers in a network scenario where the incoming packets are asynchronous and of variable length. A simple Markov model is developed to analyze the performance of our buffering scheme, in terms of packet loss ratio, average packet delay and the output link utilization. Both simulation and analytical results show that increasing the buffer size will significantly improve the performance of this optical buffer under low system load. However, under high system load, its performance will deteriorate when increasing the buffer size. In addition, this paper gives clear guidelines for designing the optimal basic delay line lengths under different system loads, in order to get the minimal packet loss. It is noticeable that this optimal basic length value is independent of the buffer sizes.
Article
In this paper, we present results of application of Tsallis entropy in detection of denial of service attacks. Two detectors, one based on Tsallis and the other one based on Shannon's entropy, have been applied in several attack simulations, and their properties have been compared. The simulated attack is Synchronize packet (SYN) flood. A simple packet distribution, that is, entropy of source addresses are considered. In both cases, cumulative sum control chart algorithm is used for change point detection. Properties of two detectors that are compared are detection delay and rate of true and false positives. The results show that Tsallis entropy-based detector can outperform (with respect to false positive rate) Shannon-based one but that requires careful tuning of Tsallis Q parameter that depends on characteristics of network traffic. The detection delay of two detectors is approximately the same. Copyright © 2015 John Wiley & Sons, Ltd.
Chapter
There are currently several applications and frameworks to simulate networks, network equipment, applications, services and protocols. These tools play an important role not only in the research and development of new solutions on the computer networking area, but also on other areas of knowledge, because they enable researchers to abstract from the underlying network complexity by providing a prepackaged set of simulation primitives in a concise and useful manner. These tools implement several models and network protocols, some of them being able to produce realistic network traffic, apart from simulating the behavior of the protocols or applications. It is well known that real network traffic in aggregation points like routers should exhibit the self-similarity property for the amount of information per time unit, which reflects the burstiness of the traffic and affects the functioning of the devices and protocols. It is thus important to assess if network simulators generate this property. This chapter presents a study to assess if self-similarity is indeed embedded in traffic produced by popular network simulators, namely NS3 and OMNeT++, and discusses the values for the Hurst parameter obtained using different estimators and for the autocorrelation structure under various network scenarios.
Conference Paper
The growth of Internet traffic and the many different traffic classes that exist make network performance control extremely difficult for operators. The methods available rely on complex or costly hardware. However, recent research on bandwidth sharing has introduced methods that require only basic statistics of aggregated link utilization, such as mean and variance. This data can be easily obtained through SNMP calls, lowering the cost of monitoring systems. Unfortunately, to the best of our knowledge, no tools have yet been developed to implement these methods. This paper presents the implementation of a plugin for the monitoring environment Nagios and the validation of a degradation detection tool from link utilization traces. The plugin does not require complex or costly hardware for acquiring data. Instead, it employs basic SNMP data about link utilization.
Article
Full-text available
To increase wireless capacity, the concurrent use of multiple wireless interfaces on different frequency bands, called aggregation, can be considered. In this paper, we focus on aggregation of multiple Wi-Fi interfaces with packet-level traffic spreading between the interfaces. Two aggregation schemes, link bonding and multipath TCP (MPTCP), are tested and compared in a dualband Wi-Fi radio system with their Linux implementation. Various test conditions such as traffic types, network delay, locations, interface failures and configuration parameters are considered. Experimental results show that aggregation increases throughput performance significantly over the use of a single interface. Link bonding achieves lower throughput than MPTCP due to duplicate TCP acknowledgements (ACKs) resulting from packet reordering and filtering such duplicate ACKs out is considered as a possible solution. However, link bonding is fast responsive to links' status changes such as a link failure. It is shown that different combinations of interface weights for packet spread in link bonding result in different throughput performance, envisioning a spatio-temporal adaptation of the weights. We also develop a mathematical model of power consumption and compare the power efficiency of the schemes applying different power consumption profiles.
Article
Full-text available
Two main approaches exist today for pro-viding quality of service (QoS) in IP backbones. One approach relies on scheduling/queuing; the other re-lies on the presence of capacity/bandwidth. The two schools of thought arise from different understandings of traffic characteristics. If traffic were bursty and self-similar, then sophisticated packet scheduling would be necessary to manage the inevitable traffic peaks that exceed capacity and cause queue buildup. However, if packet arrival times were independent and traffic were to smooth with aggregation, there would be no queue buildups and ensuring QoS would be a function of long-term capacity planning rather than short-term queue management. In this paper, we present the results of an empirical study of Internet traffic characteristics. We use packet traces from a Tier-1 IP backbone network and intro-duce a non-parametric approach to study latency char-acteristics at high utilization levels. This approach re-quires minimal assumptions and has broad applicabil-ity in contrast with previous efforts that make difficult-to-verify traffic distribution assumptions. We find that even though self-similarity exists at some timescales, the Internet traffic we observed is neither heavy tailed nor highly correlated at the timescales critical to QoS.
Article
Internet traffic at various tiers of service providers is essentially a superposition or active mixture of traffic from various sources. Statistical properties of this superposition and a resulting phenomenon of scaling are important for network performance (queuing), traffic engineering (routing) and network dimensioning (bandwidth provisioning). In this article, the authors study the process of superposition and scaling jointly in a non-asymptotic framework so as to better understand the point process nature of cumulative input traffic process arriving at telecommunication devices (e.g., switches, routers). The authors further assess the scaling dynamics of the structural components (packets, flows and sessions) of the cumulative input process and their relation with superposition of point processes. Classical and new results are discussed with their applicability in access and core networks. The authors propose that renewal theory-based approximate point process models, that is, Pareto renewal process superposition and Weibull renewal process superposition can model the similar second-order scaling, as observed in traffic data of access and backbone core networks, respectively.
Conference Paper
Recent advances in information and communication technology resulted in increased use of multimedia applications. Multimedia transmission over Mobile Adhoc Networks (MANETs) with high Quality of Service (QoS) is a challenging task due to mobility of the nodes. The problems associated with multimedia transmission are loss of packets or frames and delay in receiving the packets at destination. To maintain the QoS, all the multimedia applications need that packets should be delivered in sequence and order, and delay should be within the tolerable limits. Existing works mainly uses packet loss information at source station to determine whether the transmission window size should be increased or decreased In existing approaches, the source station takes decision about increasing or decreasing window size based on reception of Acknowledgement/ non-reception of Acknowledgement. The magnitude of increment/decrement in window size is predefined but not as per actual scenario in the network. This leads to poor utilization of resources. Since multimedia applications are loss-tolerant, it is important to compute the packet loss accurately as the QoS requirements for multimedia services accepts the loss of packets up to some threshold. In this paper, we model the active stations as a Poisson random process and computed the packet loss, idle period and busy period at the front end of route. This packet loss estimation is useful for precise specification of buffer size resulting in improved QoS. Our experimental results demonstrate the usefulness in designing optimal control in order to improve QoS for multimedia applications in MANETs.
Conference Paper
In communication networks data packets are often transmitted via multiple hops on certain destined source-destination paths. The topological structure of such a scenario can be modeled by multi-node tandem queueing networks, but the standard independence assumption of queueing theory does not hold, that is, modeling the packet processing times at the successive nodes as independent and identically distributed random variables is not appropriate. While the packet sizes may vary among different packets, each single packet retains its size on its route through the network, also referred to as static packet size. For this realistic scenario, we study buffer occupancies in terms of the average number of packets buffered in each queue of a tandem network topology and investigate the model failure introduced by the independence assumption. In order to efficiently obtain accurate numerical results we use a recursive estimation scheme that does not require the independence assumption. Numerical studies reveal that buffer occupancies for size-retaining packets significantly differ from those obtained under the independence assumption, so that results obtained from analytical models working with the independence assumption should be taken with much care, in particular with regard to buffer dimensioning and related QoS guarantees.
Conference Paper
An Internet traffic trace over gigabit Ethernet contains information on arrival time and size of every packet over the collection period. An important problem is to model the latency and packet losses within the trace. The Lindley equation is a general form of describing the evolution of queueing delay processes and queue length processes, both of which are tightly associated with the latency and packet losses. We report on our use of the Lindley equation to analyse publicly-available link-layer data from a gigabit Ethernet gateway and discuss the performance of packet queueing delay and queue length based on the data.
Article
Cloud infrastructures consisting of heterogeneous resources are increasingly being utilized for hosting large scale distributed applications from diverse users with discrete needs. The multifarious cloud applications impose varied demand for computational resources along with multitude of performance implications. Successful hosting of cloud applications necessitates service providers to take into account the heterogeneity existing in the behavior of users, applications and system resources while respecting the user’s agreed Quality of Service (QoS) criteria. In this work, we propose a QoS-Aware Resource Elasticity (QRE) framework that allows service providers to make an assessment of application behavior and develop mechanisms that enable dynamic scalability of cloud resources hosting the application components. Experimental results conducted on Amazon EC2 cloud clearly demonstrate the effectiveness of our approach while complying with the agreed QoS attributes of users.
Conference Paper
Full-text available
In this paper, we investigate fine timescale (sub-frame level) features in video MPEG2 traffic, and consider their effect on performance. Based on trace-driven simulations, we demonstrate that short time features can have substantial performance and engineering implications. Motivated partly by recent applications of multi-scaling analysis to modeling wide area TCP traffic, we propose a multi-fractal cascade as a parsimonious representation of sub-frame traffic fluctuations, and show that it can closely match queueing performance on these timescales. We outline an analytical method for estimating performance of traffic that is multifractal on fine time-scales and long-range dependent on coarse timescales.
Article
Full-text available
We develop a new multiscale modeling framework for characterizing positive-valued data with long-range-dependent correlations (1/f noise). Using the Haar wavelet transform and a special multiplicative structure on the wavelet and scaling coefficients to ensure positive results, the model provides a rapid O(N) cascade algorithm for synthesizing N-point data sets. We study both the second-order and multifractal properties of the model, the latter after a tutorial overview of multifractal analysis. We derive a scheme for matching the model to real data observations and, to demonstrate its effectiveness, apply the model to network traffic synthesis. The flexibility and accuracy of the model and fitting procedure result in a close fit to the real data statistics (variance-time plots and moment scaling) and queuing behavior. Although for illustrative purposes we focus on applications in network traffic modeling, the multifractal wavelet model could be useful in a number of other areas involving positive data, including image processing, finance, and geophysics
Article
Full-text available
Network arrivals are often modeled as Poisson processes for analytic simplicity, even though a number of traffic studies have shown that packet interarrivals are not exponentially distributed. We evaluate 24 wide area traces, investigating a number of wide area TCP arrival processes (session and connection arrivals, FTP data connection arrivals within FTP sessions, and TELNET packet arrivals) to determine the error introduced by modeling them using Poisson processes. We find that user-initiated TCP session arrivals, such as remote-login and file-transfer, are well-modeled as Poisson processes with fixed hourly rates, but that other connection arrivals deviate considerably from Poisson; that modeling TELNET packet interarrivals as exponential grievously underestimates the burstiness of TELNET traffic, but using the empirical Tcplib interarrivals preserves burstiness over many time scales; and that FTP data connection arrivals within FTP sessions come bunched into “connection bursts”, the largest of which are so large that they completely dominate FTP data traffic. Finally, we offer some results regarding how our findings relate to the possible self-similarity of wide area traffic
Article
Full-text available
Although ATM seems to be the wave of the future, one analysis requires that the utilization of the network be quite low. That analysis is based on asymptotic decay rates of steady-state distributions used to develop a concept of effective bandwidths for connection admission control. The present authors have developed an exact numerical algorithm that shows that the effective-bandwidth approximation can overestimate the target small blocking probabilities by several orders of magnitude when there are many sources that are more bursty than Poisson. The bad news is that the appealing simple connection admission control algorithm using effective bandwidths based solely on tail-probability asymptotic decay rates may actually not be as effective as many have hoped. The good news is that the statistical multiplexing gain on ATM networks may actually be higher than some have feared. For one example, thought to be realistic, the analysis indicates that the network actually can support twice as many sources as predicted by the effective-bandwidth approximation. The authors also show that the effective bandwidth approximation is not always conservative. Specifically, for sources less bursty than Poisson, the asymptotic constant grows exponentially in the number of sources (when they are scaled as above) and the effective-bandwidth approximation can greatly underestimate the target blocking probabilities. Finally, they develop new approximations that work much better than the pure effective-bandwidth approximation
Article
Full-text available
Recent measurement and simulation studies have revealed that wide area network traffic displays complex statistical characteristics - possibly multifractal scaling - on fine timescales, in addition to the well-known property of self-similar scaling on coarser timescales. In this paper we investigate the performance and network engineering significance of these fine timescale features using measured TCP and MPEG2 video traces, queueing simulations and analytical arguments. We demonstrate that the fine timescale features can affect performance substantially at low and intermediate utilizations, while the longer timescale self-similarity is important at intermediate and high utilizations. We relate the fine timescale structure in the measured TCP traces to flow controls, and show that UDP traffic - which is not flow controlled - lacks such fine timescale structure. Likewise we relate the fine timescale structure in video MPEG2 traces to sub-frame encoding. We show that it is possibly to construct a relatively parsimonious multi-fractal cascade model of fine timescale features that matches the queueing performance of both the TCP and video traces. We outline an analytical method to estimate performance for traffic that is self-similar on coarse timescales and multi-fractal on fine timescales, and show that the engineering problem of setting safe operating points for planning or admission controls can be significantly influenced by fine timescale fluctuations in network traffic. The work reported here can be used to model the relevant characteristics of wide area traffic across a full range of engineering timescales, and can be the basis of more accurate network performance analysis and engineering.
Article
Full-text available
In apparent contrast to the well-documented self-similar (i.e., monofractal) scaling behavior of measured LAN traffic, recent studies have suggested that measured TCP/IP and ATM WAN traffic exhibits more complex scaling behavior, consistent with multifractals. To bring multifractals into the realm of networking, this paper provides a simple construction based on cascades (also known as multiplicative processes) that is motivated by the protocol hierarchy of IP data networks. The cascade framework allows for a plausible physical explanation of the observed multifractal scaling behavior of data traffic and suggests that the underlying multiplicative structure is a traffic invariant for WAN traffic that co-exists with self-similarity. In particular, cascades allow us to refine the previously observed self-similar nature of data traffic to account for local irregularities in WAN traffic that are typically associated with networking mechanisms operating on small time scales, such as TCP fl...
Article
Full-text available
Simulating how the global Internet data network behaves is an immensely challenging undertaking because of the net- work's great heterogeneity and rapid change. The hetero- geneity ranges from the individual links that carry the net- work's traffic, to the protocols that interoperate over the links, to the “mix” of different applications used at a site and the levels of congestion (load) seen on different links. We discuss two key strategies for developing meaningful sim- ulations in the face of these difficulties: searching for in- variants and judiciously exploring the simulation parameter space. We finish with a look at a collaborative effort to build a common simulation environment for conducting Internet studies.
Article
We analyse the queue Q L at a multiplexer with L sources which may display long-range dependence. This includes, for example, sources modelled by fractional Brownian motion (FBM). The workload processes W due to each source are assumed to have large deviation properties of the form P [ W t / a ( t ) > x ] ≈ exp[ – v ( t ) K ( x )] for appropriate scaling functions a and v , and rate-function K. Under very general conditions lim L → x L –1 log P [ Q L > Lb ] = – I ( b ), provided the offered load is held constant, where the shape function I is expressed in terms of the cumulant generating functions of the input traffic. For power-law scalings v ( t ) = t v , a ( t ) = t a (such as occur in FBM) we analyse the asymptotics of the shape function lim b → x b – u/a ( I ( b ) – δb v/a ) = v u for some exponent u and constant v depending on the sources. This demonstrates the economies of scale available though the multiplexing of a large number of such sources, by comparison with a simple approximation P [ Q L > Lb ] ≈ exp[−δ Lb v/a ] based on the asymptotic decay rate δ alone. We apply this formula to Gaussian processes, in particular FBM, both alone, and also perturbed by an Ornstein–Uhlenbeck process. This demonstrates a richer potential structure than occurs for sources with linear large deviation scalings.
Chapter
Although point processes are just integer-valued random measures, their importance justifies a separate treatment, and their special features yield to techniques not readily applicable to general random measures. The first and last parts of the chapter summarize results for point processes, which parallel those for random measures—existence theorems, moment structure, and generating functional—as well as furnishing illustrative (and important) examples. Many of the results are special cases of the corresponding results in Chapter 6, while others are extensions from the context of finite point processes in Chapter 5. The remaining part of the chapter, on the avoidance functions and intensity measures, deals with properties that are peculiar to point processes and for which the extensions to general random measures are not easily found.
Conference Paper
A mechanism which gives rise to self-similar network traffic is examined, and its performance implications are presented. This mechanism is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. In a `realistic' client/server network environment, the degree to which file sizes are heavy-tailed can directly determine the degree of self-similarity traffic at the link level. The properties of the transport layer play an important role in preserving and modulating this causal relationship. Performance implications of self-similarity as represented by such performance measures as packet loss rate, retransmission rate, and queueing delay are presented.
Article
We present a multiplicative multifractal process to model traffic which exhibits long-range dependence. Using traffic trace data captured by Bellcore from operations across local and wide area networks, we examine the interarrival time series and the packet length sequences. We also model the frame size sequences of VBR video traffic process. We prove a number of properties of multiplicative multifractal processes that are most relevant to their use as traffic models. In particular, we show these processes to characterize effectively the long-range dependence properties of the measured processes. Furthermore, we consider a single server queueing system which is loaded, on one hand, by the measured processes, and, on the other hand, by our multifractal processes (the latter forming a MFe/MFg/1 queueing system model). In comparing the performance of both systems, we demonstrate our models to effectively track the behaviour exhibited by the system driven by the actual traffic processes. We show the multiplicative multifractal process to be easy to construct. Through parametric dependence on one or two parameters, this model can be calibrated to fit the measured data. We also show that in simulating the packet loss probability, our multifractal traffic model provides a better fit than that obtained by using a fractional Brownian motion model. Copyright © 2001 John Wiley & Sons, Ltd.
Article
We analyse the queueQ L at a multiplexer withL inputs. We obtain a large deviation result, namely that under very general conditions limLL1logP[QL>Lb]=I(b)\mathop {\lim }\limits_{L \to \infty } L^{ - 1} \log P\left[ {Q^L > Lb} \right] = - I(b)\mathop {\lim }\limits_{L \to \infty } L^{ - 1} \log P\left[ {Q^L > Lb} \right] = - I(b) provided the offered load is held constant, where the shape functionI is expressed in terms of the cumulant generating functions of the input traffic. This provides an improvement on the usual effective bandwidth approximation P[QL>b]eδbP\left[ {Q^L > b} \right] \approx e^{ - \delta b}P\left[ {Q^L > b} \right] \approx e^{ - \delta b} replacing it with P[QL>b]eLI(b/L)P\left[ {Q^L > b} \right] \approx e^{ - LI(b/L)}P\left[ {Q^L > b} \right] \approx e^{ - LI(b/L)} , The differenceI(b)–n = - limt ® ¥ tlt (d)\nu = - \lim _{t \to \infty } t\lambda _t (\delta ) exists (here limb ® ¥ (I(b) - db) = n\lim _{b \to \infty } (I(b) - \delta b) = \nu . We apply this idea to a number of examples of arrivals processes: heterogeneous superpositions, Gaussian processes, Markovian additive processes and Poisson processes. We obtain expressions forv in these cases,v is zero for independent arrivals, but positive for arrivals with positive correlations. Thus ecconomies of scale are obtainable for highly bursty traffic expected in ATM multiplexing.
Conference Paper
Traffic variables on an uncongested Internet wire exhibit a pervasive nonstationarity. As the rate of new TCP connections increases, arrival processes (packet and connection) tend locally toward Poisson, and time series variables (packet sizes, transferred file sizes, and connection round-trip times) tend locally toward independent. The cause of the nonstationarity is superposition: the intermingling of sequences of connections between different source-destination pairs, and the intermingling of sequences of packets from different connections. We show this empirically by extensive study of packet traces for nine links coming from four packet header databases. We show it theoretically by invoking the mathematical theory of point processes and time series. If the connection rate on a link gets sufficiently high, the variables can be quite close to Poisson and independent; if major congestion occurs on the wire before the rate gets sufficiently high, then the progression toward Poisson and independent can b e arrested for some variables.
Conference Paper
This paper presents the architecture of a passive monitoring system installed within the Sprint IP backbone network. This system differs from other packet monitoring systems in that it collects packet-level traces from multiple links within the network and provides the capability to correlate the data using highly accurate GPS timestamps. After a thorough description of the monitoring systems, we demonstrate the system's capabilities and the diversity of the results that can be obtained from the collected data. These results include workload characterization, packet size analysis, and packet delay incurred through a single backbone router. We conclude with lessons learned from the development of the monitoring infrastructure and present future research goals.
Article
Recent Bellcore studies have shown that high-speed data traffic exhibits “long-range dependence”, characterized byH>0.5, whereH is the Hurst parameter of the traffic. In the wake of those studies, there has been much interest in developing tractable analytical models for traffic with long-range dependence, for use in performance evaluation and traffic engineering. Norros has used a traffic model known as Fractional Brownian Motion (FBM) to derive several analytical results on the behavior of a queue subject to such an arrival process. In this paper, we derive a new class of results, also based on the FBM model, which reveal rather curious and unexpected “crossover” properties of the Hurst parameter of the traffic, as regards its effect on the behavior of queues. These results, together with those of Norros, serve to enhance our understanding of the significance of the Hurst parameterH for traffic engineering. In particular, Krishnan and Meempat have used the crossover property derived here to explain, in part, a gap that existed between the results of two sets of Bellcore studies, one casting doubt on the usefulness of Markovian traffic models and methods whenH>0.5, and the other furnishing an example of successful traffic engineering with Markovian methods for traffic known to haveH>0.5. The results derived here can be used to obtainconservative estimates of the multiplexing gains achieved when independent traffic sources with the same Hurst parameterH are multiplexed for combined transmission. In turn, such estimates yield guidelines for the engineering of ATM links that are subject to traffic with long-range dependence.
Article
The family of autoregressive integrated moving-average processes, widely used in time series analysis, is generalized by permitting the degree of differencing to take fractional values. The fractional differencing operator is defined as an infinite binomial series expansion in powers of the backward-shift operator. Fractionally differenced processes exhibit long-term persistence and antipersistence; the dependence between observations a long time span apart decays much more slowly with time span than is the case with the more commonly studied time series models. Long-term persistent processes have applications in economics and hydrology; compared to existing models of long-term persistence, the family of models introduced here offers much greater flexibility in the simultaneous modelling of the short-term and long-term behaviour of a time series.
Article
1. PRESENTATION 1.. THIS PAPER describes the four concepts in the title, first separately, then through their interactions. The "H-spectrum hypothesis" and the "infinite variance hypothesis" are divergence hypotheses introduced to account for the main erratic aspects in the behavior of economic time series. Next to be studied are "long-run linearity" and "locally Gaussian processes," new models to be added to the toolbox of econometrics to clarify the relations between the two divergence hypotheses. As a foil to the properties of economic time series, let us note that sequences of independent and identically distributed Gaussian random variables possess the following properties: (a) different sample functions of such a process look, from a distance, remarkably alike; (b) when analyzed by spectral analysis, each sample function seems to have a "white spectrum," that is, an almost constant spectral density; (c) different samples of sufficient length, taken from the same independent Gaussian process, yield essentially identical estimates of the population mean and variance. Clearly, most economic time series do not fulfill the independent Gaussian ideal of simplicity, even after possible gross nonstationarity has been elimi- nated by differencing. (For example, when dealing with prices, we shall consider sequences of price increments rather than sequences of the prices themselves.) For analysis, discrepancies from independent Gaussian processes can be divided into two classes. The first includes "high frequency effects," to which the bulk of econometrics has so far been devoted. An example of sulch effect is the existence of non-vanishing correlation between successive, or nearly successive, values of a time series. The second includes "low frequency effects," with which my own past and present work is concerned. To use a medical term, we shall say that, when various low frequency "symptoms" occur simultaneously, they add up to a "syndrome." I have tried to postpone to Sections 2 and 3 of this paper most of the tech- nical arguments, and to devote Section 1 to a comparatively informal presenta- tion of the main points. First, two "low frequency syndromes" of economics will be described. Next, they will be shown incompatible within the framework of the usual econometric models. Finally, a generalized model will be construct- ed, which allows for both these syndromes.
Article
We discuss findings from a large-scale study of Internet packet dynamics conducted by tracing 20000 TCP bulk transfers between 35 Internet sites. Because we traced each 100-kbyte transfer at both the sender and the receiver, the measurements allow us to distinguish between the end-to-end behavior due to the different directions of the Internet paths, which often exhibit asymmetries. We: (1) characterize the prevalence of unusual network events such as out-of-order delivery and packet replication; (2) discuss a robust receiver-based algorithm for estimating “bottleneck bandwidth” that addresses deficiencies discovered in techniques based on “packet pair;” (3) investigate patterns of packet loss, finding that loss events are not well modeled as independent and, furthermore, that the distribution of the duration of loss events exhibits infinite variance; and (4) analyze variations in packet transit delays as indicators of congestion periods, finding that congestion periods also span a wide range of time scales
Article
The notion of self-similarity has been shown to apply to wide-area and local-area network traffic. We show evidence that the subset of network traffic that is due to World Wide Web (WWW) transfers can show characteristics that are consistent with self-similarity, and we present a hypothesized explanation for that self-similarity. Using a set of traces of actual user executions of NCSA Mosaic, we examine the dependence structure of WWW traffic. First, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity in such traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user “think time”, and the superimposition of many such transfers in a local-area network. To do this, we rely on empirically measured distributions both from client traces and from data independently collected at WWW servers
Article
A number of empirical studies of traffic measurements from a variety of working packet networks have demonstrated that actual network traffic is self-similar or long-range dependent in nature-in sharp contrast to commonly made traffic modeling assumptions. We provide a plausible physical explanation for the occurrence of self-similarity in local-area network (LAN) traffic. Our explanation is based on convergence results for processes that exhibit high variability and is supported by detailed statistical analyzes of real-time traffic measurements from Ethernet LANs at the level of individual sources. This paper is an extended version of Willinger et al. (1995). We develop here the mathematical results concerning the superposition of strictly alternating ON/OFF sources. Our key mathematical result states that the superposition of many ON/OFF sources (also known as packet-trains) with strictly alternating ON- and OFF-periods and whose ON-periods or OFF-periods exhibit the Noah effect produces aggregate network traffic that exhibits the Joseph effect. There is, moreover, a simple relation between the parameters describing the intensities of the Noah effect (high variability) and the Joseph effect (self-similarity). An extensive statistical analysis of high time-resolution Ethernet LAN traffic traces confirms that the data at the level of individual sources or source-destination pairs are consistent with the Noah effect. We also discuss implications of this simple physical explanation for the presence of self-similar traffic patterns in modern high-speed network traffic
Article
Traffic measurement studies from a wide range of working packet networks have convincingly established the presence of significant statistical features that are characteristic of fractal traffic processes, in the sense that these features span many time scales. Of particular interest in packet traffic modeling is a property called long-range dependence (LRD), which is marked by the presence of correlations that can extend over many time scales. We demonstrate empirically that, beyond its statistical significance in traffic measurements, long-range dependence has considerable impact on queueing performance, and is a dominant characteristic for a number of packet traffic engineering problems. In addition, we give conditions under which the use of compact and simple traffic models that incorporate long-range dependence in a parsimonious manner (e.g., fractional Brownian motion) is justified and can lead to new insights into the traffic management of high speed networks
Article
Demonstrates that Ethernet LAN traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal-like behavior, that such behavior has serious implications for the design, control, and analysis of high-speed, cell-based networks, and that aggregating streams of such traffic typically intensifies the self-similarity (“burstiness”) instead of smoothing it. These conclusions are supported by a rigorous statistical analysis of hundreds of millions of high quality Ethernet traffic measurements collected between 1989 and 1992, coupled with a discussion of the underlying mathematical and statistical properties of self-similarity and their relationship with actual network behavior. The authors also present traffic models based on self-similar stochastic processes that provide simple, accurate, and realistic descriptions of traffic scenarios expected during B-ISDN deployment
Article
We present a parameterizable methodology for profiling Internet traffic flows at a variety of granularities. Our methodology differs from many previous studies that have concentrated on end-point definitions of flows in terms of state derived from observing the explicit opening and closing of TCP connections. Instead, our model defines flows based on traffic satisfying various temporal and spatial locality conditions, as observed at internal points of the network. This approach to flow characterization helps address some central problems in networking based on the Internet model. Among them are route caching, resource reservation at multiple service levels, usage based accounting, and the integration of IP traffic over an ATM fabric. We first define the parameter space and then concentrate on metrics characterizing both individual flows as well as the aggregate flow profile. We consider various granularities of the definition of a flow, such as by destination network, host-pair, or host and port quadruple. We include some measurements based on case studies we undertook, which yield significant insights into some aspects of Internet traffic, including demonstrating (i) the brevity of a significant fraction of IP flows at a variety of traffic aggregation granularities, (ii) that the number of host-pair IP flows is not significantly larger than the number of destination network flows, and (iii) that schemes for caching traffic information could significantly benefit from using application information
Article
This article reports a review of the most significant issues related to network architectures and technologies which will enable the realization of future optical Internet networks. The design of such networks has to take into consideration the peculiar characteristics of Internet traffic. Several architectures have been proposed to provide optical networking solutions, based on wavelength-division multiplexing and compatible with the IP world. These architectures are presented briefly, and the main advantages and drawbacks are discussed. Furthermore, advanced network architectures are reported. In particular, two network paradigms are illustrated and discussed: the optical transparent packet network and optical burst switching. Finally, the key technologies are illustrated
Article
A key criterion in the design of high-speed networks is the probability of the buffer content exceeding a certain threshold. We consider n independent identical traffic sources modelled as point processes, which are fed into a link with speed proportional to n. Under fairly general assumptions on the input processes we show that the steady state unfinished work exceeding a threshold b > 0 tends to the corresponding probability assuming Poisson input processes. We verify the assumptions for a broad range of long-range dependent sources commonly used to model data traffic. Simulations show that for realistic values of the exceedance probability and moderate utilisations, convergence takes place at reasonable values of the number of sources superposed. In particular, our results indicate that even in the presence of long-range dependent traffic sources, with superposition the buffer exceedance probability has an exponential tail for even smaller bu ers than suggested by previous results, which consider O(n) buffer size. This is particularly relevant for high-speed networks in which the cost of high-speed memory is significant.
Article
TCP start times for HTTP are nonstationary. The nonstationarity occurs because the start times on a link, a point process, are a superposition of source traffic point processes, and the statistics of superposition changes as the number of superposed processes changes. The start time rate is a measure of the number of traffic sources. The univariate distribution of the inter-arrival times is approximately Weibull, and as the rate increases, the Weibull shape parameter goes to 1, an exponential distribution. The autocorrelation of the log inter-arrival times is described by a simple, two-parameter process: white noise plus a long-range persistent time series. As the rate increases, the variance of the persistent series tends to zero, so the log times tend to white noise. A parsimonious statistical model for log inter-arrivals accounts for the autocorrelation, the Weibull distribution, and the nonstationarity in the two with the rate. The model, whose purpose is to provide stochastic input to a ...
Article
As the number of active connections (NAC) on an Internet link increases, the long-range dependence of packet traffic changes due to increased statistical multiplexing of packets from different connections. Four packet traffic variables are studied as time series --- inter-arrival times, sizes, packet counts in 100 ms intervals, and byte counts in 100 ms intervals. Results are based on the following: (1) the mathematical theory of marked point processes; (2) empirical study of 2526 packet traces, 5 min or 90 sec in duration, from 6 Internet monitors measuring 15 interfaces ranging from 100 mbps to 622 mbps; (3) simple statistical models for the traffic variables; and (4) network simulation with NS. All variables have components of long-range dependence at all levels of the NAC. But the variances of the long-range dependent components of the sizes and of the inter-arrivals decrease to zero as the NAC increases; the sizes tend toward independent, and the inter-arrivals tend toward independent or very short range dependent. These changes in the sizes and inter-arrivals are not arrested by the increased upstream queueing that eventually occurs as the NAC increases. The long-range dependence of the count variables does not change with the NAC, but their standard deviations relative to the means decrease like one over the square root of the NAC, making the counts smooth relative to the mean. Theory suggests that once the NAC is large enough, increased upstream queueing should alter these properties of the counts, but in the empirical study and in the simulation study the NAC was not large enough to observe an alteration for 100 ms counts. The change in the long-range dependence of the sizes and inter-arrivals does not contradict the constancy of the long-range dependence of th...
Article
Integrated moving average processes (IMA), especially the first-order moving average processes IMA(1; 1), are useful for modeling time series data occurring in economic situations and industrial control problems. It is noticed that any IMA(1; 1) can be thought of as a smoothest possible IMA(1; 1) process buried in white noise. This motivated our orthogonal decomposition of IMA(1; 1) processes. More specifically, the first-order differences of the observed series can be expressed as the sum of two independent processes. One is the first-order differences of a white noise process and the other is the first-order sum of another white noise process. The corresponding spectrum decomposition is then simple and useful for model building. Moreover, this decomposition allows a simple implementation of the EM algorithm for maximum likelihood estimation for a Gaussian IMA(1; 1) process. Based on this orthogonal decomposition, from modeling perspective in the frequency domain we consider a general...
Article
We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, and that such behavior has serious implications for the design, control, and analysis of high-speed, cell-based networks. Intuitively, the critical characteristic of this self-similar traffic is that there is no natural length of a "burst": at every time scale ranging from a few milliseconds to minutes and hours, similar-looking traffic bursts are evident; we find that aggregating streams of such traffic typically intensifies the self-similarity ("burstiness") instead of smoothing it. Our conclusions are supported by a rigorous statistical analysis of hundreds of millions of high quality Ethernet traffic measurements collected between 1989 and 1992, coupled with a discussion of the underlying mathematical and statistical properties of self-similarity and their relationship with actual network behavior. We also consider some implications for congestion control in high-bandwidth networks and present traffic models based on self-similar stochastic processes that are simple, accurate, and realistic for aggregate traffic.
Article
We analyse the queue Q L at a multiplexer with L sources which may display long-range dependence. This includes, for example, sources modelled by fractional Brownian Motion (fBM). The workload processes W due to each source are assumed to have large deviation properties of the form P [W t =a(t) ? x] ß e Gammav(t)K(x) for appropriate scaling functions a and v, and ratefunction K. Under very general conditions, lim L!1 L Gamma1 log P [Q L ? Lb] = GammaI (b) provided the offered load is held constant, where the shape function I is expressed in terms of the cumulant generating functions of the input traffic. For power-law scalings v(t) = t v , a(t) = t a (such as occur in fBM) we analyse the asymptotics of the shape function: lim b!1 b Gammau=a i I(b) Gamma ffi b v=a j = u for some exponent u and constant depending on the sources. This demonstrates the economies of scale available through the multiplexing of a large number of such sources, by comparison with ...
Fractional Sum-Difference Models for Open-Loop Generation of Internet Packet Traffic
  • J Cao
  • W S Cleveland
  • D X Sun
J. Cao, W. S. Cleveland, and D. X. Sun, "Fractional SumDifference Models for Open-Loop Generation of Internet Packet Traffic," Tech. Rep., Bell Labs, Murray Hill, NJ, 2002.
S-Net: A Software System for Analyzing Packet Header Databases
  • J Cao
  • W S Cleveland
  • D X Sun
J. Cao, W. S. Cleveland, and D. X. Sun, "S-Net: A Software System for Analyzing Packet Header Databases," in Proceedings Passive and Active Measurement, 2002.
Timestamping Network Packets
  • J Micheel
  • S Donnelly
  • I Graham
J. Micheel, S. Donnelly, and I. Graham, "Timestamping Network Packets," in Proceedings ACM SIGCOMM Internet Measurement Workshop, San Francisco, 2001.
Architectural and Technological Issues for Future Optical Internet Networks
  • M Listani
  • V Eramo
  • R Sabella
M. Listani, V. Eramo, and R. Sabella, "Architectural and Technological Issues for Future Optical Internet Networks," IEEE Communications Magazine, vol. September, pp. 82-86, 2000.