Maaz Mohiuddin’s research while affiliated with Swiss Federal Institute of Technology in Lausanne and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Figure 2: Unsafety in SCL and Passive schemes. Unsafety of FCR is 0.
Figure 5: CCDF of convergence time for AS 1221 with parameter set normal, g = 2 and δ n = 10 ms.
Figure 6: Median and tail (at 99 th percentile) of 4-port fattree with 2 and 3 replicas.
FCR: Fast and Consistent Controller-Replication in Software Defined Networking
  • Article
  • Full-text available

November 2019

·

158 Reads

·

3 Citations

IEEE Access

Maaz Mohiuddin

·

Mia Primorac

·

·

We consider the problem of coordination among replicated SDN controllers, where the challenge is to ensure a consistent view of the network while reacting to network events in a prompt manner. Existing solutions are either consensus-based, which achieve consistency at the expense of high latency; or eventual-consistency-based, which have low latency at the expense of severe limitations on the types of applications and policies implementable by the controller. We propose the Fast and Consistent Controller-Replication (FCR) scheme. FCR is based on a deterministic agreement mechanism that performs agreement on the input of controllers, instead of agreement on the output as done in consensus mechanisms. We formally prove that FCR provides the same guarantees in terms of implementable applications and network policies, as any deterministic single-image controller. Through simulation and implementation, we show that these guarantees can be implemented with little latency overhead, compared to eventual-consistency approaches, and can be achieved significantly faster than consensus-based approaches.

Download

Experimental validation of the suitability of virtualization-based replication for fault tolerance in real-time control of electric grids

October 2018

·

35 Reads

·

1 Citation

·

Jalal Mostafa

·

Maaz Mohiuddin

·

[...]

·

Real-time control systems (RTCSs) perform complex control and require low response times. They typically use third-party software libraries and are deployed on generic hardware, which suffer from delay faults that can cause serious damage. To improve availability and latency, the controllers in RTCSs are replicated on physical nodes. As physical replication is expensive, we study the alternative of exploiting virtualization technology to run multiple virtual replicas on the same physical node. As virtual replicas share the same resources, the delay faults they experience might be correlated, which would make such a replication method unsuitable. We conduct several experiments with an RTCS for electric grids, with multiple virtual replicas of its controller. We find that although the delay of a virtual machine is higher than of a physical machine, the correlation between high delays among the virtual replicas is insignificant, causing an overall improved availability. We conclude that virtual replication is indeed applicable to certain RTCSs, as it can improve reliability without added cost.



Figure 1: Architecture of a real-time software-based control system for electric grids 
Figure 4: Test setup for validation. All elements used in the experiments are shown in solid points. 
Figure 6: Empirical CDF of the relative error in voltage at all the buses. 
Figure 7: CPU and memory usage of T-RECS, on a laptop with 3.7 GB RAM and a 2.67GHz Intel Core i7 processor, as a function of number of software agents. CPU usage in percentage is cumulative of all four CPUs of the i7 processor.
T-RECS: A Virtual Commissioning Tool for Software-Based Control of Electric Grids: Design, Validation, and Operation

June 2018

·

141 Reads

·

6 Citations

In real-time control of electric grids using multiple software agents, the control performance depends on (1) the proper functioning of the software agents, i.e., absence of software faults, and (2) the behavior of software agents in the presence of non-ideal communication networks such as message losses and delays. To evaluate the control performance of such systems, we propose T-RECS, a virtual commissioning tool. T-RECS enables testing the performance of software-based control in-silico (before the actual deployment of software agents in the grid), saving both time and money. Developers can run the binaries of their software agents in T-RECS where these binaries exchange real messages by using an emulated network and simulated models of the electric grid and resources. Consequently, the control of an entire microgrid can be tested on a standard computer. In this paper, we first describe the design and the open-source implementation of T-RECS. Second, we measure its CPU and memory usage and show that our implementation can accommodate eight software agents on a standard laptop computer. Third, we validate the simulated grid used in T-RECS by replaying data collected from experiments performed in a real low-voltage microgrid. We find that the average error is 0.037% and the 99th percentile of the error is less than 0.1%. Finally, we present some typical use-cases of T-RECS such as performance evaluation (1) under extreme grid conditions and (2) with non-ideal communication networks. The former, i.e., performance evaluation under extreme grid conditions, is difficult to test in the field due to safety concerns.


Fig. 1: Architecture of one TSN node output port. 
Fig. 2: Illustration of the queuing policy by TSN switches for four flows of class A. 
Fig. 3: Timing Model in TSN 
End-to-End Latency and Backlog Bounds in Time-Sensitive Networking with Credit Based Shapers and Asynchronous Traffic Shaping

April 2018

·

322 Reads

We compute bounds on end-to-end worst-case latency and on nodal backlog size for a per-class deterministic network that implements Credit Based Shaper (CBS) and Asynchronous Traffic Shaping (ATS), as proposed by the Time-Sensitive Networking (TSN) standardization group. ATS is an implementation of the Interleaved Regulator, which reshapes traffic in the network before admitting it into a CBS buffer, thus avoiding burstiness cascades. Due to the interleaved regulator, traffic is reshaped at every switch, which allows for the computation of explicit delay and backlog bounds. Furthermore, we obtain a novel, tight per-flow bound for the response time of CBS, when the input is regulated, which is smaller than existing network calculus bounds. We also compute a per-flow bound on the response time of the interleaved regulator. Based on all the above results, we compute bounds on the per-class backlogs. Then, we use the newly computed delay bounds along with recent results on interleaved regulators from literature to derive tight end-to-end latency bounds and show that these are less than the sums of per-switch delay bounds.



Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems

November 2017

·

91 Reads

·

7 Citations

IEEE Transactions on Industrial Informatics

Real-time control systems use controllers that compute and issue setpoints within stringent delay constraints. Failure to do so, due to a crash or delay as a result of software and/or hardware faults, can cause failure of the controlled resources. Recently, Axo, a protocol for masking crash and delay faults by replicating the controller, was proposed. Axo provides safety by discarding delayed setpoints, and it relies on the presence of valid setpoints for providing availability. To ensure that enough valid setpoints are issued, faulty controller replicas need to be detected and recovered. We present a mechanism for detection and recovery of delay- and crash-faulty replicas under the Axo framework. These mechanisms were designed to be soft state (i.e., their state can be reconstructed from received messages) to enable seamless additions of new replicas. Besides presenting the design, we analytically characterize the time to detect and recover a faulty replica, and we validate them experimentally. We demonstrate the performance of Axo by using two case studies: the first provides a stability analysis of an inverted pendulum system with Axo, and the second shows the fault-tolerance performance of Axo through a deployment on a real-time control system that controls a CIGR'E low-voltage benchmark microgrid.




Experimental validation of the usability of Wi-Fi over redundant paths for streaming phasor data

November 2016

·

28 Reads

·

13 Citations

Applications performing streaming of phasor-measurement data require low latency and losses from the communication network. Traditionally, such requirements are realized through wired infrastructure. Recently, wireless infrastructure has gained attention due to its low-cost and ease of deployment, but its poor quality-of-service is a strong deterrent for use in mission-critical applications. Recent studies have used measurements to explore the use of packet replication over redundant Wi-Fi paths, for obtaining the desired loss performance without hampering the end-to-end latency. However, these studies are done in a controlled, laboratory environment and do not reflect the real, in-field performance. In this paper, we perform extensive measurements using two co-located directional Wi-Fi links in a real-life setting, to experimentally validate the use of packet replication over Wi-Fi for streaming phasor data. In the setting that we evaluated, we find that the two channels are not fail-independent but the performance achieved with replication is very close to what it would be if they were to be independent. From the loss and latency statistics after replication, we conclude that replicating the phasor data over redundant Wi-Fi paths is a viable option for achieving the desired quality-of-service.


Citations (12)


... The activation is driven by the frequency deviation of the system. Each FCR provider measures the frequency deviation locally with respect to nominal frequency (50 Hz) and provides primary regulation proportionally to it [7][8][9][10]. ...

Reference:

Compatibility Analysis of Frequency Containment Reserve and Load Frequency Control Functions
FCR: Fast and Consistent Controller-Replication in Software Defined Networking

IEEE Access

... In a TSN network, the network graph G = (N , L) consists of a set of nodes N , including end-systems (ESs) and switches (SWs), and a set of physical links L, as shown in Fig. 1(a). We define (u, v) ∈ L as the physical link from node u to node v, also representing the corresponding egress port, with a transmission rate of C. For each egress port, this paper considers a hybrid architecture of ATS [13], [22] and CBS [12], referred to as TSN/ATS+CBS [23], as shown in Fig. 1(b). This hybrid architecture employs CBS to provide flexible shaping services through bandwidth reservation and utilizes ATS to implement Physical link from node u to node v, also representing the corresponding egress port C Transmission rate of physical link ...

Latency and Backlog Bounds in Time-Sensitive Networking with Credit Based Shapers and Asynchronous Traffic Shaping

... To develop FCR we leveraged on the following design principles and mechanisms: (1) the SCL [4] architecture, which consists of controller proxies and switch proxies, (2) Quarts [12], a low-latency agreement mechanism used to achieve agreement among controller replicas, (3) intentionality clocks [13] to achieve a total ordering of events. Combining these and constructing an efficient (i.e., low latency, low bandwidth and highly available) and viable design is nontrivial. ...

Ordering Events Based on Intentionality in Cyber-Physical Systems
  • Citing Conference Paper
  • April 2018

... En ce sens, la plateforme de simulation T-RECS de l'EPFL permet de s'abstraire de la contrainte de disponibilité de leur démonstrateur expérimental. Au prix d'un effort de modélisation des composants physiques, la concordance des résultats de simulations et d'essais réels valident le simulateur, [12]. Un autre exemple de simulateur est SCORE, développé à l'université de Georgie [13]. ...

T-RECS: A Virtual Commissioning Tool for Software-Based Control of Electric Grids: Design, Validation, and Operation

... The definition of a multiagent systems' architecture has gone through an evolution from the initial widely used platforms, for example, JADE [91], to newer ones with improved capabilities [92]- [95]. Some common trends can be found in these solutions: the layering of functionalities, the definition of platform services, for example, communication and access control capabilities, and the abstraction or overlaying of functionalities. ...

T-RECS: A software testbed for multi-agent real-time control of electric grids
  • Citing Conference Paper
  • September 2017

... Obviously, an important requirement for a replication scheme is to ensure output consistency [8], meaning that all controller replicas send the same sequence of output messages. In this paper, we use a relaxed output consistency condition like in [9], where we allow replicas to occasionally produce no output. This is motivated by the fact that intermittent message loss in NCS-also from controller to actuator-is well-studied [10]- [12]. ...

Quarts: Quick agreement for real-time control systems
  • Citing Conference Paper
  • September 2017

... Wireless Networks availability and reliability of data by deploying them on various nodes to minimize access failures and data loss [53]. Because replicas are stored on various nodes, other data remains available if one fails, and the service is not disrupted [39]. ...

Axo: Detection and Recovery for Delay and Crash Faults in Real-Time Control Systems

IEEE Transactions on Industrial Informatics

... One-way propagation delay between any two end-points is bounded by δ n -everything beyond is considered to be a delay fault or a loss. We assume that each controller is susceptible to crash and delay faults [19] -it can stop functioning or be intermittently slow in issuing updates. Thus, we have a crash-recover fault-model [20]. ...

Axo: Masking delay faults in real-time control systems
  • Citing Conference Paper
  • October 2016

... Behavior, in terms of the fraction of frames delivered to destination correctly and timely, was experimentally shown to improve substantially. Similar approaches were proposed for streaming phasor measurements in smart grids [16]. ...

Experimental validation of the usability of Wi-Fi over redundant paths for streaming phasor data
  • Citing Conference Paper
  • November 2016

... In order to ensure the randomness of the generated network topology, the random_ regular_graph network model [8] is chosen in the experiment. For the range of network topology nodes, According to the status quo of industrial Internet based on SDN, the range of [10,60] is selected, and the transmission delay [9] of [1 ms, 20 ms] is randomly allocated for each link, so that an alternative path group satisfying the transmission delay /\ (100 ms) can be better selected in the calculation stage, and the main path in the selected alternative path group includes at least one cache node except the near sender and near receiver switches. The performance of the experimental algorithm is evaluated on a Lenovo server (Intel Xeon E7-4820, 32G, and Ubuntu 16.04). ...

iPRP: Parallel Redundancy Protocol for IP Networks