Kishor S Trivedi

Kishor S Trivedi
Duke University | DU · Department of Electrical and Computer Engineering (ECE)

PhD, UIUC; MS, UIUC; B. Tech (EE), IIT-Mumbai,

About

868
Publications
190,321
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
38,686
Citations
Introduction
Current research: 1) Software failures and their mitigation focusing on environment-dependent bugs 2) Parametric uncertainty propagation thru stochastic models 3) I conduct online weekly zoom session on Upanishads 4) My h-index is 108. 5) My Academic tree: https://www.genealogy.math.ndsu.nodak.edu/id.php?id=76227 6) Latest News Item: https://pratt.duke.edu/about/news/trivedi-ieee--lifetime-award
Additional affiliations
April 2018 - August 2021
Hiroshima University
Position
  • Professor
July 2014 - August 2014
Hiroshima University
Position
  • Professor
Description
  • I was a Fellow of the Japan Society for the Promotion of Science (JSPS); gave several keynotes at symposia in Tokyo, Sapporo and Hiroshima , gave many lectures at Japanese institutions and interacted with the research group led by Prof. Tadashi Dohi
May 2014 - June 2014
Norwegian University of Science and Technology
Position
  • Professor
Description
  • I was selected as an NordSecMobile Scholar for 2014
Education
August 1970 - July 1974
University of Illinois, Urbana-Champaign
Field of study
  • Computer Science
July 1963 - June 1968
Indian Institute of Technology Bombay
Field of study
  • Electrical Engineering

Publications

Publications (868)
Conference Paper
Smart grids are fostering a paradigm shift in the realm of power distribution systems. Whereas traditionally different components of the power distribution system have been provided and analyzed by different teams through different lenses, smart grids require a unified and holistic approach that takes into consideration the interplay of communicati...
Conference Paper
Full-text available
With software systems becoming increasingly large and complex, many difficulties in coping with software bugs arise for developers. Despite good development practices, thorough testing, and proper maintenance policies, a non-negligible number of bugs remain in the released software. Understanding the type of residual bugs is fundamental for adoptin...
Article
Full-text available
The growing complexity of mission-critical space mission software makes it prone to suffer failures during operations. The success of space missions depends on the ability of the systems to deal with software failures, or to avoid them in the first place. In order to develop more effective mitigation techniques, it is necessary to understand the na...
Conference Paper
Full-text available
As space mission software becomes more complex, the ability to effectively deal with faults is increasingly important. The strategies that can be employed for fighting a software bug depend on its fault type. Bohrbugs are easily isolated and removed during software testing. Mandelbugs appear to behave chaotically. While it is more difficult to dete...
Article
Even if software developers don't fully understand the faults or know their location in the code, software rejuvenation can help avoid failures in the presence of aging-related bugs. This is good news because reproducing and isolating an aging-related bug can be quite involved, similar to other Mandelbugs. Moreover, monitoring for signs of software...
Article
Full-text available
Network function virtualization (NFV) has been explored to be integrated with multi-access edge computing (MEC) to facilitate the development of 5G (fifth-generation) network. Latency-sensitive applications can be deployed as serial-parallel hybrid service function chains (SP-SFCs) in the MEC-NFV environment. SP-SFCs are deployed on resource-limite...
Preprint
Full-text available
A fundamental aspect of software reliability engineering is to understand how software failures manifest, identifying and comprehending their causes and effects. In this paper, we perform ex-post analyses of field software failure data, looking for characterizing their causes. The failures analyzed were collected from hundreds of computer systems l...
Article
Understanding and predicting types of bugs are of practical importance for developers to improve the testing efficiency and take appropriate steps to address bugs in software releases. However, due to the complex conditions under which faults manifest and the complexity of the classification rules, the automatic classification of Mandelbugs is a di...
Preprint
Full-text available
Ish Upanishad, Part 1
Preprint
Full-text available
Summary of the first eight mantras of Ish Upanishad
Preprint
Full-text available
Ish Upanishad, Part 2
Preprint
Full-text available
This part 3 of chapter 1 of this upanishad. There will 3 more such parts.
Article
Vehicle platooning can be applied to cooperative downloading and uploading (CDU) services through the cooperation between lead vehicle and non-lead vehicles. CDU service can be completed cooperatively by containers constructed in vehicles of vehicle platooning system. Containers in vehicles may suffer from potential attacks which can lead to resour...
Article
The safety-critical applications of vehicular ad hoc networks (VANETs) require high reliability and low transmission latency. IEEE 802.11p and IEEE 802.11bd are two standards proposed for such vehicular communication systems. In this paper, we propose an effective SINR-based model to conduct the QoS analysis of IEEE 802.11p/bd driven VANETs for saf...
Preprint
Full-text available
Kathopanishad Chapter 1 Valli 2
Presentation
Full-text available
Tutorial at the 26th Asia and South Pacific Design Automation Conference (ASP-DAC 2021)
Article
As multi-hop wireless networks are attracting more attention, the need to evaluate their performance becomes essential. In order to evaluate the performance metrics of multi-hop wireless networks, including sending and receiving rates of a node as well as the collision probability, a model based on Stochastic Reward Nets (SRNs) is proposed. The pro...
Preprint
Full-text available
Empirical studies have shown robust evidence of OS failure patterns characterized by multiple combinations of failure events composed of the same or different failure types. In this paper, we present a statistical approach to predict OS failures based on multiple failures association. Once we identify systematic failure associations in field data,...
Article
Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before release. To guarantee the availability of systems suffering from Mandelbugs, environmental-div...
Preprint
Full-text available
I have put together Kath Upanishad with commentary. This is Chapter 1, section 1. The material is collected from many sources and uniformized. Furthermore, my ideas are added. This was prepared for a weekly zoom discussion session on the topic. Comments/criticism most welcome.
Article
The recent trend of network softwarization suggests a radical shift in the implementation of traditional network intelligence. In Software Defined Networking (SDN), for instance, the control plane functions of forwarding devices are outsourced to the controller. Softwarized network components are expected to provide uninterrupted service during lon...
Article
Intrusion tolerance is an ability to keep the correct service by masking the intrusion based on fault-tolerant techniques. With the rapid development of virtualization, the virtual machine (VM)-based intrusion tolerance scheme has been developed according to the concept of state machine replication with Byzantine fault tolerant technique. In this a...
Article
With the rapid and wide development and deployment of system virtualization, service availability analysis has become increasingly important in a virtualized system (VS) which suffers from software aging. Software rejuvenation techniques can be applied to improve service availability but its effectiveness depends on the rejuvenation policy, which d...
Preprint
Full-text available
Any comments/corrections/suggestions are welcome.
Preprint
Full-text available
A brief intro to Hinduism. Any comments/corrections/suggestions are most welcome.
Chapter
Software is crucial in the provision of communication services. Most functions related to control, management and operation are realized in software. With the ongoing virtualization and shift of network functions to new software platforms, the role and criticality of software for ordinary operations as well as handling of disasters increase signifi...
Article
Migration-based Dynamic Platform (MDP) technique, a type of Moving Target Defense (MTD) techniques, defends against sophisticated cyber-attacks by randomly and dynamically selecting a platform for executing service/job. Security defense mechanisms protect service/job usually at the cost of degrading its performance. Therefore, it is valuable to mak...
Article
Full-text available
In Software Defined Networking (SDN), network programmability is enabled through a logically centralized control plane. Production networks deploy multiple controllers for scalability and reliability reasons, which in turn rely on distributed consensus protocols to operate in a logically centralized manner. However, bugs in distributed control plan...
Experiment Findings
Full-text available
The book "Reliability and Availability Engineering: Modeling, Analysis, and Applications" by Kishor S. Trivedi and Andrea Bobbio (1st edition), Cambridge University Press, 2017, covers the analytical and modeling techniques currently in use for evaluating the reliability/availability of engineered systems. The book was recommended to me when I was...
Presentation
Full-text available
A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this paper, we present an exploratory study on multiple-event failures, looking for systematic patterns of sequences of failures in logs of a commodity op...
Preprint
Full-text available
A fundamental need for software reliability engineering is to comprehend how software systems fail, which means understanding the dynamics that govern different types of failure manifestation. In this paper, we present an exploratory study on multiple-event failures, looking for systematic patterns of sequences of failures in logs of a commodity op...
Conference Paper
High reliability and availability are requirements for most technical systems including computer and communication systems. Reliability and availability assurance methods based on probabilistic models is the topic addressed in this talk. Non-state-space solution methods are often used to solve models based on reliability block diagrams, fault trees...
Conference Paper
Full-text available
Software aging, which is caused by Aging-Related Bugs (ARBs), tends to occur in long-running systems and may lead to performance degradation and increasing failure rate during software execution. ARB prediction can help developers discover and remove ARBs, thus alleviating the impact of software aging. However, ARB-prone files occupy a small percen...
Article
The extent of epistemic uncertainty in modeling and analysis of complex systems is ever growing, mainly due to increasing levels of the openness, heterogeneity and versatility in cloud-based applications that are being adopted in critical sectors, like banking and finance. State-of-the-art approaches for model-based performance assessment do not em...
Article
This paper presents an empirical study of 5741 bug reports for the Linux kernel from an evolutionary perspective, with the aim of obtaining a deep understanding of bug characteristics in the Linux operating system. Bug classification is performed based on the fault triggering conditions, followed by an analysis of the proportions and evolution of t...
Article
In some applicable scenarios such as community patrolling, mobile nodes are restricted to move only in their own communities. Exploiting the meetings of the nodes within the same community and the nodes within the neighboring communities, a Delay Tolerant Network (DTN) can provide communication between any two nodes. In this paper, two analytical m...
Article
Software failures caused by data race bugs have always been major concerns in parallel and distributed systems, despite significant efforts spent in software testing. Due to their nondeterministic and hard-to-reproduce features, when evaluating systems’ operational reliability, a rather long period of experimental execution time is expected to be s...
Chapter
The recovery and repair durations of large fault-tolerant systems generally span several orders of magnitude. The distributions also violate the common modeling assumption of an exponential distribution for the recovery and repair time. A reward-based semi-Markov model is presented that can be used to predict the steady-state availability of such s...
Chapter
Modern systems implement multiple and complex operations to manage the user demand, thereby ensuring adequate quality levels. They are usually made of a collection of interconnected (autonomous) subsystems, with a common goal to be pursued, that are perceived as a whole, single, integrated facility.
Article
The Android operating system (OS) is a sophisticated man-made system and is the dominant OS in the current smartphone market. Due to the accumulation of errors in the system internal state and the incremental consumption of resources, such as the Dalvik heap memory of software applications and the physical memory, software aging is observed frequen...
Article
Malicious lateral movement-based attacks have become a potential risk for many systems, bringing highly likely threats to critical infrastructures and national security. When launching this kind of attacks, adversaries first compromise a fraction of the targeted system and then move laterally to the rest of the system until the whole system is infe...
Article
In this paper, the performance of a grid resource is modeled and evaluated using stochastic reward nets (SRNs), wherein the failure–repair behavior of its processors is taken into account. The proposed SRN is used to compute the blocking probability and service time of a resource for two different types of tasks: grid and local tasks. After modelin...
Article
In long running systems, software tends to encounter performance degradation and increasing failure rate during execution. This phenomenon has been named software aging, which is caused by aging-related bugs (ARBs). Testing resource allocation can be optimized by identifying ARB-prone modules with ARB prediction. However, due to the low presence an...
Article
Due to the increasing need for computational power, the market has shifted towards big centralized data centers. Understanding the nature of the dynamics of these data centers from machine and job/task perspective is critical to design efficient data center management policies like optimal resource/power utilization, capacity planning and optimal (...
Article
Software aging often affects the performance of software systems and may eventually cause them to fail. A complementary approach to handle transient software failures due to the software aging is called software rejuvenation. It is a preventive and proactive solution that is particularly useful for counteracting the phenomenon of software aging. In...
Conference Paper
Full-text available
Outpatient centers comprised of many concurrent clinics increasingly see higher patient volumes. In these centers, decisions to improve clinic flow must account for the high degree of interdependence when critical personnel or equipment is shared between clinics. Discrete event simulation models have provided clinical decision support, but rarely a...
Article
Full-text available
In Software Defined Networking (SDN) critical control plane functions are offloaded to a software entity known as the SDN controller. Today’s SDN controllers are complex software systems, owing to heterogeneity of networks and forwarding devices they support, and are inherently prone to bugs. Our previous work showed that Software Reliability Growt...
Article
Data-centers have recently experienced a fast growth in energy demand, mainly due to cloud computing, a paradigm that lets the users access shared computing resources (e.g., servers, storage, etc.). Several techniques have been proposed in order to alleviate this problem, and numerous power models have been adopted to predict the servers' power con...
Article
The increasing shift of various critical services towards Infrastructure-as-a-Service (IaaS) cloud data centers (CDCs) creates a need for analyzing CDCs’ availability, which is affected by various factors including repair policy and system parameters. This paper aims to apply analytical modeling and sensitivity analysis techniques to investigate th...
Presentation
Full-text available
Details and video of my talk at Alibaba on Dec. 8, 2107: https://102.alibaba.com/detail/?id=23
Conference Paper
Full-text available
As enterprises continue to move their workloads from traditional server-room environments to private cloud-based systems, there is an increasing desire and ability for companies like IBM to centrally monitor the systems on behalf of their customers to proactively help to mitigate any potential failure scenarios. In this paper, we investigate failur...
Article
Infrastructure as a Service (IaaS) is one of the most significant and fastest growing fields in cloud computing. To efficiently use the resources of an IaaS cloud, several important factors such as performance, availability, and power consumption need to be considered and evaluated carefully. Evaluation of these metrics is essential for cost-benefi...
Conference Paper
Linux operating system is a complex system that is prone to suffer failures during usage, and increases difficulties of fixing bugs. Different testing strategies and fault mitigation methods can be developed and applied based on different types of bugs, which leads to the necessity to have a deep understanding of the nature of bugs in Linux. In thi...
Article
Full-text available
The Internet world is moving toward a scenario where users and applications have very diverse service expectation, making the current best-effort model inadequate and limiting. To be able to design high-availability service systems, it is essential to consider not only the actual failure and recovery behavior of the service infrastructure, but also...
Article
Transient performance analysis of power distribution network (PDN) after a failure occurrence could facilitate the better design of smart grid. Researchers have proposed analytical models and the numerical solutions to analyze the PDN's transient behaviors by applying homogeneous continuous-time Markov chain (CTMC). However, the PDN system may be t...
Conference Paper
Full-text available
While Blockchain network brings tremendous benefits, there are concerns whether their performance would match up with the mainstream IT systems. This paper aims to investigate whether the consensus process using Practical Byzantine Fault Tolerance (PBFT) could be a performance bottleneck for networks with a large number of peers. We model the PBFT...