Andriy V. MiranskyyRyerson University · Department of Computer Science
Andriy V. Miranskyy
About
116
Publications
23,271
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
958
Citations
Introduction
My main research interest lies in the area of quantifying and mitigating risks (in the broadest sense) in Software Engineering, focusing on large-scale software systems, especially those that process and analyze Big Data.
In our work, we leverage a plethora of “tools” ranging from data mining, machine learning (including deep learning), simulation, and information theory, to blockchain, High Performance Computing, and Cloud computing.
Preprints of my publications can be found at http://www.scs.ryerson.ca/~avm/#_selected_publications
Publications
Publications (116)
When working with large logs, practitioners often face issues such as scarce storage, unscalable analysis tools, innacurate capture and replay of logs, and inadequate privacy. Researchers have devised some practical solutions, but important challenges remain.
Online and offline analytics have been traditionally treated separately in software architecture design, and there is no existing general architecture that can support both. Our objective is to go beyond and introduce a scalable and maintainable architecture for performing online as well as offline analysis of streaming data. In this paper, we prop...
A quantum computer (QC) can solve many computational problems more efficiently than a classic one. The field of QCs is growing: companies (such as DWave, IBM, Google, and Microsoft) are building QC offerings. We position that software engineers should look into defining a set of software engineering practices that apply to QC’s software. To start t...
As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Clou...
Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quant...
Cloud computing is essential for modern enterprises, requiring robust tools to monitor and manage Large-Scale Cloud Systems (LCS). Traditional monitoring tools often miss critical insights due to the complexity and volume of LCS telemetry data. This paper presents CloudHeatMap, a novel heatmap-based visualization tool for near-real-time monitoring...
A flaky test yields inconsistent results upon repetition, posing a significant challenge to software developers. An extensive study of their presence and characteristics has been done in classical computer software but not quantum computer software. In this paper, we outline challenges and potential solutions for the automated detection of flaky te...
The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudo...
Quantum computers are gaining importance in various applications like quantum machine learning and quantum signal processing. These applications face significant challenges in loading classical datasets into quantum memory. With numerous algorithms available and multiple quality attributes to consider, comparing data loading methods is complex. Our...
The Fourth International Workshop on Quantum Software Engineering (Q-SE 2023), co-located with the 45th International Conference on Software Engineering (ICSE 2023), was held on May 14, 2023 in a hybrid manner in Melbourne, Australia, and online. This report presents the workshop structure, the keynote speech, and the themes of the presented papers...
The evolution of Cloud computing led to a novel breed of applications known as Cloud-Native Applications (CNAs). However, observing and monitoring these applications can be challenging, especially if a CNA is bound by compliance requirements. To address this challenge, we explore the characteristics of CNAs and how they affect CNAs' observability a...
In recent years, software engineers have explored ways to assist quantum software programmers. Our goal in this paper is to continue this exploration and see if quantum software programmers deal with some problems plaguing classical programs. Specifically, we examine whether intermittently failing tests, i.e., flaky tests, affect quantum software d...
There has been a massive explosion of data generated by customers and retained by companies in the last decade. However, there is a significant mismatch between the increasing volume of data and the lack of automation methods and tools. The lack of best practices in data science programming may lead to software quality degradation, release schedule...
Software under test can be analyzed dynamically, while it is being executed, to find defects. However, as the number and possible values of input parameters increase, the cost of dynamic testing rises. This paper examines whether quantum computers (QCs) can help speed up the dynamic testing of programs written for classical computers (CCs). To acco...
Quantum computers (QCs) are maturing. When QCs are powerful enough, they will be able to handle problems in chemistry, physics, and finance. However, the applicability of quantum algorithms to speed up Software Engineering (SE) tasks has not been explored. We examine eight groups of algorithms that may accelerate SE tasks across the different phase...
Machine learning (ML) classification tasks can be carried out on a quantum computer (QC) using Probabilistic Quantum Memory (PQM) and its extension, Parameteric PQM (P-PQM) by calculating the Hamming distance between an input pattern and a database of r patterns containing z features with a distinct attributes.
For accurate computations, the feat...
Machine learning (ML) classification tasks can be carried out on a quantum computer (QC) using probabilistic quantum memory (PQM) and its extension, parametric PQM (P-PQM), by calculating the Hamming distance between an input pattern and a database of
$r$
patterns containing
$z$
features with
$a$
distinct attributes. For PQM and P-PQM to corr...
Service Level Agreements (SLA) are employed to ensure the performance of Cloud solutions. When a component fails, the importance of logs increases significantly. All departments may turn to logs to determine the cause of the issue and find the party at fault. The party at fault may be motivated to tamper with the logs to hide their role. We argue t...
In the era of quantum computing, Shor's algorithm running on quantum computers (QCs) can break asymmetric encryption algorithms that classical computers essentially cannot. QCs, with the help of Grover's algorithm, can also speed up the breaking of symmetric encryption algorithms. Though the exact date when QCs will become "dangerous" for practical...
The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we...
The Hamming distance is ubiquitous in computing. Its computation gets expensive when one needs to compare a string against many strings. Quantum computers (QCs) may speed up the comparison. In this paper, we extend an existing algorithm for computing the Hamming distance. The extension can compare strings with symbols drawn from an arbitrary-long a...
Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. However, this rise in popularity challenges Cloud service providers, as they need to monitor the quality of their ever-growing offerings effectively. To address the challenge, we designed and implemented an automated monitoring system for the IBM Cloud P...
Quantum computers are becoming more mainstream. As more programmers are starting to look at writing quantum programs, they need to test and debug their code. In this paper, we discuss various use-cases for quantum computers, either standalone or as part of a System of Systems. Based on these use-cases, we discuss some testing and debugging tactics...
Software log analysis helps to maintain the health of software solutions and ensure compliance and security. Existing software systems consist of heterogeneous components emitting logs in various formats. A typical solution is to unify the logs using manually built parsers, which is laborious. Instead, we explore the possibility of automating the p...
The International Workshop on Realizing Arti cial Intelligence Synergies in Software Engineering (RAISE) aims to present the state of the art in the crossover between Software Engineering and Arti cial Intelligence. This workshop explored not only the appli- cation of AI techniques to SE problems but also the application of SE techniques to AI prob...
Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. However, this rise in popularity challenges Cloud service providers, as they need to monitor the quality of their ever-growing offerings effectively. To address the challenge, we designed and implemented an automated monitoring system for the IBM Cloud P...
Cloud platforms, under the hood, consist of a complex inter-connected stack of hardware and software components. Each of these components can fail which may lead to an outage. Our goal is to improve the quality of Cloud services through early detection of such failures by analyzing resource utilization metrics. We tested Gated-Recurrent-Unit-based...
During the normal operation of a Cloud solution, no one pays attention to the logs except the system reliability engineers, who may periodically check them to ensure that the Cloud platform's performance conforms to the Service Level Agreements (SLA). However, the moment a component fails, or a customer complains about a breach of SLA, the importan...
Cloud platforms, under the hood, consist of a complex inter-connected stack of hardware and software components. Each of these components can fail which may lead to an outage. Our goal is to improve the quality of Cloud services through early detection of such failures by analyzing resource utilization metrics. We tested Gated-Recurrent-Unit-based...
Quantum Computers (QCs), once they mature, will be able to solve some problems faster than Classic Computers. This phenomenon is called “quantum advantage” (which is often used interchangeably with a stronger term “quantum supremacy”). Quantum advantage will help us to speed up computations in many areas, from artificial intelligence to medicine. H...
Quantum computers are becoming more mainstream. As more programmers are starting to look at writing quantum programs, they face an inevitable task of debugging their code. How should the programs for quantum computers be debugged? In this paper, we discuss existing debugging tactics, used in developing programs for classic computers, and show which...
Background: Developers spend a significant amount of time and effort to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and...
Logs contain critical information about the quality of the rendered services on the Cloud and can be used as digital evidence. Hence, we argue that the critical nature of logs calls for immutability and verification mechanism without the presence of a single trusted party. In this paper, we propose a blockchain-based log system, called Logchain, wh...
Quantum Computers (QCs), once they mature, will be able to solve some problems faster than Classic Computers. This phenomenon is called "quantum advantage" (or a stronger term "quantum supremacy"). Quantum advantage will help us to speed up computations in many areas, from artificial intelligence to medicine. However, QC power can also be leveraged...
Background: Developers spend a significant amount of time and efforts to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, an...
The stability and performance of Cloud platforms are essential as they directly impact customers' satisfaction. Cloud service providers use Cloud monitoring tools to ensure that rendered services match the quality of service requirements indicated in established contracts such as service-level agreements. Given the enormous number of resources that...
In data mining, the data in various business cases (e.g., sales, marketing, and demography) gets refreshed periodically. During the refresh, the old dataset is replaced by a new one. Confirming the quality of the new dataset can be challenging because changes are inevitable. How do analysts distinguish reasonable real-world changes vs. errors relat...
A quantum computer (QC) can solve many computational problems more efficiently than a classic one. The field of QCs is growing: companies (such as DWave, IBM, Google, and Microsoft) are building QC offerings. We position that software engineers should look into defining a set of software engineering practices that apply to QC's software. To start t...
In large crowd events, there is always a potential possibility that a stampede accident will occur. The accident may cause injuries or even death. Approaches for controlling crowd flows and predicting dangerous congestion spots would be a boon to on-site authorities to manage the crowd and to prevent fatal accidents. One of the most popular approac...
Cloud services are becoming increasingly popular: 60\% of information technology spending in 2016 was Cloud-based, and the size of the public Cloud service market will reach \$236B by 2020. To ensure reliable operation of the Cloud services, one must monitor their health. While a number of research challenges in the area of Cloud monitoring have be...
A significant amount of time is spent by software developers in investigating bug reports. It is useful to indicate when a bug report will be closed, since it would help software teams to prioritise their work. Several studies have been conducted to address this problem in the past decade. Most of these studies have used the frequency of occurrence...
During the normal operation of a Cloud solution, no one usually pays attention to the logs except technical department, which may periodically check them to ensure that the performance of the platform conforms to the Service Level Agreements. However, the moment the status of a component changes from acceptable to unacceptable, or a customer compla...
Although software does not consume energy by itself, its characteristics determine which hardware resources are made available and how much energy is used. Therefore, energy efficiency of software products has become a popular agenda for both industry and academia in recent years. Designing such software is now a core initiative of software develop...
While several attempts have been made to construct a scalable and flexible architecture for analysis of streaming data, no general model to tackle this task exists. Thus, our goal is to build a scalable and maintainable architecture for performing analytics on streaming data. To reach this goal, we introduce a 7-layered architecture consisting of m...
Context: Enterprise Architecture (EA) is a discipline which has
evolved to structure the business and its alignment with the IT
systems. Most of the enterprise architecture frameworks, especially
Zachman framework, focus on describing the enterprise from six
viewpoint perspectives of the stakeholders. These six perspectives
are based on English lan...
Cloud services are becoming increasingly popular: 60% of information technology spending in 2016 was Cloud-based, and the size of the public Cloud service market will reach $236B by 2020. To ensure reliable operation of the Cloud services, one must monitor their health. While a number of research challenges in the area of Cloud monitoring have been...
A significant amount of time is spent by software developers in investigating bug reports. It is useful to indicate when a bug report will be closed, since it would help software teams to prioritise their work. Several studies have been conducted to address this problem in the past decade. Most of these studies have used the frequency of occurrence...
The same defect can be rediscovered by multiple clients, causing unplanned outages and leading to reduced customer satisfaction. In the case of popular open source software, high volume of defects is reported on a regular basis. A large number of these reports are actually duplicates / rediscoveries of each other. Researchers have analyzed the fact...
To improve software quality, one needs to build test scenarios resembling the usage of a software product in the field. This task is rendered challenging when a product's customer base is large and diverse. In this scenario, existing profiling approaches, such as operational profiling, are difficult to apply. In this work, we consider publicly avai...
Context: Information Technology consumes up to 10\% of the world's electricity generation, contributing to CO2 emissions and high energy costs. Data centers, particularly databases, use up to 23% of this energy. Therefore, building an energy-efficient (green) database engine could reduce energy consumption and CO2 emissions. Goal: To understand the...
Building a robust classifier to estimate bug fix effort for IBM DB2
Building a robust classifier to estimate bug fix effort for IBM DB2; Reducing Defect Rediscoveries;
Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and...
Smart meter data analysis provides key insights about energy demand and usage patterns for efficient operation of power generation and distribution companies. The increase in modern communication bandwidth enables smart meters to transmit the data to a corresponding utility company at hourly update rates or faster. Analysing such large amount of da...
Logical dependencies refer to hidden interconnections among source files that are changed together in order to address an issue or change in the system. In this study we propose six metrics for logical dependency using heuristics on change time. We evaluated our proposed metrics using data from three different software companies. We also built defe...
Context: Enterprise Architecture (EA) is a discipline which has evolved to
structure the business and its alignment with the IT systems. One of the
popular enterprise architecture frameworks is Zachman framework (ZF). This
framework focuses on describing the enterprise from six viewpoint perspectives
of the stakeholders. These six perspectives are...
Requirements elicitation and requirements analysis are important practices of Requirements Engineering. Elicitation techniques, such as interviews and questionnaires, rely on formulating interrogative questions and asking these in a proper order to maximize the accuracy of the information being gathered. Information gathered during requirements eli...
Requirements elicitation and requirements analysis are important practices of Requirements Engineering. Elicitation techniques, such as interviews and questionnaires, rely on formulating interrogative questions and asking these in a proper order to maximize the accuracy of the information being gathered. Information gathered during requirements eli...
Defect prediction models presented in the literature lack generalization unless the original study can be replicated using new datasets and in different organizational settings. Practitioners can also benefit from replicating studies in their own environment by gaining insights and comparing their findings with those reported. In this work, we repl...
In the rapidly growing field of Big Data, we note that a disproportionately larger amount of effort is being invested in infrastructure development and data analytics in comparison to applications software development -- approximately a 80:20 ratio. This prompted us to create a context model of Big Data Software Engineering (BDSE) containing variou...
The bug tracking repositories of software projects capture initial defect (bug) reports and the history of interactions among developers, testers, and customers. Extracting and mining information from these repositories is time consuming and daunting. Researchers have focused mostly on analyzing the frequency of the occurrence of defects and their...
Context: Number of defects fixed in a given month is used as an input for several project management decisions such as release time, maintenance effort estimation and software quality assessment. Past activity of developers and testers may help us understand the future number of reported defects. Goal: To find a simple and easy to implement solutio...
Forming teams from a large groups of developers and testers pose an important problem for software project management. There are only a few empirical studies on the topic of team evolution and the factors affecting it in software projects. In this paper, we analyzed the evolution of globally distributed testing and coding teams developing large ent...
Defect prediction is a well-established research area in software engineering
. Prediction models in the literature do not predict defect-prone modules in different test phases. We investigate the relationships between defects and test phases in order to build defect prediction models for different test phases. We mined the version history of a lar...
To learn more about how software developers can integrate green software practices, the guest editors spoke with Steve Raspudic, manager and deployment and provisioning architect at IBM Toronto's Software Lab.
Most studies and regulatory controls focus on hardware-related measurement, analysis, and control for energy consumption. However, all forms of hardware include significant software components. Although software systems don't consume energy directly, they affect hardware utilization, leading to indirect energy consumption. Therefore, it's important...