Heng Li

Heng Li
Polytechnique Montréal · Department of Computer and Software Engineering

PhD

About

37
Publications
8,243
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
421
Citations
Introduction
Heng Li leads the Maintenance, Operations and Observation of Software with intelligencE (MOOSE) lab at Polytechnique Montréal. His research interests lie within Software Engineering and Computer Systems, with special interests in mining software operational data, software log mining, software performance engineering, mining software repositories, and observation and operations of AI software.

Publications

Publications (37)
Preprint
The popularity of automated machine learning (AutoML) tools in different domains has increased over the past few years. Machine learning (ML) practitioners use AutoML tools to automate and optimize the process of feature engineering, model training, and hyperparameter optimization and so on. Recent work performed qualitative studies on practitioner...
Article
Full-text available
Logging is widely used in modern software development to record run-time information for software systems and plays a significant role in software testing. Although the research area of logging has attracted much attention, little attention is paid to the practice of test logging (i.e., the logging involved in test files). To fill this knowledge ga...
Article
Code embeddings have seen increasing applications in software engineering (SE) research and practice recently. Despite the advances in embedding techniques applied in SE research, one of the main challenges is their generalizability. A recent study finds that code embeddings may not be readily leveraged for the downstream tasks that the embeddings...
Preprint
Full-text available
Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning (ML) models with a few lines of code. Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects. In this s...
Preprint
Full-text available
With the advance in quantum computing, quantum software becomes critical for exploring the full potential of quantum computing systems. Recently, quantum software engineering (QSE) becomes an emerging area attracting more and more attention. However, it is not clear what are the challenges and opportunities of quantum computing facing the software...
Article
Full-text available
Word representation plays a key role in natural language processing (NLP). Various representation methods have been developed, among which pre-trained word embeddings (i.e., dense vectors that represent words) have shown to be highly effective in many neural network-based NLP applications, such as named entity recognition (NER) and part-of-speech (...
Preprint
With the advance in quantum computing in recent years, quantum software becomes vital for exploring the full potential of quantum computing systems. Quantum programming is different from classical programming, for example, the state of a quantum program is probabilistic in nature, and a quantum computer is error-prone due to the instability of quan...
Preprint
Full-text available
The competing nature of the app market motivates us to shift our focus on apps that provide similar functionalities and directly compete with each other (i.e., peer apps). In this work, we study the ratings and the review text of 100 Android apps across 10 peer app groups. We highlight the importance of performing peer-app analysis by showing that...
Article
Deep neural network (DNN) models typically have many hyperparameters that can be configured to achieve optimal performance on a particular dataset. Practitioners usually tune the hyperparameters of their DNN models by training a number of trial models with different configurations of the hyperparameters, to find the optimal hyperparameter configura...
Article
Software developers usually rely on in-house performance testing to detect performance regressions and locate their root causes. Such performance testing is typically resource and time-consuming, making it impractical to conduct when the software is delivered in fast-paced release cycles. On the other hand, the operational data generated in the eld...
Article
Logging the stack traces of runtime exceptions assists developers in diagnosing runtime failures. However, unnecessary logging of exception stack traces can have many negative impacts such as polluting log files. Unfortunately, there exist no guidelines for the logging of exception stack traces and developers usually practice it in an ad hoc manner...
Article
Application Programming Interfaces (APIs) allow their users to reuse existing software functionality without implementing it by themselves. However, using external functionality can come at a cost. Because developers are decoupled from the API's inner workings, they face the possibility of misunderstanding, and therefore misusing APIs. Prior resear...
Article
Logging is an integral part of software development. Software practitioners often face issues in software logging, and they post these issues on Q&A websites to take suggestions from the experts. In this study, we perform a three-level empirical analysis of logging questions posted on six popular technical Q&A websites, namely, Stack Overflow (SO),...
Article
Logs contain valuable information about the runtime behaviors of software systems. Thus, practitioners rely on logs for various tasks such as debugging, system comprehension, and anomaly detection. However, due to the unstructured nature and large size of logs, there are several challenges that practitioners face with log analysis. In this paper, w...
Article
Full-text available
Stack Overflow hosts millions of solutions that aim to solve developers' programming issues. In this crowdsourced question answering process, Stack Overflow becomes a code hosting website where developers actively share its code. However, code snippets on Stack Overflow may contain security vulnerabilities, and if shared carelessly, such snippets c...
Article
AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept...
Article
Full-text available
Large-scale software systems and cloud services continue to produce a large amount of log data. Such log data is usually preserved for a long time (e.g., for auditing purposes). General compressors, like the LZ77 compressor used in gzip, are usually used in practice to compress log data to reduce the cost of long-term storage. However, such general...
Article
Full-text available
Performance regressions of large-scale software systems often lead to both financial and reputational losses. In order to detect performance regressions, performance tests are typically conducted in an in-house (non-production) environment using test suites with predefined workloads. Then, performance analysis is performed to check whether a softwa...
Article
Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of autom...
Article
Full-text available
Many software services are nowadays hosted on cloud computing platforms, like Amazon EC2, due to many benefits like reduced operational costs. However, node failures in these platforms can impact the availability of their hosted services and potentially lead to large financial losses. Predicting node failures before they actually occur is crucial a...
Article
Full-text available
Software developers insert logging statements in their source code to collect important runtime information of software systems. In practice, logging appropriately is a challenge for developers. Prior studies aimed to improve logging by proactively inserting logging statements in certain code snippets or by learning where to log from existing loggi...
Preprint
Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of autom...
Chapter
Full-text available
Football (or association football) is a highly-collaborative team sport. Passing the ball to the right player is essential for winning a football game. Anticipating the receiver of a pass can help football players build better collaborations and help coaches make informed tactical decisions. In this work, we analyze a public dataset that contains 1...
Conference Paper
Full-text available
Web applications must be load tested to analyze their behavior under various load conditions. Typically, these load tests are automated using protocol-level HTTP requests (e.g., using JMETER). However, there are several disadvantages to using protocol-level requests for load tests. For example, protocol-level requests are only partially representat...
Article
Full-text available
Software developers insert logging statements in their source code to record important runtime information; such logged information is valuable for understanding system usage in production and debugging system failures. However, providing proper logging statements remains a manual and challenging task. Missing an important logging statement may inc...
Conference Paper
Full-text available
Football (or association football) is a highly-collaborative team sport. Passing the ball to the right player is essential for winning a football game. Anticipating the receiver of a pass can help football players build better collaborations and help coaches make informed tactical decisions. In this work, we analyze a public dataset that contains 1...
Conference Paper
Full-text available
In current DevOps practice, developers are responsible for the operation and maintenance of software systems. However, the human costs for the operation and maintenance grow fast along with the increasing functionality and complexity of software systems. Autonomic computing aims to reduce or eliminate such human intervention. However, there are man...
Article
Full-text available
Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level...
Article
Full-text available
Software developers typically insert logging statements in their source code to record runtime information. However, providing proper logging statements remains a challenging task. Prior approaches automatically enhance logging statements, as a post-implementation process. Such automatic approaches do not take into account developers’ domain knowle...

Network

Cited By

Projects

Projects (3)
Project
Modern large-scale software systems (e.g., AmazonWeb Service) are growing rapidly in size and complexity. In the meanwhile, operations of large-scale systems are generating more and more monitoring data, such as metrics, events, and alerts. It becomes increasingly challenging for practitioners to collect, manage, analyze, and leverage such big operational data. This project focuses on intelligent approaches that help practitioners overcome the challenges in the operations of large-scale systems, i.e., Intelligent Operations. Intelligent operations is an interdisciplinary research area that requires knowledge from software engineering, information security, cloud computing, and artificial intelligence.
Project
Understanding and improving software logging practices