Liming Zhu

Liming Zhu
The Commonwealth Scientific and Industrial Research Organisation | CSIRO · Data61

Doctor of Philosophy

About

359
Publications
225,401
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,556
Citations

Publications

Publications (359)
Preprint
Full-text available
Privacy regulations mandate that developers must provide authentic and comprehensive privacy notices, e.g., privacy policies or labels, to inform users of their apps' privacy practices. However, due to a lack of knowledge of privacy requirements, developers often struggle to create accurate privacy notices, especially for sophisticated mobile apps...
Preprint
Full-text available
Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective. Although ET has proven effective in system-level testing of interactive systems, the need for manual execution has hindered large-scale adoption. In this work, we explore the feasibi...
Preprint
AI Copilots are increasingly integrated into professional work environments with the promise to enhance productivity, efficiency, and effectiveness. This study aims to systematically explore and map existing research on AI Copilots, to assess their impact on productivity, identify benefits and drawbacks, and examine ethical implications. Through a...
Preprint
Full-text available
Australia's National Science Agency conducted a six-month trial of M365 Copilot starting in January 2024 as part of an Australian Government initiative. 300 licenses were distributed across CSIRO using a persona-based approach to ensure diversity of roles and attributes among participants. As a scientific research organisation with a unique operati...
Preprint
The advent of Large Language Models (LLMs) has enabled the development of LLM agents capable of autonomously achieving under-specified goals and continuously evolving through post-deployment improvement, sometimes without requiring code or model updates. Conventional approaches, such as pre-defined test cases and code/model redevelopment pipelines,...
Preprint
Full-text available
Utilising quantum computing technology to enhance artificial intelligence systems is expected to improve training and inference times, increase robustness against noise and adversarial attacks, and reduce the number of parameters without compromising accuracy. However, moving beyond proof-of-concept or simulations to develop practical applications...
Preprint
Full-text available
The emergence of foundation models (FMs) has enabled the development of highly capable and autonomous agents, unlocking new application opportunities across a wide range of domains. Evaluating the architecture of FM-based agents is particularly important as the architectural decisions significantly impact the quality attributes of agents given thei...
Preprint
Full-text available
The ever-improving quality of LLMs has fueled the growth of a diverse range of downstream tasks, leading to an increased demand for AI automation and a burgeoning interest in developing foundation model (FM)-based autonomous agents. As AI agent systems tackle more complex tasks and evolve, they involve a wider range of stakeholders, including agent...
Preprint
The rise of Large Language Models (LLMs) has streamlined frontend interface creation through tools like Vercel's V0, yet surfaced challenges in design quality (e.g., accessibility, and usability). Current solutions, often limited by their focus, generalisability, or data dependency, fall short in addressing these complexities. Moreover, none of the...
Article
Full-text available
The release of ChatGPT has drawn huge interests on foundations models. There is a broad consensus that foundations models will be the fundamental building blocks for future AI systems. However, there is a lack of systematic guidance on the architecture design. Particularly, the the rapidly growing capabilities of foundations models can eventually a...
Article
Compared to other programming languages (e.g., Java), Python has more idioms to make Python code concise and efficient. Although Pythonic idioms are well accepted in the Python community, Python programmers are often faced with many challenges in using them, for example, being unaware of certain Pythonic idioms or not knowing how to use them proper...
Article
Full-text available
Foundation models including large language models (LLMs) are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI...
Preprint
As Artificial Intelligence (AI) becomes integral to business operations, integrating Responsible AI (RAI) within Environmental, Social, and Governance (ESG) frameworks is essential for ethical and sustainable AI deployment. This study examines how leading companies align RAI with their ESG goals. Through interviews with 28 industry leaders, we iden...
Preprint
Full-text available
The rapid advancement of AI technology has led to widespread applications of agent systems across various domains. However, the need for detailed architecture design poses significant challenges in designing and operating these systems. This paper introduces a taxonomy focused on the architectures of foundation-model-based agents, addressing critic...
Preprint
Full-text available
The rapid advancement and widespread deployment of foundation model (FM) based systems have revolutionized numerous applications across various domains. However, the fast-growing capabilities and autonomy have also raised significant concerns about responsible AI and AI safety. Recently, there have been increasing attention toward implementing guar...
Preprint
The rapid growth of Artificial Intelligence (AI) has underscored the urgent need for responsible AI practices. Despite increasing interest, a comprehensive AI risk assessment toolkit remains lacking. This study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives. By integr...
Preprint
Full-text available
Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learnin...
Preprint
Full-text available
Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this w...
Preprint
Full-text available
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing...
Article
The increase of software supply chain threats has underscored the necessity for robust security mechanisms, among which the Software Bill of Materials (SBOM) stands out as a promising solution. SBOMs, by providing a machine-readable inventory of software composition details, play a crucial role in enhancing transparency and traceability within soft...
Article
Full-text available
The right to be forgotten (RTBF) allows individuals to request the removal of personal information from online platforms. Researchers have proposed machine unlearning algorithms as a solution for erasing specific data from trained models to support RTBF. However, these methods modify how data are fed into the model and how training is done, which m...
Article
The rapid advancement of Artificial Intelligence (AI), represented by ChatGPT, has raised concerns about responsible AI development and utilization. Existing frameworks lack a comprehensive synthesis of AI risk assessment questions. To address this, we introduce QB4AIRA, a novel question bank developed by refining questions from five globally recog...
Article
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuance...
Article
Full-text available
The recent release of ChatGPT has gained huge attention and discussion worldwide, with responsible AI being a key topic of discussion. How can we ensure that AI systems, including ChatGPT, are developed and adopted in a responsible way? To tackle the responsible AI challenges, various ethical principles have been released by governments, organisati...
Article
Full-text available
Existing permissionless sharded-Blockchains come on the scene. However, there is a lack of systematic formulations and experiments regarding the behaviors of individual miners. In this article, we interpret block mining in a permissionless sharded-Blockchain as a repeated $M$ -player noncooperative game with finite actions, and propose a new mult...
Conference Paper
Ascertaining counterfactual questions, for instance, “Would individuals with diabetes have exhibited better if they had opted for a different medication?”, is a frequent pursuit in research. Observational studies have become increasingly significant in addressing such queries due to their extensive availability and ease of acquisition relative to R...
Article
Vulnerable third-party libraries pose significant threats to software applications that reuse these libraries. At an industry scale of reuse, manual analysis of third-party library vulnerabilities can be easily overwhelmed by the sheer amount of vulnerabilities continually collected from diverse sources for thousands of reused libraries. Our study...
Article
Responsible AI is widely considered as one of the greatest scientific challenges of our time and is key to increase the adoption of AI. Recently, a number of AI ethics principles frameworks have been published. However, without further guidance on best practices, practitioners are left with nothing much beyond truisms. Also, significant efforts hav...
Preprint
The World Wide Web, a ubiquitous source of information, serves as a primary resource for countless individuals, amassing a vast amount of data from global internet users. However, this online data, when scraped, indexed, and utilized for activities like web crawling, search engine indexing, and, notably, AI model training, often diverges from the o...
Preprint
Foundation models are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI systems and to prevent misuse that cou...
Article
Full-text available
Federated learning (FL) is a machine learning approach that decentralizes data and its processing by allowing clients to train intermediate models on their devices with locally stored data. It aims to preserve privacy as only model updates are shared with a central server rather than raw data. In recent years, many reviews have evaluated FL from th...
Preprint
Full-text available
Quantum computing systems depend on the principles of quantum mechanics to perform multiple challenging tasks more efficiently than their classical counterparts. In classical software engineering, the software life cycle is used to document and structure the processes of design, implementation, and maintenance of software applications. It helps sta...
Preprint
Full-text available
Software Bill of Materials (SBOM) serves as a critical pillar in ensuring software supply chain security by providing a detailed inventory of the components and dependencies integral to software development. However, challenges abound in the sharing of SBOMs, including potential data tampering, hesitation among software vendors to disclose comprehe...
Preprint
Full-text available
p>Foundation models, such as GPT-4, DALL-E have brought unprecedented AI "operating system" effect and new forms of human-AI interaction, sparking a wave of innovation in AI-native services, where natural language prompts serve as executable "code" directly (prompt as executable code), eliminating the need for programming language as an intermediar...
Preprint
Full-text available
Foundation models, such as GPT-4, DALL-E have brought unprecedented AI "operating system" effect and new forms of human-AI interaction, sparking a wave of innovation in AI-native services, where natural language prompts serve as executable "code" directly (prompt as executable code), eliminating the need for programming language as an intermediary...
Preprint
Distributed trust is a nebulous concept that has evolved from different perspectives in recent years. While one can attribute its current prominence to blockchain and cryptocurrency, the distributed trust concept has been cultivating progress in federated learning, trustworthy and responsible AI in an ecosystem setting, data sharing, privacy issues...
Preprint
The rapid advancement of Artificial Intelligence (AI), exemplified by ChatGPT, has raised concerns about the responsible development and utilization of AI systems. To address these concerns, AI ethics principles have been established, emphasizing the need for risk assessment frameworks to ensure adherence to ethical and societal considerations. How...
Preprint
The recent release of large language model (LLM) based chatbots, such as ChatGPT, has attracted significant attention on foundations models. It is widely believed that foundation models will serve as the fundamental building blocks for future AI systems. As foundation models are in their early stages, the design of foundation model based systems ha...
Article
Although AI has significant potential to transform society, there are serious concerns about its ability to behave and make decisions responsibly. Many ethical regulations, principles, and guidelines for responsible AI have been issued recently. However, these principles are high-level and difficult to put into practice. In the meantime much effort...
Preprint
Software Bill of Materials (SBOM), offers improved transparency and supply chain security by providing a machine-readable inventory of software components used. With the rise in software supply chain attacks, the SBOM has attracted attention from both academia and industry. This paper presents a study on the practice of SBOM, based on the analysis...
Preprint
The release of ChatGPT, Bard, and other large language model (LLM)-based chatbots has drawn huge attention on foundations models worldwide. There is a growing trend that foundation models will serve as the fundamental building blocks for most of the future AI systems. However, incorporating foundation models in AI systems raises significant concern...
Article
The unique characteristics of artificial intelligence (AI) systems pose new challenges to traditional software engineering approaches. Thus, new software engineering approaches are required to develop AI systems in a responsible manner.
Preprint
Full-text available
Various data-sharing platforms have emerged with the growing public demand for open data and legislation mandating certain data to remain open. Most of these platforms remain opaque, leading to many questions about data accuracy, provenance and lineage, privacy implications, consent management, and the lack of fair incentives for data providers. Wi...
Preprint
Due to convenience, open-source software is widely used. For beneficial reasons, open-source maintainers often fix the vulnerabilities silently, exposing their users unaware of the updates to threats. Previous works all focus on black-box binary detection of the silent dependency alerts that suffer from high false-positive rates. Open-source softwa...
Preprint
Full-text available
The right to be forgotten (RTBF) is motivated by the desire of people not to be perpetually disadvantaged by their past deeds. For this, data deletion needs to be deep and permanent, and should be removed from machine learning models. Researchers have proposed machine unlearning algorithms which aim to erase specific data from trained models more e...
Preprint
The usage of Python idioms is popular among Python developers in a formative study of 101 performance-related questions of Python idioms on Stack Overflow, we find that developers often get confused about the performance impact of Python idioms and use anecdotal toy code or rely on personal project experience which is often contradictory in perform...
Preprint
Full-text available
Deep generative models have gained popularity in recent years due to their ability to accurately replicate inherent empirical distributions and yield novel samples. In particular, certain advances are proposed wherein the model engenders data examples following specified attributes. Nevertheless, several challenges still exist and are to be overcom...
Preprint
Full-text available
The rapid development of artificial intelligence (AI) has led to increasing concerns about the capability of AI systems to make decisions and behave responsibly. Responsible AI (RAI) refers to the development and use of AI systems that benefit humans, society, and the environment while minimising the risk of negative consequences. To ensure respons...
Preprint
The rapid growth of software supply chain attacks has attracted considerable attention to software bill of materials (SBOM). SBOMs are a crucial building block to ensure the transparency of software supply chains that helps improve software supply chain security. Although there are significant efforts from academia and industry to facilitate SBOM d...
Preprint
Responsible AI is the practice of developing and using AI systems in a way that benefits the humans, society, and environment, while minimising the risk of negative consequences. Various responsible AI principles have been released recently. However, those principles are very abstract and not practical enough. Further, significant efforts have been...
Preprint
Full-text available
As the deployment of artificial intelligence (AI) is changing many fields and industries, there are concerns about AI systems making decisions and recommendations without adequately considering various ethical aspects, such as accountability, reliability, transparency, explainability, contestability, privacy, and fairness. While many sets of AI eth...
Conference Paper
Full-text available
As the deployment of artificial intelligence (AI) is changing many fields and industries, there are concerns about AI systems making decisions and recommendations without adequately considering various ethical aspects, such as accountability, reliability, transparency, explainability, contestability, privacy, and fairness. While many sets of AI eth...
Article
Blockchain has been increasingly used as a component to enable decentralisation in software architecture for a variety of applications. Blockchain governance has received considerable attention to ensure the safe and appropriate use and evolution of blockchain, especially after the Ethereum DAO attack in 2016. However, there are no systematic effor...
Preprint
Blockchain technology has been integrated into diverse software applications by enabling a decentralised architecture design. However, the defects of on-chain algorithmic mechanisms, and tedious disputes and debates in off-chain communities may affect the operation of blockchain systems. Accordingly, blockchain governance has received great interes...