
Hye-young Paik- PhD, Computer Science
- Senior Lecturer at UNSW Sydney
Hye-young Paik
- PhD, Computer Science
- Senior Lecturer at UNSW Sydney
About
167
Publications
56,963
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,715
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (167)
Foundation models including large language models (LLMs) are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI...
Decentralized machine learning, such as Federated Learning (FL), is widely adopted in many application domains. Especially in domains like recommendation systems, sharing gradients instead of private data has recently caught the research community’s attention. Personalized travel route recommendation utilizes users’ location data to recommend optim...
Digital identity is evolving from centralized systems to a decentralized approach known as Self-Sovereign Identity (SSI). SSI empowers individuals to control their digital identities, eliminating reliance on third-party data custodians and reducing the risk of data breaches. However, the concept of trust in SSI remains complex and fragmented. This...
The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of th...
Human-sensing platforms, encompassing digitized biometrics, activity sensing, and virtual human modeling, play a pivotal role in leveraging human-related data within the digital realm. With the swift progress of AI techniques, the promising opportunities brought about by these platforms are counterbalanced by the ethical challenges they pose. In co...
With the improvement of Internet of Things (IoT) technologies, services, and applications, there is a proliferation of access to smart devices in everyday life. However, granting access and controlling access rights for each resource is challenging in highly dynamic and large-scale IoT deployments. In particular, multiple access information may nee...
Foundation models are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI systems and to prevent misuse that cou...
Securely and efficiently managing digital identities is a pre-requisite to enabling prosperous digital service economy underpinned by trust. The concept of digital identity goes beyond just providing a credential for logins. A truly effective digital identity scheme should be able to model the observations that people, throughout their life, tend t...
Federated learning (FL) is a machine learning approach that decentralizes data and its processing by allowing clients to train intermediate models on their devices with locally stored data. It aims to preserve privacy as only model updates are shared with a central server rather than raw data. In recent years, many reviews have evaluated FL from th...
Self Sovereign Identity (SSI) is an emerging identity system that facilitates secure credential issuance and verification without placing trust in any centralised authority. To bypass central trust, most SSI implementations place blockchain as a trusted mediator by placing credential transactions on-chain. Yet, existing SSI platforms face trust iss...
The use of artificial intelligence (AI) to generate automated early warnings in epidemic surveillance by harnessing vast open-source data with minimal human intervention has the potential to be both revolutionary and highly sustainable. AI can overcome the challenges faced by weak health systems by detecting epidemic signals much earlier than tradi...
Blockchain has been increasingly used as a component to enable decentralisation in software architecture for a variety of applications. Blockchain governance has received considerable attention to ensure the safe and appropriate use and evolution of blockchain, especially after the Ethereum DAO attack in 2016. However, there are no systematic effor...
Blockchain technology has been integrated into diverse software applications by enabling a decentralised architecture design. However, the defects of on-chain algorithmic mechanisms, and tedious disputes and debates in off-chain communities may affect the operation of blockchain systems. Accordingly, blockchain governance has received great interes...
The ability to evaluate uncertainties in evolving data streams has become equally, if not more, crucial than building a static predictor. For instance, during the pandemic, a model should consider possible uncertainties such as governmental policies, meteorological features, and vaccination schedules. Neural process families (NPFs) have recently sh...
The frequency of serious epidemics has continued to increase in the last decade. The ability to predict the risk of outbreaks can improve prevention and control. There are few prediction models available, and of these most are manually constructed by human experts. These manual models are affected by the lack of automation and have limitations in d...
Since the advent of blockchain in 2009, it has drawn considerable attention from academia and industry. Later, the notion of Distributed Ledger Technology (DLT) extended the conception of blockchain, which covered both chain-based and Directed Acyclic Graph (DAG)-based immutable databases maintained by multiple participants. A large number of data...
Blockchain eliminates the need for trusted third-party intermediaries in business by enabling decentralised architecture design in software applications. However, the vulnerabilities in on-chain autonomous decision-makings and cumbersome off-chain coordination lead to serious concerns about blockchain’s ability to behave in a trustworthy and effici...
Federated learning has received fast-growing interests from academia and industry to tackle the challenges of data hungriness and privacy in machine learning. A federated learning system can be viewed as a large-scale distributed system with different components and stakeholders as numerous client devices participate in federated learning. Designin...
Federated learning is growing fast in both academia and industry to resolve data hungriness and privacy issues in machine learning. A federated learning system being widely distributed with different components and stakeholders requires software system design thinking. For instance, multiple patterns and tactics have been summarised by researchers...
Blockchain technology has been exploited to build next-generation applications with its decentralised nature. However, serious concerns are raised about the trustworthiness of the blockchain due to the virtue of the vulnerabilities in on-chain algorithmic mechanisms and tedious debates in the off-chain community. Accordingly, the governance of bloc...
Little attention has been paid to how to investigate crimes related to the Internet of Things (IoT). We propose a forensic investigation framework that considers various aspects of IoT devices and evaluate it with 32 users, including investigators, law enforcement officers, and incident responders.
The push for digitising personal health records needs to occur with serious consideration of privacy in order to instill public confidence. However, the healthcare sector still experiences leakage of Personally Identifiable Information (PII) due to improper data protection practices and security failures by data custodians. Data minimisation refers...
Federated learning is an emerging privacy-preserving AI technique where clients (i.e., organisations or devices) train models locally and formulate a global model based on the local model updates without transferring local data externally. However, federated learning systems struggle to achieve trustworthiness and embody responsible AI principles....
The ability to evaluate uncertainties in evolving data streams has become equally, if not more, crucial than building a static predictor. For instance, during the pandemic, a forecast model should always estimate its uncertainty around dynamic factors such as governmental policies, meteorological features and vaccination schedules. Targeting this,...
Blockchain eliminates the need for trusted third party intermediaries in business by enabling decentralised architecture in software applications. However, vulnerabilities in on-chain autonomous decision-making and cumbersome off-chain coordination have led to serious concerns about blockchain's ability to behave and make decisions in a trustworthy...
Personal data stores (PDSs) are decentralized, user-centric, data storage/processing environments for implementing privacy-aware smart home data storage. Our privacy preference recommender system works with PDSs to assist users in making data-sharing decisions to avoid unintended privacy mishaps.
The ability to deal with uncertainty in machine learning models has become equally, if not more, crucial to their predictive ability itself. For instance, during the pandemic, governmental policies and personal decisions are constantly made around uncertainties. Targeting this, Neural Process Families (NPFs) have recently shone a light on predictio...
Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients’ local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constrai...
Federated learning is an emerging privacy-preserving AI technique where clients (i.e., organisations or devices) train models locally and formulate a global model based on the local model updates without transferring local data externally. However, federated learning systems struggle to achieve trustworthiness and embody responsible AI principles....
Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constrai...
Federated learning is an emerging machine learning paradigm where clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning and explore how to develop federated learning systems, we perform a systematic literature review from a software engineering perspective,...
Blockchain has been increasingly used as a software component to enable decentralisation in software architecture for a variety of applications. Blockchain governance has received considerable attention to ensure the safe and appropriate use and evolution of blockchain, especially after the Ethereum DAO attack in 2016. To understand the state-of-th...
Edge computing, as a part a distributed computing architecture, has become an increasingly popular paradigm. It expands the capacity of cloud by facilitating data from the end devices to be stored and processed at the edge of the network closer to the data instead of delivering it to the cloud. Data integrity is a big concern in edge computing. As...
The data on the Web is increasingly being centralized towards a few service providers. Personal Data Stores(PDS) have emerged, proposing a fundamental shift from the current service-centric data ecosystem to a decentralized data storage and processing environment by placing the data with users. Users are to assume total self-sovereignty over their...
Crimes sabotage various societal aspects, such as social stability, public safety, economic development, and individuals’ quality of life. To accurately predict crime occurrences can not only bring the peace of mind to individuals but also help distribute and manage police resources effectively by authorities. We aim to take into account plenty of...
This book constitutes revised and selected papers from the scientific satellite events held in conjunction with the18th International Conference on Service-Oriented Computing, ICSOC 2020. The conference was held virtually during December 14-17, 2020.
A total of 125 submissions were received for the satellite events. The volume includes
9 papers fr...
Self-sovereign identity is a new identity management paradigm that allows entities to really have the ownership of their identity data and control their use without involving any intermediary. Blockchain is an enabling technology for building self-sovereign identity systems by providing a neutral and trustable storage and computing infrastructure,...
Self-sovereign identity is a new identity management paradigm that allows entities to really have the ownership of their identity data and control their use without involving any intermediary. Blockchain is an enabling technology for building self-sovereign identity systems by providing a neutral and trustable storage and computing infrastructure a...
Self-sovereign identity (SSI) is considered to be a “killer application” of blockchain. However, there is a lack of systematic architecture designs for blockchain-based SSI systems to support methodical development. An aspect of such gap is demonstrated in current solutions, which are considered coarse grained and may increase data security risks....
Self-sovereign identity (SSI) is considered to be a "killer application" of blockchain. However, there is a lack of systematic architecture designs for blockchain-based SSI systems to support methodical development. An aspect of such gap is demonstrated in current solutions, which are considered coarse grained and may increase data security risks....
In a blockchain-based system, data and the consensus-based process of recording and updating them over distributed nodes are central to enabling the trustless multi-party transactions. Thus, properly understanding what and how the data are stored and manipulated ultimately determines the degree of utility, performance, and cost of a blockchain-base...
Business world is getting increasingly dynamic. Information processing using knowledge-, service-, and cloud-based systems makes the use of complex, dynamic and often knowledge-intensive activities an inevitable task. Knowledge-intensive processes contain a set of coordinated tasks and activities, controlled by knowledge workers to achieve a busine...
Despite its popularity, the decision making process of a Deep Neural Network (DNN) model is opaque to users, making it difficult to understand the behaviour of the model. We present the design of a Web-based DNN interpretability framework which is based on the core notions in case-based reasoning approaches where exemplars (e.g., data points consid...
Deciding on the optimal architecture of a software system is difficult, as the number of design alternatives and component interactions can be overwhelmingly large. Adding security considerations can make architecture evaluation even more challenging. Existing model-based approaches for architecture optimisation usually focus on performance and cos...
Tables in documents are a widely-available and rich source of information, but not yet well-utilised computationally because of the difficulty in automatically extracting their structure and data content. There has been a plethora of systems proposed to solve the problem, but current methods present low usability and accuracy and lack precision in...
Large amount of public data produced by enterprises are in semi-structured PDF form. Tabular data extraction from reports and other published data in PDF format is of interest for various data consolidation purposes such as analysing and aggregating financial reports of a company. Queries into the structured tabular data in PDF format are normally...
A fundamental assumption of improvement in Business Process Management (BPM) is that redesigns deliver refined and improved versions of business processes. These improvements can be validated online through sequential experiment techniques like AB Testing, as we have shown in earlier work. Such approaches have the inherent risk of exposing customer...
A fundamental assumption of Business Process Management (BPM) is that redesign delivers refined and improved versions of business processes. This assumption, however, does not necessarily hold, and any required compensatory action may be delayed until a new round in the BPM life-cycle completes. Current approaches to process redesign face this prob...
Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm t...
Tables in documents are a rich and under-exploited source of structured data in otherwise unstructured documents. The extraction and understanding of tabular data is a challenging task which has attracted the attention of researchers from a range of disciplines such as information retrieval, machine learning and natural language processing. In this...
Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm t...
The presence of the Internet of Things (IoT) in healthcare through the use of mobile medical applications and wearable devices allows patients to capture their healthcare data and enables healthcare professionals to be up-to-date with a patient’s status. Ambient Assisted Living (AAL), which is considered as one of the major applications of IoT, is...
Privacy-preserving data analytics is an emerging technology which allows multiple parties to perform joint data analytics without disclosing source data to each other or a trusted third-party. A variety of platforms and protocols have been proposed in this domain. However, these systems are not yet widely used, and little is known about them from a...
A fundamental assumption of Business Process Management (BPM) is that redesign delivers new and improved versions of business processes. This assumption, however, does not necessarily hold, and required compensatory action may be delayed until a new round in the BPM life-cycle completes. Current approaches to process redesign face this problem in o...
In this chapter, we explore the concept of data services. After clarifying the main concepts, we introduce key enabling technologies for building data services, namely XSLT and XQuery. These two XML-based languages are used to transform and query potentially heterogeneous data into well-understood standard XML documents. The lab exercises included...
In this chapter, we provide concluding remarks offering readers our perspective for continued exploration in the field of Service Oriented Computing.
In this chapter, we begin by understanding Service Oriented Architecture (SOA), its key values and goals too modern and evolving business ecosystems. We then describe the SOA architectural stack in reference to software application integration layers. This is followed by an introduction to service composition and data-flow techniques, including end...
In this chapter, we present BPEL and BPMN as two main languages of Web service composition. Both BPEL and BPMN allow the codification of control flow logic of a composite service. We will introduce the core syntax elements of the two languages and their usage examples. The lab activities will show how to build a simple BPEL service by composing oth...
In this chapter, REST is introduced, an alternate Web service implementation technique. Unlike SOAP and WSDL with clearly defined standards, REST contains a set of generic Web service design principles and guidelines that can be interpreted and implemented differently. In this chapter, we present the fundamentals of the said principles, explaining...
In this chapter, we introduce a framework known as Service Component Architecture (SCA) that provides a technology-agnostic capability for composing applications from distributed services. Building a successful SOA solution in practice can be complex, due to the lack of standards and specifications, especially when integrating many different techno...
In this chapter, we examine the data-flow aspects of Web service composition, which specifies how data is exchanged between services. The data-flow description encapsulates the data movement from one service to another and the transformations applied on this data. We introduce two different paradigms based on the message passing style, namely, blac...
In this chapter, we introduce the motivation behind Web service composition technologies – going from an atomic to a composite service. In doing so, we discuss the two main paradigms of multiple service interactions: Web service orchestration and Web service choreography. In the rest of the book, we will focus on Web service orchestration as the ma...
In this chapter, SOAP and WSDL are explained as important standards that lay the foundation for standardized descriptions of messages and operations of a Web service. We first describe the core elements of SOAP and WSDL standards with examples, then present how the two standards are fit together to form the common message communication styles, name...
This book embarks on a mission to dissect, unravel and demystify the concepts of Web services, including their implementation and composition techniques. It provides a comprehensive perspective on the fundamentals of implementation standards and strategies for Web services (in the first half of the book), while also presenting composition technique...
We propose a PDF document wrapper system that is specifically targeted at table processing applications. We (i) review the PDF specifications and identify particular challenges from the table processing point of view, (ii) specify a table-oriented document model containing the required atomic elements for table extraction and understanding applicat...
People share various processes in daily lives on-line in natural language form (e.g., cooking recipes, “how-to guides” in eHow). We refer to them as personal process descriptions. Previously, we proposed Personal Process Description Graph (PPDG) to concretely represent the personal process descriptions as graphs, along with query processing techniq...
Although there is an abundance of how-to guides online, systematically utilising the collective knowledge represented in such guides has been limited. This is primarily due to how-to guides (effectively, informal process descriptions) being expressed in natural language, which complicates the process of extracting actions and data. This paper descr...
Tables in documents are a rich source of information, but not yet well-utilised computationally because of the difficulty of extracting their structure and data automatically. In this paper, we progress the state-of-the-art in automatic table extraction by identifying common patterns in table headers to develop rules and heuristics for determining...