Masatoshi Yoshikawa

Masatoshi Yoshikawa
Kyoto University | Kyodai · Graduate School of Informatics

About

366
Publications
43,612
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,242
Citations
Introduction
Additional affiliations
April 2006 - present
Kyoto University
Position
  • Professor (Full)

Publications

Publications (366)
Article
We focus on measuring relationships between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relationships between two objects exist: in Wikipedia, an explicit relationship is represented by a single link between the two pages for the objects, and an implicit relationship is represented by a link structu...
Preprint
Full-text available
Time is an important aspect of documents and is used in a range of NLP and IR tasks. In this work, we investigate methods for incorporating temporal information during pre-training to further improve the performance on time-related tasks. Compared with BERT which utilizes synchronic document collections (BooksCorpus and Eng-lish Wikipedia) as the t...
Article
How can we explore the unknown properties of high-dimensional sensitive relational data while preserving privacy? We study how to construct an explorable privacy-preserving materialized view under differential privacy. No existing state-of-the-art methods simultaneously satisfy the following essential properties in data exploration: workload indepe...
Preprint
Full-text available
Time is an important aspect of text documents, which has been widely exploited in natural language processing and has strong influence, for example, in temporal information retrieval, where the temporal information of queries or documents need to be identified for relevance estimation. Event-related tasks like event ordering, which aims to order ev...
Preprint
Full-text available
Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, in...
Preprint
Full-text available
How can we explore the unknown properties of high-dimensional sensitive relational data while preserving privacy? We study how to construct an explorable privacy-preserving materialized view under differential privacy. No existing state-of-the-art methods simultaneously satisfy the following essential properties in data exploration: workload indepe...
Preprint
Full-text available
In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives...
Preprint
Full-text available
By combining Federated Learning with Differential Privacy, it has become possible to train deep models while taking privacy into account. Using Local Differential Privacy (LDP) does not require trust in the server, but its utility is limited due to strong gradient perturbations. On the other hand, client-level Central Differential Privacy (CDP) pro...
Chapter
Language is our main communication tool. Deep understanding of its evolution is imperative for many related research areas including history, humanities, social sciences, etc. To this end, we are interested in the task of segmenting long-term document archives into naturally coherent periods based on the evolving word semantics. There are many bene...
Preprint
Full-text available
In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives...
Chapter
Local differential privacy (LDP) has been received increasing attention as a formal privacy definition without a trusted server. In a typical LDP protocol, the clients perturb their data locally with a randomized mechanism before sending it to the server for analysis. Many studies in the literature of LDP implicitly assume that the clients honestly...
Preprint
Federated Learning (FL) is emerging as a promising paradigm of privacy-preserving machine learning, which trains an algorithm across multiple clients without exchanging their data samples. Recent works highlighted several privacy and robustness weaknesses in FL and addressed these concerns using local differential privacy (LDP) and some well-studie...
Preprint
Full-text available
Federated learning (FL) is an emerging paradigm for machine learning, in which data owners can collaboratively train a model by sharing gradients instead of their raw data. Two fundamental research problems in FL are incentive mechanism and privacy protection. The former focuses on how to incentivize data owners to participate in FL. The latter stu...
Article
Adapting learning experience according to the rapidly-changing job market is essential for students to achieve fruitful learning and successful career development. As building blocks of potential job opportunities, we focus on “technical terminologies” which are frequently required in the job market. Given a technical terminology, we aim at identif...
Preprint
Full-text available
There is a growing trend regarding perceiving personal data as a commodity. Existing studies have built frameworks and theories about how to determine an arbitrage-free price of a given query according to the privacy loss quantified by differential privacy. However, those previous works have assumed that data buyers can purchase query answers with...
Preprint
Local differential privacy (LDP) has been received increasing attention as a formal privacy definition without a trusted server. In a typical LDP protocol, the clients perturb their data locally with a randomized mechanism before sending it to the server for analysis. Many studies in the literature of LDP implicitly assume that the clients honestly...
Chapter
Full-text available
Many archival collections have been recently digitized and made available to a wide public. The contained documents however tend to have limited attractiveness for ordinary users, since content may appear obsolete and uninteresting. Archival document collections can become more attractive for users if suitable content can be recommended to them. Th...
Preprint
Full-text available
Recently, differential privacy (DP) is getting attention as a privacy definition when publishing statistics of a dataset. However, when answering a decision problem with a DP mechanism, it causes a two-sided error. This characteristic of DP is not desirable when publishing risk information such as concerning COVID-19. This paper proposes relaxing D...
Article
Full-text available
Temporal collections of news articles (or news archives) contain numerous accurate and time-aligned articles, which offer immense value to our society, helping users to know details of events that occurred at specific time points in the past. Currently, the access to such collections is rather difficult for average users due to their large sizes an...
Preprint
Full-text available
How to contain the spread of the COVID-19 virus is a major concern for most countries. As the situation continues to change, various countries are making efforts to reopen their economies by lifting some restrictions and enforcing new measures to prevent the spread. In this work, we review some approaches that have been adopted to contain the COVID...
Preprint
Existing Bluetooth-based Private Contact Tracing (PCT) systems can privately detect whether people have come into direct contact with COVID-19 patients. However, we find that the existing systems lack functionality and flexibility, which may hurt the success of the contact tracing. Specifically, they cannot detect indirect contact (e.g., people may...
Chapter
In recent years, providers offering similar services have increasingly formed alliances to increase their opportunities for matching customers. Moreover, it is no longer uncommon for one provider to participate in multiple alliances. It is important that providers participating in such multiple alliances integrate their data automatically, easily,...
Preprint
Full-text available
The COVID-19 pandemic has prompted technological measures to control the spread of the disease. Private contact tracing (PCT) is one of the promising techniques for the purpose. However, the recently proposed Bluetooth-based PCT has several limitations in terms of functionality and flexibility. The existing systems are only able to detect direct co...
Preprint
Full-text available
In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. Especially, perturbation methods based on Geo-Indistinguishability (Geo-I), which randomly perturb a true location to a pseudolocation, are getting attent...
Preprint
Full-text available
Differentially private federated learning has been intensively studied. The current works are mainly based on the \textit{curator model} or \textit{local model} of differential privacy. However, both of them have pros and cons. The curator model allows greater accuracy but requires a trusted analyzer. In the local model where users randomize local...
Article
Full-text available
To classify and search various kinds of scientific data, it is useful to annotate those data with keywords from a controlled vocabulary. Data providers, such as researchers, annotate their own data with keywords from the provided vocabulary. However, for the selection of suitable keywords, extensive knowledge of both the research domain and the con...
Chapter
Full-text available
Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., Policy Graph based Location...
Chapter
We propose a scheduling method for a “detour-type demand bus” as a public transportation service suitable for densely populated areas with a “star-type demand distribution” that has a demand accumulation point. To address regular usage, which is the key to public transportation on-demand buses, we consider construction of a schedule for each vehicl...
Chapter
Full-text available
As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning models. To prevent privacy leakages from gradients that are calculated on users’ sensitive data, local...
Preprint
Full-text available
How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee...
Preprint
Full-text available
Location privacy has been extensively studied in the literature. However, existing location privacy models are either not rigorous or not customizable, which limits the trade-off between privacy and utility in many real-world applications. To address this issue, we propose a new location privacy notion called PGLP, i.e., \textit{Policy Graph based...
Article
The proliferation of Massive Open Online Courses has made it a challenge for the user to select a proper course. We assume a situation in which the user has targeted on the knowledge defined by some knowledge categories. Then, knowing how much of the knowledge in the category is covered by the courses will be helpful in the course selection. In thi...
Article
Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an assessment period, an imbalance occurs be...
Preprint
Full-text available
In this demonstration, we present a privacy-preserving epidemic surveillance system. Recently, many countries that suffer from coronavirus crises attempt to access citizen's location data to eliminate the outbreak. However, it raises privacy concerns and may open the doors to more invasive forms of surveillance in the name of public health. It also...
Preprint
Full-text available
With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometr...
Chapter
Full-text available
Long-term news article archives are valuable resources about our past, allowing people to know detailed information of events that occurred at specific time points. To make better use of such heritage collections, this work considers the task of large scale question answering on long-term news article archives. Questions on such archives are often...
Preprint
Full-text available
As massive data are produced from small gadgets, federated learning on mobile devices has become an emerging trend. In the federated setting, Stochastic Gradient Descent (SGD) has been widely used in federated learning for various machine learning models. To prevent privacy leakages from gradients that are calculated on users' sensitive data, local...
Article
Full-text available
Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting users' location privacy by releasing a perturbed location to third parties such as location-based service providers. However, when a user's perturbed locations are released continuously, existing LPPMs may not protect the sensitive information about the user...
Conference Paper
To obtain accurate information through web searches, people have to search for information carefully. This study investigates how the search behaviors and decision outcomes of searchers were affected by the documents they encountered during their search process. We focus on two document factors: (1) opinion (consistent and inconsistent) with the se...
Preprint
Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult to automatically detect and correct inconsistent values for data requiring expert knowledge or data created b...
Conference Paper
Longitudinal analyses has been a long-time interest for clinical physicians. These analyses provide many benefits for them, such as insight on the medication strategy over different period of time. With the development of clinical research, new medication is released to provide better treatment outcomes. In this paper, we assess the medication stra...
Article
Full-text available
We propose a novel way of utilizing and accessing information stored in news archives as well as a new style of investigating the history. Our idea is to automatically generate similar entity pairs given two sets of entities, one from the past and one representing the present. This allows performing entity-oriented mapping between different times....
Article
Full-text available
Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting a user's location in location-based services. However, when user's perturbed locations are released continuously, existing LPPMs may not protect users' sensitive spatiotemporal event, such as "visited hospital in the last week" or "regularly commuting betwee...
Preprint
Full-text available
Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting users' location privacy by releasing a perturbed location to third parties such as location-based service providers. However, when a user's perturbed locations are released continuously, existing LPPMs may not protect the sensitive information about the user...
Chapter
Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework, reviewing academic papers and so on. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an asse...
Conference Paper
We propose a method for identifying a set of entity properties from text. Identifying entity properties is similar to a relation extraction task that can be cast as a classification of sentences. Normally, this task can be achieved by distant supervised learning by automatically preparing training sentences for each property; however, it is impract...