About
29
Publications
41,068
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,270
Citations
Introduction
Li Cai currently works at the School of Software, Yunnan University. Li does research in Data Mining, Data Quality, and Intelligent Transportation. Her most cited publication is 'The Challenges of Data Quality and Data Quality Assessment in the Big Data Era'.
Skills and Expertise
Current institution
Publications
Publications (29)
Due to the information from the multi-relationship graphs is difficult to aggregate, the graph neural network recommendation model focuses on single-relational graphs (e.g., the user-item rating bipartite graph and user-user social relationship graphs). However, existing graph neural network recommendation models have insufficient flexibility. The...
Drug-target affinity (DTA) prediction is an important task in computer-aided drug design and drug repositioning, which can speed up drug development and reduce resource consumption. Researchers have explored some deep learning-based methods to improve DTA prediction in recent years, demonstrating the great potential of deep learning in DTA predicti...
Traffic flows (e.g., the traffic of vehicles, passengers, and bikes) aim to reveal traffic flow phenomena generated by traffic participants in traffic activities. Various studies of traffic flows rely heavily on high-quality traffic data. The taxi GPS trajectory data are location data that include latitude, longitude, and time. These data are criti...
Due to the information from the multi-relationship graphs is difficult to aggregate, the graph neural network recommendation model focuses on single-relational graphs (e.g., the user-item rating bipartite graph and user-user social relationship graphs). However, existing graph neural network recommendation models have insufficient flexibility. The...
Identifying urban functional regions is a trending topic in urban computing. It helps understand the economic and cultural development of cities and assists decision-makers in land-use planning. However, the studies to date have not fully mined the spatio-temporal characteristics of location data, and most have used the direct clustering method to...
Traditional point-of-interest (POI) data are collected by professional surveying and mapping organizations and are distributed in electronic maps. With the booming Internet and the development of crowdsourcing, the POI data defined in various formats are issued by some Internet companies and non-profit organizations. Due to the multiple sources and...
Clustering algorithms play a very important role in machine learning. With the development of big-data artificial intelligence, distributed parallel algorithms have become an important research field. To reduce the computational complexity and running time of large-scale datasets in the clustering process, this study proposes a distributed clusteri...
In the era of big data, clustering based on multi-source data fusion has become a hot topic in data mining field. Existing studies mainly focus on fusion models and algorithms of data sets in the same domain, but few studies consider imbalanced data sets from different domains. Furthermore, studies on imbalanced data sets mostly focus on classifica...
Li Cai Haoyu Wang Cong Sha- [...]
Wei Zhou
Urban hotspots reflect the degree of residents' travel gathering. The study of urban hotspots has important values for urban infrastructure planning, public security and other aspects. In existing researches, single-source location data and density-based clustering algorithms are used to mine hotspots. Due to the one-sidedness of using the single-s...
Deep neural networks have been shown vulnerable to adversarial attacks launched by adversarial examples. These examples’ transferability makes an attack in the real-world feasible, which poses a security threat to deep learning. Considering the limited representation capacity of a single deep model, the transferability of an adversarial example gen...
Accurate and timely precipitation prediction is very important to development and management of regional water resources, flood disaster prevention/control and people’s daily activities and production plans. However, the prediction accuracy is greatly affected by nonlinear and non-stationary features of precipitation data and noise. Many researches...
The trajectory data of taxies is an important kind of traffic data. Many traffic applications need to perform processing and analysis on trajectory data. Visualising trajectory data of vehicles on road maps is an important measure of reflecting and demonstrating the trend of traffic variation, where map matching from trajectory data to road network...
Due to the increasing number of cloud applications, the amount of data in the cloud shows signs of growing faster than ever before. The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance. However, most cloud storage systems currently adopt a hash-like approach to retrievin...
Urban hotspots refer to regions where flourishing shopping centers are located, the travel volume is very large, and there is high traffic. The formation of hotspots is strongly correlated with many features, i.e., time, space, and the distribution of points of interest (POI); however, most studies have used qualitative analyses to describe the rel...
Taxi trajectory data is a kind of massive traffic data with spatial–temporal dimensions, and plays a key role in traffic management, travel analysis and route recommendation for residents. Analyzing trajectory data with traditional methods is complicated, but visualization techniques can intuitively reflect the change trend of spatial–temporal data...
The trajectory data of taxies containing time dimensional and spatial dimensional information is an important kind of traffic data. How to obtain valuable information from these data has become a hot topic in the field of intelligent transportation. Existing trajectory clustering algorithms can only compute similarities using partial characteristic...
VennPainter is a program for depicting unique and shared sets of genes lists and generating Venn diagrams, by using the Qt C++ framework. The software produces Classic Venn, Edwards' Venn and Nested Venn diagrams and allows for eight sets in a graph mode and 31 sets in data processing mode only. In comparison, previous programs produce Classic Venn...
Labeled Classic Venn diagram.
This is an example of a labeled Classic Venn diagram with 5 sets.
(TIF)
Labeled Edwards’ Venn diagram.
This is an example of labeled Edwards’ Venn diagram with 5 sets.
(TIF)
Nested Venn with eight data sets.
Example from the goldfish x common carp hybrid system with Nested Venn. The right smaller diagram in the green rectangle shows uniquely shared sets only among four datasets (f18, f22-1, f22-2, f22-3), while the larger left diagram includes all eight shared relationships by inlaying the right four into every interse...
Labeled Nested Venn diagram.
This is an example of labeled Nested Venn diagram with 5 sets.
(TIF)
High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data character...
Based on the effective learning environment and multiple intelligence theory, by using a gaming concept in the learning environment, an education game design model to Teach Chinese as a Foreign Language can be proposed and the detailed steps of the design can be shown. The proposed design model is applied to design and implementation an education n...