Yu Yan's research while affiliated with Harbin Institute of Technology and other places

Publications (14)

Preprint
The knobs of modern database management systems have significant impact on the performance of the systems. With the development of cloud databases, an estimation service for knobs is urgently needed to improve the performance of database. Unfortunately, few attentions have been paid to estimate the performance of certain knob configurations. To fil...
Preprint
Cardinality estimation methods based on probability distribution estimation have achieved high-precision estimation results compared to traditional methods. However, the most advanced methods suffer from high estimation costs due to the sampling method they use when dealing with range queries. Also, such a sampling method makes them difficult to di...
Chapter
With the development of the Internet of Things, the time series data generated by monitors, analyzers, and detection instruments in the industry has surged. The management of very large-scale time series data faces great challenges. However, the current distributed time series database is still poor in terms of data storage efficiency and data writ...
Preprint
Full-text available
Recent years, the database community has attempted to develop learned cardinality estimator for improving the estimation accuracy. Although some researches show that the applying deep learning to cardinality estimation is a significant and promising direction, there still exists many problems in implementing these techniques to real applications (l...
Article
The knowledge graph (KG) has attracted much concern due to its positive effect on AI-related applications. While even for those large-scale KGs widely used, they are still far from being complete and comprehensive. This prompts reasoning over KGs to be one of the most basic and attention-grabbing tasks. However, most existing reasoning methods only...
Preprint
Nowadays, graph becomes an increasingly popular model in many real applications. The efficiency of graph storage is crucial for these applications. Generally speaking, the tune tasks of graph storage rely on the database administrators (DBAs) to find the best graph storage. However, DBAs make the tune decisions by mainly relying on their experience...
Preprint
Full-text available
Recent years, the database committee has attempted to develop automatic database management systems. Although some researches show that the applying AI to data management is a significant and promising direction, there still exists many problems in implementing these techniques to real applications (long training time, various environments and unst...
Article
With the development of big data technology, the data management of complex applications has become more and more resource intensive. In this paper, we propose an automatic approach (DRLISA) to achieve NoSQL database index selection. For different workloads, we automatically select its corresponding indexes and parameters which can totally improve...
Preprint
Full-text available
Big data management aims to establish data hubs that support data in multiple models and types in an all-around way. Thus, the multi-model database system is a promising architecture for building such a multi-model data store. For an integrated data hub, a unified and flexible query language is incredibly necessary. In this paper, an extensible and...
Chapter
Document storage management plays a significant role in the field of database. With the advent of big data, making storage management manually becomes more and more difficult and inefficient. There are many researchers to develop algorithms for automatic storage management(ASM). However, at present, no automatic systems or algorithms related to doc...
Chapter
With the advent of big data, the cost of index recommendation (IR) increases exponentially, and the portability of IR model becomes an urgent problem to be solved. In this paper, a fine-grained classification model based on multi-core convolution neural network (CNNIR) is proposed to implement the transferable IR model. Using the strong knowledge r...
Preprint
We propose a new approach of NoSQL database index selection. For different workloads, we select different indexes and their different parameters to optimize the database performance. The approach builds a deep reinforcement learning model to select an optimal index for a given fixed workload and adapts to a changing workload. Experimental results s...

Citations

... In general, embedding models use simple formulas to model and predict the veracity of triples, focusing solely on the structural encoding of fact triples in the KG while neglecting the internal connections within entities and relations [23]. Consequently, they can only capture linear associations between entities and relations. ...
... ExplainER [26] Model-specific Emboot [144] ProtoRE [24] MGNN [17], CRIAGE [72], ITransF [126], DensE [56], HopfE [9], METransE [118] Global self-explaining xER [108] FTL-LM [53], Neural LP [129] [61], Abstraction [107] Human-in-the-loop [43] SystemER [75], TuneR [69] Table 1. Overview of explainable knowledge graph construction methods. ...
... Machine learning (ML) is currently applied to many fields, such as a distributed cooperative coevolutionary genetic algorithm to optimize multi-objective data publishing [8], optimize stragglers in edge federated learning (EFL) [9], and uncertain data query [10]. In recent years, machine learning has been continuously integrated into traditional relational databases or NoSQL databases to implement components for automation and self-optimization [11][12][13]. The limitations of traditional tuning methods can be effectively addressed by using ML techniques, which significantly promote the development of AI4DB [11,12]. ...