Science topic
Cache - Science topic
Explore the latest publications in Cache, and find Cache experts.
Publications related to Cache (10,000)
Sorted by most recent
Đây là giai đoạn quan trọng trong quá trình nguyên phân, trong đó, các nhiễm sắc thể nhân đôi được phân chia thành các phần khác nhau của tế bào để chúng đóng vai trò là bộ gen hoàn chỉnh cho chu kỳ tế bào kế tiếp. Bao gồm hai quá trình: Kỳ sau A (anaphase A), sự di chuyển của nhiễm sắc thể về phía cực của thoi phân bào mà chúng hướng đến thông qua...
The sympathetic nervous system regulates various visceral functions, including those of the heart, lungs, and digestive system, and maintains homeostasis. The prevertebral ganglia (PVG) in the peripheral nervous system serve as a vital relay station, transmitting efferent signals to visceral organs. The PVG receives innervation from intestinofugal...
As processor performance advances, the cache has become an essential component of computer architecture. Moreover, the rapid digital transformation of daily life has resulted in electronic devices storing greater amounts of sensitive information. Thus, device users are becoming more concerned about the security of their personal information, so imp...
The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance the efficie...
In teaching in general and teaching mathematics in particular, creating learning motivation for students is a critical task, significantly impacting the quality of students' learning. This paper studied the concepts of learning motivation, types of learning motivation, and some measures to create learning motivation for students proposed by other a...
The Covid-19 pandemic brought unprecedented changes to business ownership in the UK which affects a generation of entrepreneurs and their employees. Nonetheless, the impact remains poorly understood. This is because research on capital accumulation has typically lacked high-quality, individualized, population-level data. We overcome these barriers...
With the widespread popularity of family cars, traffic management problems are becoming increasingly complex, and a large number of real-time processing tasks have emerged in intelligent traffic management systems. To solve the task offloading decision problem in the multi-user and multi-server scenarios of intelligent transportation system, the co...
Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as a post-training technique for efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the pre-trained models, followed by post-training q...
Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper introduces Vortex, a GPU-accelerated framework designed for data analytics workloads that exceed GPU memory capacity. A...
Ở bậc Tiểu học, việc dạy học và ứng dụng bài tập về từ nhiều nghĩa (hay còn gọi là đa nghĩa) cho học sinh là một phần rất quan trọng trong chương trình môn Tiếng Việt, đặc biệt đối với học sinh lớp 5. Trong quá trình tìm hiểu, nghiên cứu, chúng tôi nhận thấy việc dạy học vốn từ này cần phải có sự mới mẻ, phù hợp với tư duy của học sinh tiểu học để...
In response to the need for deploying the YOLOv4-Tiny model on resource-constrained Field-Programmable Gate Array (FPGA) platforms for rapid inference, this study proposes a general optimization acceleration strategy and method aimed at achieving fast inference for object detection networks. This approach centers on the synergistic effect of severa...
Sau hơn 20 năm cầm bút với ý thức cách tân nghệ thuật không ngừng, Nguyễn Ngọc Tư đã khẳng định được vị trí quan trọng của mình trong dòng chảy văn chương đương đại Việt Nam. Đến với tiểu thuyết, Nguyễn Ngọc Tư thể hiện nhiều màu sắc mới mẻ từ quan niệm nghệ thuật cho đến kĩ thuật viết. Vận dụng lí thuyết hậu hiện đại vào tìm hiểu tiểu thuyết của n...
Lý tưởng cách mạng là toàn bộ những giá trị, mục đích cao đẹp của cách mạng, có tác dụng định hướng con người hành động vượt qua khó khăn, thách thức để thực hiện nó. Lý tưởng cách mạng có vai trò quan trọng đối với nhiều chủ thể, trong đó có sinh viên nói chung và sinh viên Trường Đại học Đồng Tháp nói riêng. Vì vậy nhóm tác giả đã nghiên cứu phân...
In modern large language models (LLMs), handling very long context lengths presents significant challenges as it causes slower inference speeds and increased memory costs. Additionally, most existing pre-trained LLMs fail to generalize beyond their original training sequence lengths. To enable efficient and practical long-context utilization, we in...
Kỹ thuật làm mẫu bằng cách “nói to suy nghĩ” (Think aloud) đã được nhiều nghiên cứu trên thế giới đề cập đến trong dạy đọc, viết, nói và nghe cho học sinh. Chương trình giáo dục phổ thông môn Ngữ văn cũng đã nêu rõ “mục đích của dạy viết là rèn luyện tư duy và cách viết”. Định hướng này thể hiện sự khác biệt trong cách tiếp cận dạy viết trước đây,...
Recent findings by Chettih et al. (Cell 187: 1922–1935, 2024) from electrophysiological recordings in the hippocampus of black-capped chickadees shed light on the debate about how food-hoarding Parids may remember their cache sites. When birds retrieve caches, a “bar code” is reactivated, which is very similar to the code generated when the same ca...
The attention mechanism is essential for the impressive capabilities of transformer-based Large Language Models (LLMs). However, calculating attention is computationally intensive due to its quadratic dependency on the sequence length. We introduce a novel approach called Top-Theta Attention, or simply Top-$\theta$, which selectively prunes less es...
Bien qu'en matière littéraire rien ne soit ni complètement hermétique, ni non plus entièrement transparent ou lisible, le fait est que la condition hermétique d'un texte trouble la perception et incite de façon exemplaire à déjouer toute prise herméneutique et toute mécanisation de la lecture, car, en matière d'adversité interprétative, il s'agit m...
This paper presents Boundary-Aware Concurrent Queue (BACQ), a high-performance queue designed for modern GPUs, which focuses on high concurrency in massively parallel environments. BACQ operates at the warp level, leveraging intra-warp locality to improve throughput. A key to BACQ’s design is its ability to replace conflicting accesses to shared da...
ARTICLE INFO ABSTRACT Received: 02/10/2024 This paper examines Vietnamese EFL students' social media use, language learning, and self-regulated learning. An immersion study with 253 A2-level non-majored English students used mixed methods approach to examine students' opinion on social media as a language learning tool and its support for self-regu...
Modern large language models (LLMs) often encounter communication bottlenecks on current hardware, rather than purely computational constraints. Multi-head Latent Attention (MLA) tackles this challenge by using low-rank matrices in the key-value (KV) layers, thereby allowing compressed latent KV states to be cached. This approach significantly redu...
Efficient image loading plays a crucial role in creating seamless Android application experiences. Modern image loader libraries-Glide, Coil, and Fresco-offer advanced image-handling capabilities, including caching, memory management, and network efficiency. This paper presents a comparative performance analysis of these three popular libraries, fo...
Bài viết này phân tích và so sánh chiến lược truyền thông của Coca-Cola Việt Nam và Pepsi Việt Nam dưới góc độ quản trị thương hiệu. Nghiên cứu tập trung vào ba khía cạnh chính:
Logo và sự phát triển thương hiệu: So sánh sự khác biệt trong việc Coca-Cola giữ nguyên logo từ 1887 trong khi Pepsi thay đổi 11 lần trong 110 năm.
Giá trị thương hiệu dựa...
The human gut is rich in metabolites and harbors a complex microbial community, yet the sensory repertoire of its commensal bacteria remains largely uncharacterized. Here we systematically mapped ligand specificities of extracytoplasmic sensory domains from twenty members of the human gut microbiota, with a primary focus on the abundant and physiol...
Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times...
Large language models (LLMs) have achieved impressive success, but their high memory requirements present challenges for long-context token generation. The memory complexity of long-context LLMs is primarily due to the need to store Key-Value (KV) embeddings in their KV cache. We present BalanceKV, a KV cache compression method based on geometric s...
Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models such as encoder-only transformers and Convolutional Neural Networks. At the same time, LLMs possess large parameter sizes and use key-value caches to...
Large Language Model (LLM) inference, where a trained model generates text one word at a time in response to user prompts, is a computationally intensive process requiring efficient scheduling to optimize latency and resource utilization. A key challenge in LLM inference is the management of the Key-Value (KV) cache, which reduces redundant computa...
Edge computing moves application services from the central cloud to the network edge, significantly reducing service latency. Edge service caching presents a more complex challenge than cloud caching, due to the dynamics and diversity of mobile user requests. Consequently, traditional caching strategies are not directly applicable to edge environme...
Continual Graph Learning (CGL), which aims to accommodate new tasks over evolving graph data without forgetting prior knowledge, is garnering significant research interest. Mainstream solutions adopt the memory replay-based idea, ie, caching representative data from earlier tasks for retraining the graph model. However, this strategy struggles with...
Network File System (NFS) represents a critical technology in distributed computing, evolving significantly since its inception by Sun Microsystems. This comprehensive article explores the architectural framework, performance characteristics, and implementation strategies of NFS across various protocol versions and enterprise environments. The arti...
Dưới đây là nội dung trong ảnh được chuyển thành văn bản chữ:
---
**Tóm tắt:** Đầu tư trực tiếp nước ngoài trong giai đoạn hiện nay là một vấn đề vô cùng quan trọng của xã hội và nền kinh tế. FDI mang lại giá trị kinh tế vô cùng lớn cho xã hội, đóng góp tích cực vào tốc độ tăng trưởng kinh tế của đất nước. Các doanh nghiệp FDI góp phần thay đổi...
Hypertext Transfer Protocol (HTTP) injection is a security vulnerability in which attackers manipulate HTTP Headers for malicious intent which facilitate various types of attacks like Downgrade-attack, Session fixation, Session hijacking, Cross-site scripting (XSS), Script injection, Referer forgery, Host header injection and Cache poisoning. These...
High availability is a critical requirement for distributed scoring systems that deliver real-time predictions in applications such as fraud detection, recommendation engines, and predictive maintenance. This paper explores the architecture, best practices, and strategies to ensure resilience, scalability, and fault tolerance in such systems. It di...
Due to the rapid development of panorama cameras, the task of estimating panorama depth has attracted significant attention from the computer vision community, especially in applications such as robot sensing and autonomous driving. However, existing methods relying on different projection formats often encounter challenges, either struggling with...
Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intr...
Mức sẵn lòng chi trả trung bình cho dịch vụ 3G của khách hàng MobiFone tại thành phố Cần Thơ được ước tính bằng phương pháp Turnbull (1989). Kết quả nghiên cứu cho thấy quyết định lựa chọn dịch vụ 3G của MobiFone chịu ảnh hưởng của 3 yếu tố, được xác định theo mức độ ảnh hưởng từ cao đến thấp là: Thu nhập của khách hàng, sự đa dạng của ứng dụng 3G...
In the Markov paging model, one assumes that page requests are drawn from a Markov chain over the pages in memory, and the goal is to maintain a fast cache that suffers few page faults in expectation. While computing the optimal online algorithm $(\mathrm{OPT})$ for this problem naively takes time exponential in the size of the cache, the best-know...
In the face of rapidly evolving communication technologies and increasing user demands, traditional terrestrial networks are challenged by the need for high-quality, high-speed, and reliable communication. This paper explores the integration of heterogeneous satellite networks (HSN) with emerging technologies such as Mobile Edge Computing (MEC), in...
This paper investigates two key performance aspects of the interplay between public DNS resolution services and content delivery networks -- the latency of DNS queries for resolving CDN-accelerated hostnames and the latency between the end-user and the CDN's edge server obtained by the user through a given resolution service. While these important...
Recent improvements in machine learning techniques offer new opportunities for addressing challenges across various domains. A significant focus in current research is on leveraging machine learning methodologies to improve existing resource management strategies, aiming to achieve comparable performance capabilities. In particular, reinforcement l...
With the rapid development of Wireless Computing Power Networks (WCPNs), the urgent need for data privacy protection and communication efficiency has led to the emergence of the federated learning (FL) framework. However, the time delay leads to dragging problems and reduces the convergence performance of FL in the training process. In this article...
Chính sách về thuế, phí, lệ phí và tiền sử dụng liên quan đến đất đai và một số khuyến nghị Bài nghiên cứu này được tiến hành với mục tiêu chính là tổng hợp, phân tích và bình luận về các chính sách thuế, phí, lệ phí liên quan đến đất đai. Đặc biệt, các chính sách này sẽ được xem xét dưới góc độ của hai Nghị quyết quan trọng: Nghị quyết 19-NQ/TW ng...
This article presents a comprehensive analysis of the Data-Driven Sales Optimization (DDSO) platform, a transformative solution for financial service organizations seeking to enhance customer engagement and operational excellence. The article integrates cloud-native technologies, machine learning algorithms, and real-time data processing capabiliti...
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models, enabling precise rendering of multiple distinct elements, such as characters and styles, in multi-concept image generation. However, current approaches face significant challenges when composing these LoRAs for multi-concept image generation, resulting in...
Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instruc...
Large Language Models (LLMs) have gained immense success in revolutionizing various applications, including content generation, search and recommendation, and AI-assisted operation. To reduce high training costs, Mixture-of-Experts (MoE) architecture has become a popular backbone for modern LLMs. However, despite the benefits, serving MoE-based LLM...
The Covid-19 pandemic brought unprecedented changes to business ownership in the UK which affects a generation of entrepreneurs and their employees. Nonetheless, the impact remains poorly understood. This is because research on capital accumulation has typically lacked high-quality, individualized, population-level data. We overcome these barriers...
Penelitian ini bertujuan untuk mengidentifikasi kerentanan pada halaman PDDIKTI dengan menggunakan kombinasi OWASP ZAP dan pengujian manual. PDDIKTI adalah platform yang dikelola oleh Pusat Data dan Informasi (Pusdatin) di bawah Kementerian Riset, Teknologi, dan Pendidikan Tinggi (Kemristekdikti) yang menyimpan data penting tentang mahasiswa, dosen...
Large language models have revolutionized natural language processing but face significant challenges of high storage and runtime costs, due to the transformer architecture's reliance on self-attention, particularly the large Key-Value (KV) cache for long-sequence inference. Recent efforts to reduce KV cache size by pruning less critical entries ba...
Performance benchmarking is essential for ensuring React applications meet the growing demands of modern web users. With increasing complexity in web applications, developers must adopt systematic benchmarking techniques to identify and eliminate performance bottlenecks, ensuring fast load times, smooth interactions, and efficient resource utilizat...
Việt Nam là một quốc gia có lịch sử đã mấy nghìn năm. Dân tộc Việt Nam gắn liền với quá trình tiến hóa vô cùng phong phú và sinh động, đã gan góc vượt qua muôn vàn thử thách xâm lăng của những thế lực thù địch tàn bạo và hùng mạnh nhất trên thế giới, để rồi vững vàng, kiêu hãnh, ngẩng cao đầu trong tư thế là chủ nhân của một quốc gia có chủ quyền,...
We introduce LogQuant, a groundbreaking 2-bit quantization technique for KV Cache in large language model (LLM) inference, delivering substantial memory savings while preserving superior performance. Previous methods either assume that later tokens are more important or attempt to predict important tokens based on earlier attention patterns. Both a...
Disaggregated Large Language Model (LLM) inference has gained popularity as it separates the computation-intensive prefill stage from the memory-intensive decode stage, avoiding the prefill-decode interference and improving resource utilization. However, transmitting Key-Value (KV) data between the two stages can be a bottleneck, especially for lon...
Semantic prompt caches reduce the latency and cost of large language model (LLM) inference by reusing cached LLM-generated responses for semantically similar prompts. Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static thres...
This book offers a clear exploration of cutting-edge semiconductor circuit technologies and their practical applications. It covers topics like advanced transistor design, low-power consumption techniques, and high-performance circuit design.
Circuit Design for Modern Applications explores the recent innovations in semiconductor technology. Bandga...
In response to the issue of wireless spectrum scarcity and secure transmission caused by frequent requests for short videos from a large number of devices within the network, this article proposes a social relationship‐aware collaborative Device‐to‐Device (D2D) secure caching strategy. By thoroughly analyzing the physical and social characteristics...
Vision-Language-Action (VLA) model can process instructions and visual perception to directly generate actions as output in an end-to-end fashion due to its strong multi-modal reasoning capabilities. While the performance of VLA models is promising, their computational cost can be substantial. This raises challenge for applying them on robotics tas...
Làn sóng du lịch đang quay trở lại một cách mạnh mẽ sau thời kỳ dịch bệnh. Không những thế việc xuất hiện các xu hướng du lịch mới, nổi bật với việc gia tăng trải nghiệm ẩm thực địa phương, đang dần thay thế cho các phương thức thịnh hành trước đó. Nghiên cứu sử dụng phương pháp lấy mẫu thuận tiện với bảng khảo sát thiết kế trên Google biểu mẫu cùn...
At the edge, there is a high level of similarity in computing. One approach that has been proposed to enhance the efficiency of edge computing is computation reuse, which eliminates redundant computations. Edge computing is integrated with the ICN architecture, capitalizing on its inherent intelligence to facilitate computation reuse and reduce red...
Title: "The Interdependent Triad of Modern Data Services: A Comparative Analysis with Project Management Principles"
Abstract:
This paper presents a novel conceptual framework that analyzes modern data services through a triangular model comprising Communication, Storage, and Processing components. Drawing parallels with the established Project Ma...
Leveraging attention sparsity to accelerate long-context large language models (LLMs) has been a hot research topic. However, current algorithms such as sparse attention or key-value (KV) cache compression tend to use a fixed budget, which presents a significant challenge during deployment because it fails to account for the dynamic nature of real-...
The rapid advancements in vision-language models (VLMs), such as CLIP, have intensified the need to address distribution shifts between training and testing datasets. Although prior Test-Time Training (TTT) techniques for VLMs have demonstrated robust performance, they predominantly rely on tuning text prompts, a process that demands substantial co...
This paper presents a new hybrid cache replacement algorithm that combines random allocation with a modified V-Way cache implementation. Our RAC adapts to complex cache access patterns and optimizes cache usage by improving the utilization of cache sets, unlike traditional cache policies. The algorithm utilizes a 16-way set-associative cache with 2...
Recent advancements in fields such as automotive and aerospace have driven a growing demand for robust computational resources. Applications that were once designed for basic MCUs are now deployed on highly heterogeneous SoC platforms. While these platforms deliver the necessary computational performance, they also present challenges related to res...
Large language models (LLMs) require significant memory to store Key-Value (KV) embeddings in their KV cache, especially when handling long-range contexts. Quantization of these KV embeddings is a common technique to reduce memory consumption. This work introduces PolarQuant, a novel quantization method employing random preconditioning and polar tr...
Speculative decoding is an effective and lossless method for Large Language Model (LLM) inference acceleration. It employs a smaller model to generate a draft token sequence, which is then verified by the original base model. In multi-GPU systems, inference latency can be further reduced through tensor parallelism (TP), while the optimal TP size of...
NoSQL databases have become essential for microservices architecture due to their scalability, flexibility, and ability to handle unstructured data. This paper explores the best NoSQL databases for microservices, analyzing their features, advantages, and drawbacks. We examine MongoDB, Apache Cassandra, Redis, Couchbase, and DynamoDB, assessing thei...
Web refresh crawling is the problem of keeping a cache of web pages fresh, that is, having the most recent copy available when a page is requested, given a limited bandwidth available to the crawler. Under the assumption that the change and request events, resp., to each web page follow independent Poisson processes, the optimal scheduling policy w...
Serving large language models (LLMs) often demands specialized hardware, dedicated frameworks, and substantial development efforts, which restrict their accessibility, especially for edge devices and organizations with limited technical resources. We propose a novel compiler that translates LLM inference graphs into SQL queries, enabling relational...
The context caching technique is employed to accelerate the Multimodal Large Language Model (MLLM) inference by prevailing serving platforms currently. However, this approach merely reuses the Key-Value (KV) cache of the initial sequence of prompt, resulting in full KV cache recomputation even if the prefix differs slightly. This becomes particular...
In Multi-access Edge Computing (MEC), there exist some dynamic and unknown environment states, such as time-varying wireless channel condition, unreliable computing resource, changing task popularity and so on. In this paper, the autonomic offloading and caching problem for tasks with content data in unknown environment is investigated, and then an...