Nian Feng Tzeng

Nian Feng Tzeng
  • University of Louisiana at Lafayette

About

195
Publications
11,521
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,988
Citations
Current institution

Publications

Publications (195)
Preprint
Full-text available
Scheduling deep learning (DL) models to train on powerful clusters with accelerators like GPUs and TPUs, presently falls short, either lacking fine-grained heterogeneity awareness or leaving resources substantially under-utilized. To fill this gap, we propose a novel design of a task-level heterogeneity-aware scheduler, {\em Hadar}, based on an opt...
Preprint
Full-text available
Federated Learning (FL) is a promising distributed machine learning framework that allows collaborative learning of a global model across decentralized devices without uploading their local data. However, in real-world FL scenarios, the conventional synchronous FL mechanism suffers from inefficient training caused by slow-speed devices, commonly kn...
Preprint
Graph neural networks (GNNs) have exhibited superior performance in various classification tasks on graph-structured data. However, they encounter the potential vulnerability from the link stealing attacks, which can infer the presence of a link between two nodes via measuring the similarity of its incident nodes' prediction vectors produced by a G...
Preprint
Federated Learning (FL) has gained prominence as a decentralized machine learning paradigm, allowing clients to collaboratively train a global model while preserving data privacy. Despite its potential, FL faces significant challenges in heterogeneous environments, where varying client resources and capabilities can undermine overall system perform...
Article
Full-text available
Accurate and timely regional weather prediction is vital for sectors dependent on weather-related decisions. Traditional prediction methods, based on atmospheric equations, often struggle with coarse temporal resolutions and inaccuracies. This paper presents a novel machine learning (ML) model, called MiMa (short for Micro-Macro), that integrates b...
Preprint
Full-text available
Large Language Models (LLMs) have received considerable interest in wide applications lately. During pre-training via massive datasets, such a model implicitly memorizes the factual knowledge of trained datasets in its hidden parameters. However, knowledge held implicitly in parameters often makes its use by downstream applications ineffective due...
Preprint
Full-text available
Adversarial training (AT) can help improve the robustness of Vision Transformers (ViT) against adversarial attacks by intentionally injecting adversarial examples into the training data. However, this way of adversarial injection inevitably incurs standard accuracy degradation to some extent, thereby calling for a trade-off between standard accurac...
Conference Paper
Full-text available
Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of indepen...
Preprint
Full-text available
Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of indepen...
Poster
Full-text available
FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering
Preprint
Full-text available
Precise crop yield predictions are of national importance for ensuring food security and sustainable agricultural practices. While AI-for-science approaches have exhibited promising achievements in solving many scientific problems such as drug discovery, precipitation nowcasting, etc., the development of deep learning models for predicting crop yie...
Conference Paper
Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. A key challenge in FL is the uneven data distribution across client devices, violating the well-known assumption of independent and-identically-distributed (IID) training samp...
Conference Paper
Full-text available
Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. A key challenge in FL is the uneven data distribution across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samp...
Article
Full-text available
In this paper, we propose a novel medium access control (MAC) protocol, called SYN-MAC (for SYNchronized MAC), based on a binary countdown approach tailored for wireless networks. SYN-MAC has several attractive features such as simplicity, robustness, high efficiency, fairness, and quality of service capability. We evaluate SYN-MAC in terms of coll...
Chapter
Full-text available
Storm prediction provides the early alert for preparation, avoiding potential damage to property and human safety. However, a traditional storm prediction model usually incurs excessive computational overhead due to employing atmosphere physical equations and complicated data assimilation. In this work, we strive to develop a lightweight and portab...
Article
Full-text available
Quantum computing is a quickly growing field with great potential for future technology. Quantum computers in the current noisy intermediate-scale quantum (NISQ) era face two major limitations - qubit count and error vulnerability. Although quantum error correction (QEC) methods exist, they are not applicable to the current size of computers, requi...
Article
Full-text available
While quantum computers provide exciting new opportunities for information processing, today's quantum computers suffer from noise during computation that is not fully understood. Incomplete noise models have caused quantum program success rate (SR) estimators to generate performance predictions that substantially differ from actual machine outcome...
Article
The proliferation of IoT devices, with various capabilities in sensing, monitoring, and controlling, has prompted diverse emerging applications, highly relying on effective delivery of sensitive information gathered at edge devices to remote controllers for timely responses. To effectively deliver such information/status updates, this paper underta...
Chapter
Flexible fine-grained weather forecasting is a problem of national importance due to its stark impacts on economic development and human livelihoods. It remains challenging for such forecasting, given the limitation of currently employed statistical models, that usually involve the complex simulation governed by atmosphere physical equations. To ad...
Article
Full-text available
The fall detection system is of critical importance in protecting elders through promptly discovering fall accidents to provide immediate medical assistance, potentially saving elders' lives. This paper aims to develop a novel and lightweight fall detection system by relying solely on a home audio device via inaudible acoustic sensing, to recognize...
Conference Paper
Full-text available
This paper proposes a novel oversampling approach that strives to balance the class priors with a considerably imbalanced data distribution of high dimensionality. The crux of our approach lies in learning interpretable latent representations that can model the synthetic mechanism of the minority samples by using a generative adversarial network(GA...
Article
While considerable work has addressed the optimal AoI under different circumstances in single-hop networks, the exploration of AoI in multi-hop wireless networks is rarely attempted. More importantly, the inherent relationships between AoI and throughput are yet to be explored, especially in multi-hop networks. This paper studies AoI in multi-hop w...
Article
With the growing effort to reduce power consumption in machines, fault tolerance becomes more of a concern. This holds particularly for large-scale computing, where execution failures due to soft faults waste excessive time and resources. These large-scale applications are normally parallel in nature and rely on control structures tailored specific...
Chapter
In this paper, we conduct a systematic study for the very first time on the poisoning attack to neural collaborative filtering-based recommender systems, exploring both availability and target attacks with their respective goals of distorting recommended results and promoting specific targets. The key challenge arises on how to perform effective po...
Article
This article considers best checkpointing control realizable in real-world systems, whose mean time between failures (MTBFs) often fluctuate. The considered control scheme is based on equating aggregate checkpointing overhead over an activity sequence of interest (θ) and the expected rework amount after a failure recovery for best checkpointing, ca...
Conference Paper
Full-text available
Learning interpretable representations in an unsupervised setting is an important yet a challenging task. Existing unsupervised interpretable methods focus on extracting independent salient features from data. However they miss out the fact that the entanglement of salient features may also be informative. Acknowledging these entanglements can impr...
Article
Full-text available
The growing popularity of in-memory computing for bigdata analytics often causes performance bottlenecks to memory subsystem resided in operating systems (OS). This article purposes Cooperative Memory Expansion (COMEX), an OS kernel extension. COMEX establishes a stable pool of memory collectively across nodes in a cluster and enhances OS's memory...
Article
A bufferless network-on-chip (NoC) can deliver high energy efficiency, but such a NoC is subject to growing deflection when its traffic load rises. This article proposes Deflection Containment (DeC) for the bufferless NoC to address its notorious shortcomings of excessive deflection for performance improvement and energy savings. With multiple subn...
Conference Paper
Full-text available
The spammers have been grossly detrimental since the inception of Twitter social networks and keep polluting social environments by hiding themselves among a large amount of normal users. In this paper, we aim to address two challenges existing in the spammer detection problem: 1) monitoring tweets that have a higher probability of including spam m...
Article
In multicore systems, a large portion of checkpoint time overhead can be hidden from the execution critical path by resorting to a dedicated checkpointing thread run concurrently with regular execution threads for compressing checkpoint files to lower checkpointing overhead. On the other hand, the restore time is on the critical path that cannot be...
Article
Full-text available
Chip multiprocessors (CMPs) involve directory storage overhead if cache coherence is realized via sharer tracking. This work proposes a novel framework dubbed non-uniform directory architecture (NUDA), by leveraging our two insights in that the number of active directory entries required to stay on chip is usually small for a short execution time w...
Preprint
Full-text available
Video streams usually have to be transcoded to match the characteristics of viewers' devices. Streaming providers have to store numerous transcoded versions of a given video to serve various display devices. Given the fact that viewers' access pattern to video streams follows a long tail distribution, for the video streams with low access rate, we...
Conference Paper
Full-text available
Modern chip multiprocessors (CMPs) employ on-chip networks to enable communication between the individual cores. Operations such as coherence and synchronization generate a significant amount of the on-chip network traffic, and often create network requests that have one-to-many (i.e., a core multicasting a message to several cores) or many-to-one...
Article
To lower on-chip SRAM area overhead for chip multiprocessors (CMPs), this work treats a novel directory design which compresses present-bit vectors (PVs) by dropping “runs of zeros” commonly existing and lets PVs be transformed to their variations after sharer relinquishment for hashing alternative table sets to lift table utilization. Featured wit...
Article
Full-text available
Cloud computing users are most concerned about the application turnaround time and the monetary cost involved. For lower monetary costs, less expensive services, like spot instances offered by Amazon, are often made available, albeit to their relatively frequent resource unavailability that leads to on-going execution being evicted, thereby undercu...
Article
Sand monitoring gives the benefits of avoiding equipment erosion and production failure in the oil industry. This paper presents the design and implementation of a reliable and cost-effective sand monitoring system for measuring sand production in gas and oil flows in real time. The designed monitoring system involves two acoustic emission (AE) sen...
Article
Full-text available
This paper pursues RFID support for localization, aiming to pinpoint an object in 3D space. Given a set of RFID tags and/or readers deployed as reference points at known locations in a hexahedron (like shipping container or storage room), a passive and an active localization schemes are considered in this paper. Being the very first range-free 3D l...
Conference Paper
Full-text available
Multithreaded applications are common in high performance cloud computing systems, able to take advantage of elastic resource availability and cost fluctuation inherent to the systems. When applications involve many threads over more cores leased from the RaaS (Resource-as-a-Service) cloud under spot instance pricing for faster execution, resource...
Conference Paper
Checkpointing has been widely adopted in support of fault-tolerance and job migration, with checkpoint files preferably kept also at remote storage to withstand unavailability/failures of local nodes in networked systems. Lately, I/O bandwidth to remote storage becomes the bottleneck for checkpointing on a large-scale system. This paper proposes an...
Conference Paper
In the operation of air pitted gaseous sensor the microhotplate (μHP) consumes almost all the power used by the sensor. The required area to micromachine the air pit for the μHP of a single sensor is several times more than the actual area required by the sensor itself. The feasibility of implementing low power and ultra dense gaseous sensor array...
Article
Full-text available
This article proposes a runtime model that relates server energy consumption to its overall thermal envelope, using hardware performance counters and experimental measurements. While previous studies have attempted system-wide modeling of server power consumption through subsystem models, our approach is different in that it links system energy inp...
Conference Paper
Existing virtual clusters and computer clouds usually depend on small groups of (or even single) data repositories for their virtual machine and software deployments. This paper proposes a Peer-to-Peer (P2P)-based approach for publishing, querying, and deploying both virtual machine (VM) images and application-specific packages, dubbed the P2P Virt...
Article
Full-text available
We present a distinct longest prefix matching (LPM) lookup scheme able to achieve exceedingly concise lookup tables (CoLT), suitable for scalable routers. Based on unified hash tables for handling both IPv4 and IPv6 simultaneously, CoLT excels over previous mechanisms in: 1) lower on-chip storage for lookup tables; 2) simpler table formats to enjoy...
Data
Full-text available
Modern processors crudely manage thermal emer-gencies through Dynamic Thermal Management (DTM), where the processor monitors the die temperature and dynamically adjusts the processor voltage and frequency (DVFS) to throttle down the processor when necessary. However, DVFS tends to yield marked degradation in both application performance and system...
Conference Paper
Full-text available
This article pursues speedy packet classification with low on-chip memory requirements realized on Xilinx Virtext-6 FPGA. Based on hashing round-down prefixes specified in filter rules (dubbed HaRP), our implemented classifier is demonstrated to exhibit an extremely low on-chip memory requirement (lowering the byte count per rule by a factor of 8.6...
Article
Full-text available
Packet classification is central to a wide array of Internet applications and services, with its approaches mostly involving either hardware support or optimization steps needed by software-oriented techniques (to add precomputed markers and insert rules in the search data structures). Unfortunately, an approach with hardware support is expensive a...
Article
Full-text available
This paper deals with decentralized, QoS-aware middleware for checkpointing arrangement in Mobile Grid (MoG) computing systems. Checkpointing is more crucial in MoG systems than in their conventional wired counterparts due to host mobility, dynamicity, less reliable wireless links, frequent disconnections, and variations in mobile systems. We've de...
Article
A novel storage design for IP routing table con- struction is introduced on the basis of a single set-associative hash table to support fast longest prefix matching (LPM). The proposed design involves two key techniques to lower table storage required drastically: 1) storing transformed prefix rep- resentations; and 2) accommodating multiple prefix...
Article
Full-text available
This paper proposes a chaotic time series model of server system-wide energy consumption to capture the dynamics present in ob-served sensor readings of underlying physical systems. Based on the chaotic model, we have developed a real-time predictor that es-timates actual server energy consumption according to its overall thermal envelope. This cha...
Article
Arrays of microsensors may be employed for accurate detection of multiple gases possibly existing simultaneously in an environment. They can be made reconfigurable for improving efficiency and reliability. Constituent microsensors in such a reconfigurable array are highly desirable to operate in an ultra low power regime, have a short response time...
Conference Paper
A distributed hash table (DHT) with replicated objects enjoys improved performance and fault-tolerance but calls for effective replica management. This paper deals with proximity-aware distributed mutual exclusion (PADME) for P2P replica management on a DHT. Three main components are involved in PADME: (1) a few nodes designated as the sink candida...
Article
Packet classification is complex due to multiple fields present in each filter rule, easily manifesting itself as a router performance bottleneck. Most known classification approaches involve either hardware support or optimization steps (to add precomputed markers and insert rules in the search data structures). Unfortunately, an approach with har...
Article
Full-text available
A Prioritized Medium Access Control (PMAC) protocol is proposed for wireless sensor networks, based on a binary count down approach. PMAC effectively eliminates data collision and optimizes channel allocation. Moreover, as many networks nowadays compose traffic with several priorities, the simple yet effective design of PMAC offers strict service d...
Conference Paper
Full-text available
This paper describes the deployment and evaluation of a 700MHz WiFi-based Wireless Mesh Network (WMN) testbed. To our knowledge, this is the world's first WiFi-based testbed using the recently-released 700MHz frequency band and deployed under both indoor and outdoor environments, including open space and a Louisiana swamp with dense cypress trees....
Conference Paper
Full-text available
This work deals with an architectural framework to enable application-layer packet processing for lowered processing latency and enhanced throughput. Creating an "Ethereal memory" shared by application programs and network interface drivers, the proposed framework realizes application-layer packet processing through Ethereal memory (APPEAL). Unlike...
Conference Paper
Distributed grid resource discovery (ReD) systems lack the ability to adapt efficiently to an increase in the number of attributes. The main contribution of this paper is a fast and scalable ReD mechanism, dubbed FaSReD, which composes a resource key via bit string encoding. We establish close-to-optimal FaSReD and a lower bound on the mean number...
Conference Paper
Network coding has shown great potential to improve the overall throughput of wireless networks with broadcast nature. Existing network coding schemes (e.g., COPE) require information exchange among neighboring nodes, in order to correctly encode and decode the data packets. Such an approach results in significant overhead. Moreover, the informatio...
Conference Paper
Multicluster grids have emerged as major execution environments to solve large-scale compute-intensive applications, with each participating cluster having its own scheduler under different policies. In order to take full advantages of multicluster grid capability, computer scientists need to deal with how to collaborate practically and efficiently...
Article
Full-text available
While extensive studies have been carried out in the past several years for many sensor applications, the main approach for sensor networking cannot be applied to the sceonarios with extremely low and intermittent connectivity, dubbed the Delay/Fault-Tolerant Mobile Sensor Network (DFT-MSN). Without end-to-end connections due to sparse network dens...
Conference Paper
Full-text available
Wireless sensor data acquisition (WSDA) is preferred over its wired counterpart due to its easier deployment and simpler maintenance. These characteristics make WSDA attractive to a wide array of engineering applications. This article deals with a framework for WSDA comprised of both legacy sensors and modern sensors commonly found in the oil and g...
Conference Paper
Full-text available
Existing resource discovery and monitoring systems, such as Globus GRAM/MDS, are based on a central point of control. This paper proposes a decentralized multilayered resource discovery system implemented over a peer-to-peer structure to address shortcomings associated with current resource discovery systems. Our approach segments Grid nodes into v...
Conference Paper
Full-text available
The discovery and management of energy resources, especially at locations in the Gulf of Mexico, requires an economic but technically enhanced infrastructure. Research teams from Louisiana State University, University of Louisiana at Lafayette, and Southern University Baton Rouge are engaged in a collaborative effort to create a ubiquitous computin...
Conference Paper
This article deals with distributed address determination for mobile Grids, realized by ADENS (address determination via neighboring states), where a new mobile host (MH) determines a conflict-free address for itself efficiently according to state information only from neighboring MHs. With low traffic overhead, ADENS achieves higher address space...
Conference Paper
This article deals with communication performance of a multiprocessor system implemented using award-wining BCM 1480 multi-core chips. Our system uses high-performance HyperTransport links to interconnect constituent chips, realizing cache-coherent non-uniform memory access. It takes advantage of hardware support from the BCM 1480 chip to attain ve...
Conference Paper
Full-text available
This paper proposes to develop a system-wide energy consumption model for servers by making use of hardware performance coun- ters and experimental measurements. We develop a real-time en- ergy prediction model that relates server energy consumption to its overall thermal envelope. While previous studies have attempted system-wide modeling of serve...
Article
Full-text available
With various wireless technologies developed, a ubiquitous and integrated architecture is envisioned for future wireless communication. An important optimization issue in such an integrated system is how to minimize the overall communication cost by intelligently utilizing the available heterogeneous wireless technologies while, at the same time, m...
Conference Paper
This article deals with a novel architecture for IP routing table construction, on the basis of a single set-associative hash table to support fast longest prefix matching (LPM). The proposed architecture uses two key techniques to lower table storage required drastically: (1) storing transformed prefix representations and (2) accommodating multipl...
Conference Paper
As the demand for bandwidth grows, Internet routers must run faster. Ternary Content Addressable Memory (TCAM) has been known as a promising device in composing simple and efficient solutions for fast forwarding table lookups. However, most existing TCAM-based IP lookup solutions suffer from lengthy update durations imposed by TCAM entry shifts for...
Conference Paper
Excessive power consumption is deemed one of the major drawbacks of TCAM-based IP search engines. This paper proposes a simple and yet efficient forwarding table partitioning algorithm aiming to achieve significant TCAM power savings. Our algorithm partitions the IP address space into a set of adjoining but non-overlapping search ranges comprising...
Article
Full-text available
Ideal speedup in pipelined processors is seldom achieved due to stalls and breaks in the execution stream. These interrupts are caused by data and control hazards, the latter, however, can be the most detrimental to pipeline performance. Branch Target Buffer (BTB) can reduce performance penalty of branches in pipelined processors by predicting the...
Conference Paper
Full-text available
While extensive studies have been carried out in the past several years for many sensor applications, they cannot be applied to the network with extremely low and intermittent connectivity, dubbed the delay/fault-tolerant mobile sensor network (DFT-MSN). Without end-to-end connections due to sparse network density and sensor node mobility, routing...
Conference Paper
This paper deals with a novel, distributed, QoS-aware, peer-to- peer checkpointing arrangement component for mobile Grid (MoG) computing systems middleware. Checkpointing is more crucial in MoG systems than in their wired counterparts due to node mobility and less reliable wireless links resulting in frequent and dynamic connections and disconnecti...
Conference Paper
This research focuses on RFID-based 3-D positioning schemes, aiming to locate an object in a 3-dimensional space, with reference to a predetermined arbitrary coordinates system, by using RFID tags and readers. More specifically, we consider a hexahedron which may be a shipping container, a storage room, or other hexahedral shape spaces. A number of...
Article
Energy efficiency is a critical design issue in wireless sensor networks, where each sensor node relies oil its limited battery power for data acquisition. processing, transmission, and reception. In this paper, We Study all integrated approach oil power control and load balancing, aiming at even distribution of the residual energy of the sensors a...
Conference Paper
Full-text available
Reliable remote measuring of flow meters for the petroleum gas industry is proposed in this work. The monitoring of flow rates and the total amount of the fluid flow is collected using a manual process. The main goal of this work is to implement a mechanism that avoids human error and achieves reliable, continuous, and accurate monitoring. We emplo...
Conference Paper
Energy efficiency is a critical design issue in wireless sensor networks, where each sensor node relies on its limited battery power for data acquisition, processing, transmission, and reception. In this paper, we propose several energy-efficient communication schemes based on power control and load balancing, aiming at even distribution of the res...
Article
Full-text available
In this paper, we propose a novel medium access control protocol with a separate control channel (MAC-SCC) to increase the channel efficiency and address the unfairness and instability problems of IEEE 802.11 MAC protocol. In MAC-SCC, the available bandwidth is partitioned into two channels: a data channel and a control channel, each associated wit...
Conference Paper
Hardware approaches for speedy IP lookups can be realized by making use of TCAMs (Ternary Content Addressable Memories), whose lookups utilize IP addresses as search keys with each search requiring only a single memory access. However, most existing TCAM-based forwarding engines involve shifting TCAM entries when the forwarding table is updated, ty...
Article
Full-text available
Most of the high-performance routers available commercially these days equip each of their line cards (LCs) with a forwarding engine (FE) to perform table lookups locally. This work introduces and evaluates a technique for speedy packet lookups, called SPAL, in such routers. The BGP routing table under SPAL is fragmented into subsets which constitu...
Conference Paper
In support of continuously increasing line rates and various Internet services, multiprocessor-based linecards have appeared in next-generation routers, significantly improving performance. However, this improvement has come at the expense of increased energy consumption, which tends to not only raise the operational and cooling costs, but also low...

Network

Cited By