Nicholas D. Lane’s research while affiliated with University of Cambridge and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (285)


Figure 1: SparsyFed pipeline. (1) Server broadcasts the global model ω t . (2) Client i reparameterizes local weights. (3) Executes a forward pass on batch B. (4a) Computes layer-wise sparsity s t . (4b) Prunes activations using s t and stores them. (5) Computes grads. (6) Applies grads. (7) Computes model updates and applies Top-K pruning. (8) Sends sparse updates ∆˜ω∆˜ω t i back to the server. (9) Apply server optimizer to obtain the global model. Steps (2-6) repeat until convergence.
Figure 4: We report the test accuracy of different re-parameterization methods with sparse activations during backpropagation. We deployed a ResNet-18 trained on the CIFAR-10 dataset using LDA(α = 1). This plot illustrates the methods' performance under different sparsity levels. Powerpropagation exhibited superior robustness to the applied sparsity levels, achieving the best overall performance among these methods.
Figure 5: Test Accuracy with different β values, with 95% sparsity on CIFAR-10 LDA α = 1.0. The accuracy of the dense model (gray), the hyperparameter-free Spectral Exponent version, and the Top-K method are also reported for reference.
SparsyFed: Sparse Adaptive Federated Training
  • Preprint
  • File available

April 2025

·

11 Reads

Adriano Guastella

·

Lorenzo Sani

·

Alex Iacob

·

[...]

·

Nicholas D. Lane

Sparse training is often adopted in cross-device federated learning (FL) environments where constrained devices collaboratively train a machine learning model on private data by exchanging pseudo-gradients across heterogeneous networks. Although sparse training methods can reduce communication overhead and computational burden in FL, they are often not used in practice for the following key reasons: (1) data heterogeneity makes it harder for clients to reach consensus on sparse models compared to dense ones, requiring longer training; (2) methods for obtaining sparse masks lack adaptivity to accommodate very heterogeneous data distributions, crucial in cross-device FL; and (3) additional hyperparameters are required, which are notably challenging to tune in FL. This paper presents SparsyFed, a practical federated sparse training method that critically addresses the problems above. Previous works have only solved one or two of these challenges at the expense of introducing new trade-offs, such as clients' consensus on masks versus sparsity pattern adaptivity. We show that SparsyFed simultaneously (1) can produce 95% sparse models, with negligible degradation in accuracy, while only needing a single hyperparameter, (2) achieves a per-round weight regrowth 200 times smaller than previous methods, and (3) allows the sparse masks to adapt to highly heterogeneous data distributions and outperform all baselines under such conditions.

Download




Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

February 2025

·

8 Reads

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and enhancing programmer productivity. The potential of LLMs in software programming has sparked significant interest in exploring automated hardware generation and automation. Although preliminary endeavors have been made to adopt LLMs in generating hardware description languages (HDLs), several challenges persist in this direction. First, the volume of available HDL training data is substantially smaller compared to that for software programming languages. Second, the pre-trained LLMs, mainly tailored for software code, tend to produce HDL designs that are more error-prone. Third, the generation of HDL requires a significantly higher number of tokens compared to software programming, leading to inefficiencies in cost and energy consumption. To tackle these challenges, this paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design. Although code generation for domain-specific programming languages is not new in the literature, we aim to provide experimental results, insights, benchmarks, and evaluation infrastructure to investigate the suitability of HLS over low-level HDLs for LLM-assisted hardware design generation. To achieve this, we first finetune pre-trained models for HLS-based hardware generation, using a collected dataset with text prompts and corresponding reference HLS designs. An LLM-assisted framework is then proposed to automate end-to-end hardware code generation, which also investigates the impact of chain-of-thought and feedback loops promoting techniques on HLS-design generation. Limited by the timeframe of this research, we plan to evaluate more advanced reasoning models in the future.


Bridge the Gaps between Machine Unlearning and AI Regulation

February 2025

·

31 Reads

The "right to be forgotten" and the data privacy laws that encode it have motivated machine unlearning since its earliest days. Now, an inbound wave of artificial intelligence regulations - like the European Union's Artificial Intelligence Act (AIA) - potentially offer important new use cases for machine unlearning. However, this position paper argues, this opportunity will only be realized if researchers, aided by policymakers, proactively bridge the (sometimes sizable) gaps between machine unlearning's state of the art and its potential applications to AI regulation. To demonstrate this point, we use the AIA as an example. Specifically, we deliver a "state of the union" as regards machine unlearning's current potential for aiding compliance with the AIA. This starts with a precise cataloging of the potential applications of machine unlearning to AIA compliance. For each, we flag any legal ambiguities clouding the potential application and, moreover, flag the technical gaps that exist between the potential application and the state of the art of machine unlearning. Finally, we end with a call to action: for both machine learning researchers and policymakers, to, respectively, solve the open technical and legal questions that will unlock machine unlearning's potential to assist compliance with the AIA - and other AI regulation like it.


LUNAR: LLM Unlearning via Neural Activation Redirection

February 2025

·

3 Reads

Large Language Models (LLMs) benefit from training on ever larger amounts of textual data, but as a result, they increasingly incur the risk of leaking private information. The ability to selectively remove knowledge from LLMs is, therefore, a highly desirable capability. In this paper, we propose LUNAR, a novel unlearning methodology grounded in the Linear Representation Hypothesis. LUNAR operates by redirecting the representations of unlearned data to regions that trigger the model's inherent ability to express its inability to answer. LUNAR achieves state-of-the-art unlearning performance while significantly enhancing the controllability of the unlearned model during inference. Specifically, LUNAR achieves between 2.9x to 11.7x improvements on combined "unlearning efficacy" and "model utility" score ("Deviation Score") on the PISTOL dataset across various base models. We also demonstrate, through quantitative analysis and qualitative examples, LUNAR's superior controllability in generating coherent and contextually aware responses, mitigating undesired side effects of existing methods. Moreover, we demonstrate that LUNAR is robust against white-box adversarial attacks and versatile in handling real-world scenarios, such as processing sequential unlearning requests.



Fig. 4. Segmentation performance of the MobileSAM model before finetuning. Images show water body samples from the disaster site test location at Ylitornio.
Fig. 5. Cumulative frequency of IoU values for a satellite from each scenario.
Fig. 6. Satellite conditions during model training and evaluation. Model exchanges are shown for just a one hour period for demonstration purposes.
Fig. 7. Example for one satellite from each scenario showing the state of charge over time as the satellite performs various activities.
Fig. 8. Segmentation performance of the MobileSAM model after fine-tuning under scenario 2. Images show water body samples from the disaster site test location at Ylitornio.
Rapid Distributed Fine-tuning of a Segmentation Model Onboard Satellites

November 2024

·

55 Reads

Segmentation of Earth observation (EO) satellite data is critical for natural hazard analysis and disaster response. However, processing EO data at ground stations introduces delays due to data transmission bottlenecks and communication windows. Using segmentation models capable of near-real-time data analysis onboard satellites can therefore improve response times. This study presents a proof-of-concept using MobileSAM, a lightweight, pre-trained segmentation model, onboard Unibap iX10-100 satellite hardware. We demonstrate the segmentation of water bodies from Sentinel-2 satellite imagery and integrate MobileSAM with PASEOS, an open-source Python module that simulates satellite operations. This integration allows us to evaluate MobileSAM's performance under simulated conditions of a satellite constellation. Our research investigates the potential of fine-tuning MobileSAM in a decentralised way onboard multiple satellites in rapid response to a disaster. Our findings show that MobileSAM can be rapidly fine-tuned and benefits from decentralised learning, considering the constraints imposed by the simulated orbital environment. We observe improvements in segmentation performance with minimal training data and fast fine-tuning when satellites frequently communicate model updates. This study contributes to the field of onboard AI by emphasising the benefits of decentralised learning and fine-tuning pre-trained models for rapid response scenarios. Our work builds on recent related research at a critical time; as extreme weather events increase in frequency and magnitude, rapid response with onboard data analysis is essential.


Photon: Federated LLM Pre-Training

November 2024

·

25 Reads

Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like federated learning (FL) could enable collaborative training of larger models across weakly-connected GPUs if they can effectively be used for pre-training. To achieve this, we introduce Photon, the first complete system for federated end-to-end LLM training, leveraging cross-silo FL for global-scale training with minimal communication overheads. Using Photon, we train the first federated family of decoder-only LLMs from scratch. We show that: (1) Photon can train model sizes up to 7B in a federated fashion while reaching an even better perplexity than centralized pre-training; (2) Photon model training time decreases with available compute, achieving a similar compute-time trade-off to centralized; and (3) Photon outperforms the wall-time of baseline distributed training methods by 35% via communicating 64x-512xless. Our proposal is robust to data heterogeneity and converges twice as fast as previous methods like DiLoCo. This surprising data efficiency stems from a unique approach combining small client batch sizes with extremely high learning rates, enabled by federated averaging's robustness to hyperparameters. Photon thus represents the first economical system for global internet-wide LLM pre-training.


Citations (44)


... We delved into the "End-to-end data-driven weather prediction" described in Allen et al. [7] because of their Aardvark Weather end-to-end machine learning model that replaces all functions of Numerical Weather Prediction (NWP) pipelines. The model achieves performance levels similar to standard operational NWP frameworks at a reduced cost and increased speed, according to research done by [7]. ...

Reference:

Hyper-Localized Weather Forecasting System
End-to-end data-driven weather prediction
  • Citing Article
  • March 2025

Nature

... LLM development is undergoing rapid improvement in the field of code generation [3], [19]. Models have shown promising ability to capture both the syntax and semantics of a language, with popular models such as CodeLlama [5] being able to produce high quality code. ...

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis
  • Citing Conference Paper
  • March 2025

... Given the tremendous advances in edge devices, recent work proposes the integration of Flower and Nvidia FLARE [8]: A comprehensive FL framework designed for production. The authors of [9] motivate the integration from different perspectives. For example, FLARE supports several communication protocols, decentralized topologies, and parallel job execution. ...

Supercharging Federated Learning with Flower and NVIDIA FLARE
  • Citing Chapter
  • March 2025

... While large language models have made remarkable progress in terms of performance and are likely to perform well in data center environments, their adoption may be limited by concerns over cost-effectiveness. In such scenarios, smaller language models-with their reduced computational requirements and reliance on less training datacould represent a practical and efficient option for addressing specific tasks 34 . Additionally, our study did not demonstrate whether multimodal data is superior to single-modal data in predicting patient outcomes 35 . ...

Small Language Models: Survey, Measurements, and Insights

... The rapid growth of end-user AI applications, such as realtime image recognition and generative AI, has led to high data and processing demands that often exceed device capabilities. Edge AI addresses these challenges by offloading computation to the network's edge, where hardware-accelerated AI processing can occur [1]. This approach is integral to AI and RAN, a key component of future 6G networks as outlined by the AI-RAN Alliance 1 . ...

The Future of Consumer Edge-AI Computing
  • Citing Article
  • July 2024

IEEE Pervasive Computing

... The recent Lookup Table (LUT)-based model architecture [1], [4], [32], [50], [55] introduces a novel computing paradigm for neural networks. It employs Vector Quantization to leverage the semantic similarity of feature maps (activations). ...

PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
  • Citing Article
  • April 2024

ACM Transactions on Reconfigurable Technology and Systems

... The former typically predicts blur kernels which are further used to guide the SR process. Some advanced techniques have been adopted for kernel estimation, including FKP [28] based on a normalizing flowbased kernel prior and MetaKernelGAN [26] through metalearning enhanced kernel estimation adaptation. The DEbased methods [33,42,50,54] learn a degradation representation from the degraded image, with notable examples such as DASR [50], CDFormer [34] and DiffTSR [66]. ...

Meta-Learned Kernel For Blind Super-Resolution Kernel Estimation
  • Citing Conference Paper
  • January 2024

... This approach allows for a comprehensive investigation of our approach by scaling it across distributed clients with Non Independently Identically Distributed (non-IID) audio and visual data. It is pertinent to mention here that our work focuses on a more general cross-device FL (where each training iteration trains a random fraction of clients from the pool of clients) [23,27], rather than personalized FL (that maintains a global model for each client [2]) and cross-silo FL (where all the clients train simultaneously [40]). ...

L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning
  • Citing Conference Paper
  • October 2023

... Real-time monitoring by the hardware monitor ensures that the scheduler can formulate optimal scheduling plans, optimizing the processing of subgraphs in the task queues for efficient execution. Once all subgraph tasks are completed, the entire model inference task is finished, and the system returns the results to the application.Through the collaborative efforts of these components, the ADMS can efficiently handle dynamic mobile workload changes while maximizing resource utilization [27], [28]. ...

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads
  • Citing Conference Paper
  • December 2023

... Face detection methods started using neural networks for automatic feature extraction and classification with the introduction of deep learning. For instance, CascadeCNN Venieris, et al. [21] presented a cascade structure made up of three meticulously crafted deep convolutional networks that made coarse-to-fine predictions about landmark and facial positions. Similarly, MTCNN Khan, et al. [22] employed a cascading architecture that enabled joint alignment of facial landmarks and detection of facial positions. ...

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
  • Citing Article
  • August 2023

ACM Transactions on Design Automation of Electronic Systems