Available via license: CC BY 4.0
Content may be subject to copyright.
Review Not peer-reviewed version
Hardware Design and Verification with
Large Language Models: A Literature
Survey, Challenges, and Open Issues
Meisam Abdollahi * , S. Faegheh Yeganli , Mohammad (Amir) Baharloo , Amirali Baniasadi
Posted Date: 4 November 2024
doi: 10.20944/preprints202411.0156.v1
Keywords: Large Language Model; Hardware Design; Hardware Verification; Hardware Accelerator;
Debugging; Hardware Security; Hardware/Software Codesign
Preprints.org is a free multidisciplinary platform providing preprint service
that is dedicated to making early versions of research outputs permanently
available and citable. Preprints posted at Preprints.org appear in Web of
Science, Crossref, Google Scholar, Scilit, Europe PMC.
Copyright: This open access article is published under a Creative Commons CC BY 4.0
license, which permit the free download, distribution, and reuse, provided that the author
and preprint are cited in any reuse.
Article
Hardware Design and Verification with Large
Language Models: A Literature Survey, Challenges,
and Open Issues
Meisam Abdollahi 1,*, S. Faegheh Yeganli 2, Mohammad (Amir) Baharloo 3, Amirali Baniasadi 1
1Electrical and Computer Engineering Department, University of Victoria; amiralib@uvic.ca
2School of Engineering Science, Simon Fraser University; faegheh_yeganli@sfu.ca
3Electrical and Computer Science Department, University of Victoria; amirbaharloo@uvic.ca
*Correspondence: meisam@uvic.ca; Tel.: +1-(250)-661-2672(BC, Canada)
Abstract: Large Language Models (LLMs) are emerging as promising tools in hardware design and
verification, with recent advancements suggesting they could fundamentally reshape conventional
practices. In this survey, we analyze over 54 research papers to assess the current role of LLMs
in enhancing automation, optimization, and innovation within hardware design and verification
workflows. Our review highlights LLM applications across synthesis, simulation, and formal
verification, emphasizing their potential to streamline development processes while upholding high
standards of accuracy and performance. We identify critical challenges, such as scalability, model
interpretability, and the alignment of LLMs with domain-specific languages and methodologies.
Furthermore, we discuss open issues, including the necessity for tailored model fine-tuning,
integration with existing Electronic Design Automation (EDA)) tools, and effective handling of
complex data structures typical of hardware projects. This survey not only consolidates existing
knowledge but also outlines prospective research directions, underscoring the transformative role
LLMs could play in the future of hardware design and verification.
Keywords: Large Language Model; Hardware Design; Hardware Verification; Hardware Accelerator;
Debugging; Hardware Security; Hardware/Software Codesign
1. Introduction
1.1. Introduction to LLMs
Large Language Models (LLMs) such as OpenAI’s GPT-4
1
, Google’s Gemini
2
, Google’s
Bidirectional Encoder Representations from Transformers (BERT)
3
and Denoising Autoencoder from
Transformer (BART)
4
which is a transformer-based model introduced by Facebook, are at the forefront
of Artificial Intelligence (AI) research, revolutionizing how machines understand and generate human
language. These models process extensive datasets covering a wide spectrum of human discourse,
enabling them to perform complex tasks including translation, summarization, conversation, and
creative content generation [
1
–
5
]. Recent advancements in this field, driven by innovative model
architectures, refined training methodologies, and expanded data processing capabilities, have
significantly enhanced the ability of these models to deliver nuanced and contextually relevant outputs.
This evolution reflects a growing sophistication in AI’s approach to Natural Language Processing (NLP),
positioning LLMs as crucial tools in both academic research and practical applications, transforming
interactions between humans and machines [5–7].
1https://chatgpt.com
2https://gemini.google.com/app
3https://github.com/google-research/bert
4https://huggingface.co/docs/transformers/model_doc/bart
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
© 2024 by the author(s). Distributed under a Creative Commons CC BY license.
2 of 63
This evolution can be traced back to early statistical language models like n-gram models, which
simply predicted word sequences based on the frequencies of previous sequences observed in a dataset.
Although these models provided a foundational approach for text prediction, their limited ability
to perceive broader contextual cues restricted their application to basic tasks [
8
–
12
]. The advent of
neural network-based models, especially Recurrent Neural Networks (RNN), represented a significant
advancement, offering the ability to retain information over longer text sequences and thus, managing
more complex dialogues and text structures [
13
–
15
]. Although RNNs made advancements, they
continued to struggle with scalability and long-term dependency issues, leading to the creation of
Transformer models. These models introduced an innovative self-attention mechanism, allowing
simultaneous processing of different sentence segments to enhance relevance and contextuality in
text interpretation. This breakthrough underpins modern LLMs, which are pre-trained in extensive
web-text data and subsequently fine-tuned for specific tasks, enabling them to generate nuanced,
stylistically diverse, and seemingly authentic human text [16–21].
Moreover, with the advent of highly sophisticated models, LLMs have become an indispensable
domain for both academic research and practical applications. These models necessitate thorough
evaluations to fully understand their potential risks and impacts, both at task-specific and societal levels.
In recent years, significant efforts have been invested in assessing LLMs from multiple perspectives,
enhancing their applicability and effectiveness. The adaptability and deep comprehension abilities
of LLMs have led to their extensive deployment across numerous AI domains. They are utilized
not only in fundamental NLP tasks but also in complex scenarios involving autonomous agents and
multimodal systems that integrate textual data with other data forms [
22
–
39
]. The utility of LLMs
spans several domains, including healthcare [
40
–
45
], education [
46
–
50
], law [
51
–
58
], finance [
59
–
63
],
and sciences [64–69], where they substantially improve data analysis and decision-making processes.
This wide-ranging application underscores the transformative impact of LLMs on both technological
innovation and societal functions.
In particular, within domains like hardware design and verification, LLMs enhance productivity
and innovation by automating and optimizing various stages of the design process. These models
can assist engineers in generating design specifications, suggesting improvements, and even creating
initial design drafts. By leveraging vast amounts of data and advanced algorithms, LLMs can identify
patterns and propose design optimizations that might not be immediately apparent to human designers.
This capability helps in reducing time-to-market and ensuring that hardware designs are both efficient
and innovative [70].
In hardware design, verification is a critical step in the lifecycle of hardware development.
Verification ensures that the hardware performs as intended and meets all specified requirements
before going into production. Traditionally, this process has been time-consuming and prone to
human error. LLMs can automate much of the verification process by generating test cases, simulating
hardware behavior, and identifying potential faults or discrepancies. They can analyze a large amount
of verification data to predict potential issues and provide solutions, thus enhancing the reliability and
accuracy of the hardware verification process. This not only speeds up the verification process, but
also ensures a higher quality of the final product [71].
In addition to automation, LLMs facilitate better communication and collaboration among
hardware design and verification teams. By providing a common platform where designers, engineers,
and verification experts can interact with the model, LLMs help in bridging the gap between different
teams. This collaborative approach ensures that all aspects of the hardware design and verification are
aligned and that any issues are identified and addressed early in the process. Furthermore, LLMs can
serve as knowledge repositories, offering solutions based on previous designs and verifications, thus
ensuring that best practices are followed and past mistakes are not repeated [72].
Another significant benefit of LLMs in hardware design and verification is their ability to handle
complex and high-dimensional data. Modern hardware designs are increasingly complex with
numerous components and interdependencies. LLMs can manage this complexity by analyzing
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
3 of 63
and processing large datasets to extract meaningful insights. They can model intricate relationships
between different hardware components and predict how changes in one part of the design could
impact the overall system. This holistic understanding is crucial for creating robust and reliable
hardware systems [73].
In conclusion, integration of LLMs in hardware design and verification not only fosters innovation,
but also ensures the development of cutting-edge hardware technologies [
74
,
75
]. This survey aims to
explore the transformative role of LLMs in this domain, highlighting key contributions, addressing
challenges, and discussing open issues that continue to shape this dynamic landscape. The goal is to
provide a comprehensive overview that not only informs, but also inspires continued research and
application of LLMs to improve hardware design and verification processes. We hope that this study
will contribute to a better understanding and use of LLMs. In summary, the contributions of this paper
can be summarized as follows:
•
Identification of Core Applications: We detail the fundamental ways in which LLMs are
currently applied in hardware design, debugging, and verification, providing a solid foundation
to understand their impact.
•
Analysis of Challenges: This paper presents a critical analysis of the inherent challenges in
applying LLMs to hardware design, such as data scarcity, the need for specialized training, and
integration with existing tools.
•
Future Directions and Open Issues: We outline potential future applications of LLMs in
hardware design and verification and discuss methodological improvements to bridge the
identified gaps.
The remainder of the paper is organized as follows. Section 2reviews the literature on the
application of LLMs in hardware design and verification. Section 3discusses the challenges associated
with training and adapting LLMs for specific hardware design tasks. Section 4explores open issues in
the field and proposes areas for further investigation. Finally, Section 5presents the conclusion of the
study. The organization of the paper is depicted in Figure 1.
Figure 1. Section organization of the review paper
1.2. A Brief History of LLMs
The evolution of LLMs represents a crucial aspect of the broader development in AI. This
progression begins with the earliest models and extends through to the sophisticated systems that
today significantly influence computational linguistics and AI applications. A very brief history of
LLMs is shown by Figure 2.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
4 of 63
Initially, LLMs operated on rule-based systems established in the mid-20
th
century, which were
limited by strict linguistic rules and struggled to adapt to the variability of natural language. These
systems, while foundational, offered limited utility for complex language tasks due to their inability to
capture nuanced linguistic patterns [
76
]. The transition to statistical models like n-grams and Hidden
Markov Models (HMMs) [
77
] in the late 20
th
century marked a pivotal enhancement. These models
introduced a statistical approach to language processing, utilizing probabilities derived from large text
corpora to predict language patterns. This shift allowed better handling of larger datasets, significantly
improving real-world language processing capabilities. Despite these advancements, these models
continued to struggle with deep contextual and semantic understanding, which later developments in
algorithmic technology aimed to address [8–12].
By the early 2000s, the integration of advanced neural networks, specifically RNN [
13
–
15
] and
Long Short-Term Memory (LSTM) networks [
78
], brought about substantial improvements in modeling
sequential data. Additionally, the emergence of word embedding technologies like Word2Vec and
GloVe advanced LLM capabilities by mapping words into dense vector spaces, capturing complex
semantic and syntactic relationships more effectively. Despite these innovations, the increasing
complexity of neural networks raised new challenges, particularly in model interpretability and the
computational resources required [14,79,80].
The mid-2010s marked another significant advancement with the introduction of deep
learning-based neural language models, notably the Recurrent Neural Network Language Model
(RNNLM) in 2010, designed to effectively capture textual dependencies [
81
,
82
]. This development
improved the generation of text that was more natural and contextually informed. However, these
models also faced limitations such as restricted memory capacity and extensive training demands [
83
].
In 2015, Google’s breakthrough with the Neural Machine Translation (GNMT) model utilized deep
learning to significantly enhance machine translation, moving away from traditional rule-based and
statistical techniques towards a more robust neural approach. This development not only improved
translation accuracy but also addressed complex NLP challenges with greater efficacy [84,85].
Figure 2. Brief history of language models [5].
A major breakthrough occurred in 2017 with the development of the Transformer model,
which abandoned the sequential processing limitations of previous models in favor of self-attention
mechanisms [
16
,
17
,
86
]. This innovation allowed for the parallel processing of words, drastically
increasing efficiency and enhancing the model’s ability to manage long-range dependencies. The
Transformer architecture facilitated the creation of more sophisticated models such as BERT, which
utilized bidirectional processing to achieve a deep understanding of text context, greatly improving
performance across a multitude of NLP tasks [
18
,
19
,
87
]. Following BERT, models such as RoBERTa, T5,
and DistilBERT have been tailored to meet the diverse requirements of various domains, illustrating
the adaptability and expansiveness of LLM applications [88].
Subsequently, the introduction of OpenAI’s GPT series further pushed the boundaries of
what LLMs could achieve. Starting with GPT-1 and evolving through GPT-3, these models
demonstrated exceptional capabilities in generating coherent and contextually relevant text across
various applications. GPT-3, in particular, with its wide array of parameters, showcased the potential
of LLMs to perform complex language tasks such as translation, question-answering and creative
writing with minimal specific tuning. The advent of GPT-4 further broadened these capabilities by
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
5 of 63
incorporating multimodal applications that process both text and images, thus significantly expanding
the scope of LLMs. Recent developments, including enhancements in GPT-4 and the introduction of
innovative models such as DALL-E 3, have continued this trend, emphasizing efficiency in fine-tuning,
and enhancing capabilities in creative AI fields, demonstrating the versatility and depth of current
models [19,21,87,89–92].
This progression from statistical models to today’s advanced, multimodal, and domain-specific
systems illustrates the dynamic and ongoing nature of LLMs development. These continual
innovations not only advance the technology but also significantly impact the fields of AI and
computational linguistics. Innovations such as sparse attention mechanisms, more efficient training
algorithms, and the use of specialized hardware like Graphics Processing Units (GPUs) and Tensor
Processing Units (TPUs) have enabled researchers to build increasingly larger and more powerful
models. Moreover, efforts to improve model interpretability, reduce bias, and ensure ethical use are
increasingly becoming central to the field.
In summary, the history of LLMs is a story of rapid progress driven by breakthroughs in Machine
Learning (ML) and neural network architectures. From early statistical models to the transformative
impact of the Transformer architecture and the rise of models like GPT-4, LLMs have evolved
dramatically, reshaping our understanding of language and AI. As research continues, LLMs are
poised to become even more integral to technological innovation and human-computer interaction in
the years to come.
1.3. Survey Papers on the Application of LLMs in Different Areas
Due to the success of LLMs on various tasks and their increasing integration into AI research,
examining the extensive survey literature on these models is essential. A significant number of
surveys [
5
,
93
–
103
], provide detailed insights into the application of LLMs across different fields,
demonstrating their advancements and wide-ranging uses. By analyzing these surveys, primarily
published in 2022 and 2024, our aim is to gain a deeper understanding of LLMs’ applications, and
evaluate their potential impact across various sectors.
In this context, the work by Huang et al. [
93
] present a comprehensive survey of reasoning in
LLMs, focusing on the current methodologies and techniques to enhance and evaluate reasoning
capabilities in these models. The paper provides an in-depth review of various reasoning types,
including deductive, inductive, and abductive reasoning, and discusses how these reasoning forms
can be applied in LLMs. It also explores key methods used to elicit reasoning, such as fully supervised
fine-tuning, prompting, and techniques like "chain of thought" prompting, which encourages models to
generate reasoning steps explicitly. The authors review benchmarks and evaluation methods to assess
reasoning abilities and analyze recent findings in this rapidly evolving field. Despite the advancements
in reasoning with LLMs, the paper points out the limitations of current models, emphasizing that it
remains unclear whether LLMs truly possess reasoning abilities or are merely following heuristics.
Huang et al. conclude by offering insights into future directions, suggesting that better benchmarks and
more robust reasoning techniques are needed to push the boundaries of LLMs’ reasoning capabilities.
This survey serves as an essential resource for researchers looking to understand the nuances of
reasoning in LLMs and guide future research in this critical area. The authors review benchmarks and
evaluation methods to assess reasoning abilities and analyze recent findings in this rapidly evolving
field. Despite the advancements in reasoning with LLMs, the paper points out the limitations of
current models, emphasizing that it remains unclear whether LLMs truly possess reasoning abilities
or are merely following heuristics. Huang et al. conclude by offering insights into future directions,
suggesting that better benchmarks and more robust reasoning techniques are needed to push the
boundaries of LLMs’ reasoning capabilities. This survey serves as an essential resource for researchers
looking to understand the nuances of reasoning in LLMs and guide future research in this critical area.
The authors review benchmarks and evaluation methods to assess reasoning abilities and analyze
recent findings in this rapidly evolving field. Despite the advancements in reasoning with LLMs,
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
6 of 63
the paper points out the limitations of current models, emphasizing that it remains unclear whether
LLMs truly possess reasoning abilities or are merely following heuristics. Huang et al. conclude by
offering insights into future directions, suggesting that better benchmarks and more robust reasoning
techniques are needed to push the boundaries of LLMs’ reasoning capabilities. This survey serves as an
essential resource for researchers looking to understand the nuances of reasoning in LLMs and guide
future research in this critical area. The authors review benchmarks and evaluation methods to assess
reasoning abilities and analyze recent findings in this rapidly evolving field. Despite the advancements
in reasoning with LLMs, the paper points out the limitations of current models, emphasizing that it
remains unclear whether LLMs truly possess reasoning abilities or are merely following heuristics.
Huang et al. conclude by offering insights into future directions, suggesting that better benchmarks and
more robust reasoning techniques are needed to push the boundaries of LLMs’ reasoning capabilities.
This survey is an essential resource for researchers looking to understand the nuances of reasoning in
LLMs and guide future research in this critical area.
In the study by Xi et al. [
94
], the authors comprehensively survey the rise and potential of
LLM-based agents. The paper traces the evolution of AI agents, with a particular focus on LLMs
as foundational components. It explores the conceptual framework of LLM-based agents, which
consists of three main parts: brain, perception, and action. The survey discusses how LLMs,
particularly transformer models, have been leveraged to enhance various agent capabilities, such as
knowledge processing, reasoning, and decision-making, allowing them to interact effectively with
their environments. Furthermore, the paper delves into the real-world applications of LLM-based
agents across different sectors, including single-agent and multi-agent systems, as well as human-agent
cooperation scenarios. It also highlights how LLM-based agents can exhibit behaviors akin to social
phenomena when placed in societies of multiple agents. In addition, the study examines the ethical,
security, and trustworthiness challenges posed by these agents, stressing the need for robust evaluation
frameworks to ensure their responsible deployment. Finally, the authors present future research
directions, particularly around scaling LLM-based agents, improving their capabilities in real-world
settings, and addressing open problems related to their generalization and adaptability.
In [
95
], the authors present a detailed review of LLMs’ evolution, applications, and challenges.
The paper highlights the architecture and training methods of LLMs, particularly focusing on
transformer-based models, and emphasizes their significant contributions across a range of sectors,
including medicine, education, finance, and engineering. It also explores both the potential and
limitations of LLMs, addressing ethical concerns such as biases, the need for vast computational
resources, and issues of model interpretability. Furthermore, the survey delves into emerging trends,
including efforts to improve model robustness and fairness, while anticipating future directions for
research and development in the field. This comprehensive analysis serves as a valuable resource
for researchers and practitioners, offering insights into the current state and future prospects of LLM
technologies.
Naveed et al. [
96
] provide a comprehensive overview of LLMs, focusing on their architectural
design, training methodologies, and diverse applications across various domains. The paper delves
deeply into transformer models and their role in advancing NLP tasks. It also highlights the challenges
associated with LLM deployment, including ethical concerns, computational resource demands, and
the complexity of training these models. Additionally, the survey explores the impact of LLMs on
different sectors such as healthcare, engineering, and social sciences, and identifies potential research
directions for the future. This review serves as a key resource for researchers and practitioners looking
to understand the current landscape of LLM development and deployment.
Fan et al. [
97
] present a comprehensive bibliometric analysis of over 5,000 publications on LLMs
spanning from 2017 to 2023. This study aims to provide a detailed map of the progression and trends in
LLM research, offering valuable insights for researchers, practitioners, and policymakers. The analysis
delves into key developments in LLM algorithms and explores their applications across a wide range
of fields, including NLP, medicine, engineering, and the social sciences. Additionally, the paper reveals
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
7 of 63
the dynamic and fast-paced evolution of LLM research, highlighting the core algorithms that have
driven advancements and examining how LLMs have been applied in diverse domains. By tracing
these developments, the study underscores the substantial impact LLMs have had on both scientific
research and technological innovation and provides a roadmap for future research in the field.
The study by Zhao et al. [
5
] offers an extensive survey of the evolution and impact of LLMs
within AI and NLP. It traces the development from early statistical and neural language models to
modern pre-trained language models (PLMs) with vast parameter sets. The paper highlights the
unique capabilities that emerge as LLMs scale, such as in-context learning and instruction-following,
which distinguish them from smaller models. A significant portion of the survey is dedicated to the
contributions of LLMs, including their role in advancing AI applications like ChatGPT. Organized
around four key areas—pre-training, adaptation tuning, utilization, and capacity evaluation—the
study offers a comprehensive analysis of current evaluation techniques and benchmarks, while also
identifying future research directions for enhancing LLMs and exploring their full potential.
In the study by Raiaan et al. [
98
], the authors conduct a comprehensive review of LLMs, focusing
on their architecture, particularly transformer-based models, and their role in advancing NLP tasks
such as text generation, translation, and question answering. The paper explores the historical
development of LLMs, beginning with early neural network-based models, and examines the evolution
of architectures like transformers, which have significantly enhanced the capabilities of LLMs. It
discusses key aspects such as training methods, datasets, and the implementation of LLMs across
various domains including healthcare, education, and business.
In another study, Minaee et al. [
99
] survey on LLMs illustrates an insightful analysis of the
rise and development of LLMs, focusing on key models like GPT, LLaMA, and PaLM. The paper
offers a comprehensive analysis of their architectures, training methodologies, and the scaling laws
that underpin their performance in natural language tasks. Additionally, the survey examines key
advancements in LLM development techniques, evaluates commonly used training datasets, and
compares the effectiveness of different models through benchmark testing. Importantly, the study
explores the emergent abilities of LLMs—such as in-context learning and multi-step reasoning—that
differentiate them from smaller models, while also addressing real-world applications, current
limitations, and potential future research directions.
In [
100
], the authors provide an in-depth analysis of the methodologies and technological
advancements in the training and inference phases of LLMs. The paper explores various aspects
of LLM development, including data preprocessing, model architecture, pre-training tasks, and
fine-tuning strategies. Additionally, it covers the deployment of LLMs, with a particular emphasis
on cost-efficient training, model compression, and the optimization of computational resources. The
review concludes by discussing future trends and potential developments in LLM technology, making
it a valuable resource for understanding the current and future landscape of LLM research and
deployment.
Cui et al. [
101
] present a comprehensive survey on the role of Multimodal Large Language
Models (MLLMs) in advancing autonomous driving technologies. The paper systematically explores
the evolution and integration of LLMs with vision foundation models, focusing on their potential
to enhance perception, decision-making, and control in autonomous vehicles. It reviews current
methodologies and real-world applications of MLLMs in the context of autonomous driving, including
insights from the 1st WACV Workshop on Large Language and Vision Models for Autonomous
Driving (LLVM-AD). The study highlights emerging research trends, key challenges, and innovative
approaches to improving autonomous driving systems through MLLM technology, emphasizing the
importance of multimodal learning for the future of autonomous vehicles. Additionally, it stresses
the need for further research to address critical issues like safety, data processing, and real-time
decision-making in the deployment of these models.
Chang et al. [
102
] thoroughly explores the essential practices for assessing the performance and
applicability of LLMs. It systematically reviews evaluation methodologies focusing on what aspects of
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
8 of 63
LLMs to evaluate, where these evaluations should occur, and the best practices on how to conduct
them. The paper explores evaluations across various domains including NLP, reasoning, medical
applications, ethics, and education, among others. It also highlights successful and unsuccessful case
studies in LLM applications, providing a critical insight into future challenges that might arise in LLM
evaluation and stressing the need for a discipline-specific approach to effectively support the ongoing
development of these models.
Kachris [
103
] provides a comprehensive analysis of the various hardware solutions designed to
optimize the performance and efficiency of LLMs. The paper explores a wide variety of accelerators,
including GPUs, FPGAs, and Application-Specific Integrated Circuit (ASIC)s, providing a detailed
discussion of their architectures, performance, and energy efficiency metrics. It focuses on the
significant computational demands of LLMs, particularly in both training and inference, and evaluates
how these accelerators help meet these demands. The survey also highlights the trade-offs in
performance and energy consumption, making it a valuable resource for those seeking to optimize
hardware solutions for LLM deployment in data centers and edge computing.
Table 1provides a comparison between various review papers, categorizing them based on
critical features such as LLM models, APIs, datasets, domain-specific LLMs, ML-based comparisons,
taxonomies, architectures, performance, hardware specifications for testing and training, and
configurations.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
9 of 63
Table 1. Comparison of LLM Review Papers
Paper LLMs Model LLMsAPI LLMs Dataset DomainLLMs Taxonomy LLMs Architecture LLMs Configurations MLComparisons Performance Parameters
and Hardware Specification Scope Key Findings Methodology and Approach
Huang et al. [93]✓x✓x✓ ✓ ✓ ✓ ✓ xLLM reasoning
abilities
Explores LLMs’ reasoning
abilities and evaluation methodologies Reasoning-focused review
Xi et al. [94]✓x✓ ✓ x✓x x ✓xLLM-based AI agents
for multiple domains
Highlights potential for LLMs
as general-purpose agents Agent-centric analysis
Hadi et al. [95]✓x✓x✓ ✓ ✓ x✓ ✓ Comprehensive review of
LLMs, applications, and challenges
Highlights potential of LLMs
in various domains, discusses
challenges and limitations
Literature review and analysis
Naveed et al. [96]✓x✓x✓ ✓ ✓ ✓ ✓ ✓ Overview of LLM architectures
and performance
Challenges and advancements
in LLM training, architectural innovations,
and emergent abilities
Comparative review of models
and training methods
Fan et al. [97]✓x✓x✓ ✓ x x x x Bibliometric review
of LLM research (2017-2023)
Tracksresearch trends,
collaboration networks,
and the evolution of LLM research
Bibliometric analysis
using topic modeling
and citation networks
Zhao et al. [5]✓ ✓ ✓ x✓ ✓ ✓ ✓ ✓ xComprehensivesurvey
of LLM models, taxonomy
Detailed analysis of LLMs
evolution, taxonomy,emergent
abilities, adaptation, and evaluation
Thorough review,
structured methodology
and various benchmarks
Raiaan et al. [98]✓x✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Comprehensive review
of LLM architectures,
applications, and
challenges
Discusses LLM
development,
applications in various
domains, and societal
impact
Extensive literature
review with comparisons
and analysis of open issues
Minaee et al. [99]✓x✓x✓ ✓ ✓ ✓ ✓ x
Comprehensive survey
of LLM architectures, datasets,
and performance
Comprehensive review
of LLM architectures, datasets,
and evaluations
Comprehensive survey and analysis
Liu et al. [100]✓x✓x✓ ✓ ✓ x✓ ✓ Training and inference
in LLMs
Cost-efficient training
and inference techniques
are crucial for LLM development
Comprehensive review of
training techniques and
inference optimizations
Cui et al. [101]✓x✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ MLLMs for autonomous driving
with extensive dataset coverage
Explores the potential of MLLMs
in autonomous vehicle systems
Survey focusing on perception,
planning, and control
Chang et al. [102]✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Comprehensive evaluation
of LLMs across multiple
domains and tasks
Details LLM evaluation
protocols, benchmarks,
and task categories
Survey of evaluation
methods for LLMs
Kachris et al. [103]✓x✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Hardware solutions
for accelerating LLMs
Energy efficiency
improvements through hardware
Survey on hardware
accelerators for LLMs
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
10 of 63
Table 2. Tasks in hardware design and verification which can be done by LLMs
Category Task Description
HDL Code Generation Automatically generate Verilog, VHDL, or SystemC code
from high-level design descriptions or specifications.
Design Specification Translation Convertnatural language specifications into formal design
requirements or constraints.
Design Optimization Suggestions Provide recommendations for optimizing design parameters
such as power, performance, and area.
Component Selection Suggest suitable components based on design requirements
and existing libraries.
Documentation Generation Create detailed design documentation, including block diagrams,
interface definitions, and data sheets.
Design Space Exploration Propose and evaluate different design alternatives based on
specified criteria.
Design
IP Core Integration Automate the integration of IP cores into larger systems, including
interface matching and configuration.
Test Bench Generation Automatically generate test benches, including stimulus and
expected results, from high-level test plans.
Test Case Generation Create individual test cases based on design specifications and
verification requirements.
Bug Detection and Suggestion Analyze simulation logs and error reports to identify potential bugs
and suggest debugging steps.
Assertion Generation Generate assertions for formal verification to ensure the correctness
of design behavior.
Coverage Analysis Analyze coverage reports to identify untested areas and suggest
additional tests.
Regression Test Management Automate the organization, execution, and analysis of regression
test suites.
Verification
Simulation Script Generation Create scripts for running simulations with different configurations
and scenarios.
Code Review Assistance Provide automated feedback on HDL code quality, compliance with
coding standards, and potential issues
Documentation Summarization Summarize lengthy documentation and highlight key points for
quicker understanding.
Training Material Creation Generate tutorials, guides, and FAQs for training new team members
on tools and processes.
Knowledge Base Maintenance Organize and maintain a knowledge base of best practices, common
issues, and solutions.
Collaborative and
Supportive Tasks
Natural Language Queries Answer queries in natural language about design specifications,
verification results, and other relevant topics.
Requirement Traceability Track design requirements through all stages of development and
verification, ensuring all requirements are met.
Change Impact Analysis Analyze the impact of design changes on the overall system and
suggest necessary verification updates.
Design and Verification
Workflow Automation
Project Management Support Assist in tracking project milestones, deadlines, and deliverables
related to design and verification.
Design Validation Validate design correctness against high-level specifications using
formal methods and simulation.
Error Diagnosis Diagnose errors in simulation results and suggest possible fixes
based on historical data.
Performance Analysis Perform detailed performance analysis and suggest improvements
based on simulation data.
Advanced Automation
Automated Synthesis: Guide the synthesis process to optimize the design for specific targets
(e.g., low power, high performance).
1.4. How LLM facilitate Hardware Design and Verification?
By automating repetitive tasks, providing intelligent suggestions, and facilitating better
communication and documentation, LLMs significantly improve the efficiency and effectiveness
of hardware design and verification processes. They enable engineers to focus on higher-level
problem-solving and innovation, thereby accelerating the development cycle and improving the
quality of hardware products. A comprehensive list of these tasks are listed on Table 2.
2. Literature Review
2.1. Overview of LLMs in Hardware Design
LLMs have become a transformative tool in the field of hardware design and verification, bringing
significant advancements in efficiency and accuracy. These models, powered by sophisticated AI
and NLP capabilities, can analyze and interpret vast amounts of documentation, code, and design
specifications, which accelerate the initial phases of hardware design. Using LLMs, engineers can
automate the generation of design documents, ensuring consistency and reducing human error. This
automation not only speeds up the design process but also enables the exploration of more complex
and innovative designs, as the model can provide insights and suggestions based on a wide array of
previous designs and industry standards.
In the realm of hardware verification, LLMs play a crucial role in improving the robustness and
reliability of hardware systems. Verification is a critical step that ensures that the designed hardware
functions correctly under all specified conditions. LLMs can help generate comprehensive test cases,
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
11 of 63
identifying potential edge cases that could be overlooked by human designers. In addition, they can
analyze the results of these tests more efficiently, highlighting discrepancies and providing detailed
diagnostics that can pinpoint the root causes of failures. This capability significantly reduces the
time and resources required for verification, allowing quicker iterations and more reliable hardware
products. As a result, the integration of LLMs into hardware design and verification workflows is
increasingly essential to maintain a competitive advantage in the fast-paced tech industry.
These studies collectively illustrate the transformative potential of LLMs in hardware design
and verification, offering new methodologies that enhance efficiency, accuracy, and innovation in the
field. As technology continues to evolve, further research and development will likely uncover even
more applications and benefits, solidifying the role of LLMs as a crucial tool in modern hardware
engineering.
Figure 3. Different Categories of LLMs for Hardware Design and Verification
2.2. Different Categories of LLMs for Hardware Design and Verification
To the best of our knowledge, all articles in the literature can be categorized into six categories,
as depicted in Figure 3. It should be noted that, although some papers could belong to two or more
categories due to their approaches, we have decided to assign them to the category that most closely
aligns with the majority of their content. Initially, we evaluated 62 articles. After the first round of
review, 54 articles were selected for our survey. Hardware Design and Hardware Security were the
most prominent categories, representing 48% and 18% of the articles, respectively. In the following
sections, each category will be discussed in detail together with all surveyed papers.
Figure 4. Input prompt for a HDL design
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
14 of 63
Figure 7. Copilot output for the input prompt of the HDL design
2.2.1. Hardware Design
Hardware design using HDLs is evolving with the integration of LLMs such as OpenAI’s ChatGPT,
Google Gemini, and Microsoft Copilot. These LLMs assist in modifying and generating HDL code by
accepting design specifications as input prompts, streamlining the development process. In
Figure 4,
we demonstrate an HDL design for a shift register, where the input prompt describes the desired
behavior and adjustments. The results of this input prompt processed by different LLMs are presented
in Figures 5–7, showing the outputs of ChatGPT 4o, Gemini, and Copilot, respectively. These figures
illustrate the varying approaches taken by each model to interpret and modify the HDL design.
This paper [
104
] presents a comprehensive framework for evaluating the performance of hardware
setups used in the inference of LLMs. The framework aims to measure critical performance metrics
such as latency, throughput, energy efficiency, and resource utilization to provide a detailed assessment
of hardware capabilities. By standardizing these measurements, the framework facilitates consistent
and comparable evaluations across different hardware platforms, enabling researchers and engineers
to identify the most efficient configurations for LLM inference. The proposed framework addresses
the growing complexity and computational demands of LLMs by offering a robust tool for hardware
benchmarking. It integrates various testing scenarios to capture the diverse workloads LLMs handle
during inference. This allows for a nuanced understanding of how different hardware components
contribute to overall performance. Ultimately, the framework aims to guide the development of more
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
15 of 63
efficient hardware solutions tailored to the specific needs of LLMs, promoting advancements in both
hardware design and LLM deployment strategies.
This paper [
105
] investigates the application of LLMs in optimizing and designing VHDL (VHSIC
Hardware Description Language (HDL)) code, a critical aspect of digital circuit design. The study
explores how LLMs can automate the generation of efficient VHDL code, potentially reducing the time
and effort required in the design process. Through a series of experiments, the authors demonstrate
the capability of LLMs to provide high-quality code suggestions and optimizations, which can enhance
the performance and reliability of digital circuits. The paper also discusses the challenges associated
with integrating LLMs into the VHDL design workflow. It highlights issues such as the need for
domain-specific training data and the importance of understanding the context and constraints of
hardware design. By addressing these challenges, the study provides insights into how LLMs can
be effectively utilized to support and streamline VHDL code development, paving the way for more
automated and intelligent design processes in digital electronics.
AutoChip [
106
] introduces a novel method to automate the generation of HDL code by using
feedback from LLMs. The approach involves an iterative process where LLMs provide suggestions
and improvements on initial HDL code drafts, leading to refined and optimized final versions. This
method significantly reduces the need for manual intervention, making the design process more
efficient and accessible, especially for complex hardware projects. The paper outlines the technical
details of implementing AutoChip, including training LLMs on domain-specific datasets and the
integration of feedback loops into the design workflow. Case studies demonstrate the effectiveness
of AutoChip in producing high-quality HDL code with minimal human oversight. By automating
routine and complex coding tasks, AutoChip has the potential to revolutionize the field of hardware
design, enabling faster prototyping and more innovative solutions in hardware development.
This study [
107
] focuses on benchmarking various LLMs to evaluate their performance in
generating Verilog Register Transfer Level (RTL) code. The paper compares the accuracy, efficiency,
and complexity handling capabilities of different models, providing a comprehensive assessment of
their suitability for automated hardware design tasks. By establishing standardized benchmarks, the
research offers valuable insight into the strengths and limitations of each model in the context of the
generation of Verilog code RTL. The findings highlight significant differences in model performance,
emphasizing the importance of selecting the right LLM for specific hardware design tasks. The paper
also discusses the potential for further improving LLMs by incorporating more specialized training
data and refining model architectures. Through detailed comparisons and practical examples, this
study contributes to the ongoing effort to enhance the role of LLMs in automating and optimizing the
hardware design process.
Chip-Chat [
108
] explores the emerging field of using conversational glsai, particularly LLMs, in
hardware design. The paper discusses the potential benefits of natural language interfaces, such as
increased accessibility and collaboration, by enabling designers to interact with design tools through
simple conversational commands. This approach could democratize hardware design, making it
more accessible to non-experts and fostering innovation through diverse contributions. However,
the paper also highlights significant challenges in this domain, including the current limitations of
LLMs in understanding complex hardware design concepts and the need for precise and unambiguous
communication in technical contexts. The authors propose potential solutions, such as improving
model training with domain-specific data and developing more sophisticated interaction protocols.
By addressing these challenges, the paper aims to pave the way for more effective integration of
conversational AI in hardware design, potentially transforming how engineers and designers approach
complex projects.
ChipGPT [
109
] examines the current state of using LLMs for natural language-based hardware
design, assessing their capabilities and identifying existing gaps. The paper evaluates various LLMs
in terms of their ability to understand and generate hardware design code from natural language
descriptions. It highlights the potential of these models to streamline the design process by enabling
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
16 of 63
more intuitive and accessible interactions between designers and design tools. Despite the promising
potential, the paper identifies several challenges that need to be addressed to achieve seamless natural
language hardware design. These include improving the models’ understanding of technical jargon
and design constraints, enhancing the accuracy and efficiency of code generation, and ensuring robust
handling of complex design scenarios. By outlining these challenges and suggesting areas for future
research, the study provides a roadmap for advancing the integration of NLP in hardware design
workflows.
This paper [
110
] explores the application of LLMs to detect code segments that can benefit
from hardware acceleration. The study presents techniques for identifying computation-intensive
parts of code that, when offloaded to specialized hardware, can significantly improve overall system
performance. The authors demonstrate the effectiveness of LLMs in pinpointing these critical segments
and providing recommendations for hardware acceleration. The research also discusses the practical
implementation of this approach, discussing how LLMs can be integrated into existing development
workflows to automatically suggest and optimize code for hardware acceleration. The paper highlights
the potential performance gains and efficiency improvements achievable through this method, making
a strong case for the use of LLMs to optimize software for better hardware utilization. By leveraging
LLMs for this purpose, developers can achieve more efficient and powerful computing solutions.
CreativEval [
111
] introduces a novel approach to evaluating the creativity of hardware design
code generated by LLMs. The paper proposes specific metrics and benchmarks to assess the originality,
efficiency, and practicality of the generated code, emphasizing the importance of creativity in hardware
design. By focusing on these aspects, the study aims to determine how well LLMs can innovate
within the constraints of hardware development. The findings suggest that while LLMs are capable of
producing creative solutions, there are limitations to their ability to fully replicate human ingenuity in
hardware design. The paper discusses the potential for improving LLMs through more diverse and
comprehensive training data, as well as refining the evaluation metrics to better capture the nuances of
creative design. Through this evaluation framework, CreativEval contributes to the understanding of
LLM capabilities in generating novel and effective hardware design solutions.
Designing Silicon Brains using LLM [
112
] explores the use of ChatGPT, a type of LLMs, for
designing a spiking neuron array, which is a type of neuromorphic hardware that mimics the behavior
of biological neurons. The authors demonstrate how ChatGPT can generate detailed and accurate
descriptions of the neuron array, including its architecture and functionality, thereby aiding in the
design process. This approach leverages the model’s ability to understand and articulate complex
technical concepts in natural language. The study showcases the potential of using LLMs for designing
advanced neuromorphic systems, highlighting the benefits of automated description and specification
generation. By providing a structured and comprehensive design output, ChatGPT can significantly
streamline the development process of spiking neuron arrays. The paper also discusses the challenges
and future directions for integrating LLMs into neuromorphic hardware design, emphasizing the
need for further refinement and domain-specific training to enhance the accuracy and utility of the
generated descriptions.
Digital ASIC Design with Ongoing LLMs [
113
] explores the application of ongoing LLMs in
the design of ASICs. The paper provides an overview of current methodologies and strategies for
incorporating LLMs into various stages of the ASIC design process, from initial specification to final
implementation. It highlights the potential of LLMs to automate routine tasks, enhance design accuracy,
and reduce development time. The authors also discuss the prospects and challenges of using LLMs
in digital ASIC design, including the need for specialized training data, the integration of LLMs into
existing design workflows, and the importance of maintaining design integrity and performance. By
addressing these issues, the study offers valuable insights into the future of ASIC design, suggesting
that LLMs could play a significant role in advancing the field through increased automation and
intelligent design support.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
17 of 63
This paper [
114
] reviews recent advancements in the development of efficient algorithms and
hardware architectures specifically tailored for NLP tasks. The authors discuss various techniques for
optimizing NLP algorithms to improve performance and reduce computational requirements. These
optimizations are crucial for handling large-scale data and complex models typical of modern NLP
applications, including LLMs. The study also explores the design of specialized hardware that can
support efficient NLP. This includes discussing hardware accelerators, such as GPUs and TPUs, and
their role in enhancing the performance of NLP tasks. By combining algorithmic improvements with
hardware advancements, the paper outlines a comprehensive approach to achieving high efficiency
and scalability in NLP applications. This integrated approach is essential for meeting the growing
demands of NLP and leveraging the full potential of LLMs.
GPT4AIGChip [
115
] investigates the use of LLMs to automate the design of AI accelerators. The
paper presents methodologies for leveraging LLMs to generate design specifications and optimize
the architecture of AI accelerators. By automating these processes, the study aims to streamline the
development of specialized hardware for AI tasks, reducing both the time and costs associated with
traditional design methods. The authors provide detailed case studies demonstrating the effectiveness
of LLMs in producing high-quality design outputs for AI accelerators. They highlight the potential
for LLMs to not only accelerate the design process but also to innovate and improve upon existing
architectures. The paper discusses future directions for this research, including the integration of more
advanced LLMs and the development of more sophisticated automation tools. By advancing the use
of LLMs in hardware design, GPT4AIGChip contributes to the ongoing evolution of AI hardware
development.
Hardware Phi-1.5B [
116
] introduces a LLM trained specifically on hardware-related data,
demonstrating its capability to encode domain-specific knowledge. The paper discusses the model’s
architecture and training process, emphasizing the importance of specialized datasets for achieving
high performance in hardware design tasks. The authors showcase various applications of Hardware
Phi-1.5B, including code generation, optimization, and troubleshooting in hardware development.
The study highlights the advantages of using domain-specific LLMs over general-purpose models,
particularly in terms of accuracy and relevance of the generated content. By tailoring the model to the
hardware domain, Hardware Phi-1.5B is able to provide more precise and contextually appropriate
outputs, which can significantly enhance the efficiency and effectiveness of hardware design processes.
The paper concludes with a discussion of future research directions and potential improvements to
further leverage domain-specific LLMs in hardware engineering.
Hardware-Aware Transformers (HAT) [
117
] introduces a novel approach to designing LLMs
that consider hardware constraints during model training and inference. The paper details the
development of transformers that are optimized for specific hardware configurations, aiming to
improve performance and reduce resource consumption. This hardware-aware design is particularly
important for deploying NLP tasks on various platforms, including edge devices and specialized
accelerators. The study presents experimental results demonstrating the efficiency gains achieved
by HAT models compared to traditional transformers. The authors highlight significant reductions
in latency and energy usage, making these models more suitable for real-world applications where
resource constraints are a critical factor. By focusing on the co-design of hardware and software, HAT
offers a promising solution for enhancing the performance and scalability of NLP tasks in diverse
deployment environments.
This paper [
118
] proposes a post-processing technique to improve the quality of hardware design
code generated by LLMs. The approach involves applying search algorithms to refine and optimize
the initial outputs from LLMs, ensuring higher quality and more reliable code generation. The authors
detail the implementation of this technique and provide experimental results demonstrating its
effectiveness in enhancing code quality. The study highlights the limitations of current LLM-generated
code, such as inaccuracies and inefficiencies, and shows how post-LLM search can address these
issues. By iteratively refining the code, the proposed method can significantly improve the final output,
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
18 of 63
making it more suitable for practical hardware design applications. This approach underscores the
potential for combining LLM capabilities with additional optimization techniques to achieve superior
results in automated code generation.
OliVe [
119
] introduces a quantization technique designed to accelerate LLMs by focusing on
hardware-friendly implementations. The concept of outlier-victim pair quantization is presented as a
method to reduce the computational load and improve inference speed on hardware platforms. This
technique targets specific outliers in the data, which typically require more resources, and optimizes
their representation to enhance overall model efficiency. The paper provides a detailed analysis of
the quantization process and its impact on model performance. Experimental results demonstrate
significant improvements in inference speed and resource utilization without compromising the
accuracy of the LLMs. By making LLMs more hardware-friendly, OliVe offers a practical solution for
deploying these models in environments with limited computational resources, such as mobile devices
and edge computing platforms.
RTLCoder [
120
] presents a specialized model designed to outperform GPT-3.5 in generating RTL
code. The paper highlights the use of an open-source dataset and a lightweight model architecture
to achieve superior results in RTL code generation tasks. The authors provide a comprehensive
comparison of RTLCoder’s performance against GPT-3.5, demonstrating significant improvements
in accuracy, efficiency, and code quality. The study also discusses the advantages of using a focused
dataset and a streamlined model for specific applications, such as hardware design. By tailoring the
model to the unique requirements of RTL code generation, RTLCoder can provide more relevant and
high-quality outputs, reducing the need for extensive manual corrections. This approach underscores
the potential for developing specialized models that can outperform general-purpose LLMs in
targeted tasks.
RTLLM [
121
] introduces an open-source benchmark specifically designed to evaluate the
performance of LLMs in generating RTL code. The benchmark provides a standardized set of tasks and
metrics to facilitate consistent and comprehensive assessments of LLM capabilities in RTL generation.
By offering a common evaluation framework, RTLLM aims to drive improvements and innovation in
the use of LLMs for hardware design. The paper details the creation of the benchmark, including the
selection of tasks, the development of evaluation criteria, and the compilation of relevant datasets. The
authors present initial results from using RTLLM to evaluate various LLMs, highlighting strengths
and areas for improvement. By providing an open-source tool for benchmarking, RTLLM encourages
collaboration and transparency in the development and assessment of LLMs for RTL code generation,
fostering advancements in this emerging field.
This paper [
122
] discusses the co-design approach to selecting features for language models, and
balancing software and hardware requirements to achieve optimal performance. The authors highlight
the importance of considering both aspects in the development of LLMs, as hardware constraints can
significantly impact model efficiency and scalability. By integrating software and hardware design,
the study aims to create more efficient and effective LLMs. The paper presents case studies and
experimental results demonstrating the benefits of the co-design approach. These examples show how
tailored features can enhance model performance on specific hardware platforms, reducing latency
and resource consumption. The authors argue that this integrated approach is essential for developing
LLMs that can meet the growing demands of real-world applications, offering a path forward for more
sustainable and scalable NLP solutions.
SlowLLM [
123
] investigates the feasibility and performance of running LLMs on consumer-grade
hardware. The paper addresses the challenges and limitations of deploying LLMs on less powerful
devices, such as personal computers and smartphones, which often lack the computational resources
of specialized hardware. The authors propose solutions to enhance performance, such as model
compression and optimization techniques tailored for consumer hardware. The study provides
experimental results showcasing the performance of various LLMs on consumer devices, highlighting
both successes and areas for improvement. The findings demonstrate that, while there are significant
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
19 of 63
challenges, it is possible to achieve acceptable performance levels with appropriate optimizations.
By exploring these possibilities, SlowLLM contributes to making advanced NLP capabilities more
accessible to a broader audience, potentially expanding the applications and impact of LLMs in
everyday technology use.
SpecLLM [
124
] investigates the use of LLMs for generating and reviewing Very-Large Scale
Integration (VLSI) design specifications. The paper evaluates the ability of LLMs to produce accurate
and comprehensive specifications, which are crucial for the development of complex integrated
circuits. The authors present methods for training LLMs on domain-specific data to enhance their
understanding and performance in VLSI design tasks. The study provides experimental results
demonstrating the effectiveness of LLMs in generating VLSI design specifications, highlighting their
potential to streamline the design process and reduce errors. The authors discuss the challenges
and future directions for improving the integration of LLMs in VLSI design, including the need for
more sophisticated training techniques and better handling of technical jargon. By exploring these
possibilities, SpecLLM contributes to the ongoing efforts to enhance the role of LLMs in the field of
hardware design.
ZipLM [
125
] introduces a structured pruning technique designed to improve the inference
efficiency of LLMs. The paper details the development of pruning methods that selectively remove
less important components of the model, reducing its size and computational requirements without
significantly impacting performance. This inference-aware approach ensures that the pruned models
remain effective for their intended tasks while benefiting from enhanced efficiency. The study presents
experimental results demonstrating the effectiveness of ZipLM in reducing model size and improving
inference speed. The authors highlight the potential applications of this technique in environments
with limited computational resources, such as edge devices and mobile platforms. By focusing on
structured pruning, ZipLM offers a practical solution for deploying LLMs more efficiently, enabling
broader accessibility and application of these powerful models in various real-world scenarios.
This paper [
126
] explores the application of advanced language models (ALMs), such as GPT-3.5
and GPT-4, in the realm of electronic hardware design, particularly focusing on Verilog programming.
Verilog is a HDL used for designing and modeling digital systems. The study introduces the VeriPPA
framework, which utilizes ALMs to generate and refine Verilog code. This framework incorporates a
two-stage refinement process to enhance both the syntactic and functional correctness of the generated
code and to align it with key performance metrics—Power, Performance, and Area (PPA). This iterative
approach involves leveraging diagnostic feedback from simulators to identify and correct errors
systematically, akin to human problem-solving techniques. The methodology begins with ALMs
generating initial Verilog code, which is then refined through the VeriRectify process. This process uses
error diagnostics from simulators to guide the correction of syntactic and functional issues, ensuring
the generated code meets specific correctness criteria. Following this, the code undergoes a PPA
optimization stage where its power consumption, performance, and area efficiency are evaluated and
further refined if necessary. This dual-stage approach significantly improves the quality of Verilog
code, achieving an 81.37% linguistic accuracy and a 62.0% operational efficacy in programming
synthesis, surpassing existing techniques. The study highlights the potential of ALMs in automating
and improving the hardware design process, making it more accessible and efficient for those with
limited expertise in chip design.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
20 of 63
Table 3. Comparison of papers focuses on Hardware Design with LLMs
Parameter Approach References
Optimizing hardware specifically for LLM inference and performance [104,115,117,127]
Generation and optimization of hardware design code using LLMs [105,106,113,124]
Exploring broader challenges and opportunities in conversational and
natural language-based hardware design [108,109]
Scope and Focus
Code detection for acceleration and quantization techniques [110,119]
Benchmarking and
Evaluation Benchmarking and evaluating LLM performance in hardware-related tasks [104,107,121,128]
Automated
Generation
Methodologies for automating HDL generation and design specification
using LLM feedback [106,120,124]Methodologies
Optimization
Techniques
Exploring specific optimization techniques like hardware-aware transformers
and structured pruning [110,117,125,127]
Creativity and
Originality Evaluating the creativity of LLM-generated hardware code [111,128]
Neuromorphic
Hardware
Focusing on designing neuromorphic hardware (spiking neuron arrays)
using LLMs, highlighting an innovative application in the field [112]
Innovative
Contributions
Consumer Hardware
Feasibility
Investigating the feasibility of running LLMs on consumer-grade hardware,
addressing practical deployment challenges [123,127]
AI Accelerators Automation of AI accelerator design, reflecting the growing importance
of specialized hardware for AI tasks. [115,120,127]
VLSI Design VLSI design specifications, an area critical for complex integrated circuit design. [124]
Application Areas
General Hardware
Design Looking at various aspects of hardware design and integration with LLMs. [109,113,114,128]
Making LLMs more efficient and hardware-friendly, addressing the
computational and resource challenges associated with large models. [117,119,127]
Performance and Efficiency Discussing frameworks and techniques to enhance inference performance,
which are crucial for deploying LLMs in real-world applications. [104,125,128]
This paper [
127
] reviews and proposes various strategies to accelerate and optimize LLM,
addressing the computational and memory challenges associated with their deployment. It
covers algorithmic improvements, including early exiting and parallel decoding, and introduces
hardware-specific optimizations through LLM-hardware co-design. The paper presents frameworks
like Medusa for parallel decoding, achieving speedups up to 2.8x, and SnapKV for memory efficiency
improvements. It also explores High Level Synthesis (HLS) applications with frameworks like
ScaleHLS and HIDA, which convert LLM architectures into hardware accelerators. These advances
improve LLM performance in real-time applications, such as NLP and EDA, while reducing energy
consumption and improving efficiency.
From English to ASIC [
128
] explores the application of LLMs in automating hardware design
using HDLs like Verilog for ASICs. The authors focus on improving the precision of LLM-generated
HDL code by fine-tuning Mistral 7B and addressing challenges such as syntax errors and the scarcity
of quality training datasets. By creating a labeled Verilog dataset and applying advanced optimization
techniques such as LoRA and DeepSpeed ZeRO, the paper shows significant improvements in code
generation accuracy, with up to a 20% increase in pass@1 metrics. The paper’s contributions include
optimizing memory usage and inference speed, making LLMs more practical for EDA in hardware
development. Table 3provides an overview of all the papers discussed in the hardware design
subsection for comparison.
2.2.2. Hardware/Software Codesign
This paper [
129
] explores co-design strategies to improve training speed and scalability of
deep learning recommendation models. The authors emphasize the integration of software and
hardware design to achieve significant performance gains, addressing the computational intensity and
resource demands of training large models. The study presents various techniques for optimizing
algorithms and hardware architectures, ensuring efficient utilization of resources. The paper showcases
experimental results demonstrating the effectiveness of co-design strategies in accelerating model
training. These results highlight improvements in training times and scalability, making it feasible
to handle larger datasets and more complex models. By focusing on the co-design approach, the
study provides valuable insights into achieving faster and more scalable training processes, which are
essential for the ongoing advancement of deep learning recommendation systems.
This paper [
130
] explores the integration of software and hardware design principles to optimize
the performance of LLMs. It focuses on the co-design approach, which synchronizes the development
of software algorithms and hardware architecture to achieve efficient processing and better utilization
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
21 of 63
of resources. This synergy is essential in managing the computational demands of LLMs, which require
significant processing power and memory bandwidth. The authors discuss various strategies for
optimizing both hardware (such as specialized accelerators and memory hierarchies) and software
(such as algorithmic improvements and efficient coding practices) to enhance the performance and
efficiency of LLMs. Furthermore, the paper delves into the application of this co-design methodology
in design verification processes. Design verification, a critical phase in the development of digital
systems, benefits from the enhanced capabilities of co-designed LLMs. By leveraging optimized
LLMs, verification tools can process complex datasets and simulations more effectively, leading to
more accurate and faster verification results. The integration of co-designed LLMs into verification
workflows helps in identifying design flaws early, reducing the time and cost associated with the design
and development of hardware systems. The paper highlights case studies and experimental results
that demonstrate the practical benefits and improvements achieved through the software/hardware
co-design approach in real-world verification scenarios.
This paper [
131
] investigates the potential of leveraging LLMs in the co-design process of software
and hardware, specifically for designing Compute-in-Memory (CiM) Deep Neural Network (DNN)
accelerators. It explores how LLMs can be utilized to enhance the co-design process by providing
advanced capabilities in generating design solutions and optimizing both software and hardware
components concurrently. The authors emphasize the importance of LLMs in automating and
improving the design workflow, leading to more efficient and effective development of CiM DNN
accelerators. The paper presents a detailed case study demonstrating the application of LLMs in the
co-design of CiM DNN accelerators. Through this case study, the authors illustrate how LLMs can
aid in identifying optimal design configurations and addressing complex design challenges. The
study shows that LLMs can significantly reduce the time and effort required for design iterations
and verification, thereby accelerating the overall development process. The findings suggest that
integrating LLMs into the co-design framework can result in substantial performance gains and
resource savings, highlighting the viability and benefits of using LLMs in the co-design of advanced
hardware accelerators.
2.2.3. Hardware Accelerators
This paper [
132
] presents a comprehensive dataset specifically designed to facilitate the generation
of AI accelerators driven by LLMs. The dataset includes a diverse range of hardware design
benchmarks, synthesis results, and performance metrics. The goal is to provide a robust foundation
for training LLMs to understand and optimize hardware accelerators effectively. By offering detailed
annotations and a variety of design scenarios, the dataset aims to enhance the ability of LLMs to
generate efficient and optimized hardware designs. The authors detail the structure of the dataset,
which covers various aspects of hardware accelerator design, including computational kernels, memory
hierarchies, and interconnect architectures. They also discuss the potential applications of the dataset
in training LLMs for tasks such as design space exploration, performance prediction, and design
optimization. The paper demonstrates the utility of the dataset through several case studies, showing
how LLMs can leverage the provided data to generate and optimize hardware accelerators with
significant improvements in performance and efficiency.
This study [
110
] explores the use of LLMs to detect and optimize code for hardware acceleration.
The authors propose a methodology where LLMs analyze software codebases to identify sections that
can benefit from hardware acceleration. The LLMs are trained to recognize patterns and structures
within the code that are amenable to acceleration, such as loops and parallelizable tasks. The paper
presents an evaluation of this methodology using several open-source projects, demonstrating that
LLMs can effectively identify and suggest optimizations for hardware acceleration. The authors also
highlight the potential for integrating this approach into existing development workflows, allowing
for seamless detection and acceleration of critical code sections. The study concludes by discussing
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
22 of 63
the challenges and future directions for improving the accuracy and applicability of LLM-driven code
detection for hardware acceleration.
Table 4. Comparison of papers focuses on Hardware Accelerators with LLMs
Paremeter Approach References
Developing a comprehensive dataset to support LLM-driven AI accelerator generation. [132]
Detecting code patterns suitable for hardware acceleration using LLMs. [110]
Automating hardware accelerator design using LLMs. [133]
Optimizing batched LLM inferencing with a heterogeneous acceleration approach
combining NPUs and PIMs. [134]
Objective and Focus
Optimizing memory management and reduce compilation times for multi-core AI
accelerators targeting large language models using a hybrid SPM-cache architecture. [135]
Curating a diverse set of hardware design examples and specifications for LLMs. [132]
Training LLMs on a corpus of annotated code examples to detect hardware-accelerable code. [110]
Using LLMs to interpret high-level hardware design specifications and generate accelerators. [133]
Integrating NPUs and PIMs to handle computation-intensive and memory-bound tasks. [134]
Approach and Methodology
Integrating a shared cache with AI cores and employs TMU for cache management, along
with tile-level hardware prefetching and dead block prediction. [135]
Evaluating dataset by the performance of LLMs in generating accurate hardware designs;
improvements noted. [132]
Measuring accuracy of LLMs in detecting acceleratable code sections and performance
gains; significant improvements found. [110]
Comparing LLM-generated accelerators with manually designed ones; LLM designs show
comparable or superior performance. [133]
Benchmarking against CPU and GPU setups; significant improvements in speed and
energy efficiency. [134]
Evaluation and Result
The system outperforms traditional SPM in mixed-precision quantization scenarios [135]
First standardized dataset for LLM and hardware accelerator design intersection; potential
to advance the field. [132]
Application of LLMs to code optimization for hardware acceleration; automates optimization
process. [110]
Automates traditionally manual hardware design process, reducing development time and cost. [133]
Combines NPU and PIM technologies to optimize LLM inferencing; addresses computational and
memory challenges. [134]
Innovation and Impact
The hybrid SPM-cache architecture introduces novel hardware-level cache management for AI
accelerators, especially beneficial for LLMs. [135]
Expand and diversify the dataset, enhance LLM capabilities for complex tasks. [132]
Develop more sophisticated models, integrate with different hardware platforms, expand dataset. [110]
Refine models, expand applicability to different accelerators, integrate with design tools. [133]
Refine NPU and PIM integration, explore other heterogeneous configurations, expand to other
AI workloads. [134]Future Directions
Further optimization of cache replacement policies and better integration of this architecture into
future AI accelerator designs for large-scale AI models. [135]
Gen-acceleration [
133
] focuses on the pioneering efforts to use LLMs for the automatic generation
of hardware accelerators. The paper outlines a novel framework where LLMs are employed to
translate high-level design specifications directly into hardware accelerator designs. The approach
leverages the NLP capabilities of LLMs to understand and interpret complex design requirements
and convert them into efficient hardware architectures. The authors provide a detailed analysis of the
framework, including its architecture, training process, and performance evaluation. They demonstrate
the effectiveness of the approach through multiple case studies, showing that LLMs can generate
hardware accelerators that meet or exceed the performance of manually designed counterparts. The
paper also discusses the potential for this technology to revolutionize the hardware design process,
making it more accessible and efficient.
NeuPIMs [
134
] introduces a heterogeneous acceleration framework that combines Neural
Processing Units (NPUs) with Processing-In-Memory (PIM) technologies to enhance the performance
of batched LLM inferencing. The paper discusses the architectural innovations that enable this
combination, focusing on how NPUs and PIM can work together to overcome the memory bandwidth
limitations and computational bottlenecks typically associated with LLM inferencing. The authors
provide a comprehensive evaluation of the NeuPIMs framework, highlighting its performance benefits
across various LLM benchmarks. They demonstrate significant improvements in throughput and
energy efficiency compared to traditional GPU-based solutions. The paper also delves into the
technical details of the NPU-PIM integration, including the data flow, memory management, and
synchronization mechanisms that enable efficient batched inferencing.
The paper [
135
] presents the LCM (LLM-focused Hybrid Scratch-Pad Memory (SPM)-cache)
architecture designed for multi-core AI accelerators to address the growing computational demands
of LLMs. By integrating a hybrid system combining SPM and a shared cache, this architecture
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
23 of 63
provides shorter compilation times and better memory management, particularly for LLMs using
mixed-precision quantization. The proposed system utilizes a Tensor Management Unit (TMU) for
efficient cache handling and employs innovative hardware prefetching and dead block prediction
strategies to mitigate memory access issues. The system outperforms conventional SPM-based
architectures, showing up to 50.5% performance improvements in specific scenarios. Table 4provides
an overview of all the papers discussed in the hardware accelerators subsection for comparison.
2.2.4. Hardware Security
DIVAS (Distributed Intelligence for Verification and Security) [
136
] introduces a comprehensive
framework utilizing LLMs to improve the security of System on Chip (SoC) designs. DIVAS
integrates multiple security analysis tools and LLMs to provide an end-to-end solution for detecting
vulnerabilities and enforcing security policies in SoC designs. The framework automates the
identification of security threats and applies policy-based protection mechanisms, with the aim of
streamlined and fortifying the security analysis process. DIVAS employs LLMs for various tasks such
as vulnerability assessment, anomaly detection, and generating mitigation strategies. The authors
detail the system architecture, including its integration with existing SoC design tools and the use
of LLMs to interpret and analyze complex security data. The experimental results demonstrate the
effectiveness of DIVAS in identifying security vulnerabilities and implementing policies, leading
to a significant improvement in SoC security. The paper concludes with a discussion of potential
improvements, such as the incorporation of real-time monitoring and refinement LLMs to handle a
wider range of security scenarios.
This paper [
137
] addresses the potential pitfalls of relying on LLMs for hardware specification,
particularly in the context of security. The authors argue that, while LLMs can accelerate the
specification process, they often generate specifications that are syntactically correct but semantically
flawed, leading to security vulnerabilities. The paper presents case studies where LLM-generated
specifications resulted in security issues, emphasizing the need for formal methods to verify these
specifications. The authors propose a hybrid approach that combines LLMs with formal verification
techniques to ensure the correctness and security of hardware specifications. They present a framework
that uses LLMs to generate initial specifications, followed by formal verification tools to validate and
correct these specifications. Experimental results show that this approach significantly reduces the
incidence of security flaws compared to using LLMs alone. The paper concludes with a discussion on
the limitations of current LLMs in understanding complex security requirements and the importance
of integrating formal methods to achieve reliable and secure hardware designs.
This paper [
138
] also explores the use of LLMs to identify and fix security bugs in hardware
designs. The authors describe a system where LLMs analyzes hardware code to detect potential security
vulnerabilities and suggest fixes. This approach aims to automate the bug-fixing process, reducing the
time and expertise required to secure hardware designs. The paper details the methodology for training
LLMs on hardware security datasets, the types of bugs the system can identify, and the accuracy of the
suggested fixes. Experimental results indicate that LLMs can effectively identify and propose solutions
for a wide range of security bugs, though their effectiveness varies with the complexity of the bugs.
The authors discuss the limitations of LLMs in understanding intricate hardware interactions and
suggest future work to improve the robustness of the models and expand their applicability to more
complex security scenarios.
This paper [
139
] examines the potential misuse of general-purpose LLMs in designing hardware
Trojans, malicious circuits embedded in hardware designs. The authors demonstrate how LLMs,
typically used for benign purposes, can be repurposed to create sophisticated Trojans that are difficult
to detect. The paper presents a series of experiments where LLMs are used to generate Trojan
designs and assesses their effectiveness and stealthiness. The authors highlight the risks posed by
the accessibility of powerful LLMs and the need for robust detection mechanisms. They propose
countermeasures, including enhanced verification processes and the development of specialized
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
24 of 63
LLMs trained to recognize and flag suspicious patterns in hardware designs. The paper concludes
by discussing the ethical implications of LLMs in hardware design and the importance of proactive
measures to prevent their misuse in creating security threats.
LLM for SoC Security [
140
] explores the transformative impact of LLMs on the security of SoC
designs. The paper argues that LLMs represent a significant advancement in the ability to analyze and
secure SoC architectures. The authors describe various applications of LLMs in SoC security, including
vulnerability detection, threat modeling, and automated patch generation. The paper provides detailed
case studies demonstrating the effectiveness of LLMs in enhancing SoC security. Experimental results
show that LLMs can identify vulnerabilities more efficiently and accurately than traditional methods.
The authors discuss the challenges in integrating LLMs with existing security workflows and propose
solutions to address these challenges. The paper concludes by highlighting the potential for LLMs to
redefine SoC security practices and the need for continued research to fully realize their potential.
This paper [
141
] discusses the dual role of LLMs in chip design: as tools for improving design
efficiency and as potential sources of security risks. The authors examine the capabilities of LLMs
in automating various aspects of chip design, including specification, verification, and optimization.
They also highlight the security risks associated with LLM-generated designs, such as the inadvertent
introduction of vulnerabilities and the potential for malicious use. The authors propose a framework
for building trust in LLM-generated designs by incorporating rigorous security checks and validation
processes. They present case studies where LLMs successfully aided in chip design and instances where
they introduced security risks. The paper concludes with recommendations for developing secure
LLM workflows, including enhanced training protocols and collaboration between LLM developers
and security experts to mitigate risks and ensure the reliability of LLM-assisted chip designs.
This paper [
142
] investigates the use of LLMs to assist in fixing security bugs in hardware
code. The authors present a system where developers can interactively prompt LLMs to identify and
correct security vulnerabilities in hardware designs. The approach aims to leverage the language
understanding capabilities of LLMs to enhance the efficiency and accuracy of bug fixing. The paper
details the interactive prompting system, the types of security bugs it can address, and the effectiveness
of the LLM-generated fixes. Experimental results show that the system can significantly reduce the
time required to identify and fix security bugs, though the success rate varies with the complexity of
the bugs. The authors discuss the limitations of current LLMs in handling complex hardware security
scenarios and suggest future research directions to improve the system’s robustness and expand its
capabilities.
This paper [
143
] explores the use of LLMs to generate security assertions for hardware designs.
Security assertions are critical for verifying that hardware operates as intended without vulnerabilities.
The authors propose a framework where LLMs are trained to understand the hardware specifications
and automatically generate the corresponding security assertions. The paper demonstrates the
effectiveness of this approach through multiple case studies, showing that LLM-generated assertions
can identify security issues early in the design process. The authors also discuss the challenges
of training LLMs to understand complex hardware specifications and the need for continuous
improvement in the training datasets. The study concludes that LLMs has a significant potential
to enhance the security verification process by providing accurate and automated security assertions.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 November 2024 doi:10.20944/preprints202411.0156.v1
25 of 63
Table 5. Comparison of papers focuses on Hardware Security with LLMs
Paremeter Approach References
Providing comprehensive LLM-based frameworks that enhance security analysis in SoC design
by automating tasks such as bug fixing, vulnerability detection, and policy generation. [136,140,144,145]
Specific challenges of detecting and fixing bugs in hardware code, as well as the potential
misuse of LLMs fo