Available via license: CC BY 4.0
Content may be subject to copyright.
How Can Large Language Models Help Humans in Design And
Manufacturing?
LIANE MAKATURA,MICHAEL FOSHEY, and BOHAN WANG,MIT, USA
FELIX HÄHNLEIN,University of Washington, USA
PINGCHUAN MA,BOLEI DENG, and MEGAN TJANDRASUWITA,MIT, USA
ANDREW SPIELBERG,Harvard University, USA
CRYSTAL ELAINE OWENS,PETER YICHEN CHEN, and ALLAN ZHAO,MIT, USA
AMY ZHU,University of Washington, USA
WIL J NORTON,EDWARD GU,JOSHUA JACOB, and YIFEI LI,MIT, USA
ADRIANA SCHULZ,University of Washington, USA
WOJCIECH MATUSIK,MIT, USA
The advancement of Large Language Models (LLMs), including GPT-4, provides exciting new opportunities for generative
design. We investigate the application of this tool across the entire design and manufacturing workow. Specically, we
scrutinize the utility of LLMs in tasks such as: converting a text-based prompt into a design specication, transforming a
design into manufacturing instructions, producing a design space and design variations, computing the performance of a
design, and searching for designs predicated on performance. Through a series of examples, we highlight both the benets
and the limitations of the current LLMs. By exposing these limitations, we aspire to catalyze the continued improvement and
progression of these models.
CCS Concepts: •Computing methodologies
→
Modeling and simulation;Spatial and physical reasoning;•Human-
centered computing →Natural language interfaces;Text input.
Additional Key Words and Phrases: Large Language Models, GPT-4, computational design, computational fabrication, CAD,
CAM, design for manufacturing, simulation, inverse design
1 INTRODUCTION
Advances in computational design and manufacturing (CDaM) have already permeated and transformed numerous
industries, including aerospace, architecture, electronics, dental, and digital media, among others. Nevertheless,
the full potential of the CDaM workow is still limited by a number of barriers, such as the extensive domain-
specic knowledge that is often required to use CDaM software packages or integrate CDaM solutions into
existing workows. Generative AI tools such as Large Language Models (LLMs) have the potential to remove
these barriers, by expediting the CDaM process and providing an intuitive, unied, and user-friendly interface
that connects each stage of the pipeline. However, to date, generative AI and LLMs have predominantly been
applied to non-engineering domains. In this study, we show how these tools can also be used to develop new
design and manufacturing workows.
Authors’ addresses: Liane Makatura, makatura@mit.edu; Michael Foshey, mfoshey@mit.edu; Bohan Wang, bohanw@mit.edu, MIT, 77
Massachusetts Ave, Cambridge, MA, 02139, USA; Felix Hähnlein, fhahnlei@cs.washington.edu, University of Washington, 1410 NE Campus
Parkway, Seattle, WA, 98195, USA; Pingchuan Ma, pcma@csail.mit.edu; Bolei Deng, boleiden@mit.edu; Megan Tjandrasuwita, megantj@
mit.edu, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, USA; Andrew Spielberg, aespielberg@seas.harvard.edu, Harvard University,
Massachusetts Hall, Cambridge, MA, 02138, USA; Crystal Elaine Owens, crystalo@mit.edu; Peter Yichen Chen, pyc@csail.mit.edu; Allan
Zhao, azhao@csail.mit.edu, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, USA; Amy Zhu, amyzhu@cs.washington.edu, University of
Washington, 1410 NE Campus Parkway, Seattle, WA, 98195, USA; Wil J Norton, wn1024@mit.edu; Edward Gu, egu@mit.edu; Joshua Jacob,
jmjacob@csail.mit.edu; Yifei Li, liyifei@csail.mit.edu, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, USA; Adriana Schulz, adriana@cs.
washington.edu, University of Washington, 1410 NE Campus Parkway, Seattle, WA, 98195, USA; Wojciech Matusik, wojciech@csail.mit.edu,
MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
arXiv:2307.14377v1 [cs.CL] 25 Jul 2023
2•Makatura et al.
Our analysis examines the standard CDaM workow to identify opportunities for LLM-driven automation or
acceleration. Specically, we break the CDaM workow into ve phases, and then assess whether and how the
eciency and quality of each phase could be improved by integrating LLMs. The components under investigation
include (1) generating a design, (2) constructing a design space and design variations, (3) preparing designs for
manufacturing, (4) evaluating a design’s performance, and (5) discovering high-performing designs based on a
given performance and design space.
Although it is feasible to create specialized LLMs for design and manufacturing, we demonstrate the opportu-
nities oered by generic, pre-trained models. To this end, we conduct all of our experiments using GPT-4 [
26
]
1
, a
state-of-the-art general-purpose LLM. Our GPT-4-augmented CDaM workows demonstrate how LLMs could be
used to simplify and expedite the design and production of complex objects. Our analysis also showcases how
LLMs can can leverage existing solvers, algorithms, tools, and visualizers to synthesize an integrated workow.
Finally, our work demonstrates current limitations of GPT-4 in the context of design and manufacturing, which
naturally suggests a series of potential improvements for future LLMs and LLM-augmented workows.
2 BACKGROUND & RELATED WORK
To contextualize our work, we briey describe the state of the art for generative LLMs and various aspects of
CDaM.
2.1 LLMs for Generative Modeling
Large Language Models (LLMs) have garnered signicant interest in the research community and beyond, as
a result of both their already-demonstrated generative capabilities and their seemingly unbounded promise.
Although these models are recognized primarily for their inuence on text generation [
31
], their reach has
been extended to impact various other domains, including image generation [
32
], music generation [
9
], motion
generation [
16
], code generation [
6
], 3D model creation [
19
], and robotic control [
23
]. Notable foundational
models include OpenAI’s GPT series, ranging from GPT-2 to the more recent GPT-4 [
26
]. These models have
showcased progressive improvements in uency, coherence, and generalization capabilities. Meta AI’s LLaMa
model has further extended the reach of LLMs by demonstrating prociency in both text and image synthesis [
36
].
The Falcon LLM [
29
], trained exclusively on properly ltered and deduplicated web data, has exhibited comparable
performance to models trained on meticulously curated datasets. These models have been utilized in conjunction
with Reinforcement Learning from Human Feedback (RLHF) to improve the quality of the generated content
[
27
]. This is done by incorporating human feedback into the training process, where humans rate the quality of
the generated outputs and provide examples of ideal outputs for a given input [
7
]. In parallel, domain-specic
LLMs have also been trained for performance within a specic subject area. For example, ProtGPT2 specializes in
predicting protein folding structures [
14
], while Codex has been specically tailored to understand and generate
code [6]. In this work, we investigate the generative capabilities of generic, pre-trained LLMs within CDaM.
2.2 Computational Design and Manufacturing
The CDaM workow is often decomposed into a series of steps including (1) representing a design, (2) representing
and exploring a design space, (3) preparing a design for manufacturing, (4) computing the performance of a
design, and (5) nding a design with optimal performance. For each phase, we provide a brief overview of the
relevant work, with a focus on aspects that oer the best opportunities for LLM integration.
Design Representations. The cornerstone of computational design is the capacity to digitally represent and
manipulate the salient aspects of a given design – such as geometry, articulated joints, material composition,
1We use the OpenAI ChatGPT interface to interact with the GPT-4 versions released between May 24, 2023 and July 19, 2023
How Can Large Language Models Help Humans in Design And Manufacturing? •3
etc. There are many ways to represent such aspects, but we focus on focus on design representations that are
compact, understandable, and editable. For example, modern CAD systems represent a shape as a sequence of
operations such as 2D sketches, extrusions and Boolean operations [
38
]. These can be represented as compact
programs written in domain specic languages (DSLs) such as OnShape’s FeatureScript [
2
]. Designs can also
be represented compactly as a graph [
30
,
40
], in which the nodes typically represent individual components,
while edges represent multi-component interactions. Such graphs have been used to eciently and hierarchically
represent CAD models [
10
], robots [
41
], metamaterials [
21
], architecture [
24
], and chemical molecules [
15
]. To
represent even more complex designs – such as a quadcopter with a physical design and a software controller –
multiple DSLs may be used simultaneously. For example, the copter’s physical design may be encoded using
CAD, while its software is coded using a control-specic DSL.
Design Space Representations. A design space represents an entire family of designs – rather than a single
instantiation – which allows for design exploration, customization, and performance-driven design optimization.
One of the most popular design space representations is parametric design, in which a few exposed parameters are
used to control a design. This is commonly used in CAD systems, where e.g. a bookshelf may be parametrized by
its height, width, depth, and number of shelves. Another popular option is formal languages such as L-systems [
33
]
or shape-grammars [
28
,
34
], which generate design variations by manipulating a set of terminal and non-terminal
symbols according to given rewrite rules. Formal languages have been used in domains such as architecture [
24
],
robotics [41], and chemistry [15].
Design for Manufacturing. Design for Manufacturing (DfM) is a planning process used to generate designs that
can be fabricated with maximal eciency and minimal cost. One prominent aspect of this is Computer-Aided
Manufacturing (CAM), which transforms a digital design into a viable fabrication plan for some manufacturing
process, such as 3D printing, 3- or 5-axis CNC milling, or sheet-metal stretching. CAM also extends to multi-process
representations such as STEP-NC, which abstracts away from machine-specic G-code in favor of tool-type-
specic machining operations that are interpretable on dierent hardware. Because all of these fabrication
plans can also be described as a program in some DSL, CAM can be interpreted as a translation from a design
DSL to a manufacturing-oriented DSL. DfM also includes many other aspects, such as selecting an appropriate
manufacturing method, optimizing manufacturing process parameters [
13
], sourcing parts and materials, or
modifying a design in light of manufacturing constraints [18].
Performance Prediction. Before manufacturing a design, engineers typically want to understand its predicted
performance. For example, automobile engineers may wish to evaluate and iteratively rene a candidate design’s
eciency, safety, and aesthetics. To do this, engineers frequently make use of numerical simulation methods such
as general-purpose nite element analysis (FEA) [
11
] or more domain-specic approaches for e.g. acoustics [
25
],
robotics [
12
], and electromagnetism [
35
]. Commercial CAD systems (e.g., Autodesk [
5
] and Dassault Systèmes [
8
])
integrate simulation into their ecosystem. Since engineers are primarily interested in the performance of the
design’s manufactured counterpart, it is crucial to minimize the gap between an object’s performance in simulation
versus reality.
Performance Optimization: Given a design space and a way to predict performance, it is natural to seek designs
that perform best with respect to a particular metric. Although this search could be performed via manual trial
and error, it is more ecient and eective to use automated exploration tools. One process known as inverse
design can automatically search (or optimize) over a given design space to nd a design that exhibits some
target performance [
20
]. Inverse design has already been applied to many problem domains. For example, a
parametric design space can be searched for designs that have the best value of a simulated metric [
39
]. Topology
optimization has been applied to problems such as minimum compliance. In addition, designs can be optimized
for metrics such as weight, cost, and manufacturing time.
4•Makatura et al.
Fig. 1. Opportunities for LLM Integration within the CDaM Workflow. Each technical section of our paper covers
opportunities for LLM integration in one of the tasks depicted above: text to design, text/design to design space, bi-directional
design for manufacturing, design to performance, and inverse design (from performance and design space to an optimized
design).
3 OVERVIEW
The fundamental aim of this study is to conduct an in-depth exploration of the opportunities and challenges of
applying contemporary LLMs within the landscape of the CDaM workow described in Section 2.2. Driven by
this objective, we propose a thorough and wide-ranging exploration that is independent of any predened or
proposed framework.
To apply LLMs coherently across such diverse tasks, we leverage the insight that all building blocks in the
CDaM workow (design, design spaces, manufacturing instructions, and performance metrics) can be represented
by compact programs. Thus, at a high level, every phase of the CDaM workow can be seen as a translation layer
between an input DSL and an output DSL. The fact that LLMs excel at such symbolic manipulations suggests
that LLMs have the potential to address these tasks while simultaneously leveraging and improving upon our
traditional solutions.
To achieve comprehensive coverage and uncover the dierent facets of LLM-assisted CDaM, we have undertaken
an extensive suite of experiments, incorporating a broad variety of design representations, manufacturing
processes, and performance metrics. These are detailed further in Section 3.2.
3.1 Methodology
Our methodology is crafted to provide a comprehensive inspection of the opportunities for and ecacy of various
interfaces between GPT-4 and the CDaM workow. We investigate each of the ve stages of the design and
manufacturing pipeline individually. As illustrated in Figure 1, these stages include: design generation (Section 4),
design space generation (Section 5), design for manufacturing (Section 6), performance prediction (Section 7),
and inverse design (Section 8).
In each of these stages, we pose fundamental questions about ways in which GPT-4 may oer some benet,
and then conduct a series of experiments to answer these questions. For each query, we investigate aspects such
as (1) strategies for engineering eective prompts, (2) strategies for integrating human feedback, expertise, or
preferences into the LLM-assisted design process, and (3) tasks that GPT-4 can accomplish natively versus tasks
that are better completed by asking GPT-4 to leverage external tools.
After a detailed examination of each stage, we sought to understand the implications of incorporating GPT-4
within an end-to-end CDaM process. To this end, we designed and fabricated two practical examples (a cabinet
and a quadcopter) with GPT-4’s support. The end-to-end design process for each example is detailed in Section 9.
How Can Large Language Models Help Humans in Design And Manufacturing? •5
Category Code Title Summary
Capabilities C.1 Extensive Knowledge Base in Des.&Mfg. GPT-4 has a broad knowledge of design and mfg. considerations
C.2 Iteration Support GPT-4 attempts (and often succeeds) to iterate and recitfy errors when prompted
C.3 Modularity Support GPT-4 can reuse or adapt previous/provided designs or solutions
Limitations L.1 Reasoning Challenges GPT-4 struggles with spatial reasoning, analytical reasoning, and computations
L.2 Correctness and Verication GPT-4 produces inaccurate results or justications for its solutions
L.3 Scalability GPT-4 struggles to respect multiple requests concurrently
L.4 Iterative Editing GPT-4 forgets/introduces errors when modifying previously-generated designs
Dualisms D.1 Context Information GPT-4’s performance depends on the amount of context provided
D.2 Unprompted Responses GPT-4 makes inferences/suggestions beyond what is specied in the prompt
Table 1. GPT-4’s key properties for CDaM To facilitate discussion of GPT-4’s applicability for design and manufacturing
(Des.&Mfg.), we have identified 9 key observations about GPT-4 that persist across several aspects of the CDaM workflow.
This includes 3 powerful capabilities, 4 limitations, and 2 dualisms (so named because they may manifest either as an
opportunity or a drawback, depending on the context). We use these observations to frame our discussions about GPT-4’s
suitability for each stage of the CDaM workflow.
Beyond these individual questions, our comprehensive investigation has also exposed several key insights
about GPT-4’s general capabilities and limitations with respect to CDaM. We have also observed a group of
properties that we term ’dualisms’, because they may manifest either as an opportunity or a drawback, depending
on the situation. Our ndings are summarized in Table 1, with a full description in Section 10.1. To emphasize the
pervasive nature of these properties, we also use these labels as a framework for our discussions and takeaways
at the end of each section. Specically, we draw on each section’s ndings and examples in order to illustrate the
manifestation and impact of various properties in Table 1throughout the CDaM workow.
3.2 Scope of Evaluation
To conduct a holistic survey of GPT-4-assisted CDaM, our experiments span a number of dierent design domains
(Section 3.2.1), performance metrics (Section 3.2.2) and manufacturing methods (Section 3.2.3). Here, we briey
describe each domain of interest, along with the specic challenges they pose and the sort of representative,
transferable insight we hope to glean by studying each domain in connection with LLMs.
3.2.1 Target Design Domains. Our experiments are concentrated in three main design domains, including 2d
vector graphic design, 3D parametric modeling, and articulated robotics problems.
Vector graphics use a series of text-based commands to represent paths and areas that form a given design.
Vector image formats are an important part of CDaM, as they can be used as both a design specication and a
manufacturing specication for e.g. laser cutters. Despite their simplicity, vector graphics can represent a wide
range of 2D and 3D objects, such as artistic engravings or at-pack furniture. We examine LLMs’ capacity to
generate two popular vector formats: SVG and DFX. These formats present several challenges: they contain
boilerplate formatting that GPT-4 may struggle to reproduce; it may be dicult to layout individual pieces on the
canvas; and nally, it may be dicult to decompose higher-dimensional designs into 2d. Thus, vector graphics
will test GPT-4’s spatial reasoning and ability to respect highly-constrained syntax, either on its own or with the
use of external libraries.
Parametric modeling languages generate 3D geometry through a sequence of constructive instructions. The
term “parametric modeling” reects how each constructive operator exposes a set of parameters, such as the radius
of a circle. We explore two distinct approaches that are powerful, widely-used, and well-documented online. The
rst is rooted in classic Constructive Solid Geometry (CSG), which constructs shapes by successively deploying
boolean operations (union, intersection, subtraction) over basic shapes or primitives (such as cuboids, spheres,
cylinders, and so forth) that can undergo transformations such as translations, rotations, and scaling. The CSG
6•Makatura et al.
approach is intended to test the global spatial reasoning capacity of GPT-4, as every CSG operation/transformation
occurs w.r.t. a shared coordinate space. The second representation relies on the contemporary B-rep format used
by modern CAD systems. Here, geometry is built through a sequence of operations like sketching, extruding,
and lleting. Each operation in this context is parametric and uses references to previously created geometry to
e.g., select a plane for a sketch design or select a sketch for an extrusion. Sketch-based CAD will test GPT-4’s
ability to eectively switch between and reason over multiple relative, local coordinate frames.
Robotics oers a particularly rich design domain, as GPT-4 must coordinate a set of articulated and actuated
geometries to form complex objects such as open chain robot arms, wheeled robots, copters/UAVs, and robot
grippers. Robotics representations must describe not only the high-level geometry of each part, but also their
properties and relationships – including the joints between parts, the degrees of freedom that those joints exhibit,
and dynamics information such as the inertia of a given part. Several existing formats support these tasks, but
we primarily use the XML-based language known as the Universal Robot Description Format (URDF). We also
investigate the use of a more general graph-based robot representation. These formats test GPT-4’s ability to
simultaneously reason about multiple aspects of design, such as static geometric bodies and dynamic articulation
constraints.
3.2.2 Target Performance Domains. Diverse performance domains within engineering design require evaluation
of aspects such as structural and material properties, mechanical integrity, geometry-based functionality, materials
use, electromechanical integration, and subjective features. The results of such evaluation allow us to (dis)qualify
a design for use and to use the evaluation to further understand and improve the design. Using GPT-4, we focus
on assessing mechanical and structural properties through generating rst-order analysis equations for input
designs of standard objects like chairs, cabinets, and a quadcopter, which test the ability of GPT-4 to suciently
understand a given input design in text form or through a DSL and to evaluate criteria for functionality and
failure. Mechanical properties assessed include weight, size, load capacity, storage capacity, and stability. Analysis
of electromechanical functionality include battery life and quadcopter travel distance. Further use of GPT-4
aims to streamline the computationally intensive process of Finite Element Analysis (FEA), a crucial tool for
understanding structural behavior in detail under various conditions, and we apply this to the case of a load on a
set of chairs.
In addition to these technical aspects, our investigation extends into the subjective domains of sustainability
and aesthetics, which cannot be strictly quantied. The inherent complexity and qualitative nature of these
areas present unique challenges in evaluation. While it is well-known that computational systems can compute
quantitative features, machine learning systems are becoming more sophisticated in artistic domains, and so
we seek to leverage the capacity of LLMs for lexical analysis to aid more holistically in the more ambiguous
realms of the design process and to nd its limits. For example, could an LLM reasonably address whether a
piece of furniture of a given size is “large”, or if a shoe of a given design is “comfortable,” or can it only handle
classically quantiable features? Can it even help us to reason more objectively about what aspects delineate
these properties? To this end, we test evaluation of subjective domains and use GPT-4 to generate a scoring
system and functions for quantifying the sustainability of a chair, the classication of chairs based on categories
of aesthetic inuence, and the appropriate distribution of a set of chairs into a set of rooms in a house, among
other examples.
We further combine these performance metric evaluations with the principles of inverse design. Inverse design
entails setting desired performance attributes and employing computational methodologies to deduce design
parameters that satisfy these attributes, both by generating areas for improvement within a design domain and by
testing the eects of implementing improvements suggested by GPT-4 or target design goals of our own interest,
as well as selecting appropriate methods of optimization. In this case, given a design/decision space for an object,
we use GPT-4 to generate and implement methods to computationally improve or optimize qualifying designs to
How Can Large Language Models Help Humans in Design And Manufacturing? •7
satisfy designated performance goals. This methodical approach evaluates if LLMs can apply constructive logic
for design enhancement and innovation.
3.2.3 Target Manufacturing Domains. Leveraging language models like GPT-4 in DfM context can yield more
consistent and scalable decision-making, potentially augmenting human expertise and reducing our reliance on
CAD software usage. Potential applications of GPT-4 include the selection of optimal manufacturing techniques,
suggestion of design modications that would enable easier production, identication of potential suppliers, and
creation of manufacturing instructions. The approach is aimed to alleviate many of the bottlenecks caused by the
designers’ lack of knowledge and experience in DfM.
In a set of experiments, we’ve explored GPT-4’s capabilities across various tasks. Firstly, GPT-4 was used to
identify the optimal manufacturing process for a given part, considering factors such as part geometry, material,
production volume, and tolerance requirements. Next, GPT-4 was tasked with optimizing a component design for
CNC machining. Given the geometry of the component, GPT-4 identied potential manufacturing diculties and
modied the design to address these. We also leveraged GPT-4’s extensive dataset knowledge to identify parts
needed for manufacturing.
In addition to these, GPT-4 was used to create manufacturing instructions for both additive and subtractive
design processes. Additive design can be challenging due to the need for spatial reasoning, precision, and
meticulous planning, and often requires many iterations. We’ve explored the generation of fabrication instructions
using subtractive manufacturing techniques for a cabinet design. We also investigated GPT-4’s potential in
generating machine-readable instructions for robot assembly tasks and converting those into human-readable
standard operating procedures. This allowed for eective communication and collaboration between robots and
human operators.
4 TEXT-TO-DESIGN
For our rst line of inquiry, we explore the extent to which GPT-4 is able to generate designs across a variety of
domains. Even within the specic context of manufacturable design, the concept of a “design” is quite broad, and
exists at many scales. For example, we may want to specify a single self-contained part, or a sizable hierarchical
assembly containing several levels of sub-assemblies and/or other individual component modules. Such assemblies
may be completely customized/self-contained, with all parts designed simultaneously, or they may be hybrid
designs that integrate existing, pre-manufactured elements such as brackets or motors. In many cases, our target
design tasks also include dynamic considerations such as assembly mating or articulated joints.
Although these complex tasks may initially seem out-of-scope for lexical models such as LLMs, there are many
modeling and design paradigms that can be expressed in terms of potentially-LLM-compatible language. To guide
our exploration of GPT-4’s ability to interface with each of these models, we pose the following questions:
•
Q1 Can GPT-4 generate a meaningful design when provided with a high-level description of the goal and
a given modeling language?
•
Q2 To what extent is the user able to control the designs created by GPT-4? Is GPT-4 able to interpret and
respect user-dened constraints, such as spatial relationships between objects or integration of standard
pre-fabricated parts?
•
Q3 Is GPT-4 able to incorporate high-level abstractions used by human designers, such as modular
(de)composition?
4.1 Simple, self-contained designs from high-level input (Q1)
To explore GPT-4’s capacity for design, we rst test its ability to do one- (or few-) shot generation of an object
from a minimal high-level text description as input. Ideally, we would like to understand GPT-4’s ability to
complete design tasks independent of any particular modeling paradigm. However, it is not immediately clear
8•Makatura et al.
how much dependence there may be on the specic representation that is chosen, because the variation in
possible language-based modeling paradigms is signicant. Some languages are very general and versatile, with
a wide variety of features and capabilites, while others may be highly-specialized for a specic set of tasks or
outcomes. Similarly, some languages are well-established with plentiful online documentation or examples, while
others may be custom-dened, poorly documented, or otherwise underrepresented in GPT-4’s training repository.
Finally, some languages are fairly streamlined, while others may be syntactically complex and/or require the
use/coordination of many modules. Each possibility oers unique capabilities and challenges. Thus, we set out to
test a wide variety of them, in an eort to determine LLMs’ ability to use each representation; whether there are
any conclusions that seem to span across dierent representations; and whether any particular representations
seem uniquely well- or poorly-suited for LLM integration.
4.1.1 Vector Graphics with SVG/DXF. Our initial focus in the design domain is on 2D vector graphics. Vector
formats such as SVGs or DXFs are prevalently utilized in manufacturing processes, like those for laser or waterjet
cutting. The goal of our investigation was to ascertain whether GPT-4 could empower designers to transform
their text directly into these vector formats. To evaluate this, we conducted experiments to determine if GPT-4 is
capable of generating a valid SVG le and converting the design into DXF format.
The primary aim of our experiment was to design an SVG le for a cabinet, with predetermined dimensions, to
be constructed from 1/2 inch plywood. This implies that the thickness of each wall, a preset parameter, is 0.5
inches. The experimental setup involved the design of a cabinet comprising three shelves, with overall dimensions
measuring 6 feet in height, 1 foot in depth, and 4 feet in width. A crucial aspect of the investigation was to see if
GPT-4 could accurately account for this wall thickness during the design of the cabinet, appropriately adjusting
the dimensions of its various components. GPT-4 was able to design the specied cabinet and subsequently
generated a Python script to create an SVG le reecting the cabinet’s layout. The script considered the necessary
clearances for the thickness and accurately positioned the side panels, top and bottom panels, shelves, and back
panel. Moreover, it factored in the prescribed spacing between parts and leveraged ‘svgwrite’ to generate the SVG
le. The resulting SVG le provided a visual depiction of the cabinet’s design. We also replicated the experiment
to create a DXF le, where GPT-4 utilized ‘ezdxf’ to generate the le. The results of these experiments are depicted
in Figure 2.
In conclusion, GPT-4 demonstrated its capability to employ the APIs for generating the vector le in the correct
format without any simplications. Nevertheless, it was necessary to perform several iterations to ensure GPT-4
did not cause any overlap among the cabinet parts.
4.1.2 CSG with self-defined primitives. The next design domain we are investigating is CSG. As outlined in
Sec.3.2.1, CSG languages generally operate by building up a collection of primitives that have been altered or
combined via linear transformations and Boolean operations. Because the associated design logic can be quite
complex, it was not immediately clear that GPT-4 should be able to generate designs using these languages. Thus,
to progressively test GPT-4’s modeling capabilities, we begin by exploring a very simple, custom CSG language
based on a single primitive: a box.
Boxes are one of the most common primitives seen in manufacturing. Moreover, many shapes can be considered
as a combination of boxes with dierent sizes. Because of the simplicity of a box or any shape formed by the
boxes, we would like to see if GPT-4 is able to generate designs of such kind of simple shapes, such as tables and
chairs.
Our initial approach to this task is performed in 2D. We provide a function, foo(x, y, w, h), which forms a
box of dimensions
𝑤×ℎ
centred at the point
(𝑥, 𝑦 )
. We subsequently employ this function to generate letters
composed of axis-aligned bars, such as ‘F’ and ‘E’. During the testing phase, we observed that while the system
understands the requirement of 2D boxes, it struggles with their accurate placement. A particularly prominent
issue is the collision problem. More specically, the GPT-4 system fails to determine whether two boxes are
How Can Large Language Models Help Humans in Design And Manufacturing? •9
I would like to build a cabinet; the cabinet will have 3 shelves 6 feet high, 1 foot deep, and 4 feet wide. Please ensure that you
account for the thickness of the wood. I want to use 1/2 inch plywood. Can you label the parts? Create a script that creates an
SVG le. Use lines to depict the edges of the parts.
import svgwrite
(... omitted by authors: dimension conversion from feet to inches ...)
dwg = svgwrite. Drawing('cabinet .svg ', prole ='tiny ')
(... omitted by authors: variable setup ...)
# Side panels
for iin range(2) :
dwg.add(dwg.rect ((cumulative_width , 0) , side_panel_dims) )
cumulative_width += side_panel_dims[0] + space
max_height = max(max_height, side_panel_dims[1])
(... omitted by authors: similar for loop to write the top and bottom panels ...)
# Reset cumulative_width for next row and update max_height
cumulative_width = 0
max_height += space
(... omitted by authors: for loop to write the shelves ...)
(... omitted by authors: reset cumulative width and height variables, then write back panel ...)
dwg.save()
Can you use the same method you used in this to generate the placement? Except generate a DXF rather than an SVG?
import ezdxf
(... omitted by authors: variable setup and dimension conversion from feet to inches ...)
doc = ezdxf . new("R2010")# Create a new DXF document
msp = doc.modelspace() # Create a new layout
(... omitted by authors: variable setup ...)
# Side panels
for iin range(2) :
msp.add_lwpolyline ([( cumulative_width, 0) , (cumulative_width, side_panel_dims [1]) , (
cumulative_width + side_panel_dims [0], side_panel_dims[1]) , (cumulative_width +
side_panel_dims [0], 0) , (cumulative_width, 0) ])
cumulative_width += side_panel_dims[0] + space
max_height = max(max_height, side_panel_dims[1])
(... omitted by authors: similar for loop to write the top and bottom panels ...)
(... omitted by authors: reset cumulative width and heigh variables, then for loop to write the shelves ...)
(... omitted by authors: reset cumulative width and height variables, then write back panel ...)
doc. saveas ( " cabinet . dxf " )# Save the DXF le
Fig. 2. The prompts used for generating files in the format of SVG and DXF using GPT-4. The visualization of the
graphs are shown on the right side. It is clear that GPT-4 can accomplish this task aer several iterations.
overlapping or whether there is a vacant space between them. This issue is observable when creating letters like
‘T’ and ‘E’. Using three to ve targeted prompts enabled GPT-4 to ascertain the correct positions. However, these
prompts had to be granular and often involved providing the direct solution. The outcomes of these attempts are
10 •Makatura et al.
demonstrated in Figure 3. Interestingly, after addressing this issue, GPT-4 appears to retain the corrections. This
is evidenced by its successful generation of the new letters ‘F’ and ‘L’ in a single attempt. These letters share a
similar structure to ‘T’ and ‘E’, and the results can be seen in Figure 3.
Our next step involved venturing into 3D, which holds more practical values. Analogous to the 2D scenarios,
we inform GPT-4 of a pre-established function, box(x, y, z, w, h, d), which generates a 3D box of dimensions
𝑤×ℎ×𝑑
centred at the 3D coordinates
(𝑥, 𝑦, 𝑧 )
. We then tested if GPT-4 could write a program to produce a
simple box of specied dimensions, for instance, 100
×
100
×
40, utilizing function ‘box’. GPT-4 successfully
accomplished this task, and the resulting text explanation illustrates its understanding of the box concept and the
usage of our predened function. Next, we presented a more complex challenge: having GPT-4 design a simple
table, typically consisting of four legs and a tabletop in the real world. We posed the question of whether GPT-4
could craft a program to generate such a table with a provided size using solely our box function. The output
text explanation revealed that GPT-4 accurately comprehends the structure of a basic table. Given that we only
provide the overall table size, GPT-4 lacks information about individual leg lengths or tabletop thickness. Yet, it
was able to identify these missing parameters and make reasonable assumptions. Consequently, GPT-4 succeeded
in writing a program to represent the table by creating ve boxes using our predened function. Upon visualizing
the 3D table, however, the relative positioning of each pair of boxes was not always accurate. We noticed that the
tabletop appeared to be suspended in the air, not in contact with the legs, as shown in Figure 4. This diculty, also
observed in our 2D tests (Figure 3), pertains to GPT-4’s understanding of mathematical concepts. In this instance,
we expedited the process by directly providing GPT-4 with the solution. We indicated the necessary translations
for the misplaced boxes, acknowledging that it would take several prompts to rectify the issue otherwise. After
correcting the oating tabletop, the table appeared as intended, as demonstrated in Figure 4. Therefore, to create
a table, it only required two prompts, signicantly streamlining the procedure for generating a basic table.
Once we successfully generate the table, our next more challenging goal is to design a few accompanying chairs.
We tasked GPT-4 with creating a chair compatible with the table, using only our predened function. Similar to
its approach with the table, GPT-4 successfully deduced the basic structure of a simple chair, comprising the seat,
four legs, and a backrest. Unlike the table instance, we didn’t observe any ‘oating’ issues in this scenario. It
appears that GPT-4 might have indeed gleaned some insights from previous experiences, as we also observed
when creating 2D letters. After we rectied the letters ‘T’ and ‘E’, there were no issues with the remaining letters.
Additionally, GPT-4 demonstrated comprehension of the concept of compatibility by outputting a chair of an
appropriate size. However, it was not successful in all aspects, as depicted in Figure 5. We attempted to correct
the backrest but were unable to do so. As a result, we had to manually adjust the position, directing GPT-4 to the
specic lines that needed modication to correct the structure. The nal result can be seen in Figure 5. We believe
the root of these issues lies in GPT-4’s struggles to comprehend geometric concepts, a diculty also observed in
previous examples. Despite these hurdles, the process for creating a basic table and chairs has been considerably
simplied.
Fig. 3. Failed and Successful Cases of Leer Creation Using GPT-4. The solid square is the origin of the 2D coordinate
system.
How Can Large Language Models Help Humans in Design And Manufacturing? •11
Fig. 4. Failed and Successful Cases of Table Creation Using GPT-4. The table consists of five parts: 4 legs and a tabletop.
Although GPT-4 successfully gives a correct composition of the table, GPT-4 outputs a floating tabletop without any human
intervention.
Fig. 5. Failed and Successful Cases of Chair Creation using GPT-4. GPT-4 successfully gives a correct composition of
the chair. In the incorrect version (le), the dimension of the backseat is wrong and it looks like the orientation is wrong.
Our nal objective was to position four identical chairs around the table. Although theoretically feasible
without invoking rotation, GPT-4 failed to generate the chairs with the correct orientations. We believe this
failure stems from the same root cause we’ve encountered previously, namely, GPT-4’s diculty in handling
mathematical and geometric concepts. Creating four chairs with correct orientations without the support of
rotation entails complex geometric transformations. GPT-4 must comprehend that a box rotated 90 degrees
around its center is equivalent to a swap of its width and depth dimensions. To alleviate this issue, we expanded
our ‘box‘ function to include an additional input argument, ‘angle‘, corresponding to a rotation angle around
the vertical axis. With this extension, GPT-4 was able to create a program using solely the ‘box‘ function that
successfully positioned four chairs around the table with correct orientations, as displayed in Figure 5. We surmise
that the introduction of ‘angle‘ considerably simplies the logic behind chair placement, enabling GPT-4 to create
such a program.
In conclusion, GPT-4 exhibits strong understanding of posed questions and excels at analyzing requested objects
to determine their composition. However, it demonstrates a weakness in handling geometric and mathematical
concepts. While it can provide nearly accurate solutions when mathematics is involved, it struggles to comprehend
the underlying mathematical principles and, as a result, cannot independently correct math-related issues when
they arise.
4.1.3 CSG with PyVista. Building on GPT-4’s success generating CSG-like models with boxes, we set out to
explore GPT-4’s capacity to use a larger suite of primitives. For this, we used an existing 3D visualization library,
PyVista, which allows us to create and place a variety of 3D primitives such as spheres and cones. Thanks to the
12 •Makatura et al.
(a) Generic fish (b) Goldfish (c) Manta ray (d) Loach
Fig. 6. Aquatic Creatures Generated by GPT-4 GPT-4 successfully generated variations of aquatic creatures automatically
using primitives from the PyVista package in Python.
library’s documentation, GPT-4 is able to automatically assemble a functional python program using PyVista’s
primitive functions.
We asked GPT-4 to use PyVista’s primitives to model several variations of a sh, including specic bio-
inspirations such as goldsh, a manta ray, and a loach (Figure 6). GPT-4 successfully selected and scaled an
appropriate set of primitives for each example, and provided sound bio-inspired rationale for its decisions. In
particular, although most of the sh are composed using a sphere for the body, GPT-4 intuits that a loach would
be most eectively approximated by using two cones for the body to give it an elongated shape.
One area in which GPT-4 struggled was the determination of the primitives’ orientations. It often produced
results that indicated an internal confusion of some of the axes, or an otherwise awed approximation of the
orientation that would be required to achieve a desired eect. After engaging in a dialogue with GPT-4, it was
able to rectify the orientations of the primitives to more closely resemble the target creatures. While promising,
these tests reiterate GPT-4’s seemingly limited capacity to account for local coordinate frames.
4.1.4 CSG with OpenJSCAD. To explore a full-edged approach for LLM-aided CSG, we test GPT-4’s ability to
generate meaningful designs using the open source javascript-based CSG library, OpenJSCAD [
3
]. OpenJSCAD
has extensive documentation available online, and we found that GPT-4 natively possesses a good grasp of the
API, its components, and the required code structure. In particular, it understood that it needed to import each
function from the corresponding modules, and that it needed to dene and export a function named
main
. For our
experiments, we provided GPT-4 with access to the full API, and generally allowed it to select the appropriate
primitives and operations without user interference.
To test GPT-4’s design abilities, we ask it to design a simple cabinet with one shelf, as shown in Figure 7.
GPT-4 reliably selects and instantiates the required primitives, along with intuitive naming conventions and
structure within the OpenJSCAD code. GPT-4’s initial orientation of the parts was also generally reasonable,
but the specic positioning of each part was often incorrect. Despite multiple attempts, GPT-4 was unable to
generate any fully-correct cabinet in a single shot, with no subsequent user intervention.
Moreover, GPT-4 frequently produced highly disparate results from one run to the next. Even when using
an identical prompt on fresh chat environments, GPT-4’s responses varied widely in terms of their overall code
structure, design accuracy, and the specic errors or oversights made. Figure 8shows one example of a drastically
dierent design process, even when seeded with the same initial prompt as Figure 7.
Throughout our experiments, we found that GPT-4 encountered a few common pitfalls when generating
designs in OpenJSCAD. Occasionally, GPT-4 made small syntatic errors such as generating incorrect boilerplate,
importing functions from incorrect modules, or making “typos” in API calls – e.g., trying to import from the
boolean
module rather than the correct
booleans
module, or calling the
cube()
function with parameters that were
intended to generate a
cuboid()
. In an attempt to avoid these pitfalls, we created a small list of “hints”/“reminders”
for best practices when working with OpenJSCAD; this short list was always passed in alongside our initial
prompt. See Appendix A.1 for a full listing of these reminders. Although these reminders seemed to help mitigate
How Can Large Language Models Help Humans in Design And Manufacturing? •13
these issues, we were unable to eradicate them entirely. However, GPT-4 can easily correct the majority of these
issues when they were pointed out by the user. Often, the process of correcting the issue through prompts and
responses was faster than actually adjusting the code manually, making LLMs a useful design partner.
One pervasive issue that seemed more dicult to correct was the fact that GPT-4 had issues positioning the
primitives in 3D space. In particular, GPT-4 frequently seemed to forget that OpenJSCAD positions elements
relative to the center of a given primitive, rather than an external point on the primitive (e.g., the lower left corner).
GPT-4’s arrangements were frequently incorrect due to this issue. When GPT-4 is reminded of this convention, it
does generally alter the design, but it is not always able to correct the issue. If suciently many rounds of local
edits prove unable to address the alignment issues, we found that it was generally more eective to direct GPT-4
to disregard all existing measurements, and re-derive the elements’ positions from scratch (see Figure 8).
Overall, we nd that GPT-4 is able to generate reasonable OpenJSCAD models from high-level input. However,
the design specications that emerge on the rst attempt are rarely fully correct, so users should expect to engage
in some amount of corrective feedback or iteration in order to attain the desired result.
4.1.5 Sketch-based CAD with OnShape. Another popular method for 3D shape modeling comes from contempo-
rary computer-aided design (CAD) software. Rather than directly constructing and modifying solid primitives
(as in the CSG approaches discussed above), modern parametric CAD systems generally work by lifting planar
sketches into 3D and subsequently modifying the 3D geometry. These sketches are placed on planes, which can
be osetted construction planes, or planar faces of the current 3D model. The selected sketching plane serves as
a local coordinate system in which the sketch primitives are dened. In graphical user interfaces, this change of
coordinate systems is accounted for by letting the user easily align their camera view to a top down view onto
the sketch plane. This change of view eectively comes back to drawing sketches in 2D, removing the cognitive
burden of having to think about sketches in 3D. Despite the lack of graphical assistance, we want to investigate
whether GPT-4 is able to design objects using a sketch-based modeling language.
However, since the graphical assistance is very prevalent in this modeling paradigm, CAD models are mostly
constructed via a GUI and not via textual programming, even though textual APIs exist, e.g. Onshape’s Feature-
script [
2
]. Therefore, documentation and examples are less available than for the modeling paradigms from the
previous sections. And indeed, GPT-4 performs poorly when trying to generate Featurescript code directly, which
is why we decided to provide a simplied DSL.
For our experiments, we constructed a single prompt containing the following DSL description: Our DSL
exposes two operators,
createSketch
and
extrude
, and two sketch primitives,
circle
and
rectangle
. Additionally,
we provide a construction example using this language of a single leg round table. Lastly, we also add some hints
about how to write the program, e.g. to explicitly use design variables and to write in
syntactically correct
python
. All of the output designs generated by GPT-4 in this section are automatically translated into Onshape
PartStudios. The full prompt can be found in the supplemental material.
Our rst task is the design of a
chair with 4 legs, a rectangular seat and a rectangular back
, see Fig. 9.
We asked GPT-4 to perform this task several times and observed the following.
•The design sometimes includes cylindrical legs, sometimes rectangular legs.
•
The design is always constructed in a single direction, the
𝑍
direction. Our input example of the round
table only used the
𝑍
direction to select sketching planes, but the description of our language documented
the use of other plane directions.
•
We observe mainly two types of designs: (i) designs which are constructed in both the negative and
positive
𝑍
direction starting from the seat, see rst answer in Fig. 9, and (ii) designs which start from a
leg, see the second response in Fig. 9. We observe that the rst type of designs has a higher chance of
being correct, whereas the second type fails more often. The failures are due to changes in the coordinate
system. For example, when selecting the top plane of the rst leg as a sketch plane for the seat, the sketch
14 •Makatura et al.
Let’s use OpenJSCAD to design a cabinet ... with 3/4" wood sheets, and nal exterior dimensions of 30 inches tall, 20 inches wide,
and 18 inches deep. The stationary part of the cabinet should be comprised of 6 boards: bottom, top, back, two sides, and one shelf
centered inside the cabinet. (... omitted by authors: OpenJSCAD modeling hints; see Appendix A.1 ...)
(... omitted by authors: import statements and variable setup ...)
pieces .push( colorize ([1, 0, 0], translate ([0, 0, −height/2 + thickness /2],
cuboid ({ size : [width, depth, thickness ]}) ) ) ) // bottom
pieces .push( colorize ([0, 1, 0], translate ([0, 0, height/2 −thickness /2],
cuboid ({ size : [width, depth, thickness ]}) ) ) ) // top
pieces . push( colorize ([0, 0, 1], translate ([0, −depth/2 + thickness /2, 0],
cuboid ({ size : [width, thickness , height ]}) ) ) ) // back
pieces . push( colorize ([1, 1, 0], translate ([−width/2 + thickness /2, 0, 0],
cuboid ({ size : [ thickness , depth , height ]}) ) ) ) // left
pieces . push( colorize ([0, 1, 1], translate ([ width/2 −thickness /2, 0, 0],
cuboid ({ size : [ thickness , depth , height ]}) ) ) ) // right
pieces . push( colorize ([1, 0, 1], translate ([0, −thickness /2, 0],
cuboid ({ size : [innerWidth, thickness , innerDepth ]}) ) ) ) //
shelf
(... omitted by authors: rest of design ...)
This is a great rst guess! However, there are two issues. First, the shelf is oriented vertically, rather than horizontally. Second, there
are many cases where the boards are protruding into one another. (... omitted by authors: description of the desired relative extent of
each board ...)
(... omitted by authors: preceding code was unchanged ...)
pieces . push( colorize ([0, 0, 1], translate ([0, −depth/2 + thickness /2, 0],
cuboid ({ size : [width, thickness , innerHeight ]}) ) ) ) // back
pieces . push( colorize ([1, 1, 0], translate ([−width/2 + thickness /2, thickness /2, 0],
cuboid ({ size : [ thickness , innerDepth, innerHeight ]}) ) ) ) //
left
pieces . push( colorize ([0, 1, 1], translate ([ width/2 −thickness /2, thickness /2, 0],
cuboid ({ size : [ thickness , innerDepth, innerHeight ]}) ) ) ) //
right
pieces .push( colorize ([1, 0, 1], translate ([0, 0, 0],
cuboid ({ size : [innerWidth, innerDepth , thickness ]}) ) ) ) //
shelf
(... omitted by authors: subsequent code was unchanged ...)
Fig. 7. GPT-4’s cabinet design process. Simple cabinet design, from initial prompt to the final fabricable result, in which
all elements have the correct orientation and no components overlap. GPT-4 reached the final result in 8 messages (4 each of
prompt/response).
plane’s origin will be in the center of the leg. GPT-4 will often ignore this or won’t be able to account
for it when pointed out. Conversely, when starting with the seat and choosing the lower seat plane as a
sketch plane for the legs, it can specify the leg sketch coordinates in global coordinates, since the global
origin coincides with the seat’s origin. The same is true for the backrest.
From this test, we can observe that GPT-4 seems to have diculties translating the coordinate system’s origin
on the XY plane.
Next, we want to see if GPT-4 can account for rotating sketch planes. To test this, we ask it to design a car.
GPT-4 always suggests a simple car shape, composed out of 4 cylindrical wheels and a rectangular car body, see
Fig.10. The diculty with this shape is that the cylinder sketches of the wheels have to be extruded on the side
How Can Large Language Models Help Humans in Design And Manufacturing? •15
Fig. 8. Alternate cabinet design process by GPT-4. From le to right, these images show a second example of GPT-4’s
design process for the simple cabinet, beginning from the same prompt as Figure 7. This example highlights the widely
varying paths that GPT-4 may take to construct a design. GPT-4 reached the final result aer 16 messages (8 each of
prompt/response). Up until the fourth displayed result, the user prompts’ phrasing focused on improving upon the previous
result. However, aer failing to make progress, we asked GPT-4 to disregard the previous aempts and recompute the
elements’ positions from scratch; this quickly resulted in a valid cabinet, as shown in the final image.
Can you design a chair with 4 legs, a rectangular back and a rectangular backrest?
(... omitted by authors: variable setup ...)
seat_sketch = createSketch ( rectangle (0, 0, seat_length , seat_width ) , XY_PLANE)
seat_solid = extrude ( seat_sketch , seat_thickness )
leg_sketch_1 = createSketch ( circle (−3.5, −3.5, leg_radius ) , cap( seat_solid , " min_z") )
leg_solid_1 = extrude ( leg_sketch_1 , leg_length )
(... omitted by authors: rest design ...)
(... omitted by authors: variable setup ...)
leg_sketch_1 = createSketch ( circle (−3.5, −3.5, leg_radius ) , XY_PLANE)
leg_solid_1 = extrude ( leg_sketch_1 , leg_length )
// other legs
seat_sketch = createSketch ( rectangle (0, 0, seat_length , seat_width ) , cap( leg_solid_1 , "max_z
"))
seat_solid = extrude ( seat_sketch , seat_thickness )
(... omitted by authors: rest design ...)
Fig. 9. Dierent Chair Design Strategies. GPT-4 designs a chair in two dierent ways. When starting with the seat, it
does not have to change coordinate systems when specifying sketch coordinates of the legs. However, sketching a seat on
the top face of one of the legs requires a coordinate change which GPT-4 does not manage to do.
planes of the car body. There are a couple dierent modeling strategies to achieve this, but we observe that GPT-4
has diculties coming up with these designs without any further indication. Instead, it often extrudes the car
body along its
height
, starting from the ground plane, and then places the wheel circles on the bottom plane of the
car, which is also the ground plane. This has the eect that the car wheels will be extruded vertically. Although
we were able to correct this design in an iterative prompt-based fashion, we had little success engineering the
initial prompt in such a way that we could eectively prevent this behavior.
Note that intuitively placing wheels at the bottom of a car body makes sense and that without any graphical
feedback, humans could also easily make this mistake. From this test, we can observe that GPT-4 is struggling to
rotationally change coordinate systems.
To address this, we changed our design language description to allow GPT-4 to specify sketch primitive
coordinates directly in a single global coordinate system. Now, a sketch primitive center takes as input three
coordinates, which we project in post-processing directly on the selected sketch plane. The extrude direction
16 •Makatura et al.
Can you design a car?
(... omitted by authors: variable setup ...)
body_sketch = createSketch ( rectangle (0, 0, bodyLength, bodyWidth), XY_PLANE)
body_solid = extrude (body_sketch, bodyHeight)
wheel_sketch1 = createSketch ( circle (−bodyLength/2, −bodyWidth/2, wheelRadius), cap(body_solid
,"min_z") )
wheel_solid1 = extrude (wheel_sketch1, wheelWidth)
(... omitted by authors: other wheels ...)
(... omitted by authors: variable setup ...)
body_sketch = createSketch ( rectangle (0, 0, 0, bodyLength, bodyWidth), X Y_PLANE)
body_solid = extrude (body_sketch, bodyHeight)
wheel_sketch1 = createSketch ( circle (−bodyLength/4, −bodyWidth/2 −wheelWidth/2, 0,
wheelRadius), cap (body_solid , "min_y"))
wheel_solid1 = extrude (wheel_sketch1, wheelWidth)
(... omitted by authors: other wheels ...)
Fig. 10. Local vs Global Coordinate Systems. GPT-4 does not manage to rotate the wheel sketch planes. If we allow it to
express sketch primitives in global coordinates and select a plane to orient the extrude operation, it places them correctly.
is still dened by the sketch plane’s normal vector. This means that GPT-4 does not have to take coordinate
translations into account anymore. We observe that this change in the DSL led to a higher success rate in
generated designs, see second answer in Fig. 10.
In conclusion, GPT-4 is able to design models in a sketch-based parametric CAD framework. However it is not
successful at changing coordinate systems. In this case, our backup strategy is to use a single global coordinate
system. One possible future direction is to let GPT-4 communicate with a geometric solver and create a feedback
loop.
4.1.6 URDF. The Universal Robot Description Format (URDF) is a common XML-based language for describing
articulated structures in robotics. URDF les specify a robot’s structure (including both visual and collision
geometry), joint locations, and dynamics information. The URDF format appears well-suited for potential LLM
design because it is human-readable and heavily documented online.
Open Chain Robot Arms. Initially, we asked GPT-4 to generate simple open chain robots (commonly called
“arms”) with a particular number of links. However, when we used the word ”arm” to prompt GPT-4 to generate a
robot, GPT-4 was unable to determine that the links should connect at the end. Most often, GPT-4 placed the
joints such that each link revolved about its center, and the links were not connected to each other (Figure 11,
initial prompt). As shown in the subsequent prompts of Figure 11, to achieve an arm with two connected links,
it was necessary to describe both the joint position relative to the link (
``the joint origin must be half the
link's length past the link origin''
, rather than
``the joint origin should be at the end of the link''
) as well as the joint axis (
``a revolute joint about the x axis''
). Given this prompt pattern, GPT-4 was easily
able to generate proper N-link robots.
Wheeled Robots. Next, we asked GPT-4 to generate wheeled robots composed of N wheels attached to a central
rectangular platform. A proper design of this type must have wheels that (1) are aligned to share an axis of
rotation normal to and through the center of their circular faces; (2) have circular faces displaced along said
How Can Large Language Models Help Humans in Design And Manufacturing? •17
Generate a urdf for a two link robot. The end of the second link should connect to the end of the rst link with a revolute joint.
The origin of the joint should be a the end of the rst link
<joint name="joint1" type=" revolute ">
<axis xyz="0 0 1" />
<parent link=" link1 " / ><child link =" link2 " />
<origin xyz="1 0 0" rpy="0 0 0" />
</ joint >
<link name="link1"><visual><geometry><box size="0.1 1 0.1 " /></geometry></visual></link>
<link name="link2"><visual><geometry><box size="0.1 1 0.1 " /></geometry></visual></link>
urdf link origins are in the center of the link. To make a joint origin at the end of a link, it must be half the link’s length past
the link origin. Generate a urdf for a 2 link robot. The end of the second link should connect to the end of the rst link with a
revolute joint. The origin of the joint should be at the end of the rst link. Both links should be lined up along the z axis.
(... omitted by authors: URDF description of an appropriately connected two-link robot ...)
urdf link origins are in the center of the link. To make a joint origin at the end of a link, it must be half the link’s length past the
link origin. Generate a urdf for a three link robot. The end of the second link should connect to the end of the rst link with a
revolute joint about the x axis, the origin of the rst joint should be a the end of the rst link. The other end of the second link
should connect to the end of the third link via a revolute joint about the x axis with an origin at the end of the third link. All
three links should be lined up along the z axis
(... omitted by authors: URDF description of an appropriately connected three-link robot ...)
Fig. 11. GPT-4’s Process for Designing N-Link Robot Arms. GPT-4 is unable to correctly interpret the initial high-level
specification, but when provided with more explicit detail, GPT-4 is able to generate correct URDF specifications for N-link
robots.
axis of rotation, and (3) contact, but do not intersect, either side of the center platform. The combination of
non-intersection and geometry relation constraints prove challenging for GPT-4, which seems to exhibit limited
geometric reasoning. Initially, we tried to specify these using language-based constraints (i.e. “the wheels should
touch, but not intersect, either side of the platform”). These proved ineective, as shown in Figure 12 (middle). To
overcome these challenges, we crafted prompts with very explicit numeric constraints (i.e. “wheels should be
oset on the global y axis by half the width of the platform plus half the height of the wheel cylinder”). This style
of prompt successfully generated a viable result, as shown in Figure 12 (right).
As in the case of robot arms, we nd that GPT-4 is immediately able to generalize a successful two-wheeled
design into a four-wheeled robot. We achieve this by asking for a duplicate, shifted version of the existing wheel
conguration, as shown in Figure 13. However, we were unable to directly generate a successful four-wheel
robot; in general, we found that as the number of constraints in a prompt increases, it becomes increasingly
likely that GPT-4 will ignore any individual constraint. Thus, rather than directly requesting a four-wheeled
robot in a single prompt, we found greater success by rst generating a two-wheeled robot and then prompting
GPT-4 to modify the URDF by adding additional wheels than placing the text in a single prompt.
18 •Makatura et al.
Fig. 12. Two Wheeled Robot Aempts by GPT-4. GPT-4’s aempts to generate a two-wheeled robot with varying types
of prompt constraints on geometry. Le: no constraints Center: language constraints Right: numeric constraints.
I want you to generate a urdf for a robot with two wheels. Follow the below steps: 1. generate a center rectangular platform 2. add
two cylinders for wheels. The wheels should be oriented so that the global y axis is normal to the center of their circular faces, and
should rotate about their cylinder’s z axis 3. The wheels should be oset on the global y axis so that they are on either side of the
center rectangular platform. They should be oset by half the width of the platform plus half the height of the wheel cylinder.
(Author remark: The above generates a two wheel robot. The following prompt extends this to create a four wheeled robot)
Now, would you be able to modify this urdf to be a four wheel robot? There should be two more wheels, in an identical conguration
to the rst two. Shift back two of the wheels on the x axis to be at the back of the platform. Also move the front two wheels to the
front of the platform by shifting them forwards on the x axis
(Author remark: inertial and collision parameters were also generated correctly, but omitted below for
brevity)
<!−− Center rectangular platform link −−>
<link name="base_link"><visual><geometry>
<box size=" 0.1 0.1 0.02 " />
</geometry></visual></link>
<!−− Front Left wheel link −−>
<link name="front_left_wheel"><visual><geometry>
<cylinder radius =" 0.05 " length=" 0.01 " />
</geometry></visual></link>
(... omitted by authors: other wheels ...)
<!−− Front Left wheel joint −−>
<joint name=" front_left_wheel_joint " type="continuous">
<parent link=" base_link "/ ><child link =" front_left_wheel " />
<origin xyz=" 0.05 0.055 0" rpy="1.570796 0 0" / ><axis xyz="0 0 1" / >
</ joint >
(... omitted by authors: other wheel joints ...)
Fig. 13. URDF of Four Wheeled Robot. A correct four wheeled robot generated by GPT-4 by modifying a previously
generated URDF for a two wheeled robot
Robot Grippers. To test the eectiveness of our iterative, multi-prompt approach for building robots of increasing
complexity, we seeded GPT-4 with a successful two-link open chain URDF, then asked it to modify this design
into a collection of multi-nger robot grippers. As shown in Figure 14, we were able to build two-, four-, and
ve-nger grippers using a sequence of prompts to add features and change proportions. To create a two-nger
gripper, we asked GPT-4 to use two of the previously generated two-link open chain robots as ngers, separated
by a distance equal to half the height of the nger, and connected by a rectangular platform on the base. The
How Can Large Language Models Help Humans in Design And Manufacturing? •19
Fig. 14. URDF Grippers Generated by GPT-4. Le: Two fingered Gripper. Center: Four fingered gripper. Right: Five
Finger Hand
four-nger gripper was similarly derived from the two-link arm by specifying that the hand should consist of four
two-link robots right next to each other on a rectangular platform. To specify a ve nger hand, we requested a
rectangular link that hinges as a base for the thumb, then prompted GPT-4 to add another nger on that link and
to adjust the hand proportions.
4.1.7 Graph-based DSL. While designing an entire robot end-to-end using LLMs may not be feasible, we nd
that GPT-4 has the ability to reason about the spatial layout of robot components. These spatial layouts are
naturally represented as graphs where the nodes are components and edges are connections between them.
Unlike URDF, this representation is more general and is applicable in domains outside of simulation.
To generate robot design graphs using GPT-4, we rst need a text-based graph representation. Our rst
approach involved asking GPT-4 to output the popular GraphViz format. While convenient, this format makes it
dicult for GPT-4 to provide metadata for each part (such as motor torque, size) in a format usable by downstream
applications. Instead, we take advantage of GPT-4’s ability to generate Python code that conforms to a provided
domain-specic language (DSL). The full DSL is detailed in Appendix A.4.
When prompted with a small DSL embedded in Python, GPT-4 is able to write code that selects and places robot
components at a high level of abstraction. By supplying a function that translates components in three-dimensional
space, we can extract GPT-4’s concept of each component’s position relative to the others.
In this example, we ask GPT-4 to generate a humanoid robot using the provided functions. GPT-4 makes
appropriate calls to
add_link
to create nodes in the design graph,
add_joint
to create edges between them, and
translate to establish their relative positions.
We manually implement the functions described in the prompt in order to visualize the resulting robot topology.
The arms are positioned beside the torso, the legs are positioned below, and the head rests on top as expected for
a humanoid robot.
We saw similar success when asking GPT-4 to construct a snake robot, car robot, and scorpion robot. When
requesting a robot dog, however, GPT-4 only adds two legs initially. Specifying a “robot dog with four legs” was
necessary to obtain the expected behavior. We also encountered diculties when attempting to obtain a more
detailed design for the robot dog. Asking for a “robot dog with four legs, two links per leg” produced a graph
with two nodes per leg, but GPT-4 did not position them relative to each other.
4.1.8 Summary Discussion. In light of these experiments, we conclude that GPT-4 is capable of generating
designs based on high-level text input, even across a wide variety of representations and problem domains. We
note that several of GPT-4’s capabilites and limitations remain consistent independent of the representation.
For example, in all cases, GPT-4 is able to generate sensible, well-structured code with semantically meaningful
variables and comments. Moreover, independent of the representation or the problem domain, GPT-4 consistently
20 •Makatura et al.
(... omitted by authors: problem setup with available Python functions ...)
Write a function to construct a humanoid robot.
To construct a humanoid robot, we can dene the dierent parts or links of
the robot, such as the head, torso, arms, and legs. We’ll then connect these
links together using joints. Here’s an example Python function to construct
a basic humanoid robot:
def construct_humanoid_robot():
# Create links for the head, torso , arms, and legs
head_link = add_link ( "head" )
(... omitted by authors: adding other parts ...)
# Connect the head to the torso
add_joint ( torso_link , head_link )
(... omitted by authors: connecting other parts ...)
# Translate the links to their respective positions
translate (head_link , "up")
translate ( left_arm_link , " left " )
(... omitted by authors: positioning other parts ...)
(... omitted by authors: summary of code, disclaimer ...)
Fig. 15. Graph of Humanoid Robot. Graph generated by GPT-4 describing the high-level components of a humanoid
robot as nodes and connections between them as edges
shows superior performance with respect to the high-level, discrete elements of a problem (e.g., identifying
the correct type and quantity of each primitive/operation) as opposed to the lower-level continuous parameter
assignments (e.g., correctly positioning the primitives relative to one another). A more detailed discussion of
capabilities, limitations and opportunities will follow in Section 4.4. For now, we rely on the similarities between
various representations to justify a reduced scope for our future experiments. In particular, moving forward, we
study each question with respect to only a subset of the design representations and domains introduced above.
4.2 Interpreting and Respecting User Control (Q2)
The above examples demonstrate GPT-4’s ability to generate a design based on very high-level semantic input.
However, we also wanted to test its ability to generate designs that adhere to a specic user-given intent. This
section also tests whether GPT-4 is able to overcome its own potential biases induced by the training data, in
order to generate something that truly adheres to a user’s specied constraints – whether or not those constraints
match the “common” form of a given design target. In particular, we choose to study whether GPT-4 is able to (1)
understand and respect semantically meaningful spatial constraints, and (2) incorporate specic pre-fabricated
elements into a design.
4.2.1 Spatial Constraints. Through the general experiments above, GPT-4 has already shown some capacity to
respect high-level spatial constraints, such as a design element’s absolute size or its position relative to another
element of the design. GPT-4’s compliance with such requests was frequently awed at the outset, but the results
were generally workable after some amount of interactive feedback. This section aims to explore the types of
constraints GPT-4 is able to natively understand, and how we might best interact with GPT-4 in order to improve
the chance of successful compliance with such constraints.
How Can Large Language Models Help Humans in Design And Manufacturing? •21
Fig. 16. Building a cabinet with a door. GPT-4’s aempt to build a cabinet similar to that from Section 4.1.4, with the
addition of a simple door (orange) that has a handle (dark grey) on the right-hand side. GPT-4 quickly fixes the position
of the cabinet’s primary pieces (e.g., the yellow and cyan side panels), but it struggles to correct the door. GPT-4 must be
iteratively prompted to fix the door orientation, the relative door placement, and the handle’s placement and protrusion into
the door. GPT-4 is able to arrive at a suitable design aer several iterations of user feedback.
As an initial experiment, we explored whether GPT-4 is able to construct a version of the previous cabinet
design that includes a door and a handle (see Figure 19). We started from a fresh chat, and provided GPT-4 with a
prompt similar to the one described in Section 4.1.4, asking for a cabinet to be built from scratch. However, this
time, we also request a door at the front of the cabinet, with a handle on the right hand side of its outward-facing
face. As shown in Figure 16, GPT-4 initially struggled to position several of the cabinet elements – particularly the
side panels and the door. Although GPT-4 corrected the position of the side boards immediately, GPT-4 continued
to have trouble placing the door, as it was oriented incorrectly relative to the rest of the design. When reminded
that the door should be oriented vertically, GPT-4 was able to comply with the request, but the corrected position
was still not fully suitable, as the door coincided with the cabinet’s side panel. After another reminder that the
door should reside at the front of the cabinet, with the handle on the right so it could be attached with hinges
on the left, GPT-4 was able to place the door correctly. However, the handle remained ill-positioned as it was
located on the left-hand side, and was protruding into the door panel. After 2 additional prompts, GPT-4 was
able to correct the position to the left hand side. To correct the protrusion issues, GPT-4 needed 3 more prompts.
During these iterations, GPT-4 moved the handle fully to the inside of the door; it needed explicit reminder that
the handle should be placed on the outside of the door.
With a fresh GPT-4 session, we also tried providing the previous OpenJSCAD specication of the cabinet as
part of our input prompt, then asking GPT-4 to modify the existing design such that it contained a door and a
handle, as before. Despite the dierent starting points, GPT-4 followed a similar trajectory, as shown in Figure 17:
the door was initially aligned incorrectly, as it coincided with one of the side panels; after 1 prompt, GPT-4 was
able to correct the door placement. However, despite GPT-4’s explicit assertion that
the handle is also placed
on the right side of the door's exterior face
, the handle remained on the left. Finally, after another prompt,
GPT-4 was able to correct the handle position such that it was on the right rather than the left.
The way in which GPT-4 dealt with the under-specied handle request also proved interesting. In Figure 16, GPT-
4 opted for an additional cuboid that would be unioned into the nal design. By contrast, in Figure 17, GPT-4 opted
to create the handle by subtracting a small cuboid from the door panel. In still other examples, GPT-4 refused to add
the handle, and instead oered the following disclaimer:
Note that the handle for the door is not included
in this script, as its size, shape, and position would depend on additional details not provided. This
would likely require additional modules, such as cylinder from @jscad/primitives, and might be added
as an eighth component in the main function.
These interactions provide a promising basis for interactive user control of the design, but the process is
somewhat tedious at the moment, as GPT-4 requires very explicit instructions about the design or correction
intent. The addition of highly-detailed user constraints also seems to confuse GPT-4 to an extent, as it seems to
“forget” the larger context of the design in the process, so it must be frequently reminded.
22 •Makatura et al.
Fig. 17. Adding a door to an existing cabinet. We provide GPT-4 with the initial cabinet design from Section 4.1.4
(semi-transparent blue), then ask it to add a door (orange) with a handle on the right-hand side. Despite beginning from a
largely-complete model, GPT-4 still has diiculty placing the door and handle correctly.
Fig. 18. GPT-4’s Aempts to Create a Proxy for an L-bracket. Le: Image of the desired pre-fabricated part, to which
GPT-4 was provided a link. Right, Top: GPT-4’s aempt to design a proxy based on the knowledge it gleaned from the
provided product webpage, with iterative high-level user feedback. Although GPT-4 identified the primary structures (two
cuboids for the L and a cylinder for the peg), it was unable to arrive at a proper design in this manner. Right, Boom:
GPT-4’s process for designing a proxy for the part from scratch with explicit user guidance about the structure and its
dimensions.
4.2.2 Incorporating pre-fabricated elements. It’s also common to design an object around specic pre-manufactured
elements, such as hinges, brackets, or motors. We explore the possibility of using GPT-4 to source the parts in
Section 6.3 – at that time, we explore whether GPT-4 can identify the required part categories, provide options,
and/or select a set of options that are compatible with one another and the intended overall design.
For now, we assume that the user has a specic (set of) part(s) in mind that they would like to incorporate into
their design. Then we investigate whether, given these components, GPT-4 is able to (1) build a reasonable proxy
of this design, then (2) eectively use it as a module within a larger assembly.
Cabinet with Standard Hardware. To make the cabinet design more stable, a designer may wish to include extra
support brackets to work with. Many pre-fabricated variations of these brackets exist, and they are inexpensive
and readily available. Given this, it does not make sense to design or manufacture these parts via GPT-4. Rather,
we’d like to incorporate instances of a pre-fabricated version. To do this, GPT-4 must rst build a proxy of the
part, place the proxies throughout the design appropriately, and adjust the remaining elements of the design to
accommodate these components.
For our rst experiment, we chose to incorporate the Prime-Line 1/4 in. Nickel-Plated Shelf Support Pegs from
Home Depot into our design. We provided GPT-4 with a URL to this part’s listing on the Home Depot website,
which contained a text description of the item and the schematic diagram pictured in Figure 18(left). We then
asked GPT-4 to build a simple geometric proxy that we could incorporate into our design as a placeholder. As
shown in Figure 18(right, top), GPT-4 was able to infer and generate the appropriate primitives (one cylinder for
the peg and two cuboids for the L bracket). However, it was not able to correctly scale, orient, or position the
elements. In an eort to test GPT-4’s understanding of the structure, we asked it to describe the structure in its
own words. Although it gave a reasonable description of the bracket, there was little improvement in the result
How Can Large Language Models Help Humans in Design And Manufacturing? •23
Now, we are going to try and integrate these bracket supports into our cabinet design from before. You can directly use the
createBracketWithPeg() function, without regenerating it each time. Now, please go back to the most recent cabinet design, and add
4 of these brackets underneath the middle shelf. There should be two brackets supporting the left side of the shelf, and two brackets
supporting the right side of the shelf. Each bracket should be positioned such that the top of the horizontal face is in contact with the
bottom of the shelf; the back of the vertical segment is in contact with the innermost side of the cabinet’s side wall; and the peg
protrudes into the side wall of the cabinet.
(... omitted by authors: 33 total messages, used to iteratively position the brackets; process summarized visually below ...)
Excellent! The bracket positions are suitable now. Now, we will move onto the shelf: at the moment, the shelf is protruding into the
vertical part of the brackets. We need to reduce the width of the shelf in the left-right direction such that it leaves space for the
brackets. Can you adjust the shelf size accordingly?
Fig. 19. Process for Integrating L-brackets (red) into an Existing Cabinet Design (semi-transparent blue) Using
GPT-4. It takes 34 messages to position the brackets appropriately (17 each of prompt/response), but once this is done,
GPT-4 is able to eiciently generate a modified shelf (pink) to accommodate the placed brackets (6 messages; 3 each of
prompt/response).
when it was asked to improve the script accordingly. Thus, even with several iterations of user feedback, GPT-4
was unable to construct this shape from high-level third-party (URL) or user input.
Ultimately, we had to provide GPT-4 with an explicit description of the structure that we wanted. Moreover, we
found that even with an explicit description, GPT-4 was unable to generate the correct shape when provided with
all directions at once. Instead, we had to create the shape in an iterative fashion, beginning with the L bracket
and then adding in the peg, as shown in Figure 18(right, bottom). Eventually, it was able to generate the structure
and consolidate the instructions into a high-level module called createBracketWithPeg, as desired.
We then provided the module
createBracketWithPeg
as an input to GPT-4, and asked it to incorporate these
structures into the design, as detailed in Figure 19. In particular, we asked for four brackets under each shelf,
with the pegs protruding into the cabinet’s side walls, the back face of the bracket’s vertical leg in contact with
(but not protruding into) the side wall, and the top face of the bracket’s horizontal leg in contact with (but not
protruding into) the bottom face of the shelf. We initially tried to complete this experiment in a single continuous
chat that (1) designed the cabinet, (2) designed the L-bracket, and then (3) incorporated the brackets into the
cabinet. However, we found that after the extended discussion regarding the L-bracket design, GPT-4 seemed
to have completely forgotten its cabinet specication. Despite multiple prompts, it was unable to recover the
previous design. Instead, we directly provided GPT-4 with the L-bracket module and its prior cabinet design, and
then asked for a modication. This approach was far more successful. Overall, we found that GPT-4 was able
to instantiate the correct number of brackets, but it struggled to rotate and position them appropriately. After
several user prompts, GPT-4 was able to successfully place the brackets in their locations. Finally, we asked GPT-4
to adjust the shelf in order to (1) not protrude into the brackets, and (2) incorporate some additional allowance so
the shelf could easily t between the supporting brackets in a physical assembly. GPT-4 was able to complete
these requests without issue.
24 •Makatura et al.
Fig. 20. A adcopter Designed with the Aid of GPT-4. The motors are colored in red. The propellers are in yellow. The
baery is in dark gray. The frame is in blue. The dark yellow box is the controller and the green box is the receiver.
Overall, although GPT-4 initially struggled to build a proxy of the pre-fabricated part we had in mind, GPT-4
seemed quite capable of incorporating the completed proxy into a given design, as desired.
Quadcopter. Designing a quadcopter involves integrating pre-built elements like the motor, propeller, and
battery. Detailed sourcing of these parts will be addressed in the later section (Section 6.3). Once these components
are sourced, the frame must be designed to accommodate their dimensions. We’ll explore how GPT-4 can assist
with this task.
However, enabling GPT-4 to accurately represent these parts isn’t straightforward. To simplify the task, parts
are represented as either a box of dimensions
𝑤×ℎ×𝑑
or a cylinder with radius
𝑟
and height
ℎ
. GPT-4 can handle
these representations well as demonstrated in Section 4.1.2. Rather than having a single function which creates a
primitive and translates it as in Section 4.1.2, we introduce three functions for ease of design: createBox(w, h,
d), createCylinder(r, h), and place(item, x, y, z, a). The rst two functions generate a box or a cylinder at origin
(0,0,0), while the third rotates and moves the item to desired coordinates.
Subsequently, we task GPT-4 with creating a design that integrates these parts using only the above functions.
The primary element GPT-4 must design is the frame, which should hold the selected components. Initially, GPT-4
produced a correct textual design, but struggled with the geometric representation, similar to Section 4.1.2. It
understood the quadcopter structure, but had issues with part positioning and orientation (Figure 20(a)). Problems
included incorrect frame orientation and part intersections. By guiding GPT-4 in correcting these issues, we
achieved a near-correct quadcopter design (Figure 20(b)).
The initial frame design wasn’t practical because it was directly attached to the motor cylinder and insucient to
hold components like the battery, controller, and signal receiver. To address this, we asked GPT-4 to incrementally
implement specic solutions, such as adding a cylinder base under each motor and a box body to