Available online at www.sciencedirect.com
Procedia CIRP 00 (2019) 000–000 www.elsevier.com/locate/procedia
31st CIRP Design Conference 2021 (CIRP Design 2021)
Distinguishing artefacts: evaluating the saturation point of convolutional
Ric Reala,∗, James Gopsilla,b, David Jonesa, Chris Snidera, Ben Hicksa
aDesign Manufacturing Futures Lab, University of Bristol, UK
bCentre for Modelling and Simulation, Bristol, UK
* Corresponding author. E-mail address: firstname.lastname@example.org
Prior work has shown Convolutional Neural Networks (CNNs) trained on surrogate Computer Aided Design (CAD) models are able to detect
and classify real-world artefacts from photographs. The applications of which support twinning of digital and physical assets in design, including
rapid extraction of part geometry from model repositories, information search & retrieval and identifying components in the ﬁeld for maintenance,
repair, and recording. The performance of CNNs in classiﬁcation tasks have been shown dependent on training data set size and number of
classes. Where prior works have used relatively small surrogate model data sets (<100 models), the question remains as to the ability of a CNN
to diﬀerentiate between models in increasingly large model repositories.
This paper presents a method for generating synthetic image data sets from online CAD model repositories, and further investigates the capacity
of an oﬀ-the-shelf CNN architecture trained on synthetic data to classify models as class size increases. 1,000 CAD models were curated and
processed to generate large scale surrogate data sets, featuring model coverage at steps of 10◦, 30◦, 60◦, and 120◦degrees.
The ﬁndings demonstrate the capability of computer vision algorithms to classify artefacts in model repositories of up to 200, beyond this point
the CNN’s performance is observed to deteriorate signiﬁcantly, limiting its present ability for automated twinning of physical to digital artefacts.
Although, a match is more often found in the top-5 results showing potential for information search and retrieval on large repositories of surrogate
©2021 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientiﬁc committee of the 31st CIRP Design Conference 2021.
Keywords: Design Repositories; Search & Retrieval; Convolutional Neural Networks; CNNs; Machine Learning; ML; Synthetic Data; Surrogate Models
Recent trends in digital design, such as Twinning  have
underlined the value of rapid, or fully integrated, synchronisa-
tion between physical and digital domains to accelerate pro-
cesses and enhance analytic capability.
The prototyping process comprises vital iteration between a
multitude of physical and digital models [5,14,16], in which
both physical and digital states must be captured, aligned, and
replicated across domains (i.e. updating of CAD models).
A key technical challenge thereby lies in creating a relation-
ship between physical and digital states, such that each may be
recognised as a counterpart of the other, whilst maintaining this
alignment across multiple versions of a prototype. By automat-
ing the detection of physical models and searching/aligning
against a digital counterpart, scope exists to reduce process time
and cost via physical/digital transition.
Detection has previously utilised physical tagging (e.g. QR
and barcodes) or direct scanning (e.g. photogrammetry), where
these methods typically require modiﬁcation to the physical
prototype or generation of new digital models, recent works
have demonstrated the ability of Convolutional Neural Net-
works (CNNs) to extract and learn distinguishable features of
an artefact for classiﬁcation. Thus, enabling rapid association
with existing data, such as retrieving CAD models by taking a
photograph of their real-world counterpart.
However, the performance of a CNN is dependent on quan-
tity and quality of training data , and the number of classes
between which it must distinguish . Where photographic
training data is often sparse in prototyping, implementing a
CNN becomes a signiﬁcant challenge , thus the real-world
viability of using CNNs for design twinning is not known.
This paper investigates (a) the performance and implemen-
tation challenges associated with CNN use at increasing scale,
and (b) the viability of ’oﬀ-the-shelf’ CNN architectures for
2212-8271 ©2021 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientiﬁc committee of the 31st CIRP Design Conference 2021.
R. Real et al. /Procedia CIRP 00 (2019) 000–000 2
artefact classiﬁcation. We consider an artefact to be a designed
object, whose form can be distinguished and classiﬁed.
The paper proceeds to present related works in the ﬁeld of
CNN use in design (Section 2), followed by a methodology for
testing CNN scalability (Section 3). Results are then reported
(Section 5) and a discussion ensues with respect to CNN’s and
their ability to support twinning activities in design (Section 6).
The paper concludes by detailing the key ﬁndings from the
2. Related work
CNNs in design: The application of CNNs to design is a
rapidly emerging ﬁeld of inquiry with many potential impacts
across Engineering Design. For example,  trained a CNN
on multi-view renders of 3D geometry and used the resulting
CNN to classify other 3D geometry. A potential use case for
this is the matching of similar parts across product families
with a view to reduce the part variety in an organisations sup-
ply chain.  have sought to augment depth mapped images
with Neural Networks to develop voxel-based approximations
of objects within a scene, with potential application to design
via providing a means to describe the locations in which prod-
ucts are used and deployed.  is seeking to democratise design
by using a CNN as an information search and retrieval tool for
large model repositories, such as Thingiverse and MyMiniFac-
tory. The CNN enables users to simply take a photo of the item
that they wish to print and return the closest matching result in
the repositories dataset.  recently demonstrated CNN’s being
able to emulate mathematical and user perceptions on shape
and form. These could be used to check for conformation to
brand identity as well as potential infringements on others. It
could also twin user feedback into the design process where the
CNN will act as market feedback providing real-time assess-
ment of designs as a designer is working on their product’s de-
sign. While such examples demonstrate the exploration of value
of CNNs in design, there remain questions around their perfor-
mance, and challenges in their implementation.
Surrogate models for dataset generation: The challenge
of acquiring datasets, with thousands of images required per
artefact, has previously been a limiting factor in exploring the
utility of CNNs in large-scale classiﬁcation tasks. With the con-
text of design prototyping requiring suﬃcient data to distin-
guish between 100s of prototype or component versions, this
presents a systemic obstacle of particular importance.
Recent work has shown the viability of synthetic image
dataset generation, whereby a surrogate CAD model is pro-
cessed and rendered with computer graphics software to gen-
erate two-dimensional artefact representations [13,4]. Image
composition, for example lighting, surfaces, background, and
appearance can additionally be tuned to replicate real-world en-
vironmental features [18,11,12].
This method presents an opportunity to leverage existing vir-
tual models for large-scale classiﬁcation, whereby a CNN is
trained on synthetic images from a surrogate model data set;
thus mitigating the need for ‘real-world’ photographs, and prior
restrictions in data acquisition . To-date however, the eﬀect
of model repository size on the eﬃcacy of this approach has
not been investigated. Where CNN detection accuracy will typ-
ically decrease as repository size increases, the extensive train-
ing sets producible via surrogate models give scope to substan-
tially increase performance.
The study followed a 5 step computational process to deter-
mine the scaling behaviour of a CNN, for the purposes of twin-
ning physical artefacts with their digital counterpart in large
1. Dataset curation
2. Surrogate model curation
3. CNN selection
4. CNN preparation
3.1. Dataset curation
The data set used consisted of 1,049 STL ﬁles collected from
the model sharing website MyMiniFactory.com 1. MyMiniFac-
tory is a website that allows users to upload, share and sell 3D
models, and also provide an API to programmatically interact
with the 3D model database. Between June 10th and June 15th
2020, the API was queried using a Python script with the search
terms ‘spare part’, ‘3d printer’, and ‘accessibility’. Addition-
ally, a ﬁlter was applied to restrict results to only those under
a Creative Commons free to re-use license. A total of 1,514
ﬁles were downloaded, consisting predominantly of STL ﬁles
(1,112 ﬁles) and a range of CAD ﬁles. A small number of those
1,112 STL ﬁles were corrupted and removed, leaving a data set
of 1,049 usable ﬁles.
3.2. Surrogate model curation
Where prior work has used relatively small datasets (<100
models), this work leverages surrogate models based on exist-
ing CAD geometry to enable CNN use in the design scenario.
The open source 3D computer graphics software Blender
(version 2.8) was used to create photo-realistic renders of
the artefacts. Blender is widely used throughout the computer
games and ﬁlm industry to create and render 3D graphics and
animation. It also provides a Python library, which in the case
of the work presented in this paper, allowed for scripted model
loading, creation of lighting, camera positions, and image ren-
Fixed scene lighting was used for all models with two lamps
positioned (xyz) at (L1) 4.0, 4.0, 4.0 and (L2) -4.5, -4.5, -4.5,
illuminating models as per Fig. 2.
1URL: MyMiniFactory.com Last visited: 2020-11-25, Author: MyMini-
R. Real et al. /Procedia CIRP 00 (2019) 000–000 3
Table 1: Rendering time per artefact (RTPA), and total dataset generation time
(TDGT) for a repository size of 1000 surrogate STL models, rendered at vary-
ing degree steps (DST).
DST Renders PA (Total) RTPA (mm:ss) TDGT(dd:hh:mm)
10◦684 (684,000) 06:00 04:04:00
30◦84 (84,000) 02:30 01:17:40
60◦24 (24,000) 00:12 00:03:20
90◦12 (12,000) 00:06 00:01:40
120◦6 (6000) 00:03 00:00:50
Fig. 1: Camera sphere (From: )
Model size was also normalised and scaled via re-scaling of
the bounding box, such that the artefact could be represented
fully in a 540px x 540px rendered output.
Camera positions were set using longitude and latitude an-
gles, allowing the camera to be rotated around the model (see
Fig. 1). To generate a range of images the longitude and latitude
angles were set at 10◦, 30◦, 60◦, and 120◦degrees, resulting in
four data sets of 684, 84, 24 and 6 renders/images per artefact
respectfully. A summary is provided in Table 1.
3.3. CNN selection
This study utilised an AlexNet ‘oﬀ-the-shelf’ CNN archi-
tecture. The network, designed for a 1000 category classiﬁca-
tion capacity, established prominence in computer vision re-
search with a breakthrough Top-5 performance in the 2012
ILSVRC contest, signiﬁcantly outperforming prior architec-
tures. Through its substantial documentation in literature and
relative ease of use, AlexNet remains a widely used CNN ar-
chitecture with diverse applications. Speciﬁcally to this context,
 showed AlexNet to outperform other top classiﬁcation ar-
chitectures, GoogleNet, and ResNet, achieving 94.9% accuracy
Fig. 2: Sample synthetic renders at 30 Degree view representations
with a six class surrogate model training data set when tested
against photographs of the physical artefact.
AlexNet’s architecture consists of 62 million parameters and
a 1000-way output classiﬁer . It comprises eight layers; the
ﬁrst ﬁve are convolutional and remaining three layers are fully
connected (Fig. 3). The ﬁnal layer, a 1000-way softmax classi-
ﬁer, outputs a probability distribution across 1000 class labels.
The network input is a 3 channel RGB image of dimensions
227 x 227 pixels. To reduce over-ﬁtting, complexity measures
including dropout and data augmentation are introduced. The
CNN was recreated using TensorFlow 2.0 and Keras deep learn-
ing API. A Nvidia 8GB RTX 2060 GPU was used for training.
3.4. CNN preparation
Three steps were taken to prepare the CNN following guid-
ance in literature and early experimentation with the surrogate
data. These are:
•Forming the data pipeline and applying image augmenta-
tion prior to parsing through the CNN.
•Conﬁguring the Hyperparameters.
•Training the CNN.
3.4.1. Data pipeline and Augmentation
From the Keras data preprocessing utilities a pipeline was
deﬁned to generate batches of tensor image data with real-time
augmentation. The CNN was fed images from the surrogate
data set using ﬂow from directory method, automatically in-
ferring class (artefact name) from the sub-directory structure.
30% of images per class were partitioned to a validation subset
for training. Additionally, parallel image data generators were
created for both training and validation data subsets, allowing
training data to be generated with augmentation and data shuf-
R. Real et al. /Procedia CIRP 00 (2019) 000–000 4
Fig. 3: AlexNet network architecture, containing ﬁve convolutional and three fully-connected layers with a 1000-way SoftMax output.
Fig. 4: Synthetic Data generation pipeline, with generator (a) augmenting im-
ﬂing whilst preserving artefact representations in the validation
generator . This process is shown in ﬁgure 4.
To improve training stability and CNN generalisation, a
mini-batch size of 32 was used to estimate the error gradi-
ent . Respectively, a learning rate of 1e-06 was applied to
scale the value by which weights were updated in back propa-
gation. These values were found to be eﬀective with batch size,
and the generated synthetic image data. An epoch count of 200
was deﬁned to observe training performance over time and al-
low suﬃcient space for model convergence.
CNN parameters are trained with weights and biases ini-
tialised on the active data set. ‘Adam’ is used as the optimisation
algorithm, and categorical cross entropy set as the loss func-
tion . This approach is elected over transfer learning with
pre-trained networks to explore causality between our surro-
gate model data structures and the CNN’s performance. Addi-
tionally, hyperparameters are kept constant across data sets and
CNN performance was evaluated using Keras metrics on
three of ﬁve generated data sets (30◦, 60◦, 90◦), measuring clas-
siﬁcation accuracy and loss for training and validation steps,
Top-5 categorical accuracy, and training time. Initial test data
showed performance to be inconclusive for the 120◦data set,
and the 10◦data set to be computationally ineﬃcient, therefore
these were omitted from the study. Results from the candidate
data sets were logged and graphed to visualise performance
across dimensions of interest; accuracy, number of classes,
model coverage (number of Images per class), and training
This section reports the results from our study into a CNN
trained on a CAD surrogate model dataset. These have been
reported with respect to training performance and classiﬁcation
5.1. Training Performance
Training time (Table 2) shows that higher quantities of arte-
fact in a given training set correlate to longer CNN training
times. Time is observed to increase exponentially across data
sets, however the degree reached by training time is limited by
data set size.
A uniform transition exists from exponential to linear growth
in training time for each of the surrogate data sets. The 30◦set
(featuring the highest number of training images) transitions to
more linear time scaling earlier in the class range (200 classes),
whilst this behaviour is observed later (400 classes) in the 60◦,
and 90◦data sets; suggesting the experimental set-up to reach a
performance cap at between 16,000, and 4800 images.
R. Real et al. /Procedia CIRP 00 (2019) 000–000 5
Table 2: Number of classes per data set and model training times
Training Time (hh:mm)Avg Epoch Time (mm:ss)
Number of Classes 30◦60◦90◦30◦60◦90◦
5 00:18 00:05 00:04 00:05 00:01 00:01
10 00:35 00:10 00:05 00:10 00:03 00:01
25 01:24 00:25 00:11 00:25 00:07 00:03
50 02:37 00:45 00:23 00:47 00:13 00:07
100 05:07 01:30 00:43 01:33 00:26 00:13
200 10:13 03:00 01:30 03:02 00:52 00:27
400 22:15 05:40 02:55 06:41 01:42 00:53
600 33:10 08:30 04:30 09:58 02:36 01:18
800 44:30 11:40 05:40 13:23 03:24 01:44
1000 55:07 14:20 07:20 16:33 04:12 02:06
5.2. Classiﬁcation accuracy
In contrast to training time, classiﬁcation accuracy decays
almost exponentially as the number of classes per data set in-
creases, showing a positive correlation between training image
quantity and classiﬁcation accuracy. Thus, accurate classiﬁca-
tion of physical artefacts can only be achieved with a small
number of classes.
Top-5 performance (by which the true artefact is named in
the top 5 predictions) displays an aﬃnity to the slope of Top-1
accuracy, although its gradient indicates a slower decay in clas-
siﬁcation performance and is more linear in nature. At 1,000
classes, the CNN was still able to classify the matching model
in the Top-5 predictions 75% of the time. This is a promising in-
dication that CNNs could support design information search &
retrieval applications. It’s worth noting for illustrative purpose
that the average car features 30,000 components.
6. Discussion and future work
This study has explored the performance properties of CNNs
trained on surrogate models for future application to Engineer-
ing Design. Particularly, this work has shown the potential of
adapting existing ‘oﬀ-the-shelf’ CNN architectures to support
Design activities. This is of signiﬁcance as it demonstrates
that engineering organisations can use existing, general purpose
AI/ML architectures rather than requiring expert consultancies
to develop bespoke solutions.
Results have shown that CNNs could be applied to auto-
mated twinning between digital-physical assets if the number
of artefacts to be classiﬁed remains low (<10). However, CNN
performance can quickly deteriorate, this should therefore be
considered in selecting suitable applications for a CNN.
Top-5 results however show stronger performance, and sug-
gest potential for CNN use as a rapid Search & Retrieval tool
for design information. Implementation could occur in a Search
& Retrieval tool that does not require bespoke multi-faceted
search strategies as is typical in engineering due to the chal-
lenges in how one describes physical artefacts, their shape and
form. This could be signiﬁcant in democratising and speeding
up information search processes as design engineers are a mere
photograph away from useful search results that can take them
from the physical to the digital domain.
Fig. 5: Training Results
With these promising results, future work could investigate
the development of bespoke CNN architectures for the activ-
ities described. In particular, it would be interesting to apply
Information Search & Retrieval’s F-score metric to the CNN
and compare it to existing engineering design search strategies.
R. Real et al. /Procedia CIRP 00 (2019) 000–000 6
Optimisation in preparing and training the CNN is also an
area to explore, ensuring development of computationally ef-
ﬁcient and sustainable architectures that support design activ-
ities. This would consider the number of renders required per
artefact and the development of a suitable scene with appropri-
ate lighting to accentuate artefact features as well as the appli-
cation of multiple scenes.
Further work into architecture performance drop-oﬀcould
yield some interesting insights into how CNNs confuse arte-
facts, with confusion matrices providing a means to explore
this. Is it the case that the confusion matrix of a CNN would be
comparable to a similarity matrix produced by humans? This
paper has shown that there remains a wealth of research to be
done to apply Machine Learning in the domain of Design, and
further assess how it supports, or even hinders design processes.
Where prior work has shown the potential of CNNs to sup-
port Engineering Design activities, this paper has taken the next
step in this journey by examining how CNNs scale with increas-
ing class sizes of surrogate CAD models. The results have im-
portant implications in determining whether CNNs can be de-
ployed for twinning and/or information search & retrieval ac-
The results show that existing ‘oﬀ-the-shelf’ CNN architec-
tures could be re-trained and successfully deployed to twin be-
tween physical and digital domains if the number of models
is low (<10). The results also demonstrate the potential for
CNNs to support Information Search & Retrieval activities with
the CNN being able to return a correct matching in the Top-5
for 1,000 model classes. This creates opportunity to use a sin-
gle photograph to eﬀectively retrieve virtual models of physical
artefacts from large corpi.
Together, these results demonstrate the viability of CNN use
in a design context, the eﬀectiveness of the novel surrogate
model training approach, and scope for future opportunities that
may be realised.
The work reported in this paper has been undertaken as
part of the Twinning of digital-physical models during pro-
totyping project. The work was conducted at the Univer-
sity of Bristol, Design and Manufacturing Futures Laboratory
(http://www.dmf-lab.co.uk) Funded by the Engineering
and Physical Sciences Research Council (EPSRC), Grant ref-
erence (EP/R032696/1). The authors would also like to thank
MiniFactory.com and their users for sharing their models.
 Barbedo, J.G.A., 2018. Impact of dataset size and variety on the eﬀective-
ness of deep learning and transfer learning for plant disease classiﬁcation.
Computers and Electronics in Agriculture 153, 46–53. doi:10.1016/j.
 Deng, J., Berg, A.C., Li, K., Fei-Fei, L., 2010. What does classifying more
than 10,000 image categories tell us?, in: Daniilidis, K., Maragos, P., Para-
gios, N. (Eds.), Computer Vision – ECCV 2010, Springer Berlin Heidel-
berg, Berlin, Heidelberg. pp. 71–84.
 Gopsill, J., Goudswaard, M., Jones, D., Hicks, B., 2021. Perceptions on
shape and form, in: International Conference on Engineering Design. In
 Gopsill, J., Jennings, S., 2020. Democratising design through surrogate
model convolutional neural networks of computer aided design repos-
itories. Proceedings of the Design Society: DESIGN Conference 1,
 Hansen, C.A., ¨
Ozkil, A.G., . From Idea to Production: A Retrospective and
Longitudinal Case Study of Prototypes and Prototyping Strategies. Journal
of Mechanical Design , 031115doi:10.1115/1.4045385.
 Jones, D., Snider, C., Nassehi, A., Yon, J., Hicks, B., 2020. Characterising
the Digital Twin: A systematic literature review. CIRP Journal of Manufac-
turing Science and Technology 29, 36–52. doi:10.1016/j.cirpj.2020.
 Kingma, D.P., Ba, J.L., 2015. Adam: A method for stochastic optimiza-
tion, in: 3rd International Conference on Learning Representations, ICLR
2015 - Conference Track Proceedings, International Conference on Learn-
ing Representations, ICLR. arXiv:1412.6980.
 Krizhevsky, A., Sutskever, I., Hinton, G.E., . ImageNet Classiﬁcation with
Deep Convolutional Neural Networks. Technical Report. URL: http:
 Masters, D., Luschi, C., . REVISITING SMALL BATCH TRAIN-
ING FOR DEEP NEURAL NETWORKS. Technical Report.
 Maturana, D., Scherer, S., 2015. Voxnet: A 3d convolutional neural net-
work for real-time object recognition, in: 2015 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 922–928.
 Peng, X., Sun, B., Ali, K., Saenko, K., . Learning Deep Object Detectors
from 3D Models. Technical Report.
 Sarkar, K., Varanasi, K., Stricker, D., . Trained 3D models for CNN based
object recognition. Technical Report.
 Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E., . Multi-view Con-
volutional Neural Networks for 3D Shape Recognition. Technical Report.
 Ulrich, K.T., 2003. Product design and development. Tata McGraw-Hill
 Wang, J., Perez, L., . The Eﬀectiveness of Data Augmentation
in Image Classiﬁcation using Deep Learning. Technical Report.
 Wynn, D.C., Eckert, C.M., 2017. Perspectives on iteration in design and
development. Research in Engineering Design 28, 153–184.
 Zaki, H.F., Shafait, F., Mian, A., 2016. Modeling 2d appearance evolution
for 3d object categorization, in: 2016 international conference on digital
image computing: Techniques and applications (DICTA), IEEE. pp. 1–8.
 Zhang, X., Jia, N., Ivrissimtzis, I., 2020. A study of the eﬀect of the il-
lumination model on the generation of synthetic training datasets URL: