ArticlePDF Available

A Texture-Based Simulation Framework for Pose Estimation

MDPI
Applied Sciences
Authors:

Abstract and Figures

An accurate 3D pose estimation of spherical objects remains challenging in industrial inspections and robotics due to their geometric symmetries and limited feature discriminability. This study proposes a texture-optimized simulation framework to enhance pose prediction accuracy through optimizing the surface texture features of the design samples. A hierarchical texture design strategy was developed, incorporating complexity gradients (low to high) and color contrast principles, and implemented via VTK-based 3D modeling with automated Euler angle annotations. The framework generated 2297 synthetic images across six texture variants, which were used to train a MobileNet model. The validation tests demonstrated that the high-complexity color textures achieved superior performance, reducing the mean absolute pose error by 64.8% compared to the low-complexity designs. While color improved the validation accuracy universally, the test set analyses revealed its dual role: complex textures leveraged chromatic contrast for robustness, whereas simple textures suffered color-induced noise (a 35.5% error increase). These findings establish texture complexity and color complementarity as critical design criteria for synthetic datasets, offering a scalable solution for vision-based pose estimation. Physical experiments confirmed the practical feasibility, yielding 2.7–3.3° mean errors. This work bridges the simulation-to-reality gaps in symmetric object localization, with implications for robotic manipulation and industrial metrology, while highlighting the need for material-aware texture adaptations in future research.
This content is subject to copyright.
Academic Editor: Pedro Couto
Received: 17 March 2025
Revised: 16 April 2025
Accepted: 18 April 2025
Published: 21 April 2025
Citation: Shen, Y.; Kong, M.; Yu, H.;
Liu, L. A Texture-Based Simulation
Framework for Pose Estimation. Appl.
Sci. 2025,15, 4574. https://doi.org/
10.3390/app15084574
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license
(https://creativecommons.org/
licenses/by/4.0/).
Article
A Texture-Based Simulation Framework for Pose Estimation
Yaoyang Shen , Ming Kong * , Hang Yu and Lu Liu *
School of Measurement Technology and Instrumentation, China Jiliang University, Hangzhou 310020, China;
dylan_shenyy@163.com (Y.S.); yuhang@cjlu.edu.cn (H.Y.)
*Correspondence: mkong@cjlu.edu.cn (M.K.); lu_liu@cjlu.edu.cn (L.L.); Tel.: +86-18222286965 (L.L.)
Abstract: An accurate 3D pose estimation of spherical objects remains challenging in
industrial inspections and robotics due to their geometric symmetries and limited fea-
ture discriminability. This study proposes a texture-optimized simulation framework
to enhance pose prediction accuracy through optimizing the surface texture features of
the design samples. A hierarchical texture design strategy was developed, incorporating
complexity gradients (low to high) and color contrast principles, and implemented via VTK-
based 3D modeling with automated Euler angle annotations. The framework generated
2297 synthetic
images across six texture variants, which were used to train a MobileNet
model. The validation tests demonstrated that the high-complexity color textures achieved
superior performance, reducing the mean absolute pose error by 64.8% compared to the
low-complexity designs. While color improved the validation accuracy universally, the
test set analyses revealed its dual role: complex textures leveraged chromatic contrast for
robustness, whereas simple textures suffered color-induced noise (a 35.5% error increase).
These findings establish texture complexity and color complementarity as critical design
criteria for synthetic datasets, offering a scalable solution for vision-based pose estimation.
Physical experiments confirmed the practical feasibility, yielding 2.7–3.3
mean errors.
This work bridges the simulation-to-reality gaps in symmetric object localization, with
implications for robotic manipulation and industrial metrology, while highlighting the
need for material-aware texture adaptations in future research.
Keywords: texture design; dataset construction; pose estimation; deep learning; spherical
particles
1. Introduction
A three-dimensional pose of spherical objects is pivotal for industrial inspection [
1
],
remote sensing applications [
2
], particle tracking [
3
], and Robotic Machining Systems [
4
].
While the traditional methods of pose estimation rely on geometric feature matching [
5
]
or point cloud registration [
6
], they struggle under dynamic lighting and occlusion [
7
].
Recent research advances have combined pose estimation with deep learning to perform
end-to-end pose estimation by training convolutional neural networks with datasets [
8
11
].
Although this method is very convenient, there are still two key challenges in practical
applications: (1) the cost of obtaining the real-world gesture data with intensive annota-
tion [
12
] is too high, and human errors easily occur in the annotation process; (2) the lack of
discriminability of features leads to a limited generalization ability of the model. When
applied to unknown targets or when there are drastic lighting changes, the performance
of the model decreases significantly [
13
,
14
]. Moreover, if the observed object is regularly
symmetric, such as a sphere, the problem of rotation ambiguity inherent in symmetry
cannot be solved [14,15].
Appl. Sci. 2025,15, 4574 https://doi.org/10.3390/app15084574
Appl. Sci. 2025,15, 4574 2 of 15
In recent years, the method of printing characteristic texture patterns on the surface
of a sphere has been used to solve the problem of obtaining particle attitude information
in particle rotation dynamics. Zimmermann et al. [
16
] matched experimental particle
textures with synthetic templates using stereo vision, while Mathai et al. [
17
] optimized
binary surface patterns via cost function minimization. Though effective in controlled
settings, these methods exhibit limited generalization due to texture sensitivity. Will et al.’s
stereolithography-based rendering [
18
] further highlights the trade-off between pattern
complexity and computational feasibility.
Recent advances in texture-driven pose estimation include the work of Zhang K
et al., who discussed in detail the importance of color and texture characteristics in using
neural networks to estimate coal ash content and showed through experimental results and
visualizations that the importance of color in a CNN was as high as 64.77%, while the texture
characteristics contributed 35.23% [
19
]. Wang Zhen et al. constructed a texture optimization
network that combined contextual aggregate information and used the network for texture
restoration to enhance low-light images [
20
]. These studies show that texture features play
a crucial role in deep learning and image processing.
To address these limitations, this work introduces a simulation-driven framework
combining VTK-based synthetic data generation with Tamura texture theory. The specific
research objectives and contents are as follows:
Objective: bridge the simulation–reality gap in attitude estimation for symmetrical
objects using synthetic data and texture theory.
Texture design: direction-sensitive surface textures governed by Tamura texture princi-
ples (coarseness, contrast, and directionality) with six variants (three complexity levels
×grayscale/color).
Data generation: VTK-based synthetic data with automated pose space sampling
(3intervals) and implicit Euler angle encoding.
Validation: experimental evaluation using 3D-printed textured spheres and real-world
attitude measurements.
This study establishes and experimentally validates a set of texture design criteria for
spherical objects. Using 3D-printed textured spheres and actual attitude measurements, we
bridge the simulation–reality gap in the positioning of symmetrical objects. This provides a
robust attitude estimation for robotics and industrial metrology while offering a scalable
synthetic data paradigm.
2. Materials and Methods
This study proposes a simulation dataset construction scheme based on representa-
tional texture. Firstly, based on the texture-related theory, a series of textures are designed
with different complexities. Secondly, the texture attachment and pose simulation of
spherical particles are established through the VTK simulation library, and automatic
pose change and pose information annotation are designed. Finally, a batch of image
label datasets with pose information is obtained, which can support the end-to-end deep
learning model training.
2.1. Texture Design
Texture, as a visual attribute of the surface of an object, reflects the statistical character-
istics of the microstructure of the surface of an object and contains a wealth of structural
information, which plays a crucial role in image recognition and object attitude estimation.
In the field of computer vision, texture is usually defined as the spatial distribution pattern
of gray pixel values within an image area.
Appl. Sci. 2025,15, 4574 3 of 15
Early research shows that the key to texture perception is to extract the basic features
of texture. Tamura et al. proposed six basic texture features, including coarseness, contrast,
directionality, line-likeness, regularity, and roughness [
21
]. These features can effectively
describe the statistical characteristics of texture and provide a theoretical basis for texture
analysis and recognition. Recent studies have further quantified the relationship between
texture features and research accuracy. Dzier ˙
zak et al. demonstrated that an optimized
feature selection from 290 texture descriptors (including gray-level statistics and wavelet
transforms) significantly improves osteoporosis detection in CT scans, with the k-nearest
neighbors algorithm achieving 96.75% accuracy using 50 prioritized features [
22
]. Trevisani
et al. proposed a method to quantify surface roughness through multi-scale texture analysis
and constructed a scalable roughness index. These indexes can reveal terrain texture
characteristics at different spatial scales, provide a new dimension for terrain analysis, and
improve the accuracy of geomorphic analysis [
23
]. He et al. further studied the impact of
texture distribution on visual perception and proposed rules for how texture distribution
affects visual perception [
24
]. Their research showed that specific texture distribution
patterns can enhance the visual system’s perception, thereby improving the accuracy of
object recognition.
The stripe spacing annular ratio usually refers to the ratio of the spacing (d) between
the adjacent stripes in a ring or periodic texture pattern to the characteristic size of the ring
structure, such as the circumference L or radius r. Its mathematical expression is as follows:
η=d/2πr×100% (1)
This ratio reflects the distribution density of the fringes in the ring structure and is a
key parameter of texture design. In optical measurement or machine vision, if the fringe
spacing is too small (the ring ratio is too low), the imaging system may have aliasing effects
due to insufficient sampling, resulting in the fringe not being able to be accurately resolved.
According to Nyquist’s sampling theorem, the sampling frequency needs to be at least
twice the resolution of the system [
25
]. In display technology, if the spatial frequency
corresponding to the pixel spacing is
fp
, the frequency of the moiré fringe needs to meet
the following:
fMoirefp
2d
2πr5% (2)
In the fringe projection system, the ring ratio should be
5% to avoid the phenomenon
of a moiré fringe on the display screen, resulting in phase unwrapping errors [26].
We considered this in the process of designing the texture patterns, so the fringe
spacing ratio is controlled above 5%. Based on the above theoretical basis, three textures
with similar proportion distribution and fringe spacing but different complexities are
designed. Texture1 is a low-complexity texture composed only of a horizontal stripe texture.
Texture2 is a medium-complexity texture, adding a columnar stripe texture, and the texture
differentiation is higher. Texture3 adds more columnar stripes and short horizontal stripes
for a more complex texture area.
CIE-Lab color difference (
E
) is an internationally used quantitative index of color
difference. The minimum color difference perceptible to the human eye is
E
1, but
E5
is needed to be reliably distinguished. When
E
is higher than a certain condition,
the color difference is significant (the human eye can clearly distinguish it), and it is suitable
for robust recognition in machine vision systems. Studies have shown that
E
> 30 can
resist light changes, noise interference, and sensor errors to ensure the stability of color
features in complex environments [27]. In this case, we improve on the original black and
white texture. Because in the Lab color domain the corresponding brightness of black and
white is 0 and 100, which are two extreme colors, while the corresponding Lab values of
Appl. Sci. 2025,15, 4574 4 of 15
blue are about 30, 68, and
112 and green is about 46,
52, and 49, the color difference
between the two
E
200 and the color difference of the four colors is far greater than 30.
Therefore, a color texture pattern with the same texture distribution is designed, composed
of black, white, green, and blue.
A total of six texture patterns, with three different texture complexities and
two versions
(black and white and color), are designed, as shown in Figure 1below.
Figure 1a–f correspond to six different textures. To further differentiate, these textures
are named in order: Texture_1_bw, Texture_1_color, Texture_2_bw, Texture_2_color, Tex-
ture_3_bw, and Texture_3_color. Texture1-3 corresponds to the three textures of different
complexities mentioned above, while bw represents the texture in black and white type
and color represents the color type.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 4 of 15
features in complex environments [27]. In this case, we improve on the original black and
white texture. Because in the Lab color domain the corresponding brightness of black and
white is 0 and 100, which are two extreme colors, while the corresponding Lab values of
blue are about 30, 68, and 112 and green is about 46, 52, and 49, the color dierence
between the two ∆𝐸 200 and the color dierence of the four colors is far greater than
30. Therefore, a color texture paern with the same texture distribution is designed, com-
posed of black, white, green, and blue.
A total of six texture paerns, with three dierent texture complexities and two ver-
sions (black and white and color), are designed, as shown in Figure 1 below. Figure 1a–f
correspond to six dierent textures. To further dierentiate, these textures are named in
order: Texture_1_bw, Texture_1_color, Texture_2_bw, Texture_2_color, Texture_3_bw,
and Texture_3_color. Texture1-3 corresponds to the three textures of dierent complexi-
ties mentioned above, while bw represents the texture in black and white type and color
represents the color type.
(a) (b)
(c) (d)
(e) (f)
Figure 1. Figures (a,b) are the black and white type and color type with a low texture complexity;
Figures (c,d) are the black and white type and color type with a medium texture complexity; and
Figures (e,f) are the black and white type and color type with a high texture complexity.
2.2. Dataset Construction
An automated simulation dataset is developed, building on methodology based on
the Visualization Toolkit (VTK) [28]. As an open-source 3D visualization framework,
VTK’s object-oriented design and cross-platform characteristics support complex scene
Figure 1. Figures (a,b) are the black and white type and color type with a low texture complexity;
Figures (c,d) are the black and white type and color type with a medium texture complexity; and
Figures (e,f) are the black and white type and color type with a high texture complexity.
2.2. Dataset Construction
An automated simulation dataset is developed, building on methodology based on
the Visualization Toolkit (VTK) [
28
]. As an open-source 3D visualization framework, VTK’s
object-oriented design and cross-platform characteristics support complex scene modeling
and high-precision rendering. The core process includes three modules: 3D modeling,
attitude control, and automatic annotation. The core process is implemented in a Pycharm
environment using Python 3.8 and its integrated VTK library.
Appl. Sci. 2025,15, 4574 5 of 15
A.
3D modeling and texture mapping
The VTK geometric modeling class is used to build a sphere model, and its radius and
spatial position are set by parameterization. In order to realize the discernability of the
surface features, the designed texture is mapped to the surface of the model, and the texture
coordinate transformation mechanism is used to ensure its continuity and consistency on
the surface. The virtual imaging environment is configured with a simulation camera, a
multi-light source system, and a physical rendering engine, where the camera parameters
(focal length, field angle, and sensor size) strictly mimic real industrial inspection equipment
to ensure the physical consistency of the generated images.
B. Attitude control and data generation
The Euler angle rotation sequence of the sphere around the x-/y-/z-axes is defined. In
order to avoid the angular coupling effect, the independent axis incremental step method
is adopted and a sampling interval of 3
is set to generate a discrete attitude set. Each
pose corresponds to a unique coordinate transformation matrix. After the VTK rendering
and pipeline real-time calculation, the corresponding two-dimensional projection image is
outputted. A single projected image corresponds to unique pose information. In addition,
to improve the efficiency of the data generation, a batch script is designed to automate the
process of iterating, rendering, and storing the attitude parameters.
C. Pose coding and dataset construction
An implicit encoding method for the pose parameters, based on the file name, is
proposed. The Euler angle (pitch angle, yaw angle, and roll angle) is embedded into the
image file name in the format of “X_Y_Z.png” to avoid data management redundancy
caused by independent annotation files. Through the design of the parsing script, the
angle value in the file name is converted into a three-dimensional vector, which is used
as the truth value label for the network training. The final dataset contains the image
pose information that supports the end-to-end deep learning model training. Some of the
datasets generated under different textures are shown in Figure 2below.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 5 of 15
modeling and high-precision rendering. The core process includes three modules: 3D
modeling, aitude control, and automatic annotation. The core process is implemented in
a Pycharm environment using Python 3.8 and its integrated VTK library.
A. 3D modeling and texture mapping
The VTK geometric modeling class is used to build a sphere model, and its radius
and spatial position are set by parameterization. In order to realize the discernability of
the surface features, the designed texture is mapped to the surface of the model, and the
texture coordinate transformation mechanism is used to ensure its continuity and con-
sistency on the surface. The virtual imaging environment is congured with a simulation
camera, a multi-light source system, and a physical rendering engine, where the camera
parameters (focal length, eld angle, and sensor size) strictly mimic real industrial inspec-
tion equipment to ensure the physical consistency of the generated images.
B. Aitude control and data generation
The Euler angle rotation sequence of the sphere around the x-/y-/z-axes is dened. In
order to avoid the angular coupling eect, the independent axis incremental step method
is adopted and a sampling interval of 3° is set to generate a discrete aitude set. Each pose
corresponds to a unique coordinate transformation matrix. After the VTK rendering and
pipeline real-time calculation, the corresponding two-dimensional projection image is
outpued. A single projected image corresponds to unique pose information. In addition,
to improve the eciency of the data generation, a batch script is designed to automate the
process of iterating, rendering, and storing the aitude parameters.
C. Pose coding and dataset construction
An implicit encoding method for the pose parameters, based on the le name, is pro-
posed. The Euler angle (pitch angle, yaw angle, and roll angle) is embedded into the image
le name in the format of “X_Y_Z.png to avoid data management redundancy caused
by independent annotation les. Through the design of the parsing script, the angle value
in the le name is converted into a three-dimensional vector, which is used as the truth
value label for the network training. The nal dataset contains the image pose information
that supports the end-to-end deep learning model training. Some of the datasets gener-
ated under dierent textures are shown in Figure 2 below.
(a)
(b)
(c)
Figure 2. Cont.
Appl. Sci. 2025,15, 4574 6 of 15
Appl. Sci. 2025, 15, x FOR PEER REVIEW 6 of 15
(d)
(e)
(f)
Figure 2. Partial datasets generated by dierent textures. Figures (af) correspond to Texture_1_bw
-Texture_3_color.
3. Simulations and Design Criterion
In this section, a series of simulations are conducted to verify the method’s eective-
ness. Firstly, the simulation image training set generated by dierent textured particles in
the previous chapter is utilized. By training a CNN model, the validation error on the
validation set is veried. Secondly, the test error on the test set is further veried by visual
and quantitative analyses. By comparing the performance of dierent texture datasets, a
set of design rules for spherical grain texture are dened, and the nal texture paern is
determined.
MobileNet is chosen as the model for this experiment [29], and a composite loss func-
tion, consisting of the mean absolute error (MAE) and the Pseudo-Huber loss function, is
designed. For all the training in this section, 1600 training sets, 597 verication sets, and
100 test sets were used. AdamW is chosen as the optimizer, using a learning rate of 0.0005
and scheduling with a cosine annealing strategy. In addition, each model is trained with
20 epochs. The expression of the loss function is as follows:
𝐿𝑜𝑠𝑠(𝑦,𝑦)=
1
𝑁 𝑎
 , 𝑖𝑓 𝑎 𝛿
𝛿(
1+(𝑎
𝛿) 1), 𝑖𝑓 𝑎> 𝛿 (3)
where 𝑛 is the number of samples; 𝑎 = |𝑦 𝑦|; and 𝛿=1° in this example.
3.1. Verication Set Performance Comparison
In this section, the performance of the CNN on the validation sets under dierent
texture datasets is visually and quantitatively analyzed. The specic quantitative analysis
results are shown in Table 1, and the visual analysis results are shown in Figure 3. In the
table, (Err_X, Err_Y, Err_Z) represents the average angular error of the aitude angle cor-
responding to each coordinate axis; (Std_X, Std_Y, Std_Z) stands for standard deviation
(Std); MAE stands for total mean absolute error; and RMSE stands for total root mean
square error, which accounts for error magnitudes and is less prone to cancellation eects
compared to the mean error metrics.
As shown in Table 1, the model’s pose estimation accuracy improves progressively
with an increasing texture complexity. Texture1 yield the highest errors (MAE = 1.17for
Figure 2. Partial datasets generated by different textures. Figures (af) correspond to Texture_1_bw
-Texture_3_color.
3. Simulations and Design Criterion
In this section, a series of simulations are conducted to verify the method’s effective-
ness. Firstly, the simulation image training set generated by different textured particles
in the previous chapter is utilized. By training a CNN model, the validation error on the
validation set is verified. Secondly, the test error on the test set is further verified by visual
and quantitative analyses. By comparing the performance of different texture datasets,
a set of design rules for spherical grain texture are defined, and the final texture pattern
is determined.
MobileNet is chosen as the model for this experiment [
29
], and a composite loss
function, consisting of the mean absolute error (MAE) and the Pseudo-Huber loss function,
is designed. For all the training in this section, 1600 training sets, 597 verification sets, and
100 test sets were used. AdamW is chosen as the optimizer, using a learning rate of 0.0005
and scheduling with a cosine annealing strategy. In addition, each model is trained with
20 epochs. The expression of the loss function is as follows:
Loss(y,ˆ
y) =
1
N
n
i1
a2,i f a δ
δ2 r1+a
δ21!,i f a >δ
(3)
where nis the number of samples; a=|yˆ
y|; and δ=1in this example.
3.1. Verification Set Performance Comparison
In this section, the performance of the CNN on the validation sets under different
texture datasets is visually and quantitatively analyzed. The specific quantitative analysis
results are shown in Table 1, and the visual analysis results are shown in Figure 3. In
the table, (Err_X, Err_Y, Err_Z) represents the average angular error of the attitude angle
corresponding to each coordinate axis; (Std_X, Std_Y, Std_Z) stands for standard deviation
(Std); MAE stands for total mean absolute error; and RMSE stands for total root mean
square error, which accounts for error magnitudes and is less prone to cancellation effects
compared to the mean error metrics.
Appl. Sci. 2025,15, 4574 7 of 15
Table 1. Validation set performance index under different textures.
Texture Err_X Err_Y Err_Z Std_X Std_Y Std_Z MAE RMSE
Texture_1_bw 1.7485 0.663 1.121 0.706 0.529 0.873 1.178 1.469
Texture_1_color
0.769 0.654 1.499 0.493 0.454 1.534 0.974 1.189
Texture_2_bw 0.764 0.663 1.097 0.539 0.540 0.873 0.842 1.078
Texture_2_color
0.710 0.658 1.004 0.568 0.874 0.657 0.791 1.037
Texture_3_bw 0.305 0.654 0.517 0.246 0.449 0.426 0.492 0.661
Texture_3_color
0.284 0.367 0.596 0.211 0.242 0.466 0.416 0.543
All indicators are measured in degrees ().
As shown in Table 1, the model’s pose estimation accuracy improves progressively
with an increasing texture complexity. Texture1 yield the highest errors (MAE = 1.178
for
black and white and 0.974
for color; RMSE = 1.469
and 1.189
), while medium-complexity
textures (Texture2) reduce the MAE to 0.842
and 0.791
(RMSE = 1.078
and 1.037
),
respectively. Texture3, which is composed of complex textures, has the best performance,
with the MAE = 0.492
in black and white type and 0.416
in color type and the RMSE
reaching 0.661
and 0.543
, which is 64.8% higher than texture 1. Notably, the RMSE values
follow a similar decreasing trend as the MAE but emphasize a greater error magnitude
reduction in the high-complexity textures. This trend shows that the complexity of the
texture features may be highly correlated with the model’s pose estimation ability.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 7 of 15
black and white and 0.974° for color; RMSE = 1.46 and 1.189°), while medium-complex-
ity textures (Texture2) reduce the MAE to 0.842° and 0.791° (RMSE = 1.078° and 1.037°),
respectively. Texture3, which is composed of complex textures, has the best performance,
with the MAE = 0.492° in black and white type and 0.41 in color type and the RMSE
reaching 0.661° and 0.543°, which is 64.8% higher than texture 1. Notably, the RMSE val-
ues follow a similar decreasing trend as the MAE but emphasize a greater error magnitude
reduction in the high-complexity textures. This trend shows that the complexity of the
texture features may be highly correlated with the models pose estimation ability.
Color textures consistently outperform their black and white counterparts across all
the complexity levels, with a lower MAE, RMSE, and Std, suggesting color information
strengthens feature discriminability. However, in low-complexity scenarios (Tex-
ture_1_color), the z-axis errors increase signicantly, implying potential noise interference
from color in simple paerns. Figure 3 further reveals that a higher texture complexity
concentrates error distributions in low-value regions, particularly for color textures, with
improved consistency across all axes.
Table 1. Validation set performance index under dierent textures.
Texture Err_X Err_Y Err_Z Std_X Std_Y Std_Z MAE RMSE
Texture_1_bw 1.7485 0.663 1.121 0.706 0.529 0.873 1.178 1.469
Texture_1_color 0.769 0.654 1.499 0.493 0.454 1.534 0.974 1.189
Texture_2_bw 0.764 0.663 1.097 0.539 0.540 0.873 0.842 1.078
Texture_2_color 0.710 0.658 1.004 0.568 0.874 0.657 0.791 1.037
Texture_3_bw 0.305 0.654 0.517 0.246 0.449 0.426 0.492 0.661
Texture_3_color 0.284 0.367 0.596 0.211 0.242 0.466 0.416 0.543
All indicators are measured in degrees (°).
Figure 3. Cont.
Appl. Sci. 2025,15, 4574 8 of 15
Appl. Sci. 2025, 15, x FOR PEER REVIEW 8 of 15
Figure 3. Figures (af) plot the distribution of the verication error and Std under dierent com-
plexities of black and white/color textures.
3.2. Test Set Performance Comparison
In this section, the performance of 100 test set images generated for each texture is
analyzed and compared. The trained model is used to estimate the aitude of the test set,
and the error distribution is visualized. The following visual analysis diagram is drawn,
including a true forecast distribution scaer plot, an error line plot, and an error frequency
domain histogram. In the scaer plot, the more concentrated the data distributed on the
line where 𝑥 = 𝑦, the more accurate the aitude prediction is; otherwise, the larger the
deviation is. The line chart can clearly and intuitively picture the specic error distribution
trend; the histogram makes a statistical analysis of the individual axis error distribution.
The detailed analysis is shown in Figure 4 below.
In the test set, texture 1s color version underperforms its black and white counter-
part, as seen in Figure 4a,b. Despite a lower MAE on the validation set, the color version
exhibits a signicant z-axis error deviation (around ) in the test set, highlighting the
potential negative impact of color information on simple textures. Figure 4c,d demon-
strate improved overall performance with more complex textures, with more accurate
predictions on the test set. In this case, the color information enhances the prediction per-
formance. While Figure 4c shows a larger z-axis error, Figure 4d presents a z-axis error
distribution similar to the other axes. Figure 4e,f reveal the optimal model performance
on the test set with complex textures. The model predictions closely match the actual val-
ues, with error distributions within nearly 1° on each axis. Notably, Figure 6 shows a sig-
nicant reduction in the z-axis error deviation with color information, resulting in a more
uniform error distribution across all the axes.
These results indicate that color information has a more positive eect on feature
extraction and overall model performance, particularly with complex textures.
Figure 3. Figures (af) plot the distribution of the verification error and Std under different complexi-
ties of black and white/color textures.
Color textures consistently outperform their black and white counterparts across all the
complexity levels, with a lower MAE, RMSE, and Std, suggesting color information strength-
ens feature discriminability. However, in low-complexity scenarios (Texture_1_color), the
z-axis errors increase significantly, implying potential noise interference from color in
simple patterns. Figure 3further reveals that a higher texture complexity concentrates
error distributions in low-value regions, particularly for color textures, with improved
consistency across all axes.
3.2. Test Set Performance Comparison
In this section, the performance of 100 test set images generated for each texture is
analyzed and compared. The trained model is used to estimate the attitude of the test set,
and the error distribution is visualized. The following visual analysis diagram is drawn,
including a true forecast distribution scatter plot, an error line plot, and an error frequency
domain histogram. In the scatter plot, the more concentrated the data distributed on the
line where
x=y
, the more accurate the attitude prediction is; otherwise, the larger the
deviation is. The line chart can clearly and intuitively picture the specific error distribution
trend; the histogram makes a statistical analysis of the individual axis error distribution.
The detailed analysis is shown in Figure 4below.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 8 of 15
Figure 3. Figures (af) plot the distribution of the verication error and Std under dierent com-
plexities of black and white/color textures.
3.2. Test Set Performance Comparison
In this section, the performance of 100 test set images generated for each texture is
analyzed and compared. The trained model is used to estimate the aitude of the test set,
and the error distribution is visualized. The following visual analysis diagram is drawn,
including a true forecast distribution scaer plot, an error line plot, and an error frequency
domain histogram. In the scaer plot, the more concentrated the data distributed on the
line where 𝑥 = 𝑦, the more accurate the aitude prediction is; otherwise, the larger the
deviation is. The line chart can clearly and intuitively picture the specic error distribution
trend; the histogram makes a statistical analysis of the individual axis error distribution.
The detailed analysis is shown in Figure 4 below.
In the test set, texture 1s color version underperforms its black and white counter-
part, as seen in Figure 4a,b. Despite a lower MAE on the validation set, the color version
exhibits a signicant z-axis error deviation (around ) in the test set, highlighting the
potential negative impact of color information on simple textures. Figure 4c,d demon-
strate improved overall performance with more complex textures, with more accurate
predictions on the test set. In this case, the color information enhances the prediction per-
formance. While Figure 4c shows a larger z-axis error, Figure 4d presents a z-axis error
distribution similar to the other axes. Figure 4e,f reveal the optimal model performance
on the test set with complex textures. The model predictions closely match the actual val-
ues, with error distributions within nearly 1° on each axis. Notably, Figure 6 shows a sig-
nicant reduction in the z-axis error deviation with color information, resulting in a more
uniform error distribution across all the axes.
These results indicate that color information has a more positive eect on feature
extraction and overall model performance, particularly with complex textures.
Figure 4. Cont.
Appl. Sci. 2025,15, 4574 9 of 15
Figure 4. Figures (af) plot the true forecast distribution scatter plot, error line plot, and error
frequency domain histogram under different complexities of black and white/color textures.
Appl. Sci. 2025,15, 4574 10 of 15
In the test set, texture 1’s color version underperforms its black and white counterpart,
as seen in Figure 4a,b. Despite a lower MAE on the validation set, the color version exhibits
a significant z-axis error deviation (around 2
) in the test set, highlighting the potential
negative impact of color information on simple textures. Figure 4c,d demonstrate improved
overall performance with more complex textures, with more accurate predictions on the test
set. In this case, the color information enhances the prediction performance. While Figure 4c
shows a larger z-axis error, Figure 4d presents a z-axis error distribution similar to the
other axes. Figure 4e,f reveal the optimal model performance on the test set with complex
textures. The model predictions closely match the actual values, with error distributions
within nearly 1
on each axis. Notably, Figure 6 shows a significant reduction in the z-axis
error deviation with color information, resulting in a more uniform error distribution across
all the axes.
These results indicate that color information has a more positive effect on feature
extraction and overall model performance, particularly with complex textures.
3.3. Results, Discussion, and Design Criterion
In the performance test of the test set, the MAE, RMSE, and Std of each texture
corresponding to the test set are recorded, and the texture characteristics and performance
on the validation set are summarized, as shown in Table 2below.
Table 2. Summary of test data for different textures.
Texture Description Val_
Mae
Val_
RMSE Test_Mae Test_
RMSE Test_Std
Texture_1_bw Black and white; low complexity 1.178 1.469 1.052 1.411 0.997, 0.625, 0.916
Texture_1_color Color; low complexity 0.974 1.189 1.32 1.543 0.649, 0.48, 0.632
Texture_2_bw Black and white; medium complexity 0.842 1.078 1.039 1.335 0.6, 0.639, 1.059
Texture_2_color Color; medium complexity 0.791 1.037 1.008 1.237 0.659, 0.737, 0.762
Texture_3_bw Black and white; high complexity 0.492 0.661 0.758 0.911 0.327, 0.665, 0.483
Texture_3_color Color; high complexity 0.416 0.543 0.731 0.876 0.421, 0.422, 0.549
All indicators are measured in degrees ().
The experimental results demonstrate a strong positive correlation between texture
complexity and pose estimation accuracy. The validation set performance progressively im-
proves with increasing complexity, and the high-complexity color textures (Texture_3_color)
achieve the optimal results (MAE: 0.416
, RMSE: 0.543
, and per-axis Std
0.549
). No-
tably, the grayscale and color variants exhibit parallel trends, with MAE reductions of 58.2%
and 57.3%, respectively, from Texture_1 to Texture_3, underscoring complexity’s universal
benefit across the color modalities.
However, the test set analyses reveal critical nuances in the model generalization.
While color enhances the validation accuracy universally, its real-world impact proves
complexity-dependent: low-complexity color textures (Texture_1_color) suffer a 35.5% test
error increase, accompanied by a 9.4% RMSE degradation (1.411
to 1.543
), over their
grayscale counterparts, suggesting that chromatic noise dominates when structural features
are sparse. Conversely, high-complexity color textures (Texture_3_color) maintain superior
test performance (MAE: 0.731
, RMSE: 0.876
), with the color-to-grayscale RMSE advantage
persisting (0.876
and 0.911
), despite the domain shift. This duality establishes texture
complexity as a prerequisite for effective color utilization in pose estimation systems.
Based on the analysis results of the above experimental data and the previous theories,
a design criterion for the surface texture of spherical particles for attitude estimation is
finally established:
(1) Orientation uniqueness: it should be ensured that each view corresponds to a
unique orientation, so that the model can distinguish between different poses;
Appl. Sci. 2025,15, 4574 11 of 15
(2) Proper proportion distribution: the proportions of the texture areas and blank areas
should be appropriate, and the pixel ratio should be close to 1:1;
(3) Stripe spacing control: the stripe spacing should be moderate, and the annular
ratio of the stripe spacing should be greater than or equal to 5%;
(4) Complex texture design: the texture design should include complex texture parts
with obvious features to strengthen the features;
(5) Color complementarity: a CIE-Lab color difference
E
> 30 color texture combina-
tion of black, white, green, and blue should be used to enhance the color information.
Finally, the color texture with a high texture complexity is chosen as the surface texture
of the spherical particles.
4. Experiments
To verify the accuracy of the texture design, texture is attached to the simulated
particle model for modeling, and the physical spherical particles are 3D-printed. At the
same time, a real machine vision system is built using an industrial CMOS camera, a triaxial
angular displacement table, and a personal computer. Figure 5shows a photo of the entire
system setup.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 11 of 15
4. Experiments
To verify the accuracy of the texture design, texture is aached to the simulated par-
ticle model for modeling, and the physical spherical particles are 3D-printed. At the same
time, a real machine vision system is built using an industrial CMOS camera, a triaxial
angular displacement table, and a personal computer. Figure 5 shows a photo of the entire
system setup.
In the experiments, the pose of the spherical particle is changed using an angular
displacement table. A camera is used to collect the 2D projection image corresponding to
the 3D pose, which is then transmied to a computer. The obtained images are processed
to be applicable to neural network algorithms. In this experiment, the neural network used
is still MobileNet trained to work with texture.
The above system is used to collect 40 real images with dierent aitude angles, and,
after processing the images, the MobileNet network is used to estimate the actual aitude.
The error between the estimated result and the actual aitude is analyzed statistically. The
detailed analysis is shown in Table 3 below, and a more specic visualization is shown in
Figure 6 below.
Figure 5. The machine vision system.
Figure 6. Box plot of test image error.
Table 3. Error analysis of test data.
Parameter Mean Error Std RMSE Maximum
X-axis 2.717 2.34 3.585 11.843
Y-axis 3.275 3.718 4.955 15.273
Figure 5. The machine vision system.
In the experiments, the pose of the spherical particle is changed using an angular
displacement table. A camera is used to collect the 2D projection image corresponding to
the 3D pose, which is then transmitted to a computer. The obtained images are processed
to be applicable to neural network algorithms. In this experiment, the neural network used
is still MobileNet trained to work with texture.
The above system is used to collect 40 real images with different attitude angles, and,
after processing the images, the MobileNet network is used to estimate the actual attitude.
The error between the estimated result and the actual attitude is analyzed statistically. The
detailed analysis is shown in Table 3below, and a more specific visualization is shown in
Figure 6below.
Table 3. Error analysis of test data.
Parameter Mean Error Std RMSE Maximum
X-axis 2.717 2.34 3.585 11.843
Y-axis 3.275 3.718 4.955 15.273
Z-axis 3.223 2.031 3.810 8.511
Appl. Sci. 2025,15, 4574 12 of 15
Appl. Sci. 2025, 15, x FOR PEER REVIEW 11 of 15
4. Experiments
To verify the accuracy of the texture design, texture is aached to the simulated par-
ticle model for modeling, and the physical spherical particles are 3D-printed. At the same
time, a real machine vision system is built using an industrial CMOS camera, a triaxial
angular displacement table, and a personal computer. Figure 5 shows a photo of the entire
system setup.
In the experiments, the pose of the spherical particle is changed using an angular
displacement table. A camera is used to collect the 2D projection image corresponding to
the 3D pose, which is then transmied to a computer. The obtained images are processed
to be applicable to neural network algorithms. In this experiment, the neural network used
is still MobileNet trained to work with texture.
The above system is used to collect 40 real images with dierent aitude angles, and,
after processing the images, the MobileNet network is used to estimate the actual aitude.
The error between the estimated result and the actual aitude is analyzed statistically. The
detailed analysis is shown in Table 3 below, and a more specic visualization is shown in
Figure 6 below.
Figure 5. The machine vision system.
Figure 6. Box plot of test image error.
Table 3. Error analysis of test data.
Parameter Mean Error Std RMSE Maximum
X-axis 2.717 2.34 3.585 11.843
Y-axis 3.275 3.718 4.955 15.273
Figure 6. Box plot of test image error.
It can be seen from Table 3that the model trained with the virtual dataset still has a
low MAE (2.7~3.3
) in the practical application, which means good practical application
prospects. In the field of attitude estimation for symmetric objects such as spheres, Zim-
mermann obtained a matching error of about 2
and a weighted error of 3
by matching
synthetic textures with stereo vision. Mathai’s method, based on optimized surface patterns,
achieved an MAE of about 4
in a controlled environment with SNR = 2. Song extended the
matching algorithm to 6-DoF pose detection of complex parts by constructing a multi-view
template library offline based on CAD models, achieving position errors < 2 mm and
pose errors
3
[
30
]. In contrast, the texture optimization framework proposed in this
paper achieves a lower average error (2.7–3.3
) in real scenes, indicating its effectiveness.
However, the deviation along the y-axis is somewhat large, with a Std of 3.718
, an RMSE
of 4.955
, and a maximum prediction error of 15
, indicating that the model’s predictions
in this direction are less stable. The test error box diagram in Figure 6further reflects the
error distribution of each coordinate axis. As can be seen from the figure, the median error
of each axis is relatively close, distributed around 2–3
, and the upper edge of the box is
almost within 4
, indicating that 75% of the data prediction is more accurate, indicating
that this method is feasible. However, there are several outliers on the x-axis and y-axis,
indicating that the prediction performance of the current model is not very stable and there
will be deviations. Further research is needed to understand and mitigate the anomalies for
more robust real-world performance.
5. Discussion and Conclusions
The development of robust 3D pose estimation systems for symmetric objects, such
as spheres, presents a crucial advancement in industrial automation, robotic manipula-
tion, precision metrology [
31
,
32
], and the biomedical field [
33
]. Existing methods have
explored the use of printed texture patterns on spheres for attitude determination, yet these
techniques often struggle with limited generalization due to texture sensitivity and the
trade-off between pattern complexity and computational demands. This work builds upon
these existing efforts by addressing the limitations of traditional approaches through a
texture-optimized design tailored for a synthetic dataset. Our results demonstrate that a
high-complexity texture design, incorporating both multi-scale directional patterns and
high chromatic contrast, leads to significant performance improvements. This enables
substantial reductions in the MAE compared to low-complexity textures, thus directly
tackling the dual bottlenecks of data scarcity and rotational ambiguity inherent in spherical
object localization. This enhancement aligns with Tamura’s texture theory, where multi-
Appl. Sci. 2025,15, 4574 13 of 15
scale directional features and chromatic contrast improve feature discriminability, enabling
robust pose prediction under varying viewpoints.
Theoretically, this work establishes a novel paradigm for texture-driven synthetic data
generation. The observed correlation between texture complexity and model accuracy
highlights the importance of the feature information provided by the texture feature design.
The dual role of color—enhancing the validation accuracy while introducing noise in low-
complexity scenarios—provides new insights into color–texture interactions. Here are the
three key findings from this study:
Texture complexity dominance: high-complexity color textures (Texture_3_color) achieved
the optimal accuracy, reducing errors by 64.8% compared to low-complexity designs.
Color–texture synergy: color enhanced performance in complex textures (with the test
MAE achieving 0.731
and RMSE achieving 0.876
) but degraded the low-complexity
results, emphasizing complexity as a prerequisite for effective color utilization.
Real-world generalization: the physical tests confirmed the feasibility, with the average
attitude error measured by the real system reaching around 3
and 75% of the test
data errors being less than 4
, which ensures the feasibility of training the network
with 2D data for 3D attitude estimation.
These results provide a foundation for texture-driven synthetic data systems with
applications in industrial detection and the related applications of target attitude estimation
and motion analysis.
This study is subject to two key limitations. First, the simulation framework assumes
ideal material–light interactions, which may not fully capture real-world scenarios with
reflective or translucent surfaces. Second, the rotational symmetries in spherical objects
introduce inherent ambiguity in pose estimation. Due to the rotational symmetry of a
sphere, an insufficient texture design can result in the object’s appearance after rotation
being indistinguishable from its original state, negatively impacting the measurement accu-
racy. This is particularly pronounced under extreme lighting variations, where cameras
struggle to capture subtle differences in surface textures (such as color gradients or tiny
marks). This difficulty further weakens the system’s ability to differentiate between various
rotation angles. Future work should integrate dynamic lighting models, like ray tracing
and material-aware texture mapping, for more realistic simulations. To overcome rotational
ambiguity, research should focus on designing more distinctive textures with invariant fea-
tures. Furthermore, exploring sensor fusion with IMUs and incorporating prior knowledge
of object motion could enhance pose estimation robustness. Moreover, future work should
prioritize the further optimization of the feature extraction capabilities and generalization
performance of datasets and models to achieve better pose estimation accuracy.
Author Contributions: Conceptualization, Y.S. and M.K.; Methodology, Y.S. and M.K.; Software,
Y.S.; Validation, Y.S.; Formal analysis, M.K., H.Y. and L.L.; Investigation, Y.S.; Resources, M.K.; Data
curation, L.L.; Writing—original draft, Y.S.; Writing—review & editing, H.Y. and L.L.; Visualization,
H.Y. and L.L.; Supervision, H.Y. and L.L.; Project administration, M.K. and L.L.; Funding acquisition,
M.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding authors.
Conflicts of Interest: The authors declare no conflict of interest.
Appl. Sci. 2025,15, 4574 14 of 15
Abbreviations
The following abbreviations are used in this manuscript:
MAE Mean absolute error
Std Standard deviation
RMSE Root mean square error
References
1.
Zhou, S.; Cao, W.; Wang, Q.; Zhou, M.; Zheng, X.; Lou, J.; Chen, Y. KMFDSST Algorithm-Based Rotor Attitude Estimation for a
Spherical Motor. IEEE Trans. Ind. Inform. 2023,20, 4463–4472. [CrossRef]
2.
Hansen, J.G.; de Figueiredo, R.P. Active Object Detection and Tracking Using Gimbal Mechanisms for Autonomous Drone
Applications. Drones 2024,8, 55. [CrossRef]
3.
Zhou, Z.; Zeng, C.; Tian, X.; Zeng, Q.; Yao, R. A Discrete Quaternion Particle Filter Based on Deterministic Sampling for IMU
Attitude Estimation. IEEE Sens. J. 2021,21, 23266–23277. [CrossRef]
4.
Hao, D.; Zhang, G.; Zhao, H.; Ding, H. A Combined Calibration Method for Workpiece Positioning in Robotic Machining Systems
and a Hybrid Optimization Algorithm for Improving Tool Center Point Calibration Accuracy. Appl. Sci. 2025,15, 1033. [CrossRef]
5.
Jiang, J.; Xia, N.; Yu, X. A feature matching and compensation method based on importance weighting for occluded human pose
estimation. J. King Saud Univ. Comput. Inf. Sci. 2024,36, 102061. [CrossRef]
6.
Nadeem, U.; Bennamoun, M.; Togneri, R.; Sohel, F.; Rekavandi, A.M.; Boussaid, F. Cross domain 2D-3D descriptor matching for
unconstrained 6-DOF pose estimation. Pattern Recognit. 2023,142, 109655. [CrossRef]
7.
Yu, X.; Zhuang, Z.; Koniusz, P.; Li, H. 6dof object pose estimation via differentiable proxy voting loss. arXiv 2020, arXiv:2002.03923.
[CrossRef]
8.
Hou, H.; Xu, Q.; Lan, C.; Lu, W.; Zhang, Y.; Cui, Z.; Qin, J. UAV Pose Estimation in GNSS-Denied Environment Assisted by
Satellite Imagery Deep Learning Features. IEEE Access 2020,9, 6358–6367. [CrossRef]
9.
Bogaart, M.V.D.; Jacobs, N.; Hallemans, A.; Meyns, P. Validity of Deep Learning-Based Motion Capture Using DeepLabCut to
Assess Proprioception in Children. Appl. Sci. 2025,15, 3428. [CrossRef]
10.
Park, S.; Jeong, W.-J.; Manawadu, M.; Park, S.-Y. 6-DoF Pose Estimation from Single RGB Image and CAD Model Retrieval Using
Feature Similarity Measurement. Appl. Sci. 2025,15, 1501. [CrossRef]
11.
Kubicki, B.; Janowski, A.; Inglot, A. Multimodal Augmented Reality System for Real-Time Roof Type Recognition and Visualiza-
tion on Mobile Devices. Appl. Sci. 2025,15, 1330. [CrossRef]
12.
Hoda ˇn, T.; Sundermeyer, M.; Drost, B.; Labbé, Y.; Brachmann, E.; Michel, F.; Rother, C.; Matas, J. BOP challenge 2020 on 6D object
localization. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part
II 16. Springer International Publishing: Cham, Switzerland, 2020; pp. 577–594. [CrossRef]
13.
Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings
of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 4561–4570, CVPR 2019 Open Access Repository.
14. Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003.
15.
Asher, J.M.; Hibbard, P.B.; Webb, A.L. Perceived intrinsic 3D shape of faces is robust to changes in lighting direction, image
rotation and polarity inversion. Vis. Res. 2024,227, 108535. [CrossRef] [PubMed]
16.
Zimmermann, R.; Gasteuil, Y.; Bourgoin, M.; Volk, R.; Pumir, A.; Pinton, J.-F. International Collaboration for Turbulence Tracking
the dynamics of translation and absolute orientation of a sphere in a turbulent flow. Rev. Sci. Instruments 2011,82, 033906.
[CrossRef] [PubMed]
17.
Mathai, V.; Neut, M.W.M.; van der Poel, E.P.; Sun, C. Translational and rotational dynamics of a large buoyant sphere in turbulence.
Exp. Fluids 2016,57, 51. [CrossRef]
18. Will, J.B.; Krug, D. Dynamics of freely rising spheres: The effect of moment of inertia. J. Fluid Mech. 2021,927, A7. [CrossRef]
19.
Zhang, K.; Wang, W.; Cui, Y.; Lv, Z.; Fan, Y.; Zhao, X. Deep learning-based estimation of ash content in coal: Unveiling the
contributions of color and texture features. Measurement 2024,233, 114632. [CrossRef]
20.
Wang, Z.; Zhang, X. Contextual recovery network for low-light image enhancement with texture recovery. J. Vis. Commun. Image
Represent. 2024,99, 104050. [CrossRef]
21.
Tamura, H.; Mori, S.; Yamawaki, T. Textural Features Corresponding to Visual Perception. IEEE Trans. Syst. Man, Cybern. 1978,8,
460–473. [CrossRef]
22.
Dzier˙
zak, R. Impact of Texture Feature Count on the Accuracy of Osteoporotic Change Detection in Computed Tomography
Images of Trabecular Bone Tissue. Appl. Sci. 2025,15, 1528. [CrossRef]
Appl. Sci. 2025,15, 4574 15 of 15
23.
Trevisani, S.; Guth, P.L. Terrain Analysis According to Multiscale Surface Roughness in the Taklimakan Desert. Land 2024,13,
1843. [CrossRef]
24.
He, T.; Zhong, Y.; Isenberg, P.; Isenberg, T. Design Characterization for Black-and-White Textures in Visualization. IEEE Trans. Vis.
Comput. Graph. 2023,30, 1019–1029. [CrossRef] [PubMed]
25. Goodman, J.W. Introduction to Fourier Optics; Roberts and Company Publishers: Colorado, CO, USA, 2005.
26.
Zhang, S. High-speed 3D shape measurement with structured light methods: A review. Opt. Lasers Eng. 2018,106, 119–131.
[CrossRef]
27.
Luo, M.R.; Cui, G.; Rigg, B. The development of the CIE 2000 colour-difference formula: CIEDE2000. Color Res. Appl. 2001,26,
340–350. [CrossRef]
28.
Schroeder, W.; Martin, K.M.; Lorensen, W.E. The Visualization Toolkit an Object-Oriented Approach to 3D Graphics; Prentice-Hall, Inc.:
Englewood Cliffs, NJ, USA, 1998; pp. 10–52.
29.
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient
Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [CrossRef]
30.
Song, W.; Guo, C.; Shen, L.; Zhang, Y. 3D pose measurement for industrial parts with complex shape by monocular vision. In
Proceedings of the SPIE 10827, Sixth International Conference on Optical and Photonic Engineering (icOPEN 2018), Shanghai,
China, 8–11 May 2018; p. 1082712. [CrossRef]
31.
Balntas, V.; Doumanoglou, A.; Sahin, C.; Sock, J.; Kouskouridas, R.; Kim, T.K. Pose Guided RGBD Feature Learning for 3D Object
Pose Estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29
October 2017. [CrossRef]
32.
Cui, Y.; Hildenbrand, D. Pose estimation based on Geometric Algebra. GraVisMa 2009,73, 7. Available online: https://www.
researchgate.net/publication/286141050_Pose_estimation_based_on_Geometric_Algebra (accessed on 2 January 2025).
33.
Ci, J.; Wang, X.; Rapado-Rincón, D.; Burusa, A.K.; Kootstra, G. 3D pose estimation of tomato peduncle nodes using deep keypoint
detection and point cloud. Biosyst. Eng. 2024,243, 57–69. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Proprioceptive deficits can lead to impaired motor performance. Therefore, accurately measuring proprioceptive function in order to identify deficits as soon as possible is important. Techniques based on deep learning to track body landmarks in simple video recordings are promising to assess proprioception (joint position sense) during joint position reproduction (JPR) tests in clinical settings, outside the laboratory and without the need to attach markers. Fifteen typically developing children participated in 90 knee JPR trials and 21 typically developing children participated in 126 hip JPR trials. Concurrent validity of two-dimensional deep-learning-based motion capture (DeepLabCut) to measure the Joint Reproduction Error (JRE) with respect to laboratory-based optoelectronic three-dimensional motion capture (Vicon motion capture system, gold standard) was assessed. There was no significant difference in the hip and knee JRE measured with DeepLabCut and Vicon. Two-dimensional deep-learning-based motion capture (DeepLabCut) is valid to assess proprioception with respect to the gold standard in typically developing children. Tools based on deep learning, such as DeepLabCut, make it possible to accurately measure joint angles in order to assess proprioception without the need of a laboratory and to attach markers, with a high level of automatization.
Article
Full-text available
This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object is one of the important techniques in 3D computer vision. However, in addition to 6-DoF pose estimation, retrieval and alignment of the matching CAD model with the RGB object should be performed for various industrial applications such as eXtended Reality (XR), Augmented Reality (AR), robot’s pick and place, and so on. This paper addresses 6-DoF pose estimation and CAD model retrieval problems simultaneously and quantitatively analyzes how much the 6-DoF pose estimation affects the CAD model retrieval performance. This study consists of two main steps. The first step is 6-DoF pose estimation based on the PoseContrast network. We enhance the structure of PoseConstrast by adding variance uncertainty weight and feature attention modules. The second step is the retrieval of the matching CAD model by an image similarity measurement between the CAD rendering and the RGB object. In our experiments, we used 2000 RGB images collected from Google and Bing search engines and 100 CAD models from ShapeNetCore. The Pascal3D+ dataset is used to train the pose estimation network and DELF features are used for the similarity measurement. Comprehensive ablation studies about the proposed network show the quantitative performance analysis with respect to the baseline model. Experimental results show that the pose estimation performance has a positive correlation with the CAD retrieval performance.
Article
Full-text available
The aim of this study is to compare the classification accuracy depending on the number of texture features used. This study used 400 computed tomography (CT) images of trabecular spinal tissue from 100 patients belonging to two groups (50 control patients and 50 patients diagnosed with osteoporosis). The descriptors of texture features were based on a gray level histogram, gradient matrix, RL matrix, event matrix, an autoregressive model, and wavelet transformation. From the 290 obtained texture features, the features with fixed values were eliminated and structured according to the feature importance ranking. The classification performance was assessed using 267, 200, 150, 100, 50, 20, and 10 texture features to build classifiers. The classifiers applied in this study included Naive Bayes, Multilayer Perceptron, Hoeffding Tree, K-nearest neighbors, and Random Forest. The following indicators were used to assess the quality of the classifiers: accuracy, sensitivity, specificity, precision, negative predictive value, Matthews correlation coefficient, and F1 score. The highest performance was achieved by the K-Nearest Neighbors (K = 1) and Multilayer Perceptron classifiers. KNN demonstrated the best results with 50 features, attaining a highest F1 score of 96.79% and accuracy (ACC) of 96.75%. MLP achieved its optimal performance with 100 features, reaching an accuracy and F1 score of 96.50%. This demonstrates that building a classifier using a larger number of features, without a selection process, allows us to achieve high classification effectiveness and holds significant diagnostic value.
Article
Full-text available
The utilization of augmented reality (AR) is becoming increasingly prevalent in the integration of virtual reality (VR) elements into the tangible reality of the physical world. It facilitates a more straightforward comprehension of the interconnections, interdependencies, and spatial context of data. Furthermore, the presentation of analyses and the combination of spatial data with annotated data are facilitated. This is particularly evident in the context of mobile applications, where the combination of real-world and virtual imagery facilitates enhances visualization. This paper presents a proposal for the development of a multimodal system that is capable of identifying roof types in real time and visualizing them in AR on mobile devices. The current approach to roof identification is based on data made available by public administrations in an open-source format, including orthophotos and building contours. Existing computer processing technologies have been employed to generate objects representing the shapes of building masses, and in particular, the shape of roofs, in three-dimensional (3D) space. The system integrates real-time data obtained from multiple sources and is based on a mobile application that enables the precise positioning and detection of the recipient’s viewing direction (pose estimation) in real time. The data were integrated and processed in a Docker container system, which ensured the scalability and security of the solution. The multimodality of the system is designed to enhance the user’s perception of the space and facilitate a more nuanced interpretation of its intricacies. In its present iteration, the system facilitates the extraction and classification/generalization of two categories of roof types (gable and other) from aerial imagery through the utilization of deep learning methodologies. The outcomes achieved suggest considerable promise for the advancement and deployment of the system in domains pertaining to architecture, urban planning, and civil engineering.
Article
Full-text available
This paper addresses the machining requirements for large aerospace structural components using robotic systems and proposes a method for rapid workpiece positioning that combines the simplicity of vision-based positioning with the precision of contact-based methods. To enhance the accuracy of robot calibration, a novel approach utilizing a ruby probe for sphere-to-sphere contact calibration of the Tool Center Point (TCP) is introduced. A robot contact calibration model is formulated, transforming the calibration process into a nonlinear least squares (NLS) optimization problem. To tackle the challenges of NLS optimization, a hybrid LM-D algorithm is developed, integrating the Levenberg–Marquardt (L-M) and DIviding RECTangle (DIRECT) algorithms in an iterative process to achieve the global optimum. This algorithm ensures computational efficiency while maximizing the likelihood of finding a globally optimal solution. An iterative convergence termination criterion for TCP calibration is established to determine global convergence, further enhancing the algorithm’s efficiency. Experimental validation was performed on industrial robots, demonstrating the proposed algorithm’s superior performance in global convergence and iteration efficiency compared to traditional methods. This research provides an effective and practical solution for TCP calibration in industrial robotic applications.
Article
Full-text available
Surface roughness, interpreted in the wide sense of surface texture, is a generic term referring to a variety of aspects and scales of spatial variability of surfaces. The analysis of solid earth surface roughness is useful for understanding, characterizing, and monitoring geomorphic factors at multiple spatiotemporal scales. The different geomorphic features characterizing a landscape exhibit specific characteristics and scales of surface texture. The capability to selectively analyze specific roughness metrics at multiple spatial scales represents a key tool in geomorphometric analysis. This research presents a simplified geostatistical approach for the multiscale analysis of surface roughness, or of image texture in the case of images, that is highly informative and interpretable. The implemented approach is able to describe two main aspects of short-range surface roughness: omnidirectional roughness and roughness anisotropy. Adopting simple upscaling approaches, it is possible to perform a multiscale analysis of roughness. An overview of the information extraction potential of the approach is shown for the analysis of a portion of the Taklimakan desert (China) using a 30 m resolution DEM derived from the Copernicus Glo-30 DSM. The multiscale roughness indexes are used as input features for unsupervised and supervised learning tasks. The approach can be refined both from the perspective of the multiscale analysis as well as in relation to the surface roughness indexes considered. However, even in its present, simplified form, it can find direct applications in relation to multiple contexts and research topics.
Article
Full-text available
Greenhouse production of fruits and vegetables in developed countries is challenged by labour scarcity and high labour costs. Robots offer a good solution for sustainable and cost-effective production. Acquiring accurate spatial information about relevant plant parts is vital for successful robot operation. Robot perception in greenhouses is challenging due to variations in plant appearance, viewpoints, and illumination. This paper proposes a keypoint-detection-based method using data from an RGB-D camera to estimate the 3D pose of peduncle nodes, which provides essential information to harvest the tomato bunches. Specifically, this paper proposes a method that detects four anatomical landmarks in the colour image and then integrates 3D point-cloud information to determine the 3D pose. A comprehensive evaluation was conducted in a commercial greenhouse to gain insight into the performance of different parts of the method. The results showed: (1) high accuracy in object detection, achieving an Average Precision (AP) of AP@0.5=0.96; (2) an average Percentage of Detected Joints (PDJ) of the keypoints of PhDJ@0.2 = 94.31%; and (3) 3D pose estimation accuracy with mean absolute errors (MAE) of 11 o and 10 o for the relative upper and lower angles between the peduncle and main stem, respectively. Furthermore, the capability to handle variations in viewpoint was investigated, demonstrating the method was robust to view changes. However, canonical and higher views resulted in slightly higher performance compared to other views. Although tomato was selected as a use case, the proposed method has the potential to be applied to other greenhouse crops, such as pepper, after fine-tuning.