ArticlePDF Available

Comparison of supervised machine learning and variable selection methods for body weight prediction of growth pigs using image processing data

Authors:

Abstract and Figures

This research aimed to compare statistical methods (random forest, RIDGE, LASSO, and elastic net regression) for the prediction of body weight in purebred and crossbred pigs reared in Brazil. This prediction was based on dorsal-view images obtained from video image processing. The study involved 69 animals belonging to breeds such as Large White, Piau, Duroc × Large White, and Piau × Large White. The data collection spanned 144 days, with measurements taken at approximately 20-day intervals, totaling eight measurements for each animal throughout their growth stages. Image acquisition was carried out in individual pens using an Intel RealSense Depth D435 digital camera. The features back area, back perimeter, back width, and body depth were extracted from the images. Pearson’s correlation analysis was conducted to assess the relationship between live weight and these features. The dataset was randomly divided into a training dataset (65%) and a test dataset (35%), and model training was performed by five-fold cross-validation balanced according to the growth stage, which was divided into three groups. This procedure was repeated 100 times, and the resulting metrics were taken as the average of the 100 repetitions. Although with a slight difference, the random forest method outperformed the others with the highest average R² value (0.87), as well as the lowest average RMSE (14.32) and average MAE (10.13) values. Consequently, the random forest algorithm proved to be the most effective in predicting body weight. The back area, back width, and back perimeter were the most important variables in the model. 2D image; back area; crossbred pig; penalized regression; precision livestock farming; random forest
Content may be subject to copyright.
Brazilian Journal of Animal Science
e-ISSN 1806-9290
www.rbz.org.br
R. Bras. Zootec., 53:e20240001, 2024
https://doi.org/10.37496/rbz5320240001
Animal production systems and agribusiness
Full-length research article
Comparison of supervised machine
learning and variable selection
methods for body weight prediction
of growth pigs using image
processing data
1 Universidade Federal de Viçosa, Departamento de Zootecnia, Viçosa, MG, Brasil.
2 Universidade Federal de Viçosa, Departamento de Estatística, Viçosa, MG, Brasil.
ABSTRACT - This research aimed to compare statistical methods (random forest, RIDGE,
LASSO, and elastic net regression) for the prediction of body weight in purebred and
crossbred pigs reared in Brazil. This prediction was based on dorsal-view images obtained
from video image processing. The study involved 69 animals belonging to breeds such as
Large White, Piau, Duroc × Large White, and Piau × Large White. The data collection
spanned 144 days, with measurements taken at approximately 20-day intervals, totaling
eight measurements for each animal throughout their growth stages. Image acquisition
was carried out in individual pens using an Intel RealSense Depth D435 digital camera.
The features back area, back perimeter, back width, and body depth were extracted from
the images. Pearson’s correlation analysis was conducted to assess the relationship
between live weight and these features. The dataset was randomly divided into a training

cross-validation balanced according to the growth stage, which was divided into three
groups. This procedure was repeated 100 times, and the resulting metrics were taken as
the average of the 100 repetitions. Although with a slight difference, the random forest
method outperformed the others with the highest average R² value (0.87), as well as the
lowest average RMSE (14.32) and average MAE (10.13) values. Consequently, the random
forest algorithm proved to be the most effective in predicting body weight. The back area,
back width, and back perimeter were the most important variables in the model.
Keywords: 2D image, back area, crossbred pig, penalized regression, precision livestock
farming, random forest
Eula Regina Carrara1, Polliany da Costa Santos Oliveira1, Layla
Cristien de Cássia Miranda Dias1, Weverton Gomes da Costa2, Aline
Rabello Conceição1, Pedro Henrique Silva Braga1, Mario Luiz
Chizzotti1, Renata Veroneze1, Erica Beatriz Schultz1*
1. Introduction
Traditionally, pig weighing is performed using manual methods that require the animal to be
physically restrained. This approach is labor-intensive and time-consuming, especially on large
farms with many animals (Li et al., 2014). Usually, manual weight measurements are taken at the end
of each phase and in many commercial farms, only pen weights are registered. It affects the ability of

variability (Fernandes et al., 2019).
   
weight can help overcome these limitations. Two-dimensional video images have some advantages
for body weight recording, such as low cost, ease of use and no need to handle animals, avoidance
*Corresponding author:
erica.schultz@ufv.br
Received: January 29, 2024
Accepted: July 23, 2024
How to cite: Carrara, E. R.; Oliveira, P. C. S.; Dias,
L. C. C. M.; Costa, W. G.; Conceição, A. R.; Braga,
P. H. S.; Chizzotti, M. L.; Veroneze, R. and Schultz,
E. B. 2024. Comparison of supervised machine
learning and variable selection methods for
body weight prediction of growth pigs using
image processing data. Revista Brasileira de
Zootecnia 53:e20240001.
https://doi.org/10.37496/rbz5320240001
Editors:
Marcos Inácio Marcondes
Valdir Ribeiro Junior
Copyright: This is an open access article
distributed under the terms of the
Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/),
which permits unrestricted use, distribution,
and reproduction in any medium, provided the
original work is properly cited.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
2
of stress and less labor, and the possibility of more frequent data collection. In addition, current
          
phenotypes, including images and videos, which yield thousands of complex phenotypes.
The application of machine learning techniques and penalized regression emerges as an approach for
the analysis of complex phenotypes. Supervised statistical learning with regularization, such as tree-
based methods, boosting, and penalized regression with variable selection, have been widely used
in studies involving production animals (Gorczyca et al., 2018; Nguyen et al., 2020; He et al., 2021).
Among them, the random forest (RF) algorithm is frequently used for data mining and prediction
analysis (Chen and Ishwaran, 2012). The RF method combines a bagging sampling approach and
random feature selection to assemble a set of decision trees to provide controlled variation in the
modeling process (Breiman, 2001).
In regression problems, the aim is to minimize the sum of squared errors (SSE). This objective,

regression methods: ridge regression (RIDGE; Hoerl and Kennard, 1970), least absolute shrinkage
and selection operator regression (LASSO; Tibshirani, 1996), and elastic net regression (ENET; Zou
and Hastie, 2005).
Currently, a restricted quantity of ongoing research is dedicated to constructing predictive models
for estimating pig body weight using images (Brandl and Jørgensen, 1996; Fernandes et al., 2019;
Yu et al., 2021). Additionally, most of the studies are carried out using only commercial breeds. No
investigations have utilized images and machine learning methods to predict the body weight of fat-
type pig breeds such as Piau, a Brazilian breed, or their associated crossbreeds.
Thus, this study aimed to compare the statistical methods RF, RIDGE, LASSO, and ENET to predict
the body weight of purebred and crossbred pigs based on dorsal-view images obtained from video
image processing.
2. Material and Methods
Research on animals was conducted according to the institutional committee on animal use
(014/2022).
2.1. Data collection

animals of the Large White (LL; n = 16), Piau (PP; n = 14), Duroc × Large White (DL; n = 18), and Piau
× Large White (PL; n = 21) breeds were evaluated. The animals were allocated in individual concrete

              
days at intervals of approximately 20 days, comprising eight measurements per animal from the
       
the dataset partition of the model training, as outlined: group 1 - weaning (28.84±1.76 days old) and
nursery period (48.85±1.78 days old); group 2 - at end of the nursery (63.84±1.76 days old) and three
measurements during growth (78.95±1.86, 98.95±1.86, 119.97±1.87 days old); and group 3 - during
   
The births occurred for one week, contributing to the observed variation in the age of the animals.
During the experimental phase, some animals died, and some records were missing, resulting in
a varying number of measurements for each day. There were 483 measurements (Table 1), being
138, 241, and 104 in groups 1, 2, and 3, respectively. The complete data (i.e., considering all growth
stages) was evaluated to ensure a larger dataset.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
3
2.2. Video, frame processing, and features extracting
Immediately after the animals were taken to the weighing scale to measure their live weight,
individual imaging was collected using an Intel RealSense Depth D435 digital camera with 1920 ×
1080 pixels resolution. The camera was positioned on a tripod at a distance and a height of 1.5 m
from the animals. Videos were taken lasting between 30 and 40 s for each animal, focusing on the
dorsal and lateral regions.
The videos were processed to select manually the best frames of individual dorsal and lateral
positions of each animal. This step was performed using the Intel RealSense Viewer video software.
The features of the back area, back perimeter, back width, and body depth (Figure 1) were extracted
A - back perimeter and back area; B - back width; C - body depth.
Figure 1 - Examples of the features extracted from the images of pigs.
Table 1 - Number of animals measured at each group, by breed
Breed
Group 1 Group 2 Group 3
Weaning Nursery Leaving the
nursery
Growth
1
Growth
2
Growth
3Finishing Leaving

LL 16 15 15 15 15 14 14 15
PP 14 15 16 13 13 13 13 10
DL 18 18 18 16 16 15 16 13
PL 21 21 20 14 14 14 13 10
Total for stage 69 69 69 58 58 56 56 48
LL - Large White; PP - Piau; DL - Duroc × Large White; PL - Piau × Large White.


R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
4
from the images, all of them in pixels. The features were extracted semi-automatically using the
ImageJ free software. The back area, back perimeter, and back width were extracted from the back

extracted from the lateral region between the 12th and 13th thoracic vertebrae.
2.3. Statistical analysis

features. In sequence, the data were partitioned randomly into two parts: the training dataset (65%)
and the test dataset (35%), balanced by grouped growth stage (group 1, group 2, and group 3), and
four statistical methods were used to construct predictive models: RF and the penalized regression
methods RIDGE, LASSO, and ENET. The analyses were performed using the R packages caret (Kuhn,
2008), randomForest (Liaw and Wiener, 2002), and glmnet (Friedman et al., 2010).
The data was partitioned into a training set and a test set in a balanced manner by group, as this
approach yielded superior results in a previous analysis (not shown). This analysis partitioned the
data by group, by breed, and by group + breed. The partitioning by group alone yielded the most
optimal metrics. Similarly, partitioning ratios spanning from 50 to 90% in increments of 5 were
evaluated, with the 65% ratio exhibiting the most optimal metrics (not shown).
The RF assembly was performed following these steps: a collection of bootstrap samples (ntree)
from the initial dataset was generated; construction of an individual tree for each bootstrap dataset
with random selection of variables (mtrynodesize);
building predictions for new data with the information gathered from the ensemble of trees; and
utilizing the data that was not included in the original bootstrap sample (test data) to compute the
out-of-bag (OOB) error rate. In our study, the following hyperparameters were tested in a training
dataset by grid search 5-fold cross-validation: ntree equal to 150, 250, 350, and 500; mtry equal to 2,
3, and 4; and nodesizentree equal to 500 and mtry and
nodesize equal to 2 as best hyperparameters.
In penalized regressions with variable selection, a penalty was added in the SSE each time the
parameter had a high value, causing the parameter shrinkage:
i
n (yi – y
^
i)2j
pj
2j
pj|),

j
2) are penalized, and

j 
    
 

         

After applying the trained model to the test dataset, predicted pig weights were generated for each
algorithm. The predicted and observed weights were then compared using simple linear regression,
  2), root mean square error (RMSE), and
mean absolute error (MAE). All the steps, from the train-test split to the obtaining of the metrics,
were carried out 100 times. This was done to ensure the reliability of the results and avoid any
potential bias or high prediction performance due to chance if we had only conducted one train-test

R2 score and the lowest average RMSE and average MAE values, the most effective algorithm for
weight prediction was chosen.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
5
3. Results
The growth pattern of the animals varied among the breeds (Figure 2, Table 2). Concerning the
regression metrics between the predicted and observed data, there was minimal variation throughout
the 100 repetitions of the train-test partition (Figure 3). The prediction algorithms showed little
difference between the average metrics for precision and accuracy, with R² values between 0.85 and
0.87, RMSE values between 14.32 and 15.23, and MAE values between 10.13 and 11.19. Furthermore,
DL
Live weight
Growth stage
Group 1
Group 2
Group 3
150
100
50
0
Group 1
Group 2
Group 3
Group 1
Group 2
Group 3
Group 1
Group 2
Group 3
LL PL PP
DL - Duroc × Large White; LL - Large White; PL - Piau × Large White; PP - Piau.


Figure 2 - Live weight measurements by breed and by group.
Table 2 - Mean, standard deviation (SD), median, and minimum (MIN) and maximum (MAX) values of live weight
by group and by breed
Group Breed Mean ± SD Median MIN MAX
Group 1
DL 12.00±4.38 11.48 6.50 20.45
LL 11.99±4.60 11.65 3.65 21.45
PL 10.79±3.88 10.20 4.55 18.45
PP 6.95±2.62 6.50 3.05 13.10
Group 2
DL 51.26±24.32 41.00 21.35 99.20
LL 47.24±20.55 42.20 19.90 20.55
PL 41.71±18.90 33.40 17.65 78.40
PP 27.83±13.77 27.20 6.15 57.40
Group 3
DL 129.59±19.17 121.80 103.20 160.80
LL 111.86±16.11 110.40 82.20 139.20
PL 101.43±14.33 97.00 78.80 127.60
PP 71.06±12.16 71.20 51.40 92.20
LL - Large White; PP - Piau; DL - Duroc × Large White; PL - Piau × Large White.


R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
6
there was no difference between the ENET models with different alpha values (Table 3). Although with
a slight difference, the random forest method outperformed the others with the highest average R²
value (0.87), as well as the lowest average RMSE (14.32) and average MAE (10.13) values (Figure 4).
The correlation between live weight and the features related to the back (back area, back perimeter,
and back width) was high and positive, ranging from 0.78 to 0.92 (Figure 5). However, the correlation
between lateral height and live weight was low (0.42). The higher correlation was between live weight
and back area and was equal to 0.97. The most important feature in building the RF predictive model
Table 3 - Descriptive1      2), root mean square error (RMSE), and mean
absolute error (MAE) considering 100 repetitions
Model R2RMSE MAE
RF 0.87±0.02 [0.81:0.91] 14.32±0.75 [12.19:16.24] 10.13±0.60 [8.73:11.72]
RIDGE 0.85±0.02 [0.79:0.89] 15.22±0.80 [13.03:17.15] 11.19±0.59 [9.67:12.71]
LASSO 0.85±0.02 [0.80:0.89] 15.23±0.80 [13.10:17.06] 11.13±0.59 [9.65:12.52]
ENET1 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET2 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET3 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET4 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET5 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET6 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET7 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET8 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
ENET9 0.85±0.02 [0.80:0.89] 15.18±0.80 [13.05:17.04] 11.14±0.59 [9.65:12.55]
RF - random forest; RIDGE - ridge regression; LASSO - least absolute shrinkage and selection operator regression; ENET1 - elastic net with
               

1 Mean ± standard deviation [minimum value: maximum value].
1.00
0.75
0.50
0.25
0.00
15
10
5
0
15
10
5
0
025
R2
MAERMSE
50 75 100
0255075 100
02550
Repetition
Model
Random Forest RIDGE LASSO Elastic net 1 Elastic net 2 Elastic net 3
Elastic net 4 Elastic net 5 Elastic net 6 Elastic net 7 Elastic net 8 Elastic net 9
75 100
Figure 3 - Regression metrics between predicted and observed data throughout the 100 repetitions of train-test
partition.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
7
was the back area (Figure 6). The importance of variables in the RIDGE, LASSO, and all ENET approaches
presented the same pattern as that in the RF approach (Figure 7). There was no difference between the
ENET models (ENET1 to ENET9), and thus, only one graph was plotted for ENET1-9 (Figure 7).
4. Discussion
Four statistical methods (RF and RIDGE, LASSO, and ENET) were evaluated to predict the body weight
of purebred and crossbred pigs based on dorsal-view images obtained from video image processing.
Regarding live weights, the purebred PP exhibited the lowest values across all growth stages, as
expected, due to their smaller size. The average weaning weight in this population of the PP breed is
6.60 kg with a standard deviation of 1.84 kg (Oliveira et al., 2023). In our study, the average weight
100
Predicted weight (kg)
Observed weight (kg)
50
Random Forest
Real vs. predict weight regression
R2 = 0.87
RMSE = 14.32
MAE = 10.13
150
100
50
0
0 150
Figure 4 - Linear regression of the observed weight and the weight predicted by random forest regression
  2), root mean square
error (RMSE), and mean absolute error (MAE).
Back
area 0.97 0.79 0.44 0.92
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0.81 0.50 0.91
0.60 0.78
0.42
Back
perimeter
Back
width
Body
depth
Live
weight

Figure 5 - Pearson’s correlation among live weight and the features obtained from image processing for back area,
back perimeter, back width, and body depth of evaluated pigs.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
8
Back
area
Back
perimeter
Back
width
Features
Importance
40
30
20
10
0
Body
depth
Figure 6 - Importance of the image features back area, back perimeter, back width, and body depth in the prediction
model for body weight in pigs using the random forest regression approach.

elastic net models. However, there was no difference among these models, and thus only one graph was plotted for elastic net 1-9.
Figure 7 - Contribution of the features in the RIDGE, LASSO, and elastic net regression models.
Back area
Back perimeter
Back width
Body depth
Back area
Back perimeter
Back width
Body depth
Features
RIDGE LASSO Elastic net 1-9
20
15
10
5
0
Coeficient value corrected by the standard deviation
of the features
Back area
Back perimeter
Back width
Body depth
at the weaning stage for PP was slightly higher, with a mean of 6.95 kg and a standard deviation of
2.62 kg. This is because animals in both the weaning (leaving the farrowing stage) and nursery stages
were included in the same “group”, increasing the mean and the deviation. This was required because
the data volume would be lower when assessed separately according to growth phases. The average

studies for the Piau breed at the same stages (35.00±4.00 kg and 65.20±4.20 kg, respectively) (Silva
et al., 2019).
The purebred LL exhibited intermediate live weight values compared with the crosses PL and DL, with
lower values for PL and higher values for DL. The lower live weight values for PL can be attributed
to the use of the smaller purebred PP as parental. Similarly, the DL crosses exhibited the highest live
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
9
weight values throughout their growth, as both purebred parentals are larger-sized animals. In fact,
Duroc animals can show an average daily weight gain of 1,062 g and reach 100 kg in 137 days, while
Large White animals can show an average daily weight gain of 1,016 g and reach 100 kg in 147.5 days
(Tretyakova et al., 2021).
The RF method was slightly better than RIDGE, LASSO, and all ENET (ENET1 to ENET9), with a higher
average R² value and lower average RMSE and average MAE values. Thus, RF was the most effective
algorithm for predicting body weight. There were no disparities within the penalized regression with
variable selection methods, likely due to the limited number of assessed features, and consequently,
a low chance of parameter penalization across the different methods.
Other studies point to the superiority of RF for weight prediction problems using features extracted
from digital images, in plants and animals. Duc et al. (2023) used several features extracted by digital
images (e.g., area size, perimeter length, length, width, and others) to predict soybean seed weight.
They demonstrated the superiority of the RF method over the RIDGE, LASSO, and ENET methods.
Sant’Ana et al. (2021) used eight machine learning models to predict body weight in sheep using a
variety of features related to the shape, size, and angles of digital images, and the RF model was the
approach that obtained the best performance.
Although the precision was high in the RF approach (87%), the MAE pointed to a variation of
up to 10.13 kg, indicating that the model may not be accurate, mainly at younger ages. There is
greater variability in the observed weight in group 3 (i.e., growing and finishing stages) (Figure 2).
This greater dispersion leads to greater variability in the predicted weights and, consequently,
increases the prediction error. Additionally, only 11 animals (2.28%) had a live weight above
140 kg, which makes it difficult to predict heavier animals. Additionally, it is important to note
that data variability is crucial in training robust models, while data with little variability may
negatively impact their predictability.
In this sense, a study conducted by Fernandes et al. (2019) used features of body measurements
and shape descriptors extracted from digital images to predict body weight in pigs in the nursery
 2 of 0.92 and MAE of 0.35 for the models
 2 of 0.80 and MAE of 0.30, when

(with less variability) contributes to increasing the accuracy of the model, without substantially
modifying the MAE.
Given the results from Fernandes et al. (2019), we re-analyzed our database using the RF model and
2 (0.74) and higher average
RMSE (18.89) and average MAE (14.91) compared with the analyses performed with the complete
database, i.e., in our study, precision and accuracy are higher when we include all animals (nursery,
    
correlation between live weight and the features evaluated, which corroborates with the importance
of each feature in building the predictive model, which pointed to the back area as the most important
feature. Similarly, body depth showed the lowest correlation with live weight and was the feature of
the lowest importance in building the predictive model.
The greater importance of features related to the back region can be explained by the fact that the
region is large and representative of the pig’s body size, as they accompany animal growth. In the
study by Brandl and Jørgensen (1996), the area and perimeter of the back of pigs were used to create
a predictive model for body weight using spline functions, and the model showed an R² of 0.98,
indicating high precision in predicting body weight using these features. Fernandes et al. (2019) used
various measurements of the back of pigs, such as area and various length and height measurements,
to build a predictive model for body weight using three-dimensional images and reported a high
precision of 0.92, with an MAE of 3.5%. The dorsal region is widely explored in animal prediction

the dorsal area is best captured in these situations.
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
10
The lower importance of body depth in predicting live weight in our study can be explained by
               
the images was limited and the images could only be taken from above and could not capture the
curvature of the belly, for example. In addition, the animals had longer body lengths and smaller body
depths, increasing the importance of the back area for body weight prediction models and reducing
the importance of body depth in that case.
                
growing pigs with a precision of over 87% using the RF method. It is hoped that the advent of real-
time data collection using images will contribute to advances in body weight monitoring in pigs,
especially images related to the animals’ backs.
5. Conclusions
The random forest machine learning algorithm was slightly better than RIDGE, LASSO, and elastic
net penalized regression algorithms for predicting body weight of pigs. It was possible to predict the
pigs’ body weight by using image measurements and the random forest algorithm with an R2 of 87%,
with the area, width, and perimeter of the back being the most important variables.


Author Contributions
Conceptualization: Schultz, E. B. Data curation: Carrara, E. R.; Oliveira, P. C. S.; Dias, L. C. C.
M.; Conceição, A. R. and Braga, P. H. S. Formal analysis: Carrara, E. R. and Costa, W. G. Funding
acquisition: Chizzotti, M. L.; Veroneze, R. and Schultz, E. B. Investigation: Carrara, E. R. and Veroneze,
R. Methodology: Carrara, E. R.; Oliveira, P. C. S.; Costa, W. G. and Schultz, E. B. Project administration:
Chizzotti, M. L. and Schultz, E. B. Supervision: Veroneze, R. and Schultz, E. B. Writing – original
draft: Carrara, E. R.; Oliveira, P. C. S.; Veroneze, R. and Schultz, E. B. Writing – review & editing:
Carrara, E. R.; Oliveira, P. C. S.; Dias, L. C. C. M.; Costa, W. G.; Conceição, A. R.; Braga, P. H. S.; Chizzotti,
M. L.; Veroneze, R. and Schultz, E. B.
Acknowledgments
We acknowledge the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, Process:

e Tecnológico (CNPq, grant number 312454/2022-8); and Coordenação de Aperfeiçoamento de
Pessoal de Nível Superior (CAPES, PROEX grant number 32002017011P9).
References
Brandl, N. and Jørgensen, E. 1996. Determination of live weight of pigs from dimensions measured using image analysis.
Computers and Electronics in Agriculture 15:57-72. https://doi.org/10.1016/0168-1699(96)00003-8
Breiman, L. 2001. Random Forests. Machine Learning 45:5-32. https://doi.org/10.1023/A:1010933404324
Chen, X. and Ishwaran, H. 2012. Random forests for genomic data analysis. Genomics 99:323-329. https://doi.
org/10.1016/j.ygeno.2012.04.003
Duc, N. T.; Ramlal, A.; Rajendran, A.; Raju, D.; Lal, S. K.; Kumar, S.; Sahoo, R. N. and Chinnusamy, V. 2023. Image-based
phenotyping of seed architectural traits and prediction of seed weight using machine learning models in soybean.
Frontiers in Plant Science 14:1206357. https://doi.org/10.3389/fpls.2023.1206357
R. Bras. Zootec., 53:e20240001, 2024
Comparison of supervised machine learning and variable selection methods for body weight prediction of growth...
Carrara et al.
11
Fernandes, A. F. A.; Dórea, J. R. R.; Fitzgerald, R.; Herring, W. and Rosa, G. J. M. 2019. A novel automated system to acquire
biometric and morphological measurements and predict the body weight of pigs via 3D computer vision. Journal of
Animal Science 97:496-508. https://doi.org/10.1093/jas/sky418
Friedman, J. H.; Hastie, T. and Tibshirani, R. 2010. Regularization paths for generalized linear models via coordinate
descent. Journal of Statistical Software 33:1-22. https://doi.org/10.18637/jss.v033.i01
Gorczyca, M. T.; Milan, H. F. M.; Maia, A. S. C. and Gebremedhin, K. G. 2018. Machine learning algorithms to predict
core, skin, and hair-coat temperatures of piglets. Computers and Electronics in Agriculture 151:286-294. https://doi.
org/10.1016/j.compag.2018.06.028
He, Y.; Tiezzi, F.; Howard, J. and Maltecca, C. 2021. Predicting body weight in growing pigs from feeding behavior data
using machine learning algorithms. Computers and Electronics in Agriculture 184:106085. https://doi.org/10.1016/j.
compag.2021.106085
Hoerl, A. E. and Kennard, R. W. 1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics
12:55-67. https://doi.org/10.1080/00401706.1970.10488634
Kuhn, M. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28:1-26.
https://doi.org/10.18637/jss.v028.i05
Li, Z.; Luo, C.; Teng, G. and Liu, T. 2014. Estimation of pig weight by machine vision: a review. p.42-49. In: Li, D. and
Chen, Y. (eds). Computer and Computing Technologies in Agriculture VII. CCTA 2013. IFIP Advances in Information and
Communication Technology, v. 420. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54341-8_5

Nguyen, Q. T.; Fouchereau, R.; Frénod, E.; Gerard, C. and Sincholle, V. 2020. Comparison of forecast models of production
of dairy cows combining animal and diet parameters. Computers and Electronics in Agriculture 170:105258.
https://doi.org/10.1016/j.compag.2020.105258
Oliveira, L. F.; Lopes, P. S.; Dias, L. C. C. M.; Silva, L. M. D.; Silva, H. T.; Guimarães, S. E. F.; Marques, D. B. D.; da Silva, D. A. and
Veroneze, R. 2023. Estimation of genetic parameters and inbreeding depression in Piau pig breed. Tropical Animal Health
and Production 55:14. https://doi.org/10.1007/s11250-022-03428-9
Sant’Ana, D. A.; Pache, M. C. B.; Martins, J.; Soares, W. P.; de Melo, S. L. N.; Garcia, V.; Weber, V. A. M.; Heimbach, N. S.; Mateus,
R. G. and Pistori, H. 2021. Weighing live sheep using computer vision techniques and regression machine learning.
Machine Learning with Applications 5:100076. https://doi.org/10.1016/j.mlwa.2021.100076
Silva, H. T.; Silva, F. F.; Ferreira, A. S.; Veroneze, R. and Lopes, P. S. 2019. Evaluation of Bayesian models for analysis of
crude protein requirement for pigs of Brazilian Piau breed. Scientia Agricola 76:208-213. https://doi.org/10.1590/1678-
992x-2017-0256
Tibshirani, R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B
(Methodological) 58:267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Tretyakova, O.; Degtyar, A.; Avdeyev, A.; Ovchinnikov, D. and Morozyuk, I. 2021. Features of the growth and development
of young pigs of various breeding. E3S Web Conferences 273:02012. https://doi.org/10.1051/e3sconf/202127302012
Yu, H.; Lee, K. and Morota, G. 2021. Forecasting dynamic body weight of nonrestrained pigs from images using an RGB-D
sensor camera. Translational Animal Science 5:txab006. https://doi.org/10.1093/tas/txab006
Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society
Series B: Statistical Methodology 67:301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Among seed attributes, weight is one of the main factors determining the soybean harvest index. Recently, the focus of soybean breeding has shifted to improving seed size and weight for crop optimization in terms of seed and oil yield. With recent technological advancements, there is an increasing application of imaging sensors that provide simple, real-time, non-destructive, and inexpensive image data for rapid image-based prediction of seed traits in plant breeding programs. The present work is related to digital image analysis of seed traits for the prediction of hundred-seed weight (HSW) in soybean. The image-based seed architectural traits (i-traits) measured were area size (AS), perimeter length (PL), length (L), width (W), length-to-width ratio (LWR), intersection of length and width (IS), seed circularity (CS), and distance between IS and CG (DS). The phenotypic investigation revealed significant genetic variability among 164 soybean genotypes for both i-traits and manually measured seed weight. Seven popular machine learning (ML) algorithms, namely Simple Linear Regression (SLR), Multiple Linear Regression (MLR), Random Forest (RF), Support Vector Regression (SVR), LASSO Regression (LR), Ridge Regression (RR), and Elastic Net Regression (EN), were used to create models that can predict the weight of soybean seeds based on the image-based novel features derived from the Red-Green-Blue (RGB)/visual image. Among the models, random forest and multiple linear regression models that use multiple explanatory variables related to seed size traits (AS, L, W, and DS) were identified as the best models for predicting seed weight with the highest prediction accuracy (coefficient of determination, R2=0.98 and 0.94, respectively) and the lowest prediction error, i.e., root mean square error (RMSE) and mean absolute error (MAE). Finally, principal components analysis (PCA) and a hierarchical clustering approach were used to identify IC538070 as a superior genotype with a larger seed size and weight. The identified donors/traits can potentially be used in soybean improvement programs
Article
Full-text available
This study is aimed at estimating genetic parameters, effective population size, inbreeding, and inbreeding depression for birth weight, weaning weight, and average pre-weaning daily weight gain (ADG) in Piau pigs. We used information from 3841 Piau pigs, and four linear models were fitted in single-trait analyses, including or excluding maternal genetic effect, common litter effect, or a combination. The adjustments of the models were compared using the likelihood ratio test, in which the model that presented the best fit for each trait was used to estimate the (co)variance components. The inbreeding depression effect was evaluated using a linear model that included the fixed effects of sex, parity order, contemporary group, and inbreeding coefficient as a fixed covariate. The weights at birth and weaning showed low direct heritabilities (0.08 and 0.05, respectively), while the ADG showed moderate heritability (0.20). The weight at birth showed high genetic correlations with the weight at weaning (0.90) and the ADG (0.82). The weight at weaning and the ADG also showed a high genetic correlation (0.99). There was an inbreeding increase over the generations and a reduction in the effective population size. In the last generation evaluated, all the animals were inbred, the average inbreeding coefficient was 0.07, and the effective population size was 20.8. A significant inbreeding effect on ADG was observed, where an increase of 1% in the inbreeding coefficient resulted in a decrease of 0.005 g in the ADG. Thus, increasing effective population size is mandatory for controlling inbreeding and reducing the loss of variability in this Piau pig population.
Article
Full-text available
The indicators of growth and development, fattening and meat qualities of pigs of the breeding center “Lozovoe” CJSC “Plemzavod-Yubileyny” of the Tyumen region were evaluated. The indicators that characterize the growth and development of young animals were taken into account: live weight, age, average daily growth. When the live weight of 100 kg was reached, an ultrasound device was used to evaluate the thickness of bacon, the depth of muscles and the yield of lean meat, which are in the database of breeding records for 2011-2020. To characterize the meat qualities, 1144 Landrace piglets were slaughtered, 275 - large white breed, 129 - Pietren breed, 339 hybrids (LxKB), 159 hybrids (LxD), 460 hybrids obtained from boars of foreign selection. A comparative analysis of commercial hybrids of various variants of crossing pigs of domestic and foreign selection is carried out. Processing of the research results was carried out in the laboratory of Molecular diagnostics and Biotechnology of the Don State Agrarian University. The influence of the breed is established.
Article
Full-text available
This research arose from the need to aggregate computer vision technology and machine learning in sheep weight control and facilitate the weighing process of animals in farms. The experiment was conducted to collect the images of the animals and their weights, and later, the annotations of the images were made, generating a mask image dataset. We selected the attribute extraction algorithms that extracted shape, size, and angles with k-curvature. With these extracted data, we used the stratified five-fold cross-validation. Also, we used eight machine learning techniques aimed at regression, and the result obtained when compared to the metric Adjusted R² was the technique called Random Forest Regressor to obtain Adjusted 0.687 (0.09) and MAE of 3.099 (1.52) kilograms. By performing the ANOVA test to check if it is statistically relevant using the Adjusted measure, we got a p-value of 0.00000807 (8.07e−06). The contribution of the work is sheep weight prediction in a non-invasive way using images. Therefore, the results achieved make it possible to measure the animal’s weight with an MAE of 3.099 kg.
Article
Full-text available
Average daily gain is an indicator of the growth rate, feed efficiency, and current health status of livestock species including pigs. Continuous monitoring of daily gain in pigs aids producers to optimize their growth performance while ensuring animal welfare and sustainability, such as reducing stress reactions and feed waste. Computer vision has been used to predict live body weight from video images without direct handling of the pig. In most studies, videos were taken while pigs were immobilized at a weighing station or feeding area to facilitate data collection. An alternative approach is to capture videos while pigs are allowed to move freely within their own housing environment, which can be easily applied to production system as no special imaging station needs to be established. The objective of this study was to establish a computer vision system by collecting RGB-D videos to capture top-view RGB and depth images of non-restrained, growing pigs to predict their body weight over time. Over a period of 38 days, eight growers were video recorded for approximately 3 min per day, at the rate of six frames per second, and manually weighed using an electronic scale. An image-processing pipeline in Python using OpenCV was developed to process the images. Specifically, each pig within the RGB frame was segmented by a thresholding algorithm, and the contour of the pig was identified to extract its length and width. The height of a pig was estimated from the depth images captured by the infrared depth sensor. Quality control included removing pigs that were touching the fence and sitting, as well as those showing extremely distorted shape or motion blur owing to their frequent movement. Fitting all of the morphological image descriptors simultaneously in linear mixed models yielded prediction coefficients of determination of 0.72–0.98, 0.65–0.95, 0.51–0.94, and 0.49–0.93 for one-, two-, three-, and four-day ahead forecasting, respectively, of body weight in time series cross-validation. Based on the results, we conclude that our RGB-D sensor-based imaging system coupled with the Python image-processing pipeline could potentially provide an effective approach to predict the live body weight of non-restrained pigs from images.
Article
Full-text available
We study the effect of nutritional diet characteristics on the lactating Holstein-Friesian dairy cows in Brittany, France from 36 individuals. An analysis of the relations between fat/protein content and milk yield was implemented for our dataset. The fat and protein production increase at a slower rate as milk yield increases. The importance of chemical composition on milk production is studied using the linear model. The data analysis confirms the importance of Starch, crude fiber, and protein which have a positive effect on milk production. This analysis also confirms the previous study on the effect of parity on the production. After that, the milk production forecasting is investigated using both linear models and machine learning approaches (support vector machine, random forest, neural network). We study the performance of multiple linear regression and machine learning-based models in both non-autoregressive and autoregressive cases at the individual level. The autoregressive models, which take into account the previously observed milk yield, have proven to significantly outperform the non-autoregressive approaches. Moreover, the computational cost of each approach is presented in the paper. While the random forest algorithm gives the best performance in both non-autoregressive and autoregressive approaches. The support vector machine algorithm gives a very close performance with a substantial less computing time. The support vector machine is shown to be the best compromise between accuracy and computational cost.
Article
Full-text available
We evaluated the inclusion of information on genetic relationship into the analysis of crude protein requirement in diets for pigs of Brazilian Piau breed, using Bayesian inference. The animals were assigned to treatments in a completely randomized design in factorial scheme 4 × 2 (crude protein levels × sex) with 12 repetitions per treatment. The evaluations were carried out in the initial, growing and finishing phases, and after slaughter. The traits evaluated were feed conversion (FC), backfat thickness (BF), daily weight gain (DWG), daily feed intake (DFI) and some carcass cuts. Three models were considered to evaluate the inclusion of information on genetic relationship into the analysis: Model I, a simple linear model; Model II, the same effects of Model I with addition of the independent random effect of animal; and Model III, the same effects of Model II, but including the genetic relationship between the animals. Model III presented the best fit and was considered for later inferences. Crude protein (CP) levels did not significantly influence any of the evaluated traits. The effect of sex was significant only for the growing phase, while its interaction with protein levels presented an opposite result for all evaluated traits. Additionally, CP levels of 10.2 %, 9.6 % and 9.0 % can be used in diets for pigs of Brazilian Piau breed in the initial, growing and finishing phases, respectively.
Article
A timely and accurate estimation of body weight in finishing pigs is critical in determining profits by allowing pork producers to make informed marketing decisions on group-housed pigs while reducing labor and feed costs. This study investigated the usefulness of feeding behavior data in predicting the body weight of pigs at the finishing stage. We obtained data on 655 pigs of three breeds (Duroc, Landrace, and Large White) from 75 to 166 days of age. Feeding behavior, feed intake, and body weight information were recorded when a pig visited the Feed Intake Recording Equipment in each pen. Data collected from 75 to 158 days of age were split into six slices of 14 days each and used to calibrate predictive models. LASSO regression and two machine learning algorithms (Random Forest and Long Short-term Memory network) were selected to forecast the body weight of pigs aged from 159 to 166 days using four scenarios: individual-informed predictive scenario, individual- and group-informed predictive scenario, breed-specific individual- and group-informed predictive scenario, and group-informed predictive scenario. We developed four models for each scenario: Model_Age included only age, Model_FB included only feeding behavior variables, Model_Age_FB and Model_Age_FB_FI added feeding behavior and feed intake measures on the basis of Model_Age as predictors. Pearson’s correlation, root mean squared error, and binary diagnostic tests were used to assess predictive performance. The greatest correlation was 0.87, and the highest accuracy was 0.89 for the individual-informed prediction, while they were 0.84 and 0.85 for the individual- and group-informed predictions, respectively. The least root mean squared error of both scenarios was about 10 kg. The best prediction performed by Model_FB had a correlation of 0.83, an accuracy of 0.74, and a root mean squared error of 14.3 kg in the individual-informed prediction. The effect of the addition of feeding behavior and feed intake data varied across algorithms and scenarios from a small to moderate improvement in predictive performance. We also found differences in predictive performance associated with the time slices or pigs used in the training set, the algorithm employed, and the breed group considered. Overall, this study’s findings connect the dynamics of feeding behavior to body growth and provide a promising picture of the involvement of feeding behavior data in predicting the body weight of group-housed pigs.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Article
Computer vision applications in livestock are appealing since they enable measurement of traits of interest without the need to directly interact with the animals. This allows the possibility of multiple measurements of traits of interest with minimal animal stress. In the current study, an automated computer vision system was devised and evaluated for extraction of features of interest, as body measurements and shape descriptors, and prediction of body weight in pigs. From the 655 pigs that had data collected 580 had more than 5 frames recorded and were used for development of the predictive models. The cross-validation for the models developed with data from nursery and finishing pigs presented an R² ranging from 0.86 (random selected image) to 0.94 (median of images truncated on the 3rd quartile). While, with the dataset without nursery pigs, the R² estimates ranged from 0.70 (random selected image) to 0.84 (median of images truncated on the 3rd quartile). However, overall the mean absolute error was lower for the models fitted without data on nursery animals. From the body measures extracted from the image, body volume, area and length were the most informative for prediction of body weight. The inclusion of the remaining body measurements (width and heights) or shape descriptors to the model promoted significant improvement of the predictions, while the further inclusion of sex and line effects were not significant.