ArticlePDF Available

Principal component regression analysis in lifetime milk yield prediction of crossbred cattle strain Vrindavani of North India

Authors:

Abstract

The study aims to devise most appropriate prediction model for lifetime milk production of Vrindavani, a crossbred cattle strain developed and maintained at Institute, based on principal components formulated on initially expressed part lactation records as predictors. Principal components (PCs) were derived on a data set pertaining to 10-year period (1999–2009). Part lactation records of 100, 170 and 240 days of first and second lactations and their respective total milk yields were used. Using principal component regression analysis (PCRA), the principal components were used as predictors for predicting lifetime milk yield as total milk yield up to 4 lactations (LTMY4), and up to 5 lactations (LTMY5). Eight types of model were fitted to identify the best fitted model for both the traits (LTMY4 and LTMY5) with the first principal component (PC_1) as a predictor. The equation LTMY4=4410.305+0.596PC_1-1.171PC_3 and LTMY5 = 7987.560– 2.301PC_3 explained 54.46% and 39.74% variation in the estimated values. The curve estimation analysis showing the appropriateness of power function was the most appropriate model for both the lifetime traits. The model LTMY4=32.609 (PC_1)0.671 was found best fit and explained 53.50% variation in estimation. In the other lifetime trait, model LTMY5==103.769 (PC_1)0.566, explained 38.40% variation in estimated values. These prediction equations may be helpful in selection at an early stage of Vrindavani cattle based on early part lactation records. Key words: Vrindavani cattle, Lifetime milk production, Part lactation, Principal component regression analysis
48
Present address: 1Senior Scientist (takhan@ivri.res.in),
2, 4Principal Scientist 2(drakstomar@gmail.com, 4bhushan.drbharat
@gmail.com), 3Jt. Director (jdee@ivri.up.nic.in).
India with 210.2 million head of cattle contributes to
14.7% of the total cattle population of the world. The
country’s per capita milk availability in 2010–11 was at 281
g/day (compared to the World average of 279.4 g/day) (BAHS
2012). To increase milk availability, efficient planning and
management tools are to be developed. Principal component
analysis (PCA) is a mathematical procedure that transforms
a number of (possibly) correlated variables into a (smaller)
number of uncorrelated variables called principal components
(Pearson 1901, Lukibisi et al. 2008a, Hotelling, 1933, Rao
1964). Chapman et al. (2001) used principal component
analysis for sensory characterization of ultra pasteurized milk
quality and found as much as 81.0% variation in estimation.
Rugoor et al. (2000) investigated advantages and
disadvantages of principal components regression (PCR) and
partial least squares (PLS) for livestock management
research. Bhatacharya and Gandhi (2005) compared multiple
regression analysis and principal components to predict
lifetime production of Karan Fries cattle and found advantage
of principal components analysis.
Indian Journal of Animal Sciences 83 (12): 1288–1291, December 2013/Article
Principal component regression analysis in lifetime milk yield prediction of
crossbred cattle strain Vrindavani of North India
T A KHAN1, A K S TOMAR2, TRIVENI DUTT3 and BHARAT BHUSHAN4
Indian Veterinary Research Institute, Izatnagar, Uttar Pradesh 243 122 India
Received: 20 April 2013; Accepted: 28 August 2013
ABSTRACT
The study aims to devise most appropriate prediction model for lifetime milk production of Vrindavani, a crossbred
cattle strain developed and maintained at Institute, based on principal components formulated on initially expressed
part lactation records as predictors. Principal components (PCs) were derived on a data set pertaining to 10-year period
(1999–2009). Part lactation records of 100, 170 and 240 days of first and second lactations and their respective total
milk yields were used. Using principal component regression analysis (PCRA), the principal components were used as
predictors for predicting lifetime milk yield as total milk yield up to 4 lactations (LTMY4), and up to 5 lactations
(LTMY5). Eight types of model were fitted to identify the best fitted model for both the traits (LTMY4 and LTMY5)
with the first principal component (PC_1) as a predictor. The equation LTMY4=4410.305+0.596PC_1-1.171PC_3 and
LTMY5 = 7987.560– 2.301PC_3 explained 54.46% and 39.74% variation in the estimated values. The curve estimation
analysis showing the appropriateness of power function was the most appropriate model for both the lifetime traits. The
model LTMY4=32.609 (PC_1)0.671 was found best fit and explained 53.50% variation in estimation. In the other lifetime
trait, model LTMY5==103.769 (PC_1)0.566, explained 38.40% variation in estimated values. These prediction equations
may be helpful in selection at an early stage of Vrindavani cattle based on early part lactation records.
Key words: Vrindavani cattle, Lifetime milk production, Part lactation, Principal component regression analysis
Lifetime production is an important economic parameter
when defining the breeding objective. Khan et al. (2012),
formulated the lifetime prediction model based on birth
weight, 5 initially expressed reproductive traits age at first
calving (AFC), first service period (FSP), first dry period
(FDP), first calving interval (FCI) and first lactation length
(FLL)) and part lactation records of 100, 170 and 240 days
of first lactation, second lactation and their respective total
milk yields and explained 40.32% variation in estimated
lifetime yields (total of first 4 lactations) in Vrindavani cattle.
Malhotra and Singh (1980) predicted lifetime production
(total milk yield in the first 3 lactations) for Red Sindhi cows
on the basis of traits available in early life. Puri and Sharma
(1965) studied first lactation yield and age at first calving on
lifetime production and determines the relative importance
of them for selection purposes in Red Sindhi and crossbred
cows. Shinde et al. (2010) predicted lifetime milk production
up to third lactation in Phule Triveni cows. Vrindavani cattle
is a synthetic crossbred cattle strain with exotic inheritance
(50 to 75%) of Holstein-Friesian, Brown Swiss, Jersey and
indigenous inheritance (25 to 50%) of Hariana cattle,
developed at Indian Veterinary Research Institute, Izatnagar
(Singh et al. 2011). The climate of Izatnagar is semi tropical
December 2013] LIFETIME MILK YIELD PREDICTION OF CROSSBRED CATTLE 1289
49
and conditions are hot humid for many months of the year,
especially during summer and monsoon seasons.
Since data on traits expressed by animals are correlated
and huge in size, present study was initiated to formulate
prediction models for lifetime production based on principal
components formulated on initially expressed part lactation
records as predictors. These models may be used in early
selection of animals.
MATRIALS AND METHODS
Ten year data records (1/4/1999 to 31/3/2009) of
Vrindavani were used to formulate lifetime prediction
models. The data set consisted of parity-wise part milk yields
of 100, 170 and 240 days in first lactation (MY100_1,
MY170_1, MY240_1), in second lactation (MY100_2,
MY170_2, MY240_2) and their respective total milk yields
(TLMY1, TLMY2) along with total milk yield of first 2
lactations (LTLMY2) and principal components (PCs) were
derived using the princomp procedure of SAS 9.2. Principal
component regression analysis (PCRA) was performed to
predict lifetime milk yield up to 4 (LTMY4) and 5 parities
(LTMY5) using principal components as predictors (Draper
and Smith 1986, Proc Reg procedure, SAS 9.2).
Eigen values (> 1) and scree plot methods were used to
identify the principal components to be retained as predictors.
Residual and diagnostics plots were examined to find out
appropriate fitted model (SAS Inc 2009).
Curve estimation analysis was undertaken to know the
appropriateness of the models (lifetime yields with first
principal as predictor). Eight types of models, viz. linear,
quadratic, cubic, logarithmic, inverse, power, S-curve, and
exponential were fitted for prediction of LTMY4 and
LTMY5. The adequacy of the best fitted model was adjudged
based on adjusted R2-value and significance of coefficients
(Draper and Smith 1986, SPSS Inc. 2003).
RESULTS AND DISCUSSION
Principal component analysis could retain following first
3 components explained 93.08% variation of the original
variables. The first principal component showed 61.86%
variation followed by second principal component (26.14%)
and third principal component (5.09%) (Table 1; Fig. 1). Here
first 2 PCs had eigen values more than one, but scree plot
showed the appropriateness of first 3 components (clearly
as curve bend at PC_3). So, first 3 PCs were retained to be
used as predictors.
Table 1. Eigen values and proportion of the variance of principal
components (PC) of the correlation matrix of original variables
PC Eigen value Difference Proportion Cumulative
1 5.5674 3.2149 0.6186 0.6186
2 2.3525 1.8949 0.2614 0.8800
3 0.4577 0.1502 0.0509 0.9308
4 0.3074 0.1673 0.0342 0.9650
5 0.1401 0.0439 0.0156 0.9806
6 0.0962 0.0544 0.0107 0.9913
7 0.0419 0.0051 0.0047 0.9959
8 0.0368 0.0368 0.0041 1.0000
Fig. 1. Scree plots of Principal components formulated in
Vrindavani cattle part milk yield data
The following 3 principal components (PCs) were
retained:
PC_1 = 0.316 MY100_1 + 0.331 MY_170_1 + 0.331
MY240_1 +0.313TLMY1+ 0.309 MY100_2 + 0.336
MY170_2+ 0.346 MY240_2+ 0.312 TLMY2+ 0.398
LTMY2
PC_2 = 0.329 MY100_1 + 0.368 MY_170_1 + 0.381
MY240_1 + 0.350 TLMY1 -.339 MY100_2 -.357
MY170_2 -.341 MY240_2 -.360 TLMY2 -.016
LTMY2
PC_3 = 0.430 MY100_1 + 0.245 MY_170_1 + 0.046
MY240_1 -.442 TLMY1+ 0.398 MY100_2 + 0.199
MY170_2+ 0.051 MY240_2 -.336 TLMY2 -.494
LTMY2
The PCRA for lifetime milk yield (LTMY4) evolved
equation, LTMY4=4410.305**+0.596*PC_1-1.171*PC_3 and
could explained 54.46% variation in the estimated values
with adjusted R2= 53.20% (*-Significant p<0.05, **-
significant P<0.01).
1290 KHAN ET AL. [Indian Journal of Animal Sciences 83 (12)
50
In case of lifetime milk yield prediction up to 5 parities
(LTMY5) the following evolved equation, LTMY5 =
7987.560**- 2.301**PC_3 could explained 39.74% variation
in the estimated values of lifetime milk yield with adjusted
R2= 38.92%.
Fit diagnostics for LTMY4 and LTMY5 showed goodness
of fit of the models. The quantile plot and histogram of
residuals indicated observance of normality assumption.
Residual plots indicated absence of heteroscedasticity
(among the PCs). The observed values verses the predicted
values plot indicated a good fit for the model. The fit-mean
residual plot indicated that the fitted model accounted for a
good deal of variability in both LTMY4 and LTMY5.
Curve estimation: The regression coefficients along with
standard errors of the fitted models (lifetime yields-Y with
first principal component-PC_1, as predictor) are presented
in Table 2. The value of adjusted R2, by the models for
LTMY4 showed maximum variation (53.50%) by power
function with minimum in inverse function (47.00%). Since
in power function, regression coefficient was highly
significant (P<0.01) and had maximum adjusted R2, the
function was decided to be more appropriate for estimating
the LTMY4, though intercept was not significant (Fig. 2).
Same trend was observed for LTMY5 and values of adjusted
R2, by the fitted models showed major variation (39.30%)
explained by power function and minimum by inverse
function (34.93%). So in both the lifetime traits (LTMY4
and LTMY5) the power function was more appropriate. The
power function explained 38.40% variation in estimated
values of LTMY5, which was found more appropriate
(Fig. 3).
The study was to show the use of principal components
Table 2. Significance of coefficients of fitted models and respective adjusted R2- values
Model LTMY4 LTMY5
b0b1b2b3Adj. R2b0b1b2b3Adj. R2
Linear 4250.0** 1.178** - - 50.00 6609.843** 1.295** - - 38.30
(973.143) (0.136) (1352.684) (0.189)
Quadratic –127.389NS 2.476* –9.2E–05NS - 50.50 4106.786NS 2.037NS –5E–05NS - 37.30
(3472.125) (0.997) (7E–05) (4874.072) (1.400) (1E–04
Logarithmic –56941.458** 7862.887** - - 50.70 –60017.832** 8570.997** - - 38.20
(7912.128) (895.449) (11084.718) (1254.505)
Cubic 5425.791NS –0.217NS 3E–04NS –2E–08NS 50.10 3143.927NS 2.505NS –1E–04NS 4E–09NS 36.80
(9216.147) (4.257) (0.001) (3E–08) (12975.327) (5.993) (0.001) (4E–08)
Inverse 19333.760** –4562588.172** - - 47.00 23080.531** –4942712.185** - - 34.90
(859.087) (558658.324) (1191.144) (7745930.581)
Power 32.609NS 0.671** - - 53.50 103.769NS 0.566** - - 39.30
(20.860) (0.072) (74.238) (0.081)
S-curve 10.007** –3959.246** - - 51.30 10.138** –3313.178** - - 37.10
(0.069) (445.942) (0.076) (496.106)
Exponent 6122.418** 9.9E–05** - - 51.30 8519.867** 8E–05** - - 38.40
(489.643) (1.1E–05) (750.036) (1E–05)
Figures in parenthesis are standard error of coefficient. *,>significant (P<0.05); **, > significant (P<0.01). NS, > nonsignificant.
Fig. 3. Appropriateness of power function in estimation of
LMMYS (kg).
Fig. 2. Appropriateness of power function in estimation of
LTMY4 (kg).
December 2013] LIFETIME MILK YIELD PREDICTION OF CROSSBRED CATTLE 1291
51
regression analysis as PCs are orthogonal contrasts, free from
problem of multicolliniarity. Since expression of animals
traits on growth, production and reproduction is very much
complex and correlated, the information generated out of all
the traits studied, can be included as null hypothesis as their
contribution towards the lifetime production. We may
formulate the principal components to reduce the data (into
components) with the variability explained in the original
set of observed traits (variables), as discussed by many
workers (Hotelling 1933, Chapman et al. 2001, Rugoor et
al. 2000).
PCRA with step-wise procedure, showed fitted models for
LTMY4 and LTMY5 with the 3 predictors (PC_1, PC_2 and
PC_3): LTMY4=4410.305**+0.596*PC_1-1.171*PC_3 and
LTMY5 = 7987.560**- 2.301**PC_3 and could explain
54.46% and 39.74% variation in the estimated values of
LTMY4 and LTMY5, respectively. Curve estimation analysis
showed that power function was the most appropriate model
for both the lifetime traits. The model LTMY4=32.609
(PC_1)0.671 was found best fit and explained 53.50% variation
in estimation. The relationships LTMY5==103.769
(PC_1)0.566, explained 38.40% variation in estimated values
of LTMY5. Since every biological phenomenon is curvilinear
in nature and the variation explained in estimation of LTMY4
and LTMY5 by step-wise procedure and curve estimation are
at par, these models are recommended as best fitted models.
In both the traits, the regressions were highly significant
(P<0.01) whereas intercepts were not significant (as the curve
is crossing y-axis nearby the origin as observed in Figs 2, 3).
Khan et al. (2012), found 40.32% variation in estimated life
time yields (total of first 4 lactations- LTMY4) with initial
growth, reproduction, part lactation records with step-wise
procedure of regression analysis in Vrindavani cattle. Whereas
principal components based on only part lactation records
could estimate 54.46% variation in estimation in the same
crossbred strain. Bhatacharya and Gandhi, (2005) have
compared multiple regression analysis and principal
components analysis to predict lifetime milk production and
found that total variance was lower from the model having
PCs as compared to original variables in the regression model.
This showed the importance of principal component
regression analysis (PCRA) in estimation of lifetime
production traits. So, it is concluded that fitted prediction
model LTMY4=32.609 (PC_1)0.671 and LTMY5==103.769
(PC_1)0.566, may be helpful in early selection of Vrindavani
cattle based on initial part lactation records.
ACKNOWLEDGEMENT
Authors are thankful to Director and Incharge Livestock
Production and Management Section, IVRI, for constant
encouragement and providing facilities
REFERENCES
Basic Animal Husbandry Statistics. 2012. Department of Animal
Husbandry Dairying and Fisheries. Ministry of Agriculture,
Government of India. Krishi Bhavan, New Delhi.
Bhatacharya T K and Gandhi R S. 2005. Principal components
versus multiple regression analysis to predict lifetime production
of Karan Fries cattle. Indian Journal of Animal Sciences 75
(11): 1317–20.
Chapman K W, Lawless H T and Boor K J. 2001. Quantitative
descriptive analysis and principal component analysis for
sensory characterization of ultrapasteurized milk. Journal of
Dairy Science 84: 12–20.
Draper N R and Smith H. 1966. Applied Regression Analysis. Pp.
407. Wiley Press, New York, USA.
Hotelling H. 1933. Analysis of the complex of statistical variables
into principal components. Journal of Educational Psychology
24: 417–41, 498–520.
Khan T A, Tomar A K S and Dutt Triveni. 2012. Prediction of
lifetime milk production in synthetic crossbred cattle strain
Vrindavani of North India. Indian Journal of Animal Sciences
82: 1367–71.
Lukibisi F B, Muhuyi W B, Muia J M K, Ole Sinkeet S N and
Wekesa W F. 2008a. Statistical use and Interpretation of
Principal Component Analysis in Applied Research. Egerton
University’s 3rd Annual Research Week and International
Conference. 16–18 September, 2008.
Malhotra P K and Singh R P. 1980. Estimation of life-time
production in Red Sindhi cattle using ridge-trace criterion.
Indian Journal of Animal Sciences 50(3): 215–18.
Puri T R and Sharma K N S. 1965. Prediction of lifetime production
on basis of first lactation yield and age at first calving for
selection of dairy cattle. Journal of Dairy Science 48(4): 462–
67.
Panda K K, Meheta R K and Das B C. 2006. Prediction of total
lactation milk yield based on most frequent daily milk yield
and highest daily milk yield of a month in Sahiwal cows. Indian
Journal of Animal Sciences 76 (10): 851–52.
Pearson K. 1901. On lines and planes of closet fit to a system of
points in space. Philosophical Magazine 2: 557–72.
Rao C R. 1964. The use and interpretation of principal component
analysis in applied research. Sankhya A 26 : 329–58.
Rougoor C W, Sundaram R, van Arendonk J A M. 2000.
The relation between breeding management and 305–day
milk production determined via principal components regression
and partial least squares. Livestock Production Science 66: 71–
83.
Singh R R, Dutt Triveni, Kumar Amit, Tomar A K S and Singh
Mukesh. 2011. On-farm characterization of Vrindavani cattle
in India. Indian Journal of Animal Sciences 81(3): 267–71.
Shinde N V, Mote M G, Khutal B B and Jagtap D Z. 2010. Prediction
of lifetime milk production on the basis of lactation traits in
Phule Triveni crossbred cattle. Indian Journal of Animal
Sciences 80(10): 968–88.
SPSS Inc. 2003 User’s Guide. SPSS Inc., Chicago.
SAS Institute Inc. 2009. SAS/STAT ® 9.2User’s Guide. 2nd edn.
Cary, NC: SAS Institute Inc.
... El objetivo fundamental de la selección es mejorar el valor económico total o genotipo agregado (H) de la población, que se define para cada animal como la suma de cada uno de los genotipos para los caracteres que se seleccionaron, ponderando cada uno por su valor económico relativo (Hazel y Lush 1942y Hazel 1943. Actualmente, para conformar los índices se usa el procedimiento de los componentes principales con los valores genéticos, como lo han aplicado Bignardiet al. (2012) y Khan et al. (2013). En Cuba lo han empleado también Hernández y Ponce de León (2018León ( , 2020. ...
... Tanto con el IML como con el multicarácter, que contemplaría un genotipo agregado con más de dos caracteres, es preferible su aplicación donde no existan poblaciones de grandes dimensiones y de amplio historial genealógico. Existen diversas formas o métodos de llevar a cabo la selección y realizar las evaluaciones con programas específicos y complejos (ASREML de Gilmour et al. 2003) o llevar a cabo una selección más precisa con el ajuste por covarianza de la curva de crecimiento a partir de los pesajes realizados a otras edades durante la prueba o proceder a conformar los índices mediante el procedimiento de los componentes principales de los valores genéticos, como señalan Bignardi et al. (2012) y Khan et al. (2013). ...
Article
Full-text available
Performance tests were carried out on 241 male buffaloes from the Empresa Pecuaria Genética Los Naranjos, in Cuba, during the years 2011/2012 and 2017. The evaluated indicators were weight at weaning at eight months, final weight at 20 months of age, weight gain from eight to 20 months, weight per age at 20 months and milk production of their mothers at 244 days, as well as the percentage values, calculated from their respective annual means. A mixed model was used (Proc Glimmix of SAS 2013), which considered the year of entry into testing as a fixed effect, and the individual nested in the year of entry as random effect. The genetic values (GV) with their precision were estimated using a two-character model, compiled in IML (Interactive Matrix Language) of SAS. This considered genealogy (250 males and females), which, together with the 241 observations of tested animals, formed a kinship matrix of 491 individuals. Weaning (136.43 and 151.92 kg) and final (285.39 and 333.35 kg) weights were low, while the production of their mothers was acceptable (944.04 and 1,135.42 kg). It is concluded that the selection index, constructed from the variances and covariances of phenotypic and genetic values, economically weighted as a regression of the observed deviations from their means of weaning and final weight, is a reliable method for the selection of buffaloes in performance tests.
... Since expression of animals traits on growth, production and reproduction is very much complex and correlated, the information generated out of all the traits studied, can be included as null hypothesis as their contribution towards the lifetime production. We may formulate the principal components to reduce the data (into components) with the variability explained in the original set of observed traits (variables), as discussed by many workers (Khan et al., 2013;Hotelling 1933;Chapman et al., 2001;Rugoor et al., 2000). ...
Article
The study aims to devise most appropriate prediction model for lifetime milk production of Jaffarabadi Buffalo, based on principal components formulated on initially expressed lactation records as predictors. Lactation milk yield, lactation period and peak milk yield records of first, second and third lactations of animals under study were used of 24 years (1987 to 2010). Principal components (PCs) were derived from data set using principal component regression analysis (PCRA), the principal components were used as predictors for predicting lifetime milk yield (LTMY). Multiple linear regression models were fitted to identify the best fitted model for prediction of lifetime milk yield with the first principal component to all principal component as a predictor. The equation LTMY = 7825.8768+2.8118 (PC1) - 13.7098 (PC2) - 599.0908 (PC3) + 3.0266 (PC4) - 8.8196 (PC5) - 257.9315 (PC6) + 2.6042 (PC7) explained 98.9% variation in the estimated values with adjusted R2= 59.09% variation in the estimated values. The curve estimation analysis showing the appropriateness of first seven principal components as predictor was the most appropriate model for lifetime milk yield. These prediction equations may be helpful in selection at an early stage of Jaffarabadi Buffalo based on early part lactation records.
... During the period 2012-13 and 2013-14 following observations were added to inrease the milk yield per lactating animal. Khan et al. (2013) reported lifetime milk prediction of crossbred cattle srain Vrindavani of North India. ...
Article
Full-text available
The opening balance of Tharparkar cattle as on 01.04.2011 was 154 heads (30 males and 124 females). The M:F ratio of new calvings was 1.00:0.95. The closing balance of the Tharparkar cattle herd as on 31.03.2012 was 98 cattle heads (15 males and 83 females). The overall mortality percent in Tharparkar herd was 4.62%. The overall conception rate in Tharparkar herd was 64.78%. The figures in heifer and adult groups were 63.63 and 65.30%, respectively. The overall calving abnormalities were 34.15%, which included 7.32% abortions, 2.44% unseen abortinos, 7.32 retained placenta, 2.44% premature births, 9.76% prolapses and 4.88% still first calving, service period, dry period and calving, service period, dry period and calving interval were 1056.67±90.22, 230.34±20.63, 292.05±118.36 and 472.46±28.95 days, respectively.Tharparkar cattle produced 21679 kg milk during the current year. Means for overall wet and herd averages were 3.39 and 1.48 kg, respectively. On the basis of analysis of 384 milk samples, the overall Fat, SNF and Total Solids were 4.34, 8.77 and 13.12% respectively.The least squares’ means (LSM) for overall live body weights at birth, 3, 6, 12, 18 and 24 months of age were 21.27±0.48, 56.18±2.33, 108.52±3.27, 171.34±7.77, 230.79±2.56 and 260.06±3.98 kg, respectively. The opening balance as on 01/04/2010 was 143 heads (23 males and 120 females). The Tharparkar produced 38,001 kg milk during current yearwhich was 1,546 kg more than that of previous year (36,455.0 kg). Means for overall wet and herd averages were 3.86 and 1.50 kg, respectively, under suckling system. On an average, 28.25 of total adult females were in the milk during the year, 2010-11. On the basis of 574 milk samples the overall fat, SNF, and total solids % were 4.52, 8.84 and 13.36, respectively.
... The results obtained by Valsalan et al. 7 also indicate that the first two components accounted for a high variance with an R 2 value equal to 0.74. Khan et al. 13 also reported the first two principal components to show maximum variance (61.86% and 26.14%). The components explaining a majority of the variance can be used for selection and breeding, especially for the construction of selection indices 14 . ...
Article
Full-text available
As the amount of data on farms grows, it is important to evaluate the potential of artificial intelligence for making farming predictions. Considering all this, this study was undertaken to evaluate various machine learning (ML) algorithms using 52-year data for sheep. Data preparation was done before analysis. Breeding values were estimated using Best Linear Unbiased Prediction. 12 ML algorithms were evaluated for their ability to predict the breeding values. The variance inflation factor for all features selected through principal component analysis (PCA) was 1. The correlation coefficients between true and predicted values for artificial neural networks, Bayesian ridge regression, classification and regression trees, gradient boosting algorithm, K nearest neighbours, multivariate adaptive regression splines (MARS) algorithm, polynomial regression, principal component regression (PCR), random forests, support vector machines, XGBoost algorithm were 0.852, 0.742, 0.869, 0.915, 0.781, 0.746, 0.742, 0.746, 0.917, 0.777, 0.915 respectively for breeding value prediction. Random forests had the highest correlation coefficients. Among the prediction equations generated using OLS, the highest coefficient of determination was 0.569. A total of 12 machine learning models were developed from the prediction of breeding values in sheep in the present study. It may be said that machine learning techniques can perform predictions with reasonable accuracies and can thus be viable alternatives to conventional strategies for breeding value prediction.
... Since the expression of animal traits on growth, production and reproduction is very complex and correlated, the information generated out of all the traits studied can be included as a null hypothesis as their contribution to lifetime production. The principal components can be used to reduce the data (into components) with the variability explained in the original set of observed traits (variables), as discussed by many workers (Hotelling, 1933;Rugoor et al., 2000Chapman et al., 2001Khan et al., 2013). ...
Article
Full-text available
The objective of the research was to investigate the relationship among production traits i.e., lactation milk yield, lactation length and lactation peak milk yield of the first three lactations using principal component analysis and formulation of prediction equation to predict lifetime milk production in Gir cattle. Data were from multiparous dairy cows of the University farm. Principal component analysis with correlation matrix was used to find the relationship among lactation milk yield, lactation length and lactation peak milk yield of first three lactation and other fixed effects, including the year of calving, season and parity with random effect of sire. The principal components were fitted to identify the best-fitted model for predicting lifetime milk yield using all principal components as a predictor in different combinations. The first six principal components (first lactation milk yield, lactation length and peak milk yield, second lactation milk yield, lactation length and peak milk yield), explained 98% variation in the estimated values with adjusted R2= 59.85% variation in the estimated values. The curve estimation analysis revealed that the first six principal components as the predictor was the most fitting model for predicting lifetime milk yield. The prediction equation found most fitted will be useful for the selection of Gir cattle at an early stage of lactation.
... Valsalan et al., Artificial Intelligence Techniques for the Prediction of Body Weights in Sheep the first one alone explained about 66%. Khan et al., 2013 also reported the first two principal components to show maximum variance (61.86% and 26.14%). The components explaining a majority of the variance can be used for selection and breeding especially for the construction of selection indices (Ibe, 1989). ...
Article
Full-text available
Background: Artificial intelligence (AI) is transforming all spheres of life and it has the potential to revolutionize animal husbandry as well. In this regard, an attempt was made to compare two AI techniques for predicting 12-month body weights of animals; viz. Principal Component regression (PCR) and Ordinary Least Squares (OLS) for datasets of Corriedale sheep spanning 11 years. Methods: PCR models were trained by varying proportions of training and testing datasets. The dataset was subject to PCR before analysis and tested (PCA dataset). A separate dataset was also created by feature selection of the PCA (PCA+FS dataset) variables. Result: The highest correlation coefficients between test and predicted variables for two datasets (PCA dataset and PCA+FS dataset) created among the multiple models trained using PCR were 0.982 and 0.9741. In terms of error, R2 or correlation coefficient, the PCA dataset performed better than the PCA+FS dataset. The second principal component had the highest explained variance in OLS (86.16%) and the highest coefficient of determination (R2) using OLS was obtained for the PCA dataset viz. 0.980. It is concluded that both the algorithms tested in this study were satisfactorily trained in their prediction of the body weights with OLS performing better than PCA in terms of R2 value.
... Du et al. (2018) also endorsed the use of PCR in breeding value prediction in their study.In the regression analysis for the breeding value dataset, seven features explained nearly 95% of the variance. Our result agrees withValsalan et al. (2020) who reported that the inclusion of the rst 2 components accounted for a signi cant improvement in the amount of variance (R 2 = 74%).Pinto et al. (2006) also reported that the rst ve principal components explained nearly 93.3% of the variation, and the rst one alone explained about 66%.Khan et al. (2013) also reported the rst two principal components to show maximum variance (61.86% and 26.14%). The components explaining a majority of the variance can be used for selection and breeding, especially for the construction of selection indices(Ibe, 1989).The prediction equations derived for ordinary least squares had a moderate coe cient of determination. ...
Preprint
Full-text available
As the amount of data on farms grows, it is important to evaluate the potential of artificial intelligence for making farming predictions. Considering all this, this study was undertaken to evaluate various machine learning (ML) algorithms using 52-year data for sheep. Data preparation was done before analysis. Breeding values were estimated using Best Linear Unbiased Prediction. 13 ML algorithms were evaluated for their ability to predict the breeding values. The variance inflation factor for all features selected through PCA was 1. The correlation coefficients between true and predicted values for Artificial neural networks, Bayesian ridge regression, Classification and regression trees, Genetic Algorithms, Gradient boosting algorithm, K nearest neighbours, MARS algorithm, Polynomial regression, Principal component regression, Random forests, Support Vector Machines, XGBoost algorithm were 0.852, 0.742, 0.869, 0.762, 0.915, 0.781, 0.746, 0.742, 0.746, 0.917, 0.777, 0.915 respectively for breeding value prediction. Random forests had the highest correlation coefficients. A total of 13 machine learning models were developed from the prediction of breeding values in sheep in the present study. It may be said that machine learning techniques can perform predictions with reasonable accuracies and can thus be viable alternatives to conventional strategies for breeding value prediction.
... Recently, several countries such as Brazil and India have incorporated the main component analysis (MC) in the construction of selection indexes in dairy cattle (Bignardi et al. 2012 andKhan et al. 2013). Authors such as Buzanskas et al. (2013) stated that the use of MC is a methodology to build linear combinations between the breeding values of the available traits in a database, taking into account the eigenvalues of the main component and the eigenvectors of the traits in each main component, which are variability measures. ...
Article
Full-text available
The phenotypic and genealogical data of 1 571 Holstein cows located in three livestock enterprises during the years 1984 to 2016 were used. The purpose of this study was to carry out the multi-trait selection of dairy production, reproduction and longevity traits by means of the preparation of selection indexes (SI) through the analysis of main components (MC). Correlations and breeding values (BV) were estimated for the traits: cumulative milk production up to 305 days (BVL305), duration of lactation (BVDL), age at first parturition (BVAP1), gestation parturition interval (BVGPI), accumulated milk per life (BVTML) and productive life (BVPL) by means of a multi-trait animal model. The SPSS statistical package was used to perform the MC analysis, for which the BV were standardized, and the Kaiser criterion was used to select the MC that explains the greatest genetic variation. The genetic correlations between L305, TML and DL showed mean values (0.55, 0.47 and 0.27), and between GPI and PL of 0.29. The first two main components (MC1, MC2) were those that obtained the Kaiser criterion and explained 53.7 % of the total variance of the BV. Linear correlations between BVs with each main component showed that L305, DL and TML were related with the MC1, and GPI and PL to MC2. It is concluded that in Holstein cows it is possible to perform multi-trait selection by constructing selection indexes based on the first two MCs, since they showed considerable genetic variation.
... cm was reported by Vohra et al. (2017) in Chhattisgarhi buffaloes while Melo et al. (2018) observed 143.07 cm body length in crossbred Murrah buffaloes. In Azikheli buffaloes, Khan et al. (2013) observed body length of 140 cm. On contrary, longer average body length (156 cm) was reported by Mirza et al. (2015) in the Nilli Ravi buffaloes of Pakistan. ...
Article
Full-text available
Linear type traits are important in terms of reflecting breed standards and in giving information about the developmental ability of the animals. For data analysis, principal component analysis (PCA) is most important technique when variables are correlated. The aim of present study was to make linear type traits unrelated and reduce their number to the extent which could be used in explaining body conformation in Murrah buffaloes. Measurements were recorded on a total of 81 adult Murrah buffaloes maintained at Buffalo Farm, Lala Lajpat Rai University of Veterinary and Animal Sciences, Hisar for 11 linear type traits (top wedge angle, rump slope, rump width, hip bone distance, navel flap length, brisket distance, height at wither, body length, skin thickness at neck region, skin thickness at ribs region and skin thickness at rump region). Phenotypic correlations were calculated for considered traits and significant positive correlations varied from 0.26 to 0.67 in the present study. All 11 linear type traits were subjected to varimax rotated PCA with Kaiser Normalization to explain body conformation of Murrah buffaloes. Principal component analysis resulted into four components which described 69.522% of total variation and out of this, first component explained 28.678% variation. The communality ranged from 0.882 (rump slope) to 0.390 (naval flap length) and unique factors ranged from 0.118 to 0.610 for 11 different linear type traits. It was concluded that PCA was effective to reduce the number of variables required to explain the body conformation in Murrah buffaloes.
... cm was reported by Vohra et al. (2017) in Chhattisgarhi buffaloes while Melo et al. (2018) observed 143.07 cm body length in crossbred Murrah buffaloes. In Azikheli buffaloes, Khan et al. (2013) observed body length of 140 cm. On contrary, longer average body length (156 cm) was reported by Mirza et al. (2015) in the Nilli Ravi buffaloes of Pakistan. ...
Article
Full-text available
Linear type traits are important in terms of reflecting breed standards and in giving information about the developmental ability of the animals. For data analysis, principal component analysis (PCA) is most important technique when variables are correlated. The aim of present study was to make linear type traits unrelated and reduce their number to the extent which could be used in explaining body conformation in Murrah buffaloes. Measurements were recorded on a total of 81 adult Murrah buffaloes maintained at Buffalo Farm, Lala Lajpat Rai University of Veterinary and Animal Sciences, Hisar for 11 linear type traits (top wedge angle, rump slope, rump width, hip bone distance, navel flap length, brisket distance, height at wither, body length, skin thickness at neck region, skin thickness at ribs region and skin thickness at rump region). Phenotypic correlations were calculated for considered traits and significant positive correlations varied from 0.26 to 0.67 in the present study. All 11 linear type traits were subjected to varimax rotated PCA with Kaiser Normalization to explain body conformation of Murrah buffaloes. Principal component analysis resulted into four components which described 69.522% of total variation and out of this, first component explained 28.678% variation. The communality ranged from 0.882 (rump slope) to 0.390 (naval flap length) and unique factors ranged from 0.118 to 0.610 for 11 different linear type traits. It was concluded that PCA was effective to reduce the number of variables required to explain the body conformation in Murrah buffaloes.
Article
Full-text available
The data on early lactation traits, production efficiency traits of Phule Triveni Cows maintained at the MPKV, Rahuri, Maharashtra from 1978 to 2007 were used to predict lifetime milk production up to third lactation. Set-I consisting of 2 out of 7 for first lactation traits the best equation under multiple egression model explained 50.91% accuracy of prediction in LTP-3. The best equation under Set-II containing 3 out of 11 for first and second lactation traits showed 74.28% accuracy. When only production, efficiency traits were incorporated in the prediction equation, then the best equation having 3 out of 6 production efficiency traits explained 48.70% accuracy of prediction. The optimum equations under Set-IV including 4 out of 1.7 traits for LTP-3 lactation explained 88.03% accuracy of prediction. It was observed, that the inclusion of production efficiency traits along with first and second lactation traits in an equation showed significant increase in R 2 value. Cattle, Early lactation traits, Lactation traits, Lifetime milk production, Prediction, Productivity.
Article
Full-text available
Ten-year data records on growth (birth weight-BWt), five initially reproductive traits (age at first calving-AFC, first dry period-FSP, first dry period-FDP, first calving interval-FCI and first lactation length-FLL) along with the part lactation records of 100, 170 and 240days of first lactation (my100-1, my170-1 and my240-1) and second lactation (my100-2, my170-2 and my240-2) and their respective total milk yields (total lactation milk yield of first lactation-TLMY1 and total lactation milk yield of second lactation-TLMY2) were used to predict LTMY5 (lifetime milk yield as total milk yield up to 5 lactations) and LTMY4 (lifetime milk yield as total milk yield up to 4 lactations). It was observed that first calving interval (FCI) happens to be important predictor (out of initially expressed growth-birth weight(BWt), reproductive traits- AFC, FSP, FDP, FCI, FLL and first lactation milk traits- my100-1, my170-1, my240-1 and TLMY1) for lifetime prediction (both LTMY4 and LTMY5). Prediction of LTMY4 and LTMY5 with respect to initial growth (birth weight), reproductive traits (AFC, FSP, FDP, FCI, FLL) and first 2 lactations (my100-1, my170-1 my240-1, TLMY1, my100-2, my170-2 my240-2 and TLMY2) indicated the contribution of my240-2 followed by TLMY1 and my170-2. They jointly explained 40.32% variation in estimated value of LTMY4. However, prediction of LTMY5, with respect to these predictors showed my240-2 together with FLL jointly explained 26.71% variation in estimated value.
Article
Full-text available
Vrindavani cattle are recently developed synthetic crossbred cattle strain of India. It has the exotic inheritance of Holstein-Friesian, Brown Swiss, Jersey and indigenous inheritance of Hariana cattle. The present study was undertaken to characterize the Vrindavani cattle maintained at cattle and buffalo farm, Indian Veterinary Research Institute, Izatnagar, Bareilly, India. The physical, morphological characteristics and production performances of Vrindavani cattle were studied. The coat colour of Vrindavani was predominantly brown though some animals had black, white and beige coat colour. Head was clean cut well proportionate with prominent poll and concave forehead. The ears were medium sized, laterally orientated with round edge. Hip bone was broad and prominent with wide, smooth and level pin bone. Udder was generally trough type. These animals were docile to moderate in temperament. Morphometric measurements of males were higher than females. The mean birth weight of Vrindavani calves were 22.13±0.12 kg. The mean lactation milk yield, 305 day milk yield and peak yield was 3219.75±41.09, 3047.42±33.8 and 16.58±0.16 kg, respectively. The average age at first successful service, age at first calving, service period and dry period was 746.28±8.94, 1 012.14±9.32, and 149.54±4.55 and 99.65±5.75 days respectively.
Article
The study was conducted to predict total milk yied based on most frequent daily milk yield and highest daily milk yield of a month in Sahiwal cows. Most frequent daily milk yield (MFDMY) and highest daily milk yield (HIDMY) both helped in prediction of total milk yield with fair degree of accuracy.
Article
Different optimum equations developed from both multiple regression analysis and principal components analysis revealed that error variance, expressed as per cent total variance was lower from the model having PCs as compared to original variables in the regression model. However, both multiple regression analysis and principal components analysis were almost equal in accuracy to predict lifetime milk production in this set of data.
Article
A field study was set up to investigate the relation between breeding management and 305-day milk production. Second goal of the study was to investigate advantages and disadvantages of principal components regression (PCR) and partial least squares (PLS) for livestock management research. Multicollinearity was present in the data set and the number of variables was high compared to the number of observations. Out of 70 variables related to breeding management and technical results at dairy farms, 19 were selected for PLS and PCR, based on a correlation of ≥0.25 or ≤−0.25 with 305-day milk production. Five principal components (PCs) were selected for PC-regression with 305-day milk production being the goal variable. Related variables were combined into one so-called synthetic factor. All synthetic variables were used in a path-analysis. The same path-analysis was worked out with PLS. PLS forms synthetic factors capturing most of the information for the independent X-variables that is useful for predicting the dependent Y-variable(s) while reducing the dimensionality. Both methodologies showed that milk production per cow is related to critical success factors of the producer, farm size, breeding value for production and conformation. Milk production per cow was the result of the attitude of the farmer as well as the genetic capacity of the cow. It was found that at high producing farms the producer put relatively much emphasis on the quality of the udder and less on the kg of milk. Advantages of PLS are the optimization towards the Y-variable, resulting in a higher R2, and the possibility to include more than one Y-variable. Advantages of PCR are that hypothesis testing can be performed, and that complete optimisation is used in determining the PCs. It is concluded that PLS is a good alternative for PCR when relations are complex and the number of observations is small.