ArticlePDF Available

Development of feed composition tables using a statistical screening procedure

Authors:
  • Purina Animal Nutrition

Abstract and Figures

Millions of feed composition records generated annually by testing laboratories are valuable assets that can be used to benefit the animal nutrition community. However, it is challenging to manage, handle, and process feed composition data that originate from multiple sources, lack standardized feed names, and contain outliers. Efficient methods that consolidate and screen such data are needed to develop feed composition databases with accurate means and standard deviations (SD). Considering the interest of the animal science community in data management and the importance of feed composition tables for the animal industry, the objective was to develop a set of procedures to construct accurate feed composition tables from large data sets. A published statistical procedure, designed to screen feed composition data, was employed, modified, and programmed to operate using Python and SAS. The 2.76 million data received from 4 commercial feed testing laboratories were used to develop procedures and to construct tables summarizing feed composition. Briefly, feed names and nutrients across laboratories were standardized, and erroneous and duplicated records were removed. Histogram, univariate, and principal component analyses were used to identify and remove outliers having key nutrients outside of the mean ± 3.5 SD. Clustering procedures identified subgroups of feeds within a large data set. Aside from the clustering step that was programmed in Python to automatically execute in SAS, all steps were programmed and automatically conducted using Python followed by a manual evaluation of the resulting mean Pearson correlation matrices of clusters. The input data set contained 42, 94, 162, and 270 feeds from 4 laboratories and comprised 25 to 30 nutrients. The final database included 174 feeds and 1.48 million records. The developed procedures effectively classified by-products (e.g., distillers grains and solubles as low or high fat), forages (e.g., legume or grass-legume mixture by maturity), and oilseeds versus meal (e.g., soybeans as whole raw seeds vs. soybean meal expellers or solvent extracted) into distinct sub-populations. Results from these analyses suggest that the procedure can provide a robust tool to construct and update large feed data sets. This approach can also be used by commercial laboratories, feed manufacturers, animal producers, and other professionals to process feed composition data sets and update feed libraries.
Content may be subject to copyright.
3786
ABSTRACT
Millions of feed composition records generated an-
nually by testing laboratories are valuable assets that
can be used to benefit the animal nutrition community.
However, it is challenging to manage, handle, and pro-
cess feed composition data that originate from multiple
sources, lack standardized feed names, and contain
outliers. Efficient methods that consolidate and screen
such data are needed to develop feed composition da-
tabases with accurate means and standard deviations
(SD). Considering the interest of the animal science
community in data management and the importance of
feed composition tables for the animal industry, the ob-
jective was to develop a set of procedures to construct
accurate feed composition tables from large data sets.
A published statistical procedure, designed to screen
feed composition data, was employed, modified, and
programmed to operate using Python and SAS. The
2.76 million data received from 4 commercial feed test-
ing laboratories were used to develop procedures and to
construct tables summarizing feed composition. Briefly,
feed names and nutrients across laboratories were stan-
dardized, and erroneous and duplicated records were
removed. Histogram, univariate, and principal compo-
nent analyses were used to identify and remove outliers
having key nutrients outside of the mean ± 3.5 SD. Clus-
tering procedures identified subgroups of feeds within
a large data set. Aside from the clustering step that
was programmed in Python to automatically execute
in SAS, all steps were programmed and automatically
conducted using Python followed by a manual evalua-
tion of the resulting mean Pearson correlation matrices
of clusters. The input data set contained 42, 94, 162,
and 270 feeds from 4 laboratories and comprised 25
to 30 nutrients. The final database included 174 feeds
and 1.48 million records. The developed procedures
effectively classified by-products (e.g., distillers grains
and solubles as low or high fat), forages (e.g., legume or
grass-legume mixture by maturity), and oilseeds versus
meal (e.g., soybeans as whole raw seeds vs. soybean
meal expellers or solvent extracted) into distinct sub-
populations. Results from these analyses suggest that
the procedure can provide a robust tool to construct
and update large feed data sets. This approach can
also be used by commercial laboratories, feed manu-
facturers, animal producers, and other professionals
to process feed composition data sets and update feed
libraries.
Key words: clustering, principal component analysis,
database, nutrient
INTRODUCTION
The process of diet formulation for livestock spe-
cies requires an accurate knowledge of the nutrient
composition of feedstuffs. From estimates of nutrient
composition, nutritionists are able to combine available
feeds in an appropriate proportion to match nutritional
requirements of animals. Although nutritionists rou-
tinely submit samples to commercial feed testing labo-
ratories, they also often rely on feed composition tables
to provide estimates of nutrient composition of feeds
not commonly analyzed (Tedeschi et al., 2008). For ex-
ample, due to the low variation in nutrient composition
between batches, nutritionists routinely use tabular
nutrient composition values for concentrate ingredients
such as corn and soybean meal (St-Pierre and Weiss,
2015). Nutritionists also rely on feed composition tables
to estimate concentrations for nutrients that are expen-
sive to determine, such as amino acids and fatty acids.
Finally, feed composition tables could be a source of
Development of feed composition tables using
a statistical screening procedure
H. Tran,1,2,3 A. Schlageter-Tello,1,2 A. Caprez,4 P. S. Miller,1,2 M. B. Hall,5 W. P. Weiss,2,6
and P. J. Kononoff1*
1Department of Animal Science, University of Nebraska-Lincoln, Lincoln 68583
2National Animal Nutrition Program, University of Kentucky, Lexington 40546
3Land O’Lakes Inc., Arden Hills, MN 55126
4Holland Computer Center, University of Nebraska-Lincoln, Lincoln, 68588
5US Dairy Forage Research Center, Madison, WI 53706
6Department of Animal Sciences, The Ohio State University, Wooster 43210
J. Dairy Sci. 103:3786–3803
https://doi.org/10.3168/jds.2019-16702
© 2020, The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Received March 28, 2019.
Accepted November 14, 2019.
*Corresponding author: pkononoff2@ unl .edu
3787
Journal of Dairy Science Vol. 103 No. 4, 2020
estimates for variability in nutrient composition (i.e.,
SD). Knowing accurate estimates of nutrient variability
would allow nutritionists to adjust safety factors (de-
fined as offering excess of nutrients to minimize risks
of deficiency associated with nutrient variability) to a
minimum level, allowing formulation of more efficient
rations (St-Pierre and Weiss, 2015). Accurate estimates
of true nutrient variability are also needed for stochas-
tic diet formulation methods (St. Pierre and Harvey,
1986 a,b,c)
The development of feed composition tables grew in
extent and complexity in the 1950s, when the National
Research Council (NRC; currently the US National
Academies of Science, Engineering and Medicine, NAS-
EM) published tables that contained data on the nutri-
ent composition of different by-products (NRC, 1956)
as well as cereals and forages (NRC, 1958). Over time,
feed composition tables were expanded to include more
feedstuffs, along with attempts to standardize nomen-
clature and updates with more accurate data (NRC,
1971, 1982). Currently, NRC/NASEM feed composition
tables are commonly used globally for diet formulation
(Thornton, 2010). Additionally, different countries have
developed their own feed composition tables, assuming
differences in nutrient composition due to geographic
location. Some examples of such feed composition
tables are those from France (Sauvant et al., 2004),
the United Kingdom (Ministry of Agriculture, Fisheries
and Food, 1992), and the Netherlands (Federatie Ned-
erlandse Diervoederketen, 2016). Expansion and cura-
tion of feed composition tables and databases should be
an ongoing processes.
Historically, feed composition tables such as those
published by the poultry NRC (1994) and swine NRC
(2012) industries are constructed from pre-existing data
sets and data collected from scientific literature. These
literature-based feed composition tables rely on small
data sets using simple data management processes.
Furthermore, collecting data from literature is a time-
consuming process, requiring review and extraction of
data from a large number of publications, and the small
sample size may result in inaccurate estimates of popu-
lation parameters. An alternative approach was adopted
by the dairy cattle NRC (2001) and beef cattle NASEM
(2016) committees, which constructed feed composition
tables using data provided by commercial laboratories.
Feed composition tables based on data sets provided
by commercial laboratories are constructed using large
data sets containing thousands or even millions of
data for every nutrient and feed. Use of large data sets
should improve estimates of statistical parameters and
ensure they are more representative of the population
of feeds. However, it is difficult to process and manage
large data. Additionally, file formats, data structure,
and feed classifications differ among the feed testing
laboratories. Managing such large databases requires
computers with high processing power and software
able to run automated procedures to consolidate files,
to screen outlying observations and detect misclassified
records.
Creation of large databases has provided a useful
resource not only in animal nutrition but in a variety
of applications such as genetics, animal health and en-
vironment, and others (Morota et al., 2018; Lokhorst
et al., 2019). Thus, considering the increasing interest
of the animal science community in big data manage-
ment and the importance of feed composition tables for
the dairy feed industry, the objective was to develop a
set of procedures to efficiently construct accurate feed
composition tables from large, messy data sets.
MATERIALS AND METHODS
General Procedures, Data Collection,
and Pre-Screening
The statistical screening procedure to develop the
feed composition tables consisted of 4 steps: (1) data
set collection; (2) data pre-screening, aimed at assign-
ing a standard nomenclature for data sets received from
different sources; (3) automated statistical screening
procedure, aimed at removing outliers and detecting
clusters within larger data sets; and (4) data summary,
to create final feed composition tables. A summary of
the steps included in this process is illustrated in Figure
1. In the fall of 2015, 4 commercial feed testing labo-
ratories in the United States (Dairyland Laboratories,
Arcadia, WI; Cumberland Valley Analytical Services,
Waynesboro, PA; Rock River Laboratory Inc., Water-
town, WI; and Dairy One, Ithaca, NY) were contacted,
and feed analysis data were requested. Together these
laboratories kindly provided a data set containing 2.82
million records (defined as an analysis of a single feed
sample conducted by 1 laboratory) collected between
2011 and 2015, and delivered as .xlsx or .csv files.
The objective of the pre-screening process was to cre-
ate standardized files to be used as the input data set
for the automated statistical screening process (Figure
1). The first step within the pre-screening process was
to remove feed records that could not be used in the
final database. Unusable records included unidentified
records, records with missing values for all nutrients,
records referring to TMR, feed mixtures, commercial
products, or mineral mixtures, and non-feed records
such as water and manure. The second step in the
pre-screening process was to standardize feed and nu-
trient names. In general, nomenclature for feeds was
Tran et al.: DAIRY INDUSTRY TODAY
Journal of Dairy Science Vol. 103 No. 4, 2020
3788
Tran et al.: DAIRY INDUSTRY TODAY
Figure 1. Summary of the procedure used to create feed composition tables.
3789
Journal of Dairy Science Vol. 103 No. 4, 2020
diverse among laboratories. Feeds with similar names
across laboratories were assigned with a common feed
name. Feed names did not follow the International Feed
Name and Number system (Harris et al., 1980); how-
ever, attempts were made to use feed names that were
consistent with national feed control official standards
(AAFCO, 2017). Forages (defined as the aerial parts
of certain vegetal species used as feed for ruminants)
were rarely identified as species in the raw data sets
provided by labs, and, when identified, the number of
records was low. Hence, forage records were grouped
into 3 main categories: cool-season grasses (including
records identified as brome grass, bluegrass, fescue,
matua grass, orchardgrass, reed canary grass, and timo-
thy), legumes (including records identified as alfalfa,
mixed alfalfa, clover, legume, alfalfa-clover mix), and
grass-legume mixture (including records designated as
pasture, mixed forages, mixed pastures, grass forage,
grass, mixed grass-legume, mixture alfalfa-grass, mix-
ture clover-grass, and meadow grass).
The procedure to standardize names of nutrients
was similar to the procedure described for feed names.
Nutrients with similar name across laboratories were
assigned a common name. Methodologies used to de-
termine nutrients were obtained from each laboratory
webpage and evaluated to determine whether nutrients
were comparable. A summary for 30 nutrients to be in-
cluded in feed composition tables can be found in Table
1. In animal nutrition, many assays characterize feeds
by measuring the concentration of an “analyte,” which
is, by definition, the chemical component of interest
(Harvey, 2000). This analyte may not be the same as a
“nutrient,” which is defined as the chemical substance
that nourishes (Pond et al., 1995). As an example, NDF
is considered an analyte but is not a direct measure of
carbohydrate, which is a nutrient. Despite inexact use
of the terms, to be consistent with the literature, in
this publication the term “nutrient” is used to refer to a
chemical component of interest (i.e., analyte).
Finally, data from a single feed were allocated into a
different Excel file (version 16.16.18, Microsoft Corp.,
Redmond, WA), classified according to the laboratory
providing data. Each file was the initial input data set
to be screened by the automated statistical procedure
(Figure 1). The initial library contained 42, 94, 162,
and 270 initial input data sets (i.e., feeds) from labora-
tories 1 through 4, containing 2.76 million feed records
in total. The pre-screening step was conducted manu-
ally using Excel.
Automated Statistical Screening Procedure
The objective of the automated statistical screening
procedure was to remove outlier data points and detect
clusters (defined as identified subgroups of feeds within
a large data set; Figure 1). Outliers are defined as ob-
servation that differs from most other observations in a
data set, generating suspicions on how it was obtained
(Hawkins, 1980). Outliers are not necessarily erroneous
data points and could represent extreme values within
a population (e.g., a mature grass harvested late in a
dry season). The statistical screening procedure was
conducted automatically, using a primary Python code
to automate the process (Python Language, v. 2.7,
Python Software Foundation, Wilmington, DE), which
executed a secondary SAS code running the statistical
screening procedure (SAS v. 9.3, SAS Institute Inc.,
Cary, NC). Python and SAS codes used to perform the
automated statistical screening procedure can be found
on the Github webpage (https: / / github .com/ unlhcc/
FeedComposition; see files Screening procedure_b_Py-
thon, Screening procedure_Common_Python, and
Screening procedure_b_SAS).
The automated statistical screening procedure began
with the selection of all initial input data sets. For
each initial input data set, the automated screening
procedure removed erroneous data points (i.e., nutri-
ent analyses with values <0 or >100) and duplicated
records (identified as records with identical values for
all nutrients contained in the observations).
In the next step, initial input data sets were
screened using the 3-step statistical screening proce-
dure previously described by Yoder et al. (2014). The
first step in the 3-step statistical screening procedure
was a univariate analysis (Figure 1), which removed
values more than ±3.5 SD units from the mean for
all nutrients. In addition, histograms and correlation
matrices for DM, CP, ash, starch, crude fat, NDF, and
ADF were generated. The output generated after the
univariate analysis was labeled as the filtered data set
(Figure 1). Step 2 was a multivariate principal com-
ponent analysis (PCA; Figure 1; Lever et al., 2017).
Before PCA, nutrient values were standardized to a
mean of 0 and SD of 1. Standardized values were used
in a PCA using the PRINCOMP procedure of SAS
(Yoder et al., 2014). Before the PCA was performed,
key nutrients were selected. Key nutrients were an
important part of the statistical screening procedure,
as they were the variables used to identify feed clus-
ters within an identified feedstuff in an initial data
set. In addition, selection of appropriate key nutrients
was important to avoid an over-elimination of data
through use of a reduced number of nutrients. A PCA
requires that all records in a data set contain values
for all nutrients. However, because most records did
not contain values for the 30 nutrients included in the
data sets (Table 1), use of fewer key nutrients reduced
the number of records eliminated by the PCA. For
Tran et al.: DAIRY INDUSTRY TODAY
Journal of Dairy Science Vol. 103 No. 4, 2020
3790
all initial input data sets (i.e., feeds), assigned key
nutrients included those commonly reported for all
feeds, such as DM, CP, NDF, and ash. Several other
nutrients were designated as key nutrients for specific
classes of feeds. For instance, lignin, hemicellulose (de-
termined as NDF − ADF), and Ca were used as key
nutrient for some forages; starch was used for grains;
fat was used for animal-derived or plant protein feeds;
and ethanol-soluble carbohydrate for by-products con-
taining elevated sugar concentrations. Summaries of
selected key nutrients for different feeds, classified per
laboratory, are shown in Table 2. Data points with
PCA scores outside the range of 3.5 SD were removed
from the data set as outliers.
Tran et al.: DAIRY INDUSTRY TODAY
Table 1. Nutrients available in the feed composition tables created with the described screening procedure
Nutrient1,2 Abbreviation Definition3
Dry matter DM The proportion of feed remaining after removing all water in the sample
Crude protein CP Content of total nitrogen in a feed, including true protein and non-protein nitrogen
(e.g., urea, nucleotides), evaluated by the Kjeldahl or Dumas methods, and multiplied
by 6.25, a factor estimating content of protein, assuming that amino acids contain 16%
nitrogen.
Ash Ash Fraction containing the total inorganic mineral elements of a feed after complete
incineration of a sample at 500 to 600°C.
Starch Starch Intracellular polysaccharides composed of amylose and amylopectin, found primarily in
seed grains and root portions of plants. The ratio of amylose and amylopectin varies
according to the botanical origin of the starch.
Crude fat Fat Feed fraction containing lipidic substances extracted with ethylic or petroleum ether.
This fraction includes acids with glycerol, phospholipids, lecithin, sterols, waxes, free
fatty acids, carotenoids, chlorophyll, and other lipidic pigments.
Total fatty acids TFA Feed fraction containing total fatty acids in a sample. The fraction is obtained via
esterification of fatty acids with HCl, followed by fatty acid determination in gas-liquid
chromatography.
Neutral detergent fiber NDF Feed fraction obtained when feed samples are boiled in the neutral detergent solution
(sodium lauryl sulfate in disodium EDTA and sodium borate). This fraction contains
most of the cell wall components (cellulose, hemicellulose, lignin, silica, tannins, and
cutins) except for some pectins and β-glucans.
Acid detergent fiber ADF Feed fraction obtained after boiling the NDF fraction in an acid detergent solution
(cetyl-trimethyl-ammonium-bromide in sulfuric acid). This fraction contains most of the
indigestible cell wall components (lignin, cellulose, and varying amounts of silica).
Hemicellulose4HC Feed fraction containing several polysaccharides present in the cell wall in chains of 500
to 3,000 units, composed of different types of monosaccharides such as xylose, mannose,
galactose, and arabinose. Hemicellulose is estimated as NDF minus ADF values.
Water-soluble carbohydrates WSC Feed fraction containing sugars and non-starch, nonstructural carbohydrates (fructans,
sucrose, glucose, and fructose), extracted with a phenol-sulfuric acid solution.
Ethanol-soluble carbohydrates ESC Feed fraction containing sugars, non-starch, nonstructural carbohydrates (sucrose,
glucose, and fructose), extracted with 80% ethanol solution. ESC poorly extract
fructans.
Lignin Lignin Feed fraction obtained after boiling the ADF in an acid detergent solution (72% sulfuric
acid 12 mol/L) for 3 h. This fraction contains mostly lignin.
Soluble protein SolP Protein fraction extracted from CP using a borate-phosphate and sodium azide solution.
It present an estimate of soluble nitrogen components rapidly degraded in the rumen.
Neutral detergent-insoluble CP NDICP Protein fraction remaining after extracting CP with a neutral detergent solution
(sodium lauryl sulfate in disodium EDTA and sodium borate). It provides an estimate
of the portion of the rumen-undegradable nitrogen that is potentially available to the
animal.
Acid detergent-insoluble CP ADICP Protein fraction remaining after extracting NDICP with an acid detergent solution
(cetyl-trimethyl-ammonium-bromide in sulfuric acid). It provides an estimate of the
portion of the rumen-unavailable nitrogen (i.e., heat-damaged or strongly attached to
cell wall).
In vitro NDF digestibility at
48 h NDFD48 Proportion of NDF digested in vitro after 48 h.
1By definition, a nutrient is “the chemical substance that nourishes.” In the current table, many chemical components cannot be defined as
“nutrients”; however, to be consistent with terms used in literature, here a “nutrient” is defined as a “chemical component of interest.” Although
many feed testing laboratories use NIRS (near infrared spectroscopy) as the main method for nutrient analysis, in this table we define the refer-
ence method used to estimate each nutrient.
2Feed composition tables created with the described screening procedure will also include values for several minerals that are not defined in the
current table, namely, Ca, P, Mg, K, Na, Cl, S, Co, Cu, Fe, Mn, Se, Zn, Mo, and I.
3References used for definitions include Van Soest (1963), Sukhija and Palmquist (1988), Licitra et al. (1996), AOAC International (2006), and
Hall (2014).
4Hemicellulose is a nutrient defined in this table and used as a key nutrient during the statistical screening procedure. However, feed composition
tables created with the screening procedure will not report values for hemicellulose.
3791
Journal of Dairy Science Vol. 103 No. 4, 2020
Tran et al.: DAIRY INDUSTRY TODAY
Table 2. Key nutrients [DM, CP, NDF, ash, starch, fat (crude fat), hemicellulose (HC; estimated as NDF minus ADF), and ethanol-soluble carbohydrates (ESC) used for within-
laboratory clustering analysis to identify groups of feeds within a named feedstuff
Feed name Laboratory 1
(n = 735,560) Laboratory 2
(n = 538,069) Laboratory 3
(n = 398,397) Laboratory 4
(n = 315,040)
Alfalfa meal DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash
Almond hulls DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF DM, CP, NDF, Ash
Apple pomace or by-product, wet DM, CP, NDF
Bakery by-product meal DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Fat DM, CP, NDF, Ash, Fat
Bakery by-product, bread waste DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Fat
Bakery by-product, cereal DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Fat
Bakery by-product, cookies DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Fat
Barley grain, dry, ground DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, Starch
Barley hay DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Barley malt sprouts DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF, Ash
Barley silage, headed DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Barley silage, mid-maturity DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Barley silage, vegetative DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Barley grain, steam rolled DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, Starch
Beet pulp, dry DM, CP, NDF, Ash DM, CP, NDF, Ash DM, NDF, Ash, ESC
Beet pulp, dry, molasses added DM, CP, NDF, Ash DM, CP, NDF, Ash
Beet pulp, wet DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF, Ash
Bermudagrass hay DM, CP, NDF, Ash DM, CP, NDF DM, CP, NDF, Ash, Lignin
Bermudagrass silage, mature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Bermudagrass silage, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Blood meal DM, CP DM, CP DM, CP DM, CP
Brewers grains, dry DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Brewers grains, wet DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Fat
Brewers yeast, dry DM, CP, Ash, Fat
Brewers yeast, wet DM, CP, Ash, Fat
Candy (not chocolate) by-product DM, CP, NDF, Ash, Fat, ESC DM, CP, NDF, Ash, ESC DM, CP, NDF, Fat, ESC
Canola meal DM, CP, Ash, Fat CP, Fat DM, CP, NDF, Fat DM, CP, Fat
Canola seed, whole CP, Fat DM, CP, NDF, Fat
Chocolate by-product DM, CP, NDF, Fat DM, CP, NDF, Fat
Citrus pulp, dry DM, CP, NDF, Ash DM, CP, NDF
Citrus pulp, wet DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF
Cool season grass hay, mature CP, NDF, HC, Lignin DM, CP, NDF, Ash, Lignin,
Starch
Cool season grass hay, mid-maturity CP, NDF, HC, Lignin DM, CP, NDF, Ash, Lignin,
Starch
Cool season grass silage CP, NDF, HC, Lignin DM, CP, NDF, Ash, Lignin,
Starch
Corn snaplage DM, CP, NDF, Ash DM, CP, NDF DM, CP, NDF DM, CP, NDF, Ash
Corn cobs DM, CP, NDF DM, CP, NDF
Corn germ meal DM, CP, NDF, Ash DM, CP, NDF, Fat DM, CP, NDF, Ash
Corn gluten feed, dry DM, CP, NDF DM, CP, NDF DM, CP, NDF, Fat DM, CP, NDF
Corn gluten feed, wet DM, CP, NDF DM, CP, NDF DM, CP, NDF, Fat DM, CP, NDF
Corn gluten meal DM, CP, NDF DM, CP, NDF, Starch DM, CP, NDF, Fat
Corn grain screenings DM, CP, NDF, Ash DM, CP, NDF, Starch DM, CP, NDF, Ash
Corn grain dry, ground, fine grind DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF, Ash, Starch
Corn grain dry, ground, medium grind DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF, Ash, Starch
Continued
Journal of Dairy Science Vol. 103 No. 4, 2020
3792
Tran et al.: DAIRY INDUSTRY TODAY
Continued
Table 2 (Continued). Key nutrients [DM, CP, NDF, ash, starch, fat (crude fat), hemicellulose (HC; estimated as NDF minus ADF), and ethanol-soluble carbohydrates (ESC)
used for within-laboratory clustering analysis to identify groups of feeds within a named feedstuff
Feed name Laboratory 1
(n = 735,560) Laboratory 2
(n = 538,069) Laboratory 3
(n = 398,397) Laboratory 4
(n = 315,040)
Corn grain dry, ground, coarse grind DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF, Ash, Starch
Corn grain, high moisture, fine grind DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch
Corn grain, high moisture, coarse grind DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch
Corn grain, steam-flaked DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, Ash
Corn hominy CP, NDF, Ash, Starch, Fat DM, CP, NDF, Starch DM, CP, NDF DM, CP, NDF, Ash
Corn silage, all samples DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Corn silage, <35% DM DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Corn silage, ≥35% DM DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Corn stalk, ensiled, wet DM, CP, NDF DM, CP, NDF, Starch DM, CP, NDF, Ash
Corn stalks, ensiled, dry DM, CP, NDF DM, CP, NDF, Starch DM, CP, NDF, Ash
Corn, ear with husk and some stalk,
ensiled, high fiber DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF, Ash, Starch
Corn, ear with husk and some stalk,
ensiled, low fiber DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF, Ash, Starch
Cotton gin trash DM, CP, NDF, Ash DM, CP, NDF
Cottonseed whole, linted DM, CP, NDF, Ash, Fat CP, NDF, Fat DM, CP, NDF, Ash, Fat
Cottonseed hulls DM, CP, NDF, Fat DM, CP, NDF DM, CP, NDF, Ash
Cottonseed meal DM, CP, NDF, Ash, Fat CP, NDF, Fat DM, CP, NDF, Fat DM, CP, NDF, Ash
Distillers grains and solubles, dried,
high fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Distillers grains and solubles, dried,
high protein DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Distillers grains and solubles, dried,
low fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Distillers grains and solubles, modified
wet DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Distillers grains with solubles, wet DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Distillers solubles DM, CP, NDF, Ash, Fat DM, CP, NDF, Ash, Fat
Feather meal DM, CP DM, CP
Fish meal DM, CP, Ash, Fat DM, CP, Ash, Fat
Fruit and vegetable by-product, wet DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF
Grain BMR sorghum silage DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin
Grain screenings, source unknown DM, CP, NDF, Ash DM, CP, NDF, Ash DM, CP, NDF, Fat
Grain sorghum silage, mature CP, NDF, Ash, Lignin,
Starch
Grain sorghum silage, mid-maturity CP, NDF, Ash, Lignin,
Starch
Grass-legume mixtures, predominantly
grass, silage CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Grass-legume mixtures, predominantly
grass, hay, mid-maturity CP, NDF, HC, Lignin
Grass-legume mixtures, predominantly
grass, hay, mature CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Grass-legume mixtures, predominantly
legume, hay, mature CP, NDF, HC, Lignin CP, NDF, HC, Lignin
3793
Journal of Dairy Science Vol. 103 No. 4, 2020
Tran et al.: DAIRY INDUSTRY TODAY
Continued
Table 2 (Continued). Key nutrients [DM, CP, NDF, ash, starch, fat (crude fat), hemicellulose (HC; estimated as NDF minus ADF), and ethanol-soluble carbohydrates (ESC)
used for within-laboratory clustering analysis to identify groups of feeds within a named feedstuff
Feed name Laboratory 1
(n = 735,560) Laboratory 2
(n = 538,069) Laboratory 3
(n = 398,397) Laboratory 4
(n = 315,040)
Grass-legume mixtures, predominantly
legume, hay, immature CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Grass-legume mixtures, predominantly
legume, silage CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Grass legume mixtures, mix hay CP, NDF, HC, Lignin CP, NDF, Ash, Lignin
Grass legume mixtures, mix silage CP, NDF, HC, Lignin CP, NDF, Ash, Lignin
Legume hay, immature CP, NDF, HC, Lignin CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Legume hay, mature CP, NDF, HC, Lignin CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Legume hay, mid-maturity CP, NDF, HC, Lignin CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Legume silage, immature CP, NDF, HC, Lignin CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Legume silage, mid-maturity CP, NDF, HC, Lignin CP, NDF, HC, Lignin CP, NDF, HC, Lignin
Linseed CP, Fat CP, Fat
Linseed meal CP, Fat CP, Fat DM, CP, NDF, Fat CP, Fat
Meat and bone meal, porcine CP, Ash CP, Fat DM, CP, Ash, Fat
Millet hay DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Millet silage DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Molasses DM, CP, NDF, Ash
Oat grain, rolled DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP, NDF
Oat hay DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Oat hulls DM, CP, NDF DM, CP, NDF
Oat silage, immature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Oat silage, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Pea silage CP, NDF
Peanut hay DM, CP, NDF, Ash DM, CP, NDF, Ash
Peanut hulls DM, CP, NDF
Peanut meal, expellers DM, CP, Fat DM, CP, Fat
Peanut skins DM, CP, NDF, Ash DM, CP, NDF
Peanuts DM, CP, NDF
Peas CP, NDF DM, CP, NDF
Pineapple cannery waste DM, CP, NDF
Potato by-product meal DM, CP, NDF, Ash, Fat DM, CP, Fat DM, CP, NDF, Ash
Poultry by-product meal DM, CP, Ash, Fat DM, CP, Ash, Fat
Rice bran DM, CP, NDF, Ash DM, CP, NDF, Fat DM, CP, Fat
Rice bran, defatted DM, CP, NDF, Fat DM, CP, Fat
Rice hulls DM, CP, NDF, Ash DM, CP, NDF, Ash
Rice silage, headed DM, CP, NDF, Ash, Lignin,
Starch
Rice silage, vegetative DM, CP, NDF, Ash, Lignin,
Starch
Rice, grain DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch
Rye annual fresh, immature DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Rye annual fresh, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Rye annual hay, immature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Rye annual hay, mature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Rye annual silage, immature DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Journal of Dairy Science Vol. 103 No. 4, 2020
3794
Tran et al.: DAIRY INDUSTRY TODAY
Continued
Table 2 (Continued). Key nutrients [DM, CP, NDF, ash, starch, fat (crude fat), hemicellulose (HC; estimated as NDF minus ADF), and ethanol-soluble carbohydrates (ESC)
used for within-laboratory clustering analysis to identify groups of feeds within a named feedstuff
Feed name Laboratory 1
(n = 735,560) Laboratory 2
(n = 538,069) Laboratory 3
(n = 398,397) Laboratory 4
(n = 315,040)
Rye annual silage, mature DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Rye annual silage, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Rye annual, hay, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Rye grain DM, CP, NDF, Ash, Starch
Safflower meal DM, CP, NDF, Fat
Soybean meal, solvent extracted, 48%
CP DM, CP, Fat CP, Fat DM, CP, NDF, Ash, Fat DM, CP, Fat
Sorghum, forage BMR,1 silage DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin
Sorghum forage silage, immature DM, CP, NDF, Ash, Lignin,
Starch CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Sorghum forage silage, mature DM, CP, NDF, Ash, Lignin,
Starch CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Sorghum grain, dry, ground DM, CP, Starch DM, CP, Starch DM, CP, NDF, Starch
Sorghum grain, steam flaked DM, CP, Starch DM, CP, Starch DM, CP, NDF, Starch
Sorghum hay DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Sorghum soybean silage DM, CP, NDF, Ash, Lignin,
Starch
Sorghum sudangrass silage DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Sorghum sudangrass, BMR, silage DM, CP, NDF, Ash, Lignin,
Starch
Sorghum sudangrass hay DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin,
Starch
Soybean hay DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Soybean hulls DM, CP, Ash CP, NDF DM, CP, NDF DM, CP, NDF
Soybean meal, expellers CP, Fat DM, CP, NDF, Fat, Ash DM, CP, Fat
Soybean meal, extruded DM, CP, Fat CP, Fat DM, CP, NDF, Fat, Ash DM, CP, Fat
Soybean silage DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin DM, CP, NDF, Ash, Lignin
Soybeans, whole raw DM, CP, Fat CP, Fat
Soybeans, whole roasted DM, CP, Fat CP, Fat DM, CP, NDF, Fat DM, CP, Fat
Spelt grain DM, CP, NDF, Ash, Starch
Sudangrass hay, mature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Sudangrass hay, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Sudangrass silage, mature DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Sudangrass silage, mid-maturity DM, CP, NDF, Ash, Lignin DM, CP, NDF DM, CP, NDF, Ash, Lignin
Sugarcane bagasse hay DM, CP, NDF
Sugarcane bagasse silage DM, CP, NDF
Sunflower meal CP, Fat DM, CP, Fat DM, CP, Fat
Sunflower seed CP, Fat DM, CP, Fat
Sunflower silage DM, CP, NDF, Ash
Sweet corn cannery waste DM, CP, NDF, Ash, Lignin,
Starch
Tapioca (Cassava) DM, CP, NDF, Ash, Starch
Triticale grain DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch
Triticale hay DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF DM, CP, NDF, Ash, Fat
Triticale plus pea silage DM, CP, NDF, Ash, Lignin
Triticale silage, mature DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin CP, NDF, Ash, Starch
3795
Journal of Dairy Science Vol. 103 No. 4, 2020
Step 3 involved a multivariate clustering (Figure
1). A 2-stage density linkage cluster analysis was con-
ducted to identify unique groups within a designated
initial input data set. Clustering was conducted using
the CLUSTER procedure of SAS. Clustering analysis
was set up to arbitrarily remove clusters containing less
than 10% of records within an initial data set. When
a cluster was removed, the process was repeated auto-
matically and was considered finished when no other
cluster was removed. Finally, a second univariate analy-
sis was performed to remove data points exceeding ±
3.5 SD from the mean within a generated cluster (Yoder
et al., 2014). After they were generated, each cluster
was evaluated to identify these clusters as specific feeds
if possible (Figure 1). Clusters were named using as
primary condition the initial feed name assigned during
the pre-screening step. Thus, a feed initially identified
as soybean in pre-screening remained as soybean during
the whole screening procedure. Further identification
for feed processing or maturity stage, as in cases of
forages, was conducted, evaluating composition of dif-
ferent nutrients such as DM, CP, NDF, crude fat, and
starch, among others. Final clusters that were success-
fully identified were moved into the final step of the
procedure, data summary (Figure 1). Otherwise, the fi-
nal clusters were rejected, and the screening procedure
was repeated using different screening parameters (e.g.,
key nutrients), as described later (Figure 1).
Data Summary
The automated statistical screening procedure was
programmed to deliver summary tables with informa-
tion for 30 nutrients (Table 1) for each final cluster.
Information contained in summary tables for each nu-
trient included number of records, mean, SD, minimum
and maximum values, and 10th and 90th percentiles
(Figure 1). Merging data from different labs to ob-
tain population statistics can be problematic because
between-laboratory analytical variation is included in
the SD estimate. Users probably tend to use a single
laboratory, and therefore between-laboratory varia-
tion could inflate the SD relative to the true variation
within a feed. Therefore we calculated means and SD
within laboratory and then calculated a weighted aver-
age using the following formulas:
Weighted mean = [(µ1 × N1) + (µ2 × N2)
+ (µ3 × N3) + (µ4 × N4)]/N, [1]
Weighted SD = [(SD1 × N1) + (SD2 × N2)
+ (SD3 × N3) + (SD4 × N4)]/N, [2]
Tran et al.: DAIRY INDUSTRY TODAY
Table 2 (Continued). Key nutrients [DM, CP, NDF, ash, starch, fat (crude fat), hemicellulose (HC; estimated as NDF minus ADF), and ethanol-soluble carbohydrates (ESC)
used for within-laboratory clustering analysis to identify groups of feeds within a named feedstuff
Feed name Laboratory 1
(n = 735,560) Laboratory 2
(n = 538,069) Laboratory 3
(n = 398,397) Laboratory 4
(n = 315,040)
Triticale silage, mid-maturity DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash, Lignin CP, NDF, Ash, Starch
Wheat bran DM, CP, NDF
Wheat grain, rolled DM, CP, NDF, Ash, Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Starch DM, CP
Wheat hay, headed DM, CP, NDF, Ash DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Wheat hay, vegetative DM, CP, NDF, Ash DM, CP, NDF DM, CP, NDF, Ash, Lignin,
Starch
Wheat middlings DM, CP, Ash DM, CP DM, CP, NDF DM, CP, Fat
Wheat silage, headed DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Lignin,
Starch
Wheat silage, vegetative DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Starch DM, CP, NDF, Ash, Lignin,
Starch
Wheat straw DM, CP, NDF, Ash, Lignin,
Starch DM, CP, NDF, Ash DM, CP, NDF, Ash, Lignin,
Starch
Whey, dry DM, CP, NDF, Ash DM, CP DM, CP, Fat
Whey, wet CP, Ash, Fat DM, CP DM, CP, Fat
1Brown midrib.
Journal of Dairy Science Vol. 103 No. 4, 2020
3796
where µ, SD, and N1 through N4 are the mean, SD, and
number of records for a nutrient in a feed with data
provided by laboratories 1, 2, 3, and 4, respectively. In
addition, minimum and maximum values for a nutri-
ent corresponded to the lowest and highest value for a
nutrient across summary tables for laboratories 1, 2, 3,
and 4. Finally, 10th and 90th percentile values included
in feed composition tables were calculated as the mean
of 10th and 90th percentile values available in summary
tables for laboratories 1, 2, 3, and 4.
RESULTS AND DISCUSSION
In total the automated procedure generated 57, 109,
127, and 140 final clusters from laboratories 1 through
4 (Table 2). These clusters were summarized in nutri-
tional information for 174 different feeds and 30 nutri-
ents in 1.48 million records (Tables 1 and 2).
One important feature of the statistical screening
procedure was an increase in accuracy for the estima-
tion of variation (i.e., SD) in nutrient composition for
tabulated feedstuffs. The increase in the accuracy of
the SD estimate was achieved by eliminating outlier
data points and by identifying hidden feed clusters
within a large data set. Variation in nutrient composi-
tion leads to uncertainty in nutrient supply from feeds,
compromising the accuracy of formulated rations, and
accurate estimates of variation are needed to develop
accurate safety factors and to improve diets generated
by stochastic formulation.
Clusters generated by the procedure were identified
as different feeds using specific nutrients as indicators
and expert opinions. For instance, clusters generated
from feeds initially identified as forages were classified
as follows: storage method, using DM values (i.e., fresh
≈ 15% DM, silage ≈ 35% DM, and hay ≈ 80% DM;
Tran et al.: DAIRY INDUSTRY TODAY
Table 3. Summary of the automated statistical screening procedure for alfalfa meal with data provided by
laboratory 3: initial input data set includes raw data after standardization of feed names and nutrients across
laboratories; the filtered data set is the result of eliminating data ±3.5 SD from the average from the initial
input data set; and the final cluster is the data set generated after filtered data set is screened with principal
component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
Alfalfa meal
Initial input data set
Filtered data set
Final cluster2
N µ ± SD3N µ ± SD N µ ± SD
DM 1,704 91.16 ± 1.42 1,670 91.18 ± 1.35 604 91.00 ± 1.34
CP 1,625 18.26 ± 2.71 1,596 18.17 ± 2.64 602 19.76 ± 1.97
NDF 1,536 45.97 ± 6.13 1,509 45.95 ± 6.02 604 42.57 ± 4.32
Ash 959 11.61 ± 2.46 934 11.48 ± 2.09 604 11.94 ± 2.15
Lignin 837 7.62 ± 1.36 814 7.66 ± 1.31 603 7.47 ± 1.11
1Nutrients used as key nutrients for PCA. Except for DM, all values expressed in DM basis.
2Five clusters were deleted for having <10% of total recurs. Final cluster was used to construct final feed
composition tables.
3Average value expressed as percentage (µ) ± SD.
Table 4. Summary of the statistical screening procedure for nutrient composition in distillers grains with solubles (DDGS), with data provided
by laboratory 3: initial input data set includes raw data after standardization of feed names and nutrients across laboratories; the filtered data
set is the result of eliminating data ±3.5 SD from the average from the initial input data set; and the final cluster is the data set generated after
filtered data set is screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
DDGS
DDGS, high fat2
DDGS, low fat2
Initial input data set
Filtered data set Final cluster 13Final cluster 23
N µ ± SD4N µ ± SD N µ ± SD N µ ± SD
DM 8,351 88.44 ± 5.70 3,657 88.51 ± 2.09 2,517 88.28 ± 1.82 476 88.04 ± 1.14
CP 7,374 31.61 ± 4.69 3,657 31.22 ± 3.93 2,514 30.55 ± 1.92 479 32.33 ± 1.59
NDF 6,511 34.19 ± 4.74 3,657 33.78 ± 3.37 2,517 34.40 ± 2.83 475 32.60 ± 2.48
Fat 6,001 11.92 ± 3.26 3,657 12.03 ± 3.30 2,522 13.59 ± 1.80 479 9.94 ± 2.33
Ash 4,115 6.05 ± 1.30 3,657 6.01 ± 1.03 2,520 5.87 ± 0.73 479 7.19 ± 0.97
1Nutrients used as key nutrients for PCA. Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation. Records with CP >35% were removed manually and assigned to feed “DDGS, high protein.”
3Final clusters were used to construct final feed composition tables.
4Average value expressed as percentage (µ) ± SD.
3797
Journal of Dairy Science Vol. 103 No. 4, 2020
NRC, 2001); maturity stage (immature, mid-mature,
and mature) according NDF, ADF, and lignin content;
and different grass-legume mixtures, identified as pre-
dominantly grass or predominantly legume according
to the hemicellulose content (Goering and Van Soest,
1970).The procedure separated clusters that were iden-
tified as oilseeds (~20% fat), solvent oilseed meals (~2%
fat), or expeller oilseed meal (~6% fat) from initial data
sets that may have been identified as whole cottonseed
or whole soybeans. Some by-products were separated
into wet or dry based on cluster analysis. Finally, the
procedure was helpful in identifying misclassified re-
cords in data sets with similar names, such as corn
gluten feed versus corn gluten meal, or soybean meal
expellers versus soybean meal extruded.
Despite the usefulness of the described procedure,
the identification of generated clusters was not always
easy. In some cases, a single initial input file generated
clusters that were easily identified and moved into the
final step of the procedure, data summary (Figure 1).
In most cases, however, generated clusters could not be
identified. When an initial data set generated clusters
that could not be properly identified, the initial input
data set was screened following a slightly different pro-
cedure. In general, no established protocol existed to
repeat the screening procedure, and the new procedure
to screen initial input data sets was decided based on
the authors’ criteria. Some of the new procedures used
to generate identifiable clusters included repeating the
automated statistical screening procedure using differ-
ent key nutrients (the automated statistical screening
procedure was repeated between 1 and 5 times), merging
initial input files initially identified as different feeds to
create a new input file, merging clusters generated from
different initial input files, retrieving and using clusters
removed by the procedure, or manual manipulation of
clusters’ data sets. The following are examples showing
different levels of complexity in how final clusters were
generated.
Alfalfa Meal
Table 3 lists a summary for the statistical screening
procedure for a data set initially identified as alfalfa
meal. The final cluster for alfalfa meal had a similar
average composition for most nutrients when compared
with the initial input library; however, SD were reduced
in the final cluster. For example, in the case of NDF,
the SD of the initial input data set was 6.13%, whereas
in the final cluster it was 4.32% (Table 3). The statisti-
cal screening procedure removed 63% of data available
from the initial input file and generated a single cluster,
and 5 other clusters were removed due to small size
(<10% of total data).
Tran et al.: DAIRY INDUSTRY TODAY
Table 5. Summary of the statistical screening procedure for nutrient composition for corn grain, with data provided by laboratory 4: initial input data set includes raw data after
standardization of feed names and nutrients across laboratories; the filtered data set is the result of eliminating data ±3.5 SD from the average from the initial input data set; and
the final cluster is the data set generated after filtered data set is screened with principal component analysis (PCA), clustering and a second elimination of data ±3.5 SD from
the average
Nutrient1
Corn grain
Corn grain, high moisture2
Corn grain, dry, ground2
Initial input data set
Filtered data set Cluster 1
Cluster 2 Deleted cluster 13
Deleted cluster 23
N µ ± SD4N µ ± SD N µ ± SD N µ ± SD N µ ± SD N µ ± SD
DM516,345 73.87 ± 7.86 16,116 73.87 ± 7.56 10,696 72.67 ± 6.02 1,834 68.76 ± 3.77 1,512 82.81 ± 4.78 765 85.88 ± 2.97
CP516,222 8.99 ± 1.54 15,978 8.89 ± 0.90 10,689 8.69 ± 0.80 1,846 9.61 ± 0.64 1,512 9.51 ± 0.71 765 8.54 ± 0.49
NDF516,037 12.27 ± 4.25 15,814 11.96 ± 2.32 10,646 11.46 ± 1.80 1,842 12.34 ± 2.03 1,512 13.65 ± 1.92 765 12.44 ± 1.53
Fat 516,144 3.14 ± 0.55 15,921 3.14 ± 0.48 10,662 3.04 ± 0.41 1,841 3.31 ± 0.39 1,512 3.33 ± 0.44 765 3.41 ± 0.53
Ash 16,148 2.05 ± 0.63 15,900 2.01 ± 0.42 10,699 1.93 ± 0.33 1,840 2.60 ± 0.26 1,512 1.97 ± 0.23 765 1.73 ± 0.23
Starch 16,125 71.20 ± 5.15 15,906 71.57 ± 3.33 10,689 72.72 ± 2.78 1,845 68.70 ± 2.19 1,512 69.04 ± 2.24 765 71.25 ± 1.89
1Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Cluster having <10% of total records were retrieved and classified as “Corn grain, dry.”
4Average value expressed as percentage (µ) ± SD.
5Used as key nutrient for PCA.
Journal of Dairy Science Vol. 103 No. 4, 2020
3798
Distillers Grains and Solubles, Dried
The statistical screening procedure was used to iden-
tify feeds with similar manufacturing processes but dif-
ferences in feed composition, as is the case for distillers
grains and solubles (Schingoethe et al., 2009). Distillers
grains are a by-product of the ethanol industry, but the
production process may be modified to result in feeds
that can be further differentiated. Table 4 lists a sum-
mary of the statistical screening procedure for an input
data set initially identified as distillers grains and solu-
bles, dried. The statistical screening procedure removed
61% of data initially designated as distillers grains and
solubles and generated 2 clusters that were identified
as distillers grain with solubles, dried, high fat (final
cluster 1: fat = 13.59% ± 1.80, Table 4), and distillers
grain with solubles, dried, low fat (final cluster 2: fat
= 9.94% ± 2.33, Table 4). After cluster evaluation,
we also removed records with CP >35% and assigned
them to a different cluster, generated from a different
initial input data set, identified as distillers grain with
solubles, high protein (data not shown), to match mar-
ket offerings of distillers grain products (Schingoethe et
al., 2009). Aside from fat, mean nutrient composition
was similar for data in the input library, the filtered li-
brary, and both generated clusters. Standard deviation
tended to be lower in clusters than in the initial input
file, especially for DM (SD for initial input data set =
5.70%, final cluster 1 = 1.82%, final cluster 2 = 1.14%)
and CP (SD for initial input data set = 4.69%, final
cluster 1 = 1.92%, final cluster 2 = 1.59%; Table 4).
Corn Grain
As explained previously, the automated statistical
screening procedure was programmed to remove clus-
ters containing less than 10% of records in an initial
input data set. In some cases, however, removed clus-
Tran et al.: DAIRY INDUSTRY TODAY
Table 6. Final clusters for 2 types of corn grain: final cluster 1 for corn grain, high moisture, was created by
merging data from clusters 1 and 2 in Table 5; final cluster 2 for corn grain, dry, was created by merging data
from deleted clusters 1 and 2 from Table 5
Nutrient1
Corn grain, high moisture, fine grind
Corn grain dry, ground, medium grind
Final cluster 12Final cluster 22,3
N µ ± SD4 N µ ± SD
DM 12,530 72.08 ± 5.48 1,268 86.94 ± 2.16
CP 12,535 8.87 ± 0.83 1,268 9.20 ± 0.87
NDF 12,488 11.76 ± 1.86 1,268 13.32 ± 2.12
Fat 12,503 3.09 ± 0.41 1,268 3.42 ± 0.55
Ash 12,539 2.03 ± 0.39 1,268 1.87 ± 0.29
Starch 12,534 71.88 ± 3.02 1,268 69.57 ± 2.67
1Except for DM, all values expressed in DM basis.
2Final clusters were used to construct final feed composition tables.
3Records with DM < 84% were deleted manually.
4Average value expressed as percentage (µ) ± SD.
Table 7. Summary of the statistical screening procedure for nutrient composition for corn gluten feed, with data provided by laboratory 1:
initial input data set includes raw data after standardization of feed names and nutrients across laboratories; the filtered data set is the result
of eliminating data ±3.5 SD from the average from the initial input data set; the final cluster is the data set generated after filtered data set is
screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
Corn gluten feed
Corn gluten feed, dry2 Corn gluten feed, wet2
Initial input data set
Filtered data set Cluster 1
Cluster 2
N µ ± SD3N µ ± SD N µ ± SD N µ ± SD
DM41,573 81.18 ± 17.08 1,487 81.13 ± 17.10 616 88.29 ± 1.97 195 44.47 ± 6.23
CP41,407 23.80 ± 5.88 1,370 23.23 ± 2.46 616 23.39 ± 2.22 194 23.08 ± 2.13
NDF4851 35.64 ± 5.96 837 35.87 ± 4.91 592 35.59 ± 4.19 195 37.32 ± 4.99
Fat 658 4.17 ± 2.70 652 4.17 ± 2.70 389 3.82 ± 0.94 111 4.05 ± 1.02
Ash4817 7.51 ± 1.75 804 7.57 ± 1.69 590 7.75 ± 1.60 185 7.29 ± 1.38
Starch 292 16.32 ± 6.20 287 16.28 ± 5.80 158 15.54 ± 4.57 105 17.18 ± 6.25
1Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Average value expressed as percentage (µ) ± SD.
4Used as key nutrient for PCA.
3799
Journal of Dairy Science Vol. 103 No. 4, 2020
ters provided useful information. The statistical screen-
ing procedure for an initial input data set designated
as corn grain generated 5 clusters, from which 3 were
deleted for having <10% of total data. Given the DM
values for generated clusters 1 (DM = 72.67% ± 6.02,
Table 5) and 2 (DM = 68.76% ± 3.77, Table 5), these
were identified as corn grain, high moisture. Three de-
leted clusters were evaluated, and 2 deleted clusters
were retrieved and identified as corn grain, dry, accord-
ing to DM values (deleted cluster 1 DM = 82.81% ±
4.78; deleted cluster 2 DM = 85.88% ± 2.97; Table 5).
The final cluster for corn grain, high moisture (Table
6), was created by merging clusters 1 and 2 from Table
5, whereas the final cluster for corn grain, dry (Table
6), was created by merging deleted clusters 1 and 2
from Table 5.
Corn Gluten Feed and Corn Gluten Meal
The automated statistical screening procedure was a
useful tool to identify commonly misclassified feeds, as
was the case of corn gluten feed and corn gluten meal.
Tables 7, 8, and 9 summarize the statistical screening
procedure used to obtain final clusters for corn gluten
feed, dry or wet, and corn gluten meal. The automated
statistical screening procedure applied to the initial
input data set identified as corn gluten feed removed
47% of data initially available and generated 2 clus-
ters. Considering that the main difference between
clusters was DM, generated clusters were identified as
corn gluten feed, dry (cluster 1, DM = 88.29% ± 1.97,
Table 7) and corn gluten feed, wet (cluster 2, DM =
44.47% ± 6.23, Table 7). The automated statistical
screening procedure was applied to the second initial
input data set identified as corn gluten meal (Table
8). The automated procedure removed 65% of data
initially available and generated 3 clusters. Generated
clusters differed mainly in DM and CP values and
were identified as corn gluten meal (cluster 1, DM =
90.00% ± 1.49, CP = 68.81% ± 3.49, Table 8), corn
gluten feed, wet (cluster 2, DM = 41.83% ± 4.39, CP
= 22.94% ± 2.66, Table 8), and corn gluten feed, dry
(cluster 3, DM = 88.91% ± 2.26, CP = 22.08% ± 5.49,
Table 8). The creation of identifiable clusters reduced
SD values for several nutrients. For instance, in the
initial data set identified as corn gluten feed (Table
7) the SD for DM decreased from 17.08% in the ini-
tial input data set to 1.97% in cluster 1 and 6.23% in
cluster 2. Similarly, in the initial data set identified as
corn gluten meal (Table 8), the SD for DM and CP
decreased from 16.14% for DM and 20.28% for CP in
the initial input data set to values ranging from 1.46%
to 4.39% for DM, and from 2.66% to 5.49% for CP
in clusters 1, 2, and 3. Final clusters were created by
Tran et al.: DAIRY INDUSTRY TODAY
Table 8. Summary of the statistical screening procedure for nutrient composition for corn gluten meal, with data provided by laboratory 1: initial input data set includes raw data
after standardization of feed names and nutrients across laboratories; the filtered data set is the result of eliminating data ±3.5 SD from the average from the initial input data
set; and the final cluster is the data set generated after filtered data set is screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5
SD from the average
Nutrient1
Corn gluten meal
Corn gluten meal2
Corn gluten feed, wet2
Corn gluten feed, dry2
Initial input data set
Filtered data set Cluster 1 Cluster 2 Cluster 3
N µ ± SD3N µ ± SD N µ ± SD N µ ± SD N µ ± SD
DM4586 82.86 ± 16.14 579 82.88 ± 16.02 115 90.00 ± 1.49 57 41.83 ± 4.39 29 88.91 ± 2.26
CP4557 59.84 ± 20.28 551 59.79 ± 20.32 115 68.81 ± 3.49 57 22.94 ± 2.66 29 22.08 ± 5.49
NDF4206 19.38 ± 16.33 205 19.31 ± 16.34 115 5.49 ± 2.34 58 38.24 ± 4.99 29 35.87 ± 7.18
Fat 156 3.12 ± 1.94 156 3.12 ± 1.94 76 2.33 ± 0.99 33 4.29 ± 1.51 18 3.60 ± 1.25
Ash 172 4.73 ± 2.59 171 4.73 ± 2.59 70 2.33 ± 0.77 56 6.97 ± 1.27 30 6.39 ± 1.75
Starch 61 17.43 ± 9.09 61 17.43 ± 9.09 15 19.10 ± 5.43 28 15.10 ± 6.10 10 17.61 ± 5.20
1Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Average value expressed as percentage (µ) ± SD.
4Used as key nutrient for PCA.
Journal of Dairy Science Vol. 103 No. 4, 2020
3800
merging clusters identified as the same feed in Tables
7 and 8, as described in Table 9.
Legume Hay
As stated previously, the automated screening proce-
dure was useful to classify forages according to storage
method and maturity stage. No attempt to separate the
3 main forages classes (cool season grasses, legumes,
and grass-legume mixtures) into single species was
made, because harvesting, conservation method, and
maturity stages are more important than specific spe-
cies to determine forage quality (Cherney et al., 1993).
When applied to an initial input data set designated
as legume forage (Table 10), the automated statistical
screening procedure removed 22% of initial data and
generated 2 clusters identified as legume hay (cluster
1, DM = 89.17% ± 4.24) and legume silage (cluster 2,
DM = 42.90% ± 9.59). A second initial input data set
designated as legume hay (Table 11) was screened with
the automated statistical procedure. After screening,
the initial input data set for legume hay generated 2
clusters identified as legume hay, immature (cluster 1,
DM = 87.34% ± 2.32, NDF = 34.25% ± 3.44), and
legume hay, mature (Cluster 1, DM = 88.00% ± 1.92,
NDF = 43.03% ± 4.16), eliminating 25% of data ini-
tially available in the process. After cluster evaluation,
initial input data sets for legume forage (Table 10) and
legume hay (Table 11) were merged, and records hav-
ing DM <70% were manually removed, creating a new
input data set designated as legume hay (Table 12).
Unlike screening performed previously in Tables 10 and
11, the statistical procedure for the new input data set
for legume hay included hemicellulose (calculated as
NDF − ADF) as a key nutrient for PCA. We did so be-
cause hemicellulose content can be used to differentiate
between legumes and grasses (Goering and Van Soest,
1970). After employing the statistical screening proce-
Tran et al.: DAIRY INDUSTRY TODAY
Table 9. Final clusters for corn gluten feed, wet or dry, and corn gluten meal: final cluster 1 for corn gluten
feed, dry, was created by merging data from cluster 1 in Table 7 and cluster 3 in Table 8; final cluster 2 for
corn gluten feed, wet, was created merging data from cluster 2 in Table 7 and cluster 2 in Table 8; final cluster
3 is equivalent to cluster 1 in Table 8
Nutrient1
Corn gluten feed, dry
Corn gluten feed, wet
Corn gluten meal
Final cluster 12Final cluster 22Final cluster 32
N µ ± SD3N µ ± SD N µ ± SD
DM 645 88.32 ± 1.99 252 43.87 ± 5.87 115 90.00 ± 1.49
CP 645 23.33 ± 2.47 251 23.05 ± 2.26 115 68.81 ± 3.49
NDF 646 35.60 ± 4.36 253 37.53 ± 4.99 115 5.49 ± 2.34
Fat 407 3.81 ± 0.96 144 4.10 ± 1.15 76 2.33 ± 0.99
Ash 619 7.68 ± 1.63 241 7.21 ± 1.35 70 2.33 ± 0.77
Starch 168 15.67 ± 4.62 133 16.74 ± 6.22 15 19.10 ± 5.43
1Except for DM, all values expressed in DM basis.
2Final clusters were used to construct feed composition tables.
3Average value expressed as percentage (µ) ± SD.
Table 10. Summary of the statistical screening procedure for nutrient composition for legume forage, with data provided by laboratory 1:
initial input data set includes raw data after standardization of feed names and nutrients across laboratories; the filtered data set is the result of
eliminating data ±3.5 SD from the average from the initial input data set; and the final cluster is the data set generated after filtered data set
is screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
Legume forage
Legume hay2
Legume silage2
Initial input data set
Filtered data set Cluster 1 Cluster 2
N µ ± SD3N1µ ± SD N1µ ± SD N1µ ± SD
DM 191,764 61.58 ± 24.16 152,412 58.92 ± 23.44 51,649 89.17 ± 4.24 97,400 42.90 ± 9.59
CP 189,388 20.81 ± 2.59 152,412 20.93 ± 2.33 52,388 20.89 ± 2.42 97,316 21.03 ± 2.17
NDF 181,214 40.32 ± 5.83 152,412 40.61 ± 5.24 52,379 39.75 ± 5.44 97,325 40.94 ± 4.84
Ash 178,816 11.05 ± 2.23 152,412 10.97 ± 1.88 52,357 10.64 ± 1.63 97,076 11.10 ± 1.84
Starch 164,355 2.39 ± 3.63 152,412 2.21 ± 0.93 52,127 2.64 ± 0.67 97,849 1.92 ± 0.78
Lignin 165,208 7.17 ± 1.23 152,412 7.13 ± 1.13 52,312 6.94 ± 1.06 97,301 7.23 ± 1.11
1Nutrients used as key nutrient for PCA. Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Average value expressed as percentage (µ) ± SD.
3801
Journal of Dairy Science Vol. 103 No. 4, 2020
dure, the new input data set for legume hay (Table 12)
generated 2 clusters, identified as legume hay, imma-
ture (cluster 1, DM = 89.69% ± 3.85, NDF = 38.41% ±
4.24), and legume hay, mature (cluster 1, DM = 86.42%
± 4.76, NDF = 48.68% ± 3.39), eliminating 32% of
data initially available in the process. No generated
cluster was classified as grass-legume mixture, due to
the relatively low content of hemicellulose in both clus-
ters (hemicellulose, cluster 1 = 6.89% ± 1.80, cluster 2
= 10.08% ± 1.65), whereas grass-legume mixtures hays
in this database had an average hemicellulose concen-
tration ranging between 16 and 21% (data not shown).
Two other examples of the cluster evaluation procedure
can be found in Supplemental File S1 (https: / / doi .org/
10 .3168/ jds .2019 -16702).
The automated statistical screening procedure de-
leted 46% of initially available data. Using the same
statistical screening procedure, Yoder et al. (2014)
reported an average removal rate of 13.5% from the
initial available data. The higher removal rate in the
current procedure than in the findings of Yoder et al.
(2014) may be due to the complex cluster evaluation
procedure employed in the current study. Yoder et al.
(2014) only analyzed 15 feeds with data provided by 2
labs. In addition, Yoder et al. (2014) ended with cluster
identification after running the screening procedure a
single time, whereas in the current work the automated
statistical screening procedure was repeated between 1
and 5 times, with frequent manual removal of data. A
greater removal rate was prevented by repeating the
Tran et al.: DAIRY INDUSTRY TODAY
Table 12. Summary of the statistical screening procedure for nutrient composition for legume hay, with data provided by laboratory 1: initial
input data set includes raw data after standardization of feed names and nutrients across laboratories; the filtered data set is the result of
eliminating data ±3.5 SD from the average from the initial input data set; and the final cluster is the data set generated after filtered data set
is screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
Legume hay
Legume hay, immature2
Legume hay, mature2
New input data set
Filtered data set Final cluster 13Final cluster 23
N µ ± SD4N µ ± SD N µ ± SD N µ ± SD
DM 77,616 89.19 ± 4.42 75,557 89.21 ± 4.35 45,675 89.69 ± 3.85 6,632 86.42 ± 4.76
CP576,479 20.66 ± 2.81 74,710 20.82 ± 2.47 46,324 21.29 ± 2.11 6,631 18.00 ± 2.00
NDF574,620 39.33 ± 6.36 72,960 39.06 ± 5.68 46,397 38.41 ± 4.24 6,632 48.68 ± 3.39
Hemicellulose5,6 74,341 7.20 ± 2.56 72,630 7.06 ± 2.09 46,344 6.89 ± 1.80 6,623 10.08 ± 1.65
Lignin557,174 7.00 ± 1.20 55,885 6.97 ± 1.10 46,404 6.71 ± 0.86 6,632 8.65 ± 0.87
Ash 66,785 10.70 ± 2.00 65,336 10.74 ± 1.86 46,221 10.89 ± 1.51 6,596 9.09 ± 1.59
1Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Final clusters were used to construct feed composition tables.
4Average value expressed as percentage (µ) ± SD.
5Nutrients used as key nutrient for PCA.
6Calculated as NDF − ADF.
Table 11. Summary of the statistical screening procedure for nutrient composition for legume hay, with data provided by laboratory 1: initial
input data set includes raw data after standardization of feed names and nutrients across laboratories; the filtered data set is the result of
eliminating data ±3.5 SD from the average from the initial input data set; and the final cluster is the data set generated after filtered data set
is screened with principal component analysis (PCA), clustering, and a second elimination of data ±3.5 SD from the average
Nutrient1
Legume hay
Legume hay, immature2
Legume hay, mature2
Initial input data set
Filtered data set Cluster 1 Cluster 2
N µ ± SD3N µ ± SD N µ ± SD N µ ± SD
DM 564 87.68 ± 4.31 432 87.53 ± 2.53 252 87.34 ± 2.32 168 88.00 ± 1.92
CP 564 20.73 ± 2.70 432 20.98 ± 2.60 252 22.35 ± 1.82 168 19.15 ± 1.79
NDF 531 38.37 ± 6.27 432 37.99 ± 6.11 252 34.25 ± 3.44 169 43.03 ± 4.16
Ash 530 10.38 ± 1.69 432 10.42 ± 1.60 253 11.05 ± 1.41 169 9.58 ± 1.27
Starch 479 2.66 ± 0.59 432 2.66 ± 0.58 253 2.83 ± 0.59 169 2.41 ± 0.46
Lignin 479 6.85 ± 1.10 432 6.82 ± 1.08 253 6.21 ± 0.64 168 7.83 ± 0.73
1Nutrients used as key nutrient for PCA. Except for DM, all values expressed in DM basis.
2Name assigned after cluster evaluation.
3Average value expressed as percentage (µ) ± SD.
Journal of Dairy Science Vol. 103 No. 4, 2020
3802
screening procedure with different key nutrients each
time. As stated previously, a PCA requires that all
variables used as key nutrients contain all the records
(Lever et al., 2017). Thus, more records were retained
when the screening procedure was repeated excluding
nutrients with the smallest numbers of records.
Univariate procedures are commonly used to elimi-
nate outliers in feed composition data sets (Maroto-Mo-
lina et al., 2013). However, the ability of such methods
to identify outliers may be imprecise and inaccurate,
because they assume that variables (i.e., nutrients) in
a feed are independent (Yu, 2005; Yoder et al., 2014).
Because many nutrients within a feed are correlated,
multivariate methods are better at identifying outliers
and can be used to generate accurate and robust esti-
mates of both the mean and associated covariances. For
example, using a univariate procedure, a sample identi-
fied as legume silage containing 50% NDF and 25%
CP would not be considered an outlier if the estimate
for NDF and CP are within ±3.5 SD from the mean
of the population. However, because CP and NDF are
negatively correlated, the likelihood that this sample
truly contains 50% NDF and 25% CP is extremely
small. Such a sample would be identified as an out-
lier using multivariate but not univariate procedures.
An additional challenge when employing a univariate
procedure is that some nutrients within feedstuffs do
not fit a normal distribution, and those data should
be evaluated using methods for non-Gaussian distribu-
tions (Yoder et al., 2014).
The described automated procedure greatly reduced
the time needed to screen data sets and allowed simul-
taneous analysis of all feeds within or across labora-
tories. In addition, the described procedure helped in
the development of one of the largest feed composition
data sets and tables currently available. However, pre-
processing, cluster evaluation, and data summary steps
were performed manually. Hence, overall it was still a
time-consuming process. Automation of the whole data-
management process, including pre-processing, statisti-
cal screening, cluster evaluation, and data summary is
also required for a full-scale implementation needed to
develop real-time databases and online feed composi-
tion tables. In this regard, the developed feed composi-
tion data set described in the current manuscript could
be the first step toward developing machine-learning
algorithms, aiming to automate the classification of
feeds records in large data sets.
ACKNOWLEDGMENTS
This research was a component of the National
Animal Nutrition Program (NRSP-9), which supports
the use and sharing of feed composition and animal
performance data, resources for nutritional modeling,
model code, and knowledge on feed analysis methods.
The authors thank the NANP-NRSP-9 program, spe-
cifically A. N. Hristov (The Pennsylvania State Uni-
versity, University Park, PA) and M. Nelson (Wash-
ington State University, Pullman, WA), who reviewed
final data. The project was also supported by state
and federal funds appropriated to the University of
Nebraska-Lincoln and The Ohio State University, and
by funding from the USDA-Agricultural Research Ser-
vice (Washington, DC). We also thank the feed analy-
sis laboratories that contributed data and made this
study possible: Dairyland Laboratories, Arcadia, WI;
Cumberland Valley Analytical Services, Waynesboro,
PA; Rock River Laboratory Inc., Watertown, WI; and
Dairy One, Ithaca, NY.
REFERENCES
AAFCO. 2017. Official Publication: Association of American Feed
Control Officials. Association of American Feed Control Officials
Inc., Champaign, IL.
AOAC. 2006. Official Methods of Analysis. 18th ed. Association of
Official Analytical Chemists, Arlington, VA.
Cherney, D. J. R., J. H. Cherney, and R. F. Lucey. 1993. In vitro
digestion kinetics and quality of perennial grasses as influenced
by forage maturity. J. Dairy Sci. 76:790–797. https: / / doi .org/ 10
.3168/ jds .S0022 -0302(93)77402 -0.
Federatie Nederlandse Diervoederketen. 2016. CVB Feed Table 2016:
Chemical Composition and Nutritional Values of Feedstuffs. Fed-
eratie Nederlandse Diervoederketen, Wageningen, the Netherlands.
Goering, H. K., and P. J. Van Soest. 1970. Forage Fiber Analysis: Ap-
paratus, Reagents, Procedures, and Some Applications. Agricul-
ture Handbook No. 379. United States Department of Agriculture,
Washington, DC.
Hall, M. B. 2014. Selection of an empirical detection method for deter-
mination of water-soluble carbohydrates in feedstuffs for applica-
tion in ruminant nutrition. Anim. Feed Sci. Technol. 198:28–37.
https: / / doi .org/ 10 .1016/ j .anifeedsci .2014 .08 .009.
Harris, L. E., H. Haendler, R. Riviere, and L. U. Rechoussat. 1980.
International Feed Databank System: An Introduction Into the
System with Instructions for Describing Feed and Recording Data.
International Network of Feed Information Centers (INFIC), Lo-
gan, UT.
Harvey, D. 2000. Modern Analytical Chemistry. McGraw-Hill, Boston,
MA.
Hawkins, D. 1980. Identification of Outliers. Chapman and Hall, Lon-
don, UK.
Lever, J., M. Krzywinski, and N. Altman. 2017. Principal compo-
nent analysis. Nat. Methods 14:641–642. https: / / doi .org/ 10 .1038/
nmeth .4346.
Licitra, G., T. M. Hernandez, and P. J. Van Soest. 1996. Standardiza-
tion of procedures for nitrogen fractionation of ruminant feeds.
Anim. Feed Sci. Technol. 57:347–358.
Lokhorst, C., R. M. de Mol, and C. Kamphuis. 2019. Invited review:
Big data in precision dairy farming. Animal 13:1519–1528. https: /
/ doi .org/ 10 .1017/ S1751731118003439.
Ministry of Agriculture, Fisheries and Food. 1992. Feed Composition:
UK Tables of Feed Composition and Nutritive Value for Rumi-
nants. Chalcombe Publications, Canterbury, UK.
Maroto-Molina, F., A. Gómez-Cabrera, J. E. Guerrero-Ginel, A. Gar-
rido-Varo, D. Sauvant, G. Tran, V. Heuzé, and D. C. Pérez-Marín.
Tran et al.: DAIRY INDUSTRY TODAY
3803
Journal of Dairy Science Vol. 103 No. 4, 2020
2013. Data pre-processing to improve the mining of large feed da-
tabases. Animal 7:1128–1136.
Morota, G., R. V. Ventura, F. F. Silva, M. Koyama, and S. C. Fer-
nando. 2018. Big data analytics and precision animal agriculture
symposium: Machine learning and data mining advance predictive
big data analysis in precision animal agriculture1. J. Anim. Sci.
96:1540–1550. https: / / doi .org/ 10 .1093/ jas/ sky014.
NASEM. 2016. Nutrient Requirements of Beef Cattle. 8th rev. ed.
Natl. Acad. Press, Washington, DC.
NRC. 1956. Composition of Concentrate By-Products. Natl. Acad.
Press, Washington, DC.
NRC. 1958. Composition of Cereal Grains and Forages. Natl. Acad.
Press, Washington, DC.
NRC. 1971. Atlas of Nutritional Data on United States and Canadian
Feeds. Natl. Acad. Press, Washington, DC.
NRC. 1982. United States-Canadian Tables of Feeds Composition. 3rd
ed. Natl. Acad. Press, Washington, DC.
NRC. 1994. Nutrient Requirements of Poultry. 9 rev. ed. Natl. Acad.
Press, Washington, DC.
NRC. 2001. Nutrient Requirements of Dairy Cattle. 7th rev. ed. Natl.
Acad. Press, Washington, DC.
NRC. 2012. Nutrient Requirements of Swine. 11 rev. ed. Natl. Acad.
Press, Washington, DC.
Pond, W. G., D. C. Church, and K. R. Pond. 1995. Basic Animal Nu-
trition and Feeding. 4th ed. Wiley, New York, NY.
Sauvant, D., J. M. Perez, and G. Tran. 2004. Tables of Composition
and Nutritional Value of Feed Materials. Wageningen Academic
Publishers, Wageningen, the Netherlands.
Schingoethe, D. J., K. F. Kalscheur, A. R. Hippen, and A. D. Gar-
cia. 2009. Invited review: The use of distillers products in dairy
cattle diets. J. Dairy Sci. 92:5802–5813. https: / / doi .org/ 10 .3168/
jds .2009 -2549.
St. Pierre, N. R., and W. R. Harvey. 1986a. Incorporation of uncer-
tainty in composition of feeds into least-cost ration models. 1. Sin-
gle-chance constrained programming. J. Dairy Sci. 69:3051–3062.
https: / / doi .org/ 10 .3168/ jds .S0022 -0302(86)80768 -8.
St. Pierre, N. R., and W. R. Harvey. 1986b. Incorporation of un-
certainty in composition of feeds into least-cost ration models.
2. Joint-chance constrained programming. J. Dairy Sci. 69:3063–
3073. https: / / doi .org/ 10 .3168/ jds .S0022 -0302(86)80769 -X.
St. Pierre, N. R., and W. R. Harvey. 1986c. Uncertainty in composi-
tion of ingredients and optimal rate of success for a maximum
profit total mixed ration. J. Dairy Sci. 69:3074–3086. https: / / doi
.org/ 10 .3168/ jds .S0022 -0302(86)80770 -6.
St-Pierre, N. R., and W. P. Weiss. 2015. Partitioning variation in
nutrient composition data of common feeds and mixed diets on
commercial dairy farms. J. Dairy Sci. 98:5004–5015. https: / / doi
.org/ 10 .3168/ jds .2015 -9431.
Sukhija, P. S., and D. L. Palmquist. 1988. Rapid method for deter-
mination of total fatty acid content and composition of feedstuffs
and feces. J. Agric. Food Chem. 36:1202–1206. https: / / doi .org/ 10
.1021/ jf00084a019.
Tedeschi, L. O., W. Chalupa, E. Janczewski, D. G. Fox, C. Sniffen,
R. Munson, P. J. Kononoff, and R. Boston. 2008. Evaluation and
application of the CPM Dairy Nutrition model. J. Agric. Sci.
146:171–182. https: / / doi .org/ 10 .1017/ S0021859607007587.
Thornton, P. K. 2010. Livestock production: Recent trends, future
prospects. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365:2853–2867.
https: / / doi .org/ 10 .1098/ rstb .2010 .0134.
Van Soest, P. J. 1963. Use of detergents in the analysis of fibrous feeds.
II. A rapid method for the determination of fiber and lignin. J.
AOAC Int. 46:829–835.
Yoder, P. S., N. R. St-Pierre, and W. P. Weiss. 2014. A statistical fil-
tering procedure to improve the accuracy of estimating population
parameters in feed composition databases. J. Dairy Sci. 97:5645–
5656. https: / / doi .org/ 10 .3168/ jds .2013 -7724.
[REMOVED IF= FIELD]Yu, P. 2005. Applications of hierarchical
cluster analysis (CLA) and principal component analysis (PCA)
in feed structure and feed molecular chemistry research, using
synchrotron-based Fourier transform infrared (FTIR) microspec-
troscopy. J. Agric. Food Chem. 53:7115–7127. https: / / doi .org/ 10
.1021/ jf050959b .
ORCIDS
H. Tran https: / / orcid .org/ 0000 -0002 -4342 -6752
M. B. Hall https: / / orcid .org/ 0000 -0002 -5460 -3208
W. P. Weiss https: / / orcid .org/ 0000 -0003 -3506 -4672
P. J. Kononoff https: / / orcid .org/ 0000 -0001 -6069 -2174
Tran et al.: DAIRY INDUSTRY TODAY
... feed name) is unreliable or inexistent. The Yoder et al. (2014) method was modified to increase automation and tested on large data commercial data sets by Tran et al., (2020). Prior to applying the Yoder et al. (2014) method, the raw data must be screened to remove clearly erroneous observations such as duplicate entries and feeds where measured nutrients summed to greater than 100%. ...
... This Downloaded from https://academic.oup.com/jas/advance-article/doi/10.1093/jas/skaa240/5885170 by guest on 08 August 2020 A c c e p t e d M a n u s c r i p t step has not been automated and requires input from someone with knowledge of feeds. The standardization of terms required about 60% of the total time needed to produce final feed composition tables from raw lab data (Tran et al., 2020). After the initial screening, data were subjected to a univariate screening, followed by principal component analysis (PCA) and finally cluster analysis as discussed by Yoder et al. (2014). ...
... Wet brewers' grains provide another example of the value of unsupervised machine leaning procedure (Table 1). In the dataset of Tran et al. (2020), the univariate procedure eliminated 16 observations and another 11 and 59 observations were eliminated by PCA and cluster analysis, respectively (a total of 8.5% of the initial observations). This elimination had essentially no effects on mean concentrations of DM (full data set vs screened data set; 25.0 vs 24.4%), CP (29.5 vs. 29.4%), ...
Full-text available
Article
Traditional feed composition tables have been a useful tool in the field of animal nutrition throughout the last 70 years. The objective of this paper is to discuss challenges and opportunities associated with creating large feed ingredient composition tables. This manuscript will focus on three topics discussed during the National Animal Nutrition Program (NANP) symposium in ruminant and non-ruminant nutrition carried out at the ASAS annual meeting in Austin, TX on 11th July 2019, namely: a) Using large datasets in feed composition tables and the importance of standard deviation in nutrient composition, as well as different methods to obtain accurate standard deviation values; b) Discussing the importance of fiber in animal nutrition and the evaluation of different methods to estimate fiber content of feeds, and c) Description of novel feed sources such as insects, algae, and single cell protein, and challenges associated to the inclusion of such feeds in feed composition tables. Development of feed composition tables presents important challenges. For instance, large datasets provided by different sources tend to have errors and misclassifications. In addition, data are in different file formats, data structure and feed classifications. Managing such large databases requires computers with high processing power and software that are also able to run automated procedures to consolidate files, to screen out outlying observations, and detect misclassified records. Complex algorithms are necessary to identify misclassified samples and outliers aimed to obtain accurate nutrient composition values. Fiber is an important nutrient for both monogastrics and ruminants. Currently, there are several methods available to estimate fiber content of feeds. However, many of them do not estimate fiber accurately. Total dietary fiber (TDF) should be used as the standard method to estimate fiber concentrations in feeds. Finally, novel feed sources are a viable option to replace traditional feed sources from a nutritional perspective, but the large variation in nutrient composition among batches makes it difficult to provide reliable nutrient information to be tabulated. Further communication and cooperation among different stakeholders in the animal industry is required to produce reliable data on nutrient composition to be published in feed composition tables.
... Data Collection. A feed database was developed by Tran et al. (2020) composed of data from 4 US commercial feed laboratories (Dairyland Laboratories, Arcadia, WI; Cumberland Valley Analytical Services, Waynesboro, PA; Rock River Laboratory Inc., Watertown, WI; and Dairy One, Ithaca, NY). The data are available from the National Animal Nutrition Program website (www .animalnutrition ...
... .org). Tran et al. (2020) removed outlier records when the nutrient value was greater than 3.5 standard deviations (SD) from the mean. The raw data were reduced to treatment means for the current work representing 203 feed ingredients (n = 1,808,633 feed samples). ...
... Moreover, near infrared spectroscopy is commonly used to estimate EE concentration of feeds because of cost-and time-saving factors. In our work, the reference methods used to estimate total EE in feeds were extraction with ethyl or petroleum ether (Tran et al., 2020). Although those methods are still widely used by animal feed analysis laboratories, these techniques have limitations that should be considered. ...
Full-text available
Article
Development of predictive models of fatty acid (FA) use by dairy cattle still faces challenges due to high variation in FA composition among feedstuffs and fat supplements. Two meta-analytical studies were carried out to develop empirical models for estimating (1) the total FA concentration of feedstuffs, and (2) the apparent total-tract digestibility of total FA (DCFATTa) in dairy cows fed different fat types. In study 1, individual feedstuff data for total crude fat (EE) and FA were taken from commercial laboratories (total of 203 feeds, 1,170,937 samples analyzed for total FA, 1,510,750 samples analyzed for total EE), and data for FA composition were collected from the Cornell Net Carbohydrate and Protein System feed library. All feedstuffs were grouped into 7 classes based on their nutritional components. To predict total FA concentration (% of dry matter) for groups of feeds, the total EE (% of dry matter) was used as an independent variable in the model, and all models were linear. For forages, data were weighted using the inverse of the standard error (SE). Regression coefficients for predicting total FA from EE (% of dry matter) were 0.73 (SE, 0.04), 0.98 (0.02), 0.80 (0.02), 0.61 (0.04), 0.92 (0.03), and 0.93 (0.03), for animal protein, plant protein, energy sources, grain crop forage, by-product feeds, and oilseeds, respectively. The intercepts for plant protein and by-product groups were different from zero and included in the models. As expected, forages had the lowest total FA concentration (slope = 0.57, SE = 0.02). In study 2, data from 30 studies (130 treatment means) that reported DCFATTa in dairy cows were used. Data for animal description, diet composition, intakes of total FA, and DCFATTa, were collected. Dietary sources of fat were grouped into 11 categories based on their fat characteristic and FA profile. A mixed model including the random effect of study was used to regress digested FA on FA intake with studies weighted according to the inverse of their variance (SE). Dietary intake of extensively saturated triglycerides resulted in markedly lower total FA digestion (DCFATTa = 44%) compared with animals consuming unsaturated FA, such as Ca-salts of palm (DCFATTa = 76%) and oilseeds (DCFATTa = 73%). Cows fed saturated fats had lower total FA digestion among groups, but it was dependent on the FA profile of each fat source. The derived models provide additional insight into FA digestion in ruminants. Predictions of total FA supply and its digestion can be used to adjust fat supplementation programs for dairy cows.
... The NRC (2001) and Tran et al. (2020) feed databases were used for estimating dietary nutrients when the dietary ingredient composition was not available from studies. The NRC (2001) data were only used when data were not available from the Tran et al. (2020) library. ...
... The NRC (2001) and Tran et al. (2020) feed databases were used for estimating dietary nutrients when the dietary ingredient composition was not available from studies. The NRC (2001) data were only used when data were not available from the Tran et al. (2020) library. The FA and AA composition for each ingredient were taken from the publications or estimated from the Cornell Net Carbohydrate and Protein System (Higgs et al., 2015) feed library. ...
Full-text available
Article
Few models have attempted to predict total milk fat because of its high variation among and within herds. The objective of this meta-analysis was to develop models to predict milk fat concentration and yield of lactating dairy cows. Data from 158 studies consisting of 658 treatments from 2,843 animals were used. Data from several feed databases were used to calculate dietary nutrients when dietary nutrient composition was not reported. Digested intake (DI, g/d) of each fatty acid (FA; C12:0, C14:0, C16:0, C16:1, C18:0, C18:1 cis, C18:1 trans C18:2, C18:3) and absorbed amounts (g/d) of each AA (Arg, His, Ile, Leu, Lys, Met, Phe, Thr, Trp, Val) were calculated and used as candidate variables in the models. A multi-model inference method was used to fit a large set of mixed models with study as the random effect, and the best models were selected based on Akaike's information criterion corrected for sample size and evaluated further. Observed milk fat concentration (MFC) ranged from 2.26 to 4.78%, and milk fat yield (MFY) ranged from 0.488 to 1.787 kg/d among studies. Dietary levels of forage, starch, and total FA (dry matter basis) averaged 50.8 ± 10.3% (mean ± standard deviation), 27.5 ± 7.0%, and 3.4 ± 1.3%, respectively. The MFC was positively correlated with dietary forage (0.294) and negatively associated with dietary starch (−0.286). The DI of C18:2 (g/d) was more negatively correlated with MFC (−0.313) than that of the other FA. The best variables for predicting MFC were days in milk, FA-free dry matter intake, forage, starch, DI of C18:2, DI of C18:3, and absorbed Met, His, and Trp. The best predictor variables for MFY were FA-free dry matter intake, days in milk, absorbed Met and Ile, and intakes of digested C16:0 and C18:3. This model had a root mean square error of 14.1% and concordance correlation coefficient of 0.81. Surprisingly, DI of C18:3 was positively related to milk fat, and this relationship was consistently observed among models. The models developed can be used as a practical tool for predicting milk fat of dairy cows, while recognizing that additional factors are likely to also affect fat yield.
... Diets were constructed for each treatment using the reported ingredient inclusion rates and ingredient nutrient composition in the publication. Missing nutrient information was filled from either the feed library of Tran et al. (2020) or the NRC (2001) feed library to create 2 separate diet input files: one based on the 2001 library and the other on the newer library. A summary of the protein fractions and in situ rates of degradation is provided in Supplemental Table S1 (http: / / hdl .handle ...
... However, evaluations of that system generally indicate low precision and, more importantly, systematic bias (Huhtanen and Hristov, 2009;Broderick et al., 2010;White et al., 2017), suggesting that the model may be an inappropriate representation of the system. Despite recent efforts to more precisely define protein fractions and the rate of protein degradation for all feeds in the feed library (Tran et al., 2020), little progress in predicting RUP based on the kinetic model was achieved in terms of accuracy or precision (Table 2). On average, the model overpredicted RUP outflow (+49 g of N/d) with significant slope bias (−0.29 g/g; Table 2 and Figure 1). ...
Article
The objectives of the present work were (1) to identify the cause of the linear bias in predictions of rumen-undegradable protein (RUP) content of feeds, and devise methods to remove the bias from prediction equations, and (2) to further explore the impact of rumen-degradable protein (RDP) on microbial N (MiN) outflow from the rumen. The kinetic model used by NRC, 2001 , which is based on protein fractionation and rates of degradation (Kd) and passage (Kp), displays considerable slope bias (−0.30 kg/kg), indicating parameter or structural problems. Regressing Kp by feed class and a static adjustment factor for the in situ–derived Kd on observed RUP flows completely resolved the slope bias problem, and the model performed significantly better than models using unadjusted Kd and marker-based Kp. The Kd adjustment was 3.82%/h, which represents approximately a 50% increase in rates of degradation over the in situ values, indicating that in situ analyses severely underestimate true rates of protein degradation. The Kp for concentrate-derived protein was 5.83%/h, which was slightly less than the marker-predicted rate of 6.69%/h. However, the derived forage protein rate was 0.49%/h, which was considerably less than the marker-based rate of 5.07%/h. Compartmental analysis of data from a single study corroborated the regression analysis, indicating that a 25% reduction in the overall passage rate and an 87% increase in the rate of degradation were required to align ruminal N pool sizes and the extent of protein degradation with the observed data. Therefore, one must conclude that both the in situ–derived degradation rates and the marker-based particle passage rates are biased relative to protein passage and cannot be used directly to predict RUP outflow from the rumen. The effects of RDP supply on microbial nitrogen (MiN) flow were apparent when intakes of individual nutrients were offered but not when DM intake and individual nutrient concentrations were offered, due to collinearity problems. Microbial N flow from the rumen was found to be linearly related to ruminally degraded starch, ruminally degraded neutral detergent fiber (NDF), RDP, and forage NDF intakes; and quadratically related to residual OM intake. More complicated models containing 2- and 3-way interactions among nutrients were also supported by the data. Independent MiN responses to RDP, ruminally degraded starch, and ruminally degraded NDF aligned with the expected responses to each of those nutrients. Nonlinear representations of MiN were found to be inferior to the linear models. Despite using unbiased predictions of RUP and MiN as drivers of AA flows, predictions of Arg, His, Ile, and Lys flow exhibited linear slope bias relative to the observed data, indicating that representations of the AA composition of the proteins may be biased or the observed data are biased. This is an improvement over the NRC, 2001 predictions, where bias adjustments were required for all of the essential AA. Despite the bias for 4 AA flows, the revised prediction system was a substantial improvement over the prior work.
... Diets were constructed for each treatment using the reported ingredient inclusion rates and ingredient nutrient composition in the publication. Missing nutrient information was filled from either the feed library of Tran et al. (2020) or the NRC (2001) feed library to create 2 separate diet input files: one based on the 2001 library and the other on the newer library. A summary of the protein fractions and in situ rates of degradation is provided in Supplemental Table S1 (http: / / hdl .handle ...
... However, evaluations of that system generally indicate low precision and, more importantly, systematic bias (Huhtanen and Hristov, 2009;Broderick et al., 2010;White et al., 2017), suggesting that the model may be an inappropriate representation of the system. Despite recent efforts to more precisely define protein fractions and the rate of protein degradation for all feeds in the feed library (Tran et al., 2020), little progress in predicting RUP based on the kinetic model was achieved in terms of accuracy or precision (Table 2). On average, the model overpredicted RUP outflow (+49 g of N/d) with significant slope bias (−0.29 g/g; Table 2 and Figure 1). ...
Article
The objectives of the present work were (1) to identify the cause of the linear bias in predictions of rumen-undegradable protein (RUP) content of feeds, and devise methods to remove the bias from prediction equations, and (2) to further explore the impact of rumen-degradable protein (RDP) on microbial N (MiN) outflow from the rumen. The kinetic model used by NRC (2001), which is based on protein fractionation and rates of degradation (Kd) and passage (Kp), displays considerable slope bias (−0.30 kg/kg), indicating parameter or structural problems. Regressing Kp by feed class and a static adjustment factor for the in situ–derived Kd on observed RUP flows completely resolved the slope bias problem, and the model performed significantly better than models using unadjusted Kd and marker-based Kp. The Kd adjustment was 3.82%/h, which represents approximately a 50% increase in rates of degradation over the in situ values, indicating that in situ analyses severely underestimate true rates of protein degradation. The Kp for concentrate-derived protein was 5.83%/h, which was slightly less than the marker-predicted rate of 6.69%/h. However, the derived forage protein rate was 0.49%/h, which was considerably less than the marker-based rate of 5.07%/h. Compartmental analysis of data from a single study corroborated the regression analysis, indicating that a 25% reduction in the overall passage rate and an 87% increase in the rate of degradation were required to align ruminal N pool sizes and the extent of protein degradation with the observed data. Therefore, one must conclude that both the in situ–derived degradation rates and the marker-based particle passage rates are biased relative to protein passage and cannot be used directly to predict RUP outflow from the rumen. The effects of RDP supply on microbial nitrogen (MiN) flow were apparent when intakes of individual nutrients were offered but not when DM intake and individual nutrient concentrations were offered, due to collinearity problems. Microbial N flow from the rumen was found to be linearly related to ruminally degraded starch, ruminally degraded neutral detergent fiber (NDF), RDP, and forage NDF intakes; and quadratically related to residual OM intake. More complicated models containing 2- and 3-way interactions among nutrients were also supported by the data. Independent MiN responses to RDP, ruminally degraded starch, and ruminally degraded NDF aligned with the expected responses to each of those nutrients. Nonlinear representations of MiN were found to be inferior to the linear models. Despite using unbiased predictions of RUP and MiN as drivers of AA flows, predictions of Arg, His, Ile, and Lys flow exhibited linear slope bias relative to the observed data, indicating that representations of the AA composition of the proteins may be biased or the observed data are biased. This is an improvement over the NRC (2001) predictions, where bias adjustments were required for all of the essential AA. Despite the bias for 4 AA flows, the revised prediction system was a substantial improvement over the prior work.
... Among the major feed ingredients utilized in animal husbandry and poultry farming are grain cultures and grain-based feeds providing nutritious and balanced livestock diets (Navarro et al., 2019;El-Deek et al., 2020;Tran et al., 2020). Despite the growing utilization of alternative feed mixtures for animals, corn, wheat, barley, sorghum, and oats remain the major grain cultures used in the industry of combined feeds worldwide (Awika et al., 2011;Diarra, 2018). ...
... Londrina, v. 42, n. 3, suplemento 1, p. 1909-1922 (P >0.05) the production of oleic fatty acid (C18: 1). Recently, Tran et al. (2020) and Daley, Armentano, Kononoff and Hanigan (2020) presented the results of scientific studies carried out in the United States of America, aiming to detail the fatty acid composition of the foods offered to dairy cows since the partitioning of fibrous and non-fibrous carbohydrates and nitrogen constituents are already well determined through the Cornell Net Carbohydrate and Protein System (CNCPS), and their effects on animals and the products generated (meat and milk) are understood. However, it is still essential to better understand the effects of lipids. ...
Full-text available
Article
Pastures are a primary source of feed for ruminants, which convert fibrous plants into nutritionally valuable foods for humans, such as meat and milk. However, it is important to understand the nutrient content of different fodder crops for ruminants and its effect on meat, milk, and milk products. We aimed to evaluate the effect of nitrogen fertilizer doses in topdressing on nutrient production in pastures of Triticale BRS Saturno. The experimental design was a randomized block with five replications. Descriptive statistics of yields per hectare were determined, and a simple linear regression was carried out at the level of 5% significance. The different nitrogen topdressing rates (0, 50, and 100 kg N ha-1) influenced the production (P < 0.05) of dry matter, total carbohydrates, neutral detergent fiber, acid detergent fiber, crude protein, soluble protein, insoluble protein in neutral and acid detergent, protein degradability, ether extract, linoleic and linolenic fatty acids, neutral detergent fiber digestibility after incubation for 24, 30, and 48 h, and the neutral detergent fiber degradation rate. The different doses of nitrogen fertilizer in topdressing in the form of urea increased nutrient production in pastures of Triticale BRS Saturno, mainly in relation to total carbohydrates and neutral and acid detergent fiber. To a lesser extent, it also significantly interfered with the production of nitrogenous constituents and fatty acids.
Article
A challenging task in analytical chemistry is an application of renewable and natural materials for isolation of hazardous substances such as antimicrobial drugs from environmental samples. The energy-efficient scalable hydrothermal procedure to fabricate the eco-friendly “switchable” sorbent based on hydroxyapatite nanoparticles with in situ modified surface using a small amount of capping agents was developed. Sorbents characterization including the surface composition investigation via quantum-chemical calculation based on the original approach was provided. The sorbents demonstrated well expressed controllable surface switching and high values of the sorption and elution efficiency for tetracycline, oxytetracycline, and chlortetracycline achieved by simple change of the medium pH. These processes were thoroughly discussed based on the results of chemical and computational experiments. A simple and universal strategy for choosing a suitable sorbent for solid phase extraction of target analytes was proposed for the first time. It was shown that the developed eco-friendly sample preparation procedure with use of biocompatible sorbents could be applied both for removal of target analytes from sample matrix (water samples) as well as for the quantitative analytes determination after elution step. It is believed that the presented research is significant for the determination of different amphoteric analytes in wide variety of samples.
Article
The objective of this work was to update and evaluate predictions of essential AA (EAA) outflows from the rumen. The model was constructed based on previously derived equations for rumen-undegradable (RUP), microbial (MiCP), and endogenous (EndCP) protein outflows from the rumen, and revised estimates of ingredient composition and EAA composition of the protein fractions. Corrections were adopted to account for incomplete recovery of EAA during 24-h acid hydrolysis. The predicted ruminal protein and EAA outflows were evaluated against a data set of observed values from the literature. Initial evaluations indicated a minor mean bias for non-ammonia, non-microbial nitrogen flow ([RUP + EndCP]/6.25) of 16 g of N per day. Root mean squared errors (RMSE) of EAA predictions ranged from 26.8 to 40.6% of observed mean values. Concordance correlation coefficients (CCC) of EAA predictions ranged from 0.34 to 0.55. Except for Leu, all ruminal EAA outflows were overpredicted by 3.0 to 32 g/d. In addition, small but significant slope biases were present for Arg [2.2% mean squared error (MSE)] and Lys (3.2% MSE). The overpredictions may suggest that the mean recovery of AA from acid hydrolysis across laboratories was less than estimates encompassed in the recovery factors. To test this hypothesis, several regression approaches were undertaken to identify potential causes of the bias. These included regressions of (1) residual errors for predicted EAA flows on each of the 3 protein-driven EA flows, (2) observed EAA flows on each protein-driven EAA flow, including an intercept, (3) observed EAA flows on the protein-driven EAA flows, excluding an intercept term, and (4) observed EAA flows on RUP and MiCP. However, these equations were deemed unsatisfactory for bias adjustment, as they generated biologically unfeasible predictions for some entities. Future work should focus on identifying the cause of the observed prediction bias.
Full-text available
Article
Insight into current scientific applications of Big Data in the precision dairy farming area may help us to understand the inflated expectations around Big Data. The objective of this invited review paper is to give that scientific background and determine whether Big Data has overcome the peak of inflated expectations. A conceptual model was created, and a literature search in Scopus resulted in 1442 scientific peer reviewed papers. After thorough screening on relevance and classification by the authors, 142 papers remained for further analysis. The area of precision dairy farming (with classes in the primary chain (dairy farm, feed, breed, health, food, retail, consumer) and levels for object of interest (animal, farm, network)), the Big Data-V area (with categories on Volume, Velocity, Variety and other V’s) and the data analytics area (with categories in analysis methods (supervised learning, unsupervised learning, semi-supervised classification, reinforcement learning) and data characteristics (time-series, streaming, sequence, graph, spatial, multimedia)) were analysed. The animal sublevel, with 83% of the papers, exceeds the farm sublevel and network sublevel. Within the animal sublevel, topics within the dairy farm level prevailed with 58% over the health level (33%). Within the Big Data category, the Volume category was most favoured with 59% of the papers, followed by 37% of papers that included the Variety category. None of the papers included the Velocity category. Supervised learning, representing 87% of the papers, exceeds unsupervised learning (12%). Within supervised learning, 64% of the papers dealt with classification issues and exceeds the regression methods (36%). Time-series were used in 61% of the papers and were mostly dealing with animal-based farm data. Multimedia data appeared in a greater number of recent papers. Based on these results, it can be concluded that Big Data is a relevant topic of research within the precision dairy farming area, but that the full potential of Big Data in this precision dairy farming area is not utilised yet. However, the present authors expect the full potential of Big Data, within the precision dairy farming area, will be reached when multiple Big Data characteristics (Volume, Variety and other V’s) and sources (animal, groups, farms and chain parts) are used simultaneously, adding value to operational and strategic decision.
Full-text available
Article
Precision animal agriculture is poised to rise to prominence in the livestock enterprise in the domains of management, production, welfare, sustainability, health surveillance, and environmental footprint. Considerable progress has been made in the use of tools to routinely monitor and collect information from animals and farms in a less laborious manner than before. These efforts have enabled the animal sciences to embark on information technology-driven discoveries to improve animal agriculture. However, the growing amount and complexity of data generated by fully automated, high-throughput data recording or phenotyping platforms, including digital images, sensor and sound data, unmanned systems, and information obtained from real-time non-invasive computer vision, pose challenges to the successful implementation of precision animal agriculture. The emerging fields of machine learning and data mining are expected to be instrumental in helping meet the daunting challenges facing global agriculture. Yet, their impact and potential in “big data” analysis have not been adequately appreciated in the animal science community, where this recognition has remained only fragmentary. To address such knowledge gaps, this article outlines a framework for machine learning and data mining, and offers a glimpse into how they can be applied to solve pressing problems in animal sciences.
Full-text available
Article
PCA helps you interpret your data, but it will not always find the important patterns.
Article
The capacity of cetyl trimethylammonium bromide to dissolve proteins in acid solution has been utilized in development of a method, called acid-detergent fiber method (ADF), which is not only a fiber determination in itself but also the major preparatory step in the determination of lignin. The entire procedure for determining fiber and lignin is considerably more rapid than presently published methods. Compositional studies show ADF to consist chiefly of lignin and polysaccharides. Correlations with the new fiber method and digestibility of 18 forages (r = —0.79) showed it to be somewhat superior to crude fiber (r = —0.73) in estimating nutritive value. The correlation of the new lignin method and digestibility was -0.90 when grass and legume species were separated.
The capacity of cetyl trimethylammonium bromide to dissolve proteins in acid solution has been utilized in development of a method, called acid-detergent fiber method (ADF), which is not only a fiber determination in itself but also the major preparatory step in the determination of lignin. The entire procedure for determining fiber and lignin is considerably more rapid than presently published methods. Compositional studies show ADF to consist chiefly of lignin and polysaccharides. Correlations with the new fiber method and digestibility of 18 forages (r = —0.79) showed it to be somewhat superior to crude fiber (r = —0.73) in estimating nutritive value. The correlation of the new lignin method and digestibility was —0.90 when grass and legume species were separated.
Article
A large project involving commercial dairy farms was undertaken to identify important sources of variation in composition data of common feeds and mixed diets. This information is needed to develop appropriate sampling schedules for feeds and should reduce the uncertainty associated with the nutrient composition of delivered diets. The first subproject quantified sources of variation in the composition of corn and haycrop silages over a 2-wk period. Silages from 11 commercial dairy farms in Ohio and Vermont were sampled daily over a 14-d period. Most silages were sampled in duplicate each day, and all samples were assayed in duplicate. Total variance was partitioned into analytical, sampling, farm, and true day-to-day components. Farm was the largest source of variation, but within-farm variance was our primary interest. Sampling variance comprised 30 to 81% of within-farm variance depending on nutrient and type of silage. For dry matter, true day-to-day variation was the greatest source of variance, but for most other nutrients, sampling was the largest source of within-farm variation. The second subproject consisted of sampling feeds and total mixed rations (TMR) from 47 commercial dairy farms across the United States. Feeds and TMR were sampled monthly. Because samples were not assayed in duplicate, source of variation included farm, month, and residual (sampling plus analytical). For corn and alfalfa silages, month-to-month variation over a 12-mo period comprised about twice as much of the total within-farm variation as did day-to-day variation over a 14-d period in the first subproject. Although month-to-month variation was greater than sampling variation, sampling still accounted for 9 to 37% of the total within-farm variance for those 2 feeds. For TMR, sampling plus analytical variance accounted for approximately 40 to 70% of the total within-farm variance (depending on the nutrient). Variance components were estimated for several nutrients and for several common feeds. The contributions to total variance differed depending on feed and nutrient, but the information provided will help in determining whether on-farm samples should be taken and if so, how often. A major implication of this project is that sampling is a substantial source of variation in silages, concentrates, and TMR, and data from a single sample are likely not highly reliable. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Article
Water-soluble carbohydrates (WSC) are commercially measured in feedstuffs for use in diet formulation for ruminants. However, we lack information as to which empirical detection assay most correctly measures WSC. The objective of this study was to determine which of two commonly used empirical assays was most appropriate for detection of WSC based on equivalency to results from high performance ion chromatography with pulsed amperometric detection plus starch analysis (HPIC) of the water extract. Empirical analyses used were a reducing sugar assay (RSA) that uses p-hydroxybenzoic acid hydrazide-based reagent with 50:50 glucose:fructose standards, and the phenol-sulfuric acid assay (PSA) with sucrose standards. Twenty samples including cool season grasses (CSG), legume forages, non-forage feedstuffs, silages, or warm season grasses were used. Air dry samples (0.2 g) were extracted in 35 mL of deionized water for 1 h at 40 °C with continuous shaking. Water extracts for HPIC and RSA analyses were hydrolyzed with 0.037 M H2SO4 at 80 °C for 70 min. Theoretically, RSA should give essentially the same results as HPIC, excepting that RSA also detects reducing ends of unhydrolyzed molecules. PSA detects all solubilized or suspended carbohydrates. On average, RSA and PSA values were greater than those found for HPIC by 28.2 g WSC/kg dry matter (DM). The two classes of feeds that showed differences between PSA and RSA were CSG and silages. For CSG, for RSA and PSA were respectively 54.1 and 20.6 g WSC/kg DM greater than HPIC; for silages differences were smaller at 8.8 and 15.9 g WSC/kg DM. CSG contain fructans, for which RSA gives higher values than does PSA. However, the elevated RSA values for CSG were in excess of differences predicted based on inflated RSA recovery values for fructose measurement (106.5% of actual). Elevated RSA values obtained for CSG suggest that interference is affecting these grasses to a greater degree than other samples. Distillers grains showed an elevated value with PSA (69.1 g WSC/kg DM greater than HPIC); this is partially explained by the inflated recovery values for glucose (128.2% of actual) noted for PSA. Neither PSA nor RSA perfectly reflected HPIC values, however PSA gave more similar values. Gross differences between RSA and HPIC for CSG are an issue, particularly without clear, resolvable basis for the discrepancy. Accordingly, PSA is preferred over RSA for detection of WSC. Selection of standards to more closely reflect WSC composition could further improve accuracy.
Article
Accurate estimates of mean nutrient composition of feeds, nutrient variance (i.e., standard deviation), and covariance (i.e., correlation) are needed to develop a more quantitative approach of formulating diets to reduce risk and optimize safety factors. Commercial feed-testing laboratories have large databases of composition values for many feeds, but because of potentially misidentified feeds or poorly defined feed names, these databases are possibly contaminated by incorrect results and could generate inaccurate statistics. The objectives of this research were to (1) design a procedure (also known as a mathematical filter) that generates accurate estimates of the first 2 moments [i.e., the mean and (co)variance] of the nutrient distributions for the largest subpopulation within a feed in the presence of outliers and multiple subpopulations, and (2) use the procedure to generate feed composition tables with accurate means, variances, and correlations. Feed composition data (>1,300,000 samples) were collected from 2 major US commercial laboratories. A combination of a univariate step and 2 multivariate steps (principal components analysis and cluster analysis) were used to filter the data. On average, 13.5% of the total samples of a particular feed population were removed, of which the multivariate steps removed the majority (66% of removed samples). For some feeds, inaccurate identification (e.g., corn gluten feed samples included in the corn gluten meal population) was a primary reason for outliers, whereas for other feeds, subpopulations of a broader population were identified (e.g., immature alfalfa silage within a broad population of alfalfa silage). Application of the procedure did not usually affect the mean concentration of nutrients but greatly reduced the standard deviation and often changed the correlation estimates among nutrients. More accurate estimates of the variation of feeds and how they tend to vary will improve the economic evaluation of feeds and risk assessment of diets, and provide the ability to implement stochastic programming.