Millions of feed composition records generated annually by testing laboratories are valuable assets that can be used to benefit the animal nutrition community. However, it is challenging to manage, handle, and process feed composition data that originate from multiple sources, lack standardized feed names, and contain outliers. Efficient methods that consolidate and screen such data are needed to develop feed composition databases with accurate means and standard deviations (SD). Considering the interest of the animal science community in data management and the importance of feed composition tables for the animal industry, the objective was to develop a set of procedures to construct accurate feed composition tables from large data sets. A published statistical procedure, designed to screen feed composition data, was employed, modified, and programmed to operate using Python and SAS. The 2.76 million data received from 4 commercial feed testing laboratories were used to develop procedures and to construct tables summarizing feed composition. Briefly, feed names and nutrients across laboratories were standardized, and erroneous and duplicated records were removed. Histogram, univariate, and principal component analyses were used to identify and remove outliers having key nutrients outside of the mean ± 3.5 SD. Clustering procedures identified subgroups of feeds within a large data set. Aside from the clustering step that was programmed in Python to automatically execute in SAS, all steps were programmed and automatically conducted using Python followed by a manual evaluation of the resulting mean Pearson correlation matrices of clusters. The input data set contained 42, 94, 162, and 270 feeds from 4 laboratories and comprised 25 to 30 nutrients. The final database included 174 feeds and 1.48 million records. The developed procedures effectively classified by-products (e.g., distillers grains and solubles as low or high fat), forages (e.g., legume or grass-legume mixture by maturity), and oilseeds versus meal (e.g., soybeans as whole raw seeds vs. soybean meal expellers or solvent extracted) into distinct sub-populations. Results from these analyses suggest that the procedure can provide a robust tool to construct and update large feed data sets. This approach can also be used by commercial laboratories, feed manufacturers, animal producers, and other professionals to process feed composition data sets and update feed libraries.
Traditional feed composition tables have been a useful tool in the field of animal nutrition throughout the last 70 years. The objective of this paper is to discuss challenges and opportunities associated with creating large feed ingredient composition tables. This manuscript will focus on three topics discussed during the National Animal Nutrition Program (NANP) symposium in ruminant and non-ruminant nutrition carried out at the ASAS annual meeting in Austin, TX on 11th July 2019, namely: a) Using large datasets in feed composition tables and the importance of standard deviation in nutrient composition, as well as different methods to obtain accurate standard deviation values; b) Discussing the importance of fiber in animal nutrition and the evaluation of different methods to estimate fiber content of feeds, and c) Description of novel feed sources such as insects, algae, and single cell protein, and challenges associated to the inclusion of such feeds in feed composition tables. Development of feed composition tables presents important challenges. For instance, large datasets provided by different sources tend to have errors and misclassifications. In addition, data are in different file formats, data structure and feed classifications. Managing such large databases requires computers with high processing power and software that are also able to run automated procedures to consolidate files, to screen out outlying observations, and detect misclassified records. Complex algorithms are necessary to identify misclassified samples and outliers aimed to obtain accurate nutrient composition values. Fiber is an important nutrient for both monogastrics and ruminants. Currently, there are several methods available to estimate fiber content of feeds. However, many of them do not estimate fiber accurately. Total dietary fiber (TDF) should be used as the standard method to estimate fiber concentrations in feeds. Finally, novel feed sources are a viable option to replace traditional feed sources from a nutritional perspective, but the large variation in nutrient composition among batches makes it difficult to provide reliable nutrient information to be tabulated. Further communication and cooperation among different stakeholders in the animal industry is required to produce reliable data on nutrient composition to be published in feed composition tables.