Random forest model. Example of training and classification processes using random forest. A) Each decision tree in the ensemble is built upon a random bootstrap sample of the original data, which contains positive (green labels) and negative (red labels) examples. B) Class prediction for new instances using a random forest model is based on a majority voting procedure among all individual trees. The procedure carried out for each tree is as follows: for each new data point (i.e., X), the algorithm starts at the root node of a decision tree and traverse down the tree (highlighted branches) testing the variables values in each of the visited split nodes (pale pink nodes), according to each it selects the next branch to follow. This process is repeated until a leaf node is reached, which assigns a class to this instance: green nodes predict for the positive class, red nodes predict for the negative class. At the end of the process, each tree casts a vote for the preferred class label, and the mode of the outputs is chosen as the final prediction.

Random forest model. Example of training and classification processes using random forest. A) Each decision tree in the ensemble is built upon a random bootstrap sample of the original data, which contains positive (green labels) and negative (red labels) examples. B) Class prediction for new instances using a random forest model is based on a majority voting procedure among all individual trees. The procedure carried out for each tree is as follows: for each new data point (i.e., X), the algorithm starts at the root node of a decision tree and traverse down the tree (highlighted branches) testing the variables values in each of the visited split nodes (pale pink nodes), according to each it selects the next branch to follow. This process is repeated until a leaf node is reached, which assigns a class to this instance: green nodes predict for the positive class, red nodes predict for the negative class. At the end of the process, each tree casts a vote for the preferred class label, and the mode of the outputs is chosen as the final prediction.

Source publication
Article
Full-text available
Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and perform...

Similar publications

Article
Full-text available
Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and perform...

Citations

... Each tree analyzes and votes on the results based on the input data and summarizes the results. The result with the most votes is the final output result (Machado et al., 2015). In this paper, we used the same projection coordinates for SOS and influencing factors and resampled them to the same resolution. ...
... Each tree analyzes and votes on the results based on the input data and summarizes the results. The result with the most votes is the final output result (Machado et al., 2015). In this paper, we used the same projection coordinates for SOS and influencing factors and resampled them to the same resolution. ...
Article
Global climate change has led to significant changes in land surface phenology. At present, research on the factors influencing the start of the growing season (SOS) mainly focuses on single factor effects, such as temperature and precipitation, ignoring the combined action of multiple factors. The impact of multiple factors on the spatial and temporal patterns of the SOS in the Northern Hemisphere is not clear, and it is necessary to combine multiple factors to quantify the degrees of influence of different factors on the SOS. Based on the GIMMS3g NDVI dataset, CRU climate data and other factor data, we used geographic detector model, random forest regression model, multiple linear regression, partial correlation analysis and Sen + Mann-Kendall trend analysis to explore the variation of the SOS in the Northern Hemisphere to reveal the main driving factors and impact threshold of 17 influencing factors on the SOS. The results showed that (1) during the past 34 years (1982-2015), the SOS in Europe and Asia mainly showed an advancing trend, whereas the SOS in North America mainly showed a delaying trend. (2) The SOS was mainly controlled by frost frequency, temperature and humidity. Increasing frost frequency inhibited the advancement of the SOS, and increasing temperature and humidity promoted the advancement of the SOS. (3) There were thresholds for the influences of the driving factors on the SOS. Outside the threshold ranges, the response mechanism of the SOS to driving factors changed. The results are important for understanding the response of the SOS to global climate change.
... Moreover, ML has a better capacity than classical statistics in searching large databases with different possibilities and determining one hypothesis that best fits the observed data [165]. Another advantage is that ML has techniques less sensitive to spatial autocorrelation and multicollinearity and is also nondependent on classical statistics assumptions (such as homoscedasticity), turning to be more adequate for processing high-dimensional, imbalanced, and nonlinear data [171,172]. A worth mentioning subset technique of ML is deep learning, which is based on artificial neural networks and can be considered an evolution of traditional ML, improving automation, feature selection, and accuracy. ...
Article
Full-text available
Precision livestock farming (PLF) research is rapidly increasing and has improved farmers’ quality of life, animal welfare, and production efficiency. PLF research in dairy calves is still relatively recent but has grown in the last few years. Automatic milk feeding systems (AMFS) and 3D accelerometers have been the most extensively used technologies in dairy calves. However, other technologies have been emerging in dairy calves’ research, such as infrared thermography (IRT), 3D cameras, ruminal bolus, and sound analysis systems, which have not been properly validated and reviewed in the scientific literature. Thus, with this review, we aimed to analyse the state-of-the-art of technological applications in calves, focusing on dairy calves. Most of the research is focused on technology to detect and predict calves’ health problems and monitor pain indicators. Feeding and lying behaviours have sometimes been associated with health and welfare levels. However, a consensus opinion is still unclear since other factors, such as milk allowance, can affect these behaviours differently. Research that employed a multi-technology approach showed better results than research focusing on only a single technique. Integrating and automating different technologies with machine learning algorithms can offer more scientific knowledge and potentially help the farmers improve calves’ health, performance, and welfare, if commercial applications are available, which, from the authors’ knowledge, are not at the moment.
... The ventral tail base ST was analyzed using supervised machine learning algorithms that use a random forest to detect calves with fever. A random forest approach is one of the most precise prediction methods among machine learning approaches, which has advantages such as the ability to determine variable importance and ability to model complex confounding or interactions among independent variables [16,17]. In this study, accuracy of 98.8% and sensitivity of 88.1% indicate that the model can detect fever periods accurately. ...
Article
Full-text available
The objective in the present study was to assess the ventral tail base surface temperature (ST) for the early detection of Japanese Black calves with fever. This study collected data from a backgrounding operation in Miyazaki, Japan, that included 153 calves aged 3–4 months. A wearable wireless ST sensor was attached to the surface of the ventral tail base of each calf at its introduction to the farm. The ventral tail base ST was measured every 10 min for one month. The present study conducted an experiment to detect calves with fever using the estimated residual ST (rST), calculated as the estimated rST minus the mean estimated rST for the same time on the previous 3 days, which was obtained using machine learning algorithms. Fever was defined as an increase of ≥1.0 °C for the estimated rST of a calf for 4 consecutive hours. The machine learning algorithm that applied was a random forest, and 15 features were included. The variable importance scores that represented the most important predictors for the detection of calves with fever were the minimum and maximum values during the last 3 h and the difference between the current value and 24- and 48-h minimum. For this prediction model, accuracy, precision, and sensitivity were 98.8%, 72.1%, and 88.1%, respectively. The present study indicated that the early detection of calves with fever can be predicted by monitoring the ventral tail base ST using a wearable wireless sensor.
... Our results show that using machine learning algorithms, especially RF, is a promising methodology for analzying cross-sectional studies, showing robust predictive power and the ability to identify predictors of major importance. So far, this methodology has previously been used to evaluate factors with the greatest impact on high HIV viral load, COVID mortality, or presence of Bovine Viral Diarrhoea Virus [30][31][32], among others, but this is the first study of its application in CF and TA. ...
Article
Full-text available
Cystic fibrosis (CF) is a genetic and multisystemic disease that requires a high therapeutic demand for its control. The aim of this study was to assess therapeutic adherence (TA) to different treatments to study possible clinical consequences and clinical factors influencing adherence. This is an ambispective observational study of 57 patients aged over 18 years with a diagnosis of CF. The assessment of TA was calculated using the Medication Possession Ratio (MPR) index. These data were related to exacerbations and the rate of decline in FEV1 percentage. Compliance was good for all CFTR modulators, azithromycin, aztreonam, and tobramycin in solution for inhalation. The patients with the best compliance were older; they had exacerbations and the greatest deterioration in lung function during this period. The three variables with the highest importance for the compliance of the generated Random Forest (RF) models were age, FEV1%, and use of Ivacaftor/Tezacaftor. This is one of the few studies to assess adherence to CFTR modulators and symptomatic treatment longitudinally. CF patient therapy is expensive, and the assessment of variables with the highest importance for a high MPR, helped by new Machine learning tools, can contribute to defining new efficient TA strategies with higher benefits.
... Classification and regression models were assessed using different indicators. A receiver operating characteristic (ROC) curve and a confusion matrix (CM) were applied to evaluate classification models, whereas regression models were assessed using the root mean squared error (RMSE), mean absolute error (MAE), and the R 2 value of the predicted and actual values [18,108]. ...
Article
Full-text available
Artificial Intelligence (AI) is generating new horizons in one of the biggest challenges in the world’s society—poverty. Our goal is to investigate utilities of AI in poverty prediction via finding answers to the following research questions: (1) How many papers on utilities of AI in poverty prediction were published up until March, 2022? (2) Which approach to poverty was applied when AI was used for poverty prediction? (3) Which AI methods were applied for predicting poverty? (4) What data were used for poverty prediction via AI? (5) What are the advantages and disadvantages of the created AI models for poverty prediction? In order to answer these questions, we selected twenty-two papers using appropriate keywords and the exclusion criteria and analyzed their content. The selection process identified that, since 2016, publications on AI applications in poverty prediction began. Results of our research illustrate that, during this relatively short period, the application of AI in predicting poverty experienced a significant progress. Overall, fifty-seven AI methods were applied during the analyzed span, among which the most popular one was random forest. It was revealed that with the adoption of AI tools, the process of poverty prediction has become, from one side, quicker and more accurate and, from another side, more advanced due to the creation and possibility of using different datasets. The originality of this work is that this is the first sophisticated survey of AI applications in poverty prediction.
... In the case of bovines, the infectious diseases that were considered for the analysis (assuming that these are diseases present in many countries [42] and/or that generate social problems [105], economic losses [98], health problems in humans [48], low quality in milk and meat [13,43,120], and restrictions in livestock movement [99]) are tuberculosis [46,65,89], salmonella [69,126], brucellosis [51,55,101], bovine viral diarrhea (BVD) [13,14,32], foot and mouth disease [41,50,91], and bovine mastitis [47,90]. Of the works, 50% are aimed at personnel who are experts in bovine infectious diseases. ...
... In the case of bovines, the infectious diseases that were considered for the analysis (assuming that these are diseases present in many countries [42] and/or that generate social problems [105], economic losses [98], health problems in humans [48], low quality in milk and meat [13,43,120], and restrictions in livestock movement [99]) are tuberculosis [46,65,89], salmonella [69,126], brucellosis [51,55,101], bovine viral diarrhea (BVD) [13,14,32], foot and mouth disease [41,50,91], and bovine mastitis [47,90]. Of the works, 50% are aimed at personnel who are experts in bovine infectious diseases. ...
... Only four works (8%) develop software to help achieve the objective, and eighteen works (33.9%) use computer tools to analyze the information. Table 4 shows the relation of the works according to the technique used, such as Bayesian networks [46][47][48]63,73,78], Markov chains [22,41,45,49,67,68,74], logistic regression [14,19,[50][51][52]75], differential equations [19,25,31,33,43,[54][55][56]59,60,62,65,69,76,77,79,115], contact networks [19,62,64,70], and machine learning [13,92,93]. The most-used technique is differential equations, while machine learning is presented in five jobs. ...
Article
Full-text available
There are different bovine infectious diseases that show economic losses and social problems in various sectors of the economy. Most of the studies are focused on some diseases (for example, tuberculosis, salmonellosis, and brucellosis), but there are few studies on other diseases which are not officially controlled but also have an impact on the economy. This work is a systematic literature review on models (as a theoretical scheme, generally in mathematical form) used in the epidemiological analysis of bovine infectious diseases in the dairy farming sector. In this systematic literature review, criteria were defined for cattle, models, and infectious diseases to select articles on Scopus, IEEE, Xplorer, and ACM databases. The relations between the found models (model type, function and the proposed objective in each work) and the bovine infectious diseases, and the different techniques used and the works over infectious disease in humans, are presented. The outcomes obtained in this systematic literature review provide the state-of-the-art inputs for research on models for the epidemiological analysis of infectious bovine diseases. As a consequence of these outcomes, this work also presents an approach of EiBeLec, which is an adaptive and predictive system for the bovine ecosystem, combining a prediction model that uses machine-learning techniques and an adaptive model that adapts the information presented to end users.
... Statistical learning in cattle medicine has been utilised at a herd level to predict bovine viral diarrhoea virus exposure (Machado, Mendoza and Corbellini, 2015), the distribution of exposure of herds to liver fluke (Ducheyne et al., 2015), and in the replication of complex specialist clinical veterinary decisions in an automated method of determining mastitis origin (Hyde et al., 2020). It has also been applied at an individual level in the prediction of fertility outcomes (Fenlon et al., 2016), high somatic cell counts (Ebrahimie et al., 2018), and the prediction of calving (Fenlon et al., 2017). ...
Article
The effective management of preweaned calves is one of the most important areas of dairy farm management, and can have substantial impacts in terms of health, welfare, and productivity. It is critical that veterinary advisors are able to implement proactive changes to housing and management likely to not only result in the largest improvements, but also be applicable to the majority of farms. The majority of statistical techniques are intended for a relatively low dimensional setting, and conventional statistical approaches are often poorly suited to problems where the number of variables exceed the number of observations. The number of potential variables available in today’s data rich environment is increasing, and the robust identification of causal variables is becoming increasingly important. Statistical learning techniques represent important tools in identifying factors associated with calf health and performance on dairy farms. In Chapter 2, over 21 million cattle deaths were analysed utilising national birth and death registrations from the national British Cattle Movement Service, to quantify the temporal incidence rate, distributional features, and factors affecting variation in mortality rates in calves in GB since 2011. Alongside providing a first benchmark for calf mortality rates in GB, factors associated with mortality rates were further explored utilising multivariate adaptive regression spline models and suggest that environmental conditions such as mean monthly environmental temperature and month of birth play a significant role in calf mortality rates at a national level. To identify the most important factors for optimal calf health and performance, the farm management practices were collected from 60 farms, resulting in a large number of potential housing and management variables affecting calf performance. In Chapter 3, an elastic net was used in combination with stability selection techniques, and these were utilised to identify which factors were most likely to improve calf performance on the majority of farms, and included management areas such as stocking demographics, milk/colostrum feeding, environmental hygiene and environmental temperature. Colostrum samples were also collected from enrolled farms, and in Chapter 4, a first benchmark of bacterial levels within colostrum on GB dairy farms was provided. Bacterial levels in were significantly higher when colostrum was collected using equipment than taken directly from the cow’s teat, suggesting interventions to reduce bacterial contamination should focus on the hygiene of collection and feeding equipment. An automated backwards stepwise mixed effects regression model was utilised in conjunction with stability selection to identify a small number of variables likely to have the largest effect on colostrum hygiene on the largest number of farms and suggest that the cleaning of colostrum collection and feeding equipment after every use should be performed with hot water as opposed to cold water, and hypochlorite or peracetic acid as opposed to water or parlour wash. It is important that associations identified during observational studies are not interpreted as causal, and the most important variables identified in Chapter 2, 3 and 4 were tested as a calf health plan intervention in a randomised controlled trial to elucidate causality, as described in Chapter 5. Health and performance outcomes were analysed for 60 dairy farms randomly allocated to receive the health plan as an intervention. Growth rates were higher for calves on farms receiving the plan for both male or beef and dairy heifer calves, and results from regression models suggest that male or beef calves had significant increased growth rates on farms receiving the plan than those that were not. Model predictions suggest that a farm with the highest number of interventions in place (15) compared to farms with the lowest number of interventions in place (4) would expect an improvement in mean growth rates from 0.65kg/d to 0.81kg/d for male or beef calves, from 0.73kg/d to 0.88kg/d for dairy heifers, a decrease in mortality rates from 10.9% to 2.8% in male or beef calves, and a decrease in diarrhoea rates from 42.1% to 15.1% in dairy heifers. Neonatal calves are relatively susceptible to heat loss, and Chapters 2 and 3 suggested that reduced environmental temperatures are associated with increased calf mortality, and reduced growth rates. The aim of Chapter 6 was to evaluate the impact of calf jackets and supplementary heat sources on the growth rates of preweaned calves in a randomised controlled trial. Seventy-nine calves from a single British dairy farm were randomly allocated to receive heat lamps or calf jackets in a factorial study design. Regression model results suggest 1kW heat lamp usage significantly improved growth rates by around 90g/d, and no effect of jacket were identified. A significant, positive impact of increased pen temperature on calf ADG was also identified in this study and was reinforced when including prior information from Chapter 3 within a Bayesian framework. The research presented in this thesis utilised a range of statistical learning techniques to identify factors associated with calf performance, which have been tested in a randomised controlled trial. To provide farmers and veterinarians with access to the calf health findings of the thesis in interactive form, the University of Nottingham Herd Health Toolkit (www.nottingham.ac.uk/herdhealthtoolkit) was created, including tools relating to the management of colostrum, prediction of mortality rates and ultimately a bespoke calf health plan based on user inputs. A number of statistical learning techniques within the field of stability selection were developed in parallel to this thesis, and the creation of the stabiliser R package to allow these techniques to be utilised by the wider research community.
... Despite the advantages over parametric models in relation to predictive performance, there are several relevant limitations to the machine learning methodology utilized here (Elith et al., 2008;Lucas, 2020;Machado et al., 2015;Rabinovich et al., 2021) including the presence of interactions among variables that may not be identified and the lack of an independent study sample, potentially impacting model performance and leading to spurious interpretations (Boulesteix et al., 2015;Oh, 2019;Strobl et al., 2009;Wright et al., 2016). The implementation of methods such as interaction forests and pairwise importance techniques improve the identification of interactions in machine learning algorithms (Hornung & Boulesteix, 2021;Wright et al., 2016), while individual conditional expectation (ICE) plots and the iBreakDown model are capable of accounting for known interactions in their post hoc interpretations (Biecek & Burzykowski, 2021;Goldstein et al., 2015). ...
Article
Full-text available
Effective biosecurity practices in swine production are key in preventing the introduction and dissemination of infectious pathogens. Ideally, on-farm biosecurity practices should be chosen by their impact on bio-containment and bio-exclusion, however quantitative supporting evidence is often unavailable. Therefore, the development of methodologies capable of quantifying and ranking biosecurity practices according to their efficacy in reducing disease risk have the potential to facilitate better informed choices of biosecurity practices. Using survey data on biosecurity practices, farm demographics, and previous outbreaks from 139 herds, a set of machine learning algorithms were trained to classify farms by porcine reproductive and respiratory syndrome virus status, depending on their biosecurity practices and farm demographics, to produce a predicted outbreak risk. A novel interpretable machine learning toolkit, MrIML-biosecurity, was developed to benchmark farms and production systems by predicted risk, and quantify the impact of biosecurity practices on disease risk at individual farms. Quantifying the variable impact on predicted risk 50% of 42 variables were associated with fomite spread while 31% were associated with local transmission. Results from machine learning interpretations identified similar results, finding substantial contribution to predicted outbreak risk from biosecurity practices relating to: the turnover and number of employees; the surrounding density of swine premises and pigs; the sharing of haul trailers; distance from the public road; and farm production type. In addition, the development of individualized biosecurity assessments provides the opportunity to better guide biosecurity implementation on a case-by-case basis. Finally, the flexibility of the MrIML-biosecurity toolkit gives it the potential to be applied to wider areas of biosecurity benchmarking, to address biosecurity weaknesses in other livestock systems and industry relevant diseases. This article is protected by copyright. All rights reserved