Article

Information Theory and an Extension of the Maximum Likelihood Principle

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Upstream and downstream dam passages were modeled separately. Akaike Information Criterion (AIC) values were used to compare the relative fit of all candidate models (Akaike, 1973). Confidence models were selected from models that had a AIC <2 (Royall, 1997). ...
... The presence events were modeled using a generalized linear model (GLM) with a binomial distribution with a logit link function. Candidate models were compared using AIC (MuMin R package; Akaike, 1973;Barton, 2019). Presence of an individual (0 = not present or 1 = present) for a given day per paddlefish or bigheaded carp was chosen as the response variable for the candidate models. ...
... In additional to the global model, 76 candidate models were created using combinations of the eight explanatory variables. Akaike Information Criterion (AIC) values were used to compare the relative fit of all candidate models (Akaike, 1973) and the best performing models were those which had the lowest AIC values. The best fitting candidate models displayed the highest model weights. ...
Article
Full-text available
Movement and dispersal of migratory fish species is an important life-history characteristics that can be impeded by navigation dams. Although habitat fragmentation may be detrimental to native fish species, it might act as an effective and economical barrier for controlling the spread of invasive species in riverine systems. Various technologies have been proposed as potential fish deterrents at locks and dams to reduce bigheaded carp (i.e., silver carp and bighead carp (Hypophthalmichthys spp.)) range expansion in the Upper Mississippi River (UMR). Lock and Dam (LD) 15 is infrequently at open-river condition (spillway gates completely open; hydraulic head across the dam
... The discrimination capacity of the model was assessed by the area under the receiver operating characteristic (ROC) curve (AUC) 76,77 . Finally, we calculated the Akaike Information Criterion (AIC) to assess their parsimony 78 . The AIC compares the quality of the models among them and chooses the best 78 www.nature.com/scientificreports/ ...
... Finally, we calculated the Akaike Information Criterion (AIC) to assess their parsimony 78 . The AIC compares the quality of the models among them and chooses the best 78 www.nature.com/scientificreports/ Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH ("Springer Nature"). ...
Article
Full-text available
Deep-habitat cetaceans are generally difficult to study, leading to a limited knowledge of their population. This paper assesses the differential distribution patterns of three deep-habitat cetaceans (Sperm whale— Physeter macrocephalus, Risso’s dolphin— Grampus griseus & Cuvier’s beaked whale— Ziphius cavirostris ). We used data of 842 opportunistic sightings of cetaceans in the western Mediterranean sea. We inferred environmental and spatio-temporal factors that affect their distribution. Binary logistic regression models were generated to compare the presence of deep-habitat cetaceans with the presence of other cetacean species in the dataset. Then, the favourability function was applied, allowing for comparison between all the models. Sperm whale and Risso’s dolphin presence was differentially favoured by the distance to towns in the eastern part of the western Mediterranean sea. The differential distribution of sperm whale was also influenced by the stability of SST, and that of the Risso’s dolphin by lower mean salinity and higher mean Chlorophyll A concentration. When modelling the three deep-habitat cetaceans (including Cuvier’s beaked whale), the variable distance to towns had a negative influence on the presence of any of them more than it did to other cetaceans, being more favourable far from towns, so this issue should be further investigated.
... The solutions of the mixed-model equations were obtained using maximum likelihood, as implemented in the MIXED procedure of the SAS package version 9.1 (SAS Institute Inc., Cary, NC, USA). The goodness of fit measures of the models was evaluated through: −2LogL (log-likelihood), AIC (Akaike information criterion; Akaike, 1973), corrected AIC and BIC (Bayesian information criterion; Schwarz, 1978). The AIC and BIC goodness of fit measures were estimated as follows: AIC = −2LogL + 2k, and BIC = −2LogL + Log(n) × k, where LogL is the log-likelihood estimated from each assessed genetic model, k is the number of parameters (Akaike, 1973) and n is the number of records. ...
... The goodness of fit measures of the models was evaluated through: −2LogL (log-likelihood), AIC (Akaike information criterion; Akaike, 1973), corrected AIC and BIC (Bayesian information criterion; Schwarz, 1978). The AIC and BIC goodness of fit measures were estimated as follows: AIC = −2LogL + 2k, and BIC = −2LogL + Log(n) × k, where LogL is the log-likelihood estimated from each assessed genetic model, k is the number of parameters (Akaike, 1973) and n is the number of records. The lower these statistics the better the models. ...
Article
One of the most important aspects of genetic evaluation (GE) is the definition of contemporary groups (CG), commonly defined as animals of the same sex born in the same herd, year and season. The objective of this study was to use an aridity index (AI) to classify season and evaluate the implications on the GE of Braunvieh cattle. A data set with 32,777 and 22,448 birth weight (BW) and weaning weight adjusted to 240 days (WW) records, respectively, were used to compare two methods of classification of climatic seasons to be used in the definition of CG for GE models. The first method considered rain season criterion (RC), and the second method is a proposed classification using an AI. Both methods were compared using two approaches. The first approach examined differences in mixed models using the RC and AI season to select the best model for BW and WW, evaluated by different goodness of fit measures. The second approach considered fitting a GE model including the season classifications into the CG structure. Lower probability values for season effect and better goodness of fit measures were obtained when the season was classified according to the AI. Results showed that although differences are small, the AI allows a better model fitting for live-weight traits than RC and revealed a re-ranking effect on EPD data. Further analysis with other traits would demonstrate the extended utility of AI indicators to be considered for fitting models under a climatic change environment.
... 3) A full model was constructed by adding all variables into one multivariate regression. The corrected Akaike information criterion (AICc) was computed for each of the models, where a lower AICc is indicative of a better model fit and an AICc score of 2 or lower suggests a significantly better model fit (44,45). 4) If multiple explanatory variables from (2) had AICc scores 2 or lower than the simple model, the variables were added to a model together with group and time, and the AICc was calculated once more. ...
Article
Full-text available
Gestational diabetes mellitus (GDM) is associated with considerable imbalances in intestinal microbiota that may underlie pathological conditions in both mothers and infants. To more definitively identify these alterations, we evaluated the maternal and infant gut microbiota through the shotgun metagenomic analysis of a subset of stool specimens collected from a randomized, controlled trial in diet-controlled women with GDM. The women were fed either a CHOICE diet (60% complex carbohydrate/25% fat/15% protein, n=18) or a conventional diet (CONV, 40% complex carbohydrate/45% fat/15% protein, n=16) from 30 weeks’ gestation through delivery. In contrast to other published studies, we designed the study to minimize the influence of other dietary sources by providing all meals, which were eucaloric and similar in fiber content. At 30 and 37 weeks’ gestation, we collected maternal stool samples; performed the fasting measurements of glucose, glycerol, insulin, free fatty acids, and triglycerides; and administered an oral glucose tolerance test (OGTT) to measure glucose clearance and insulin response. Infant stool samples were collected at 2 weeks, 2 months, and 4–5 months of age. Maternal glucose was controlled to conventional targets in both diets, with no differences in Homeostatic Model Assessment of Insulin Resistance (HOMA-IR). No differences in maternal alpha or beta diversity between the two diets from baseline to 37 weeks’ gestation were observed. However, women on CHOICE diet had higher levels of Bifidobacteriaceae, specifically Bifidobacterium adolescentis, compared with women on CONV. Species-level taxa varied significantly with fasting glycerol, fasting glucose, and glucose AUC after the OGTT challenge. Maternal diet significantly impacted the patterns of infant colonization over the first 4 months of life, with CHOICE infants showing increased microbiome alpha diversity (richness), greater Clostridiaceae, and decreased Enterococcaceae over time. Overall, these results suggest that an isocaloric GDM diet containing greater complex carbohydrates with reduced fat leads to an ostensibly beneficial effect on the maternal microbiome, improved infant gut microbiome diversity, and reduced opportunistic pathogens capable of playing a role in obesity and immune system development. These results highlight the critical role a maternal diet has in shaping the maternal and infant microbiome in women with GDM.
... If decision makers used more heuristics than this, they would be reintroducing the complexity they are trying to get rid of by using heuristics. Occam's razor and formal work on statistical model selection (Akaike 1970, Akaike 1998, Box et al. 2015, Krugman 2000, Lee 1973, Smith 1997, Stoica and Soderstrom 1982, Tukey 1961) and human decision making (Czerlinski et al. 1999, Gigerenzer and Gaissmaier 2011, Marewski et al. 2010, demonstrate that when two models fit the data the simpler one is generally more accurate. Moreover, simpler models are more testable because they have fewer assumptions. ...
Article
Full-text available
Background Sustainable transport is fundamental to progress in realising the agenda of sustainable development, as a quarter of energy-related global greenhouse gas emissions come from the transport sector. In developing countries, metropolitan areas have adopted the agenda to better serve the urban population with safe, affordable, and environmentally-friendly transport systems. However, this drive must include relevant indicators and how their operationalisation can deal with institutional barriers, such as challenges to cross-sectoral coordination. Objective This study aims to explore context-specific indicators for developing countries, focusing on the case of the Jakarta metropolitan area. Methods Expert judgement was used to assess the selection criteria. The participants were experts from government institutions, non-government organisations, and universities. Results The findings show that safety, public transport quality, transport cost, air pollution, and accessibility are contextual indicators for application in developing countries. Similarities are shown with the research results from other indexes/sets of indicators for developing countries, for example, the Sustainable Urban Transport Index (SUTI) of UN ESCAP. However, some of these indicators leave room for improvement, such as the balance between strategic and operational levels of application. Conclusion Therefore, this research suggests that global sets of indicators should be adjusted before being implemented in particular developing country contexts.
... The reason for this is twofold: first, we aim to provide a gentle introduction to the MDL principle for the unacquainted reader by starting from more known branches of statistics; second, our proposed MDL formulation of robust subgroup discovery is related to concepts from other branches, such as model comparison with Bayesian factors or multiple hypothesis testing. Note that we do not delve into the related Akaike Information Criterion (AIC) (Akaike 1998). Although it is usually advantageous in predictive settings (Grünwald and Roos 2019), the AIC has a higher rate of false positives and a bias towards more complex models than the Bayesion Information Criterion (BIC) (Rouder et al. 2009) and, consequently, our MDL formulation (as it asymptotically converges to BIC up to a constant; see "Appendix B.1"). ...
Article
Full-text available
We introduce the problem of robust subgroup discovery , i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, including traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, finding optimal subgroup lists is NP-hard. Therefore, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration. In fact, the greedy gain is shown to be equivalent to a Bayesian one-sample proportion, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. Furthermore, we empirically show on 54 datasets that SSD++ outperforms previous subgroup discovery methods in terms of quality, generalisation on unseen data, and subgroup list size.
... Aby te zmienne wybrać, można wykorzystać, między innymi, wspomniane wcześniej drzewa decyzyjne, można oprzeć się na wykresach szans Odds Ratio; ale jest również tzw. kryterium Akaikego (w skrócie AIC), służące do wyboru najwłaściwszych modeli o różnej liczbie predykatorów (Akaike 1973). Ostatecznie model dla dwóch zmiennych przyjmie poniższą formułę: Ryc. ...
Article
Full-text available
The great motorway research and construction investments have brought and are still bringing a huge set of new data. In 2019 alone, one millionnew archaeological artefacts were sourced. Therefore, there is a problem of systematic and comprehensive development of the obtained sources,in which statistics may be helpful. The article introduces selected statistical methods and shows examples of their use. It focuses on their usefulnessin archaeological research, and thus it may become a boost for their wider use in the development of archaeological sources.
... The covariate resulting in the smallest BIC was retained as the base membership predictor, then covariates were added on iteratively. Additional covariates would be kept if the resulting Akaike Information Criterion (AIC) improved over the prior membership predictor(s) [22]. Every individual has a probability of belonging to each of the latent classes; probabilities for each individual add up to 1.0. ...
Article
Full-text available
Aims 1) To delineate latent classes of treatment response to biologics in juvenile idiopathic arthritis (JIA) patients in the first 16 weeks after initiation. 2) To identify predictors of early disease response. Methods The study population was drawn from four biologics trials in polyarticular course JIA: Etanercept 2000, Abatacept 2008, TRial of Early Aggressive Therapy (TREAT) 2012 and Tocilizumab 2014. The outcome was active joint counts (AJC). Semiparametric latent class trajectory analysis was applied to identify latent classes of response to treatment; AJC was transformed for this modelling. We tested baseline disease and treatment characteristics for their abilities to predict class membership of response. Results There were 480 participants, 74% females. At baseline, 26% were rheumatoid factor positive. 67% were on methotrexate at enrollment. Three latent class solution provided the best fit. Baseline AJC was the sole best predictor of class membership. Participants classified by their highest membership probabilities into high baseline AJC (> 30) and slow response (26.5%), low baseline AJC (< 10), early and sustained response (29.7%), and moderate baseline AJC progressive response (43.8%). Participants were classified into the latent classes with a mean class membership posterior probability of 0.97. Those on methotrexate at baseline were less likely to belong to high baseline AJC class. Conclusions Three latent classes of responses were detectable in the first 16 weeks of biologics therapy. Those with the highest baseline AJC demonstrated very slow response in this window and were less likely to be on concomitant methotrexate. Trials registration TREAT 2012 (NCT NCT00443430 ) (Wallace et. al, Arthritis Rheum 64:2012–21, 2012), tocilizumab trial 2014 ( NCT00988221 ), abatacept trial 2008 ( NCT00095173 ). Etanercept 2000 from Amgen does not have a trial registration number.
... Information criteria can evaluate the goodness-of-fit of different models that are statistically estimated from the same data (Akaike 1998). In contrast to the likelihood ratio test, the use of information criteria accentuates the parsimony of the parametric model under study. ...
Article
Full-text available
This study explored how the presence of work zones could influence the saturation flow rate (SFR) prevailing at an intersection. Specifically, it researched construction-ridden intersections with interweaving movements (CIWIMs) of vehicle flows that proceed across the stop line and down the connective lanes on the downstream approach to the adjacent intersection. First, image recognition and tracking algorithms were used to extract 2,545 vehicle trajectories from the video captured on-site. Then, the trajectories were manipulated based on lanes to obtain the saturated headway time of the entry-lane stop line during effective green time and the variables related to lane-change behaviors after passing the stop line (e.g., lane-change percentage, lane-change position, lorry percentage, and average passing speed). In addition, certain linear and nonlinear regression methods were employed to estimate lane-focused SFR models in a parsimonious fashion. Subsequently, the Highway Capacity Manual (HCM) model, along with Schroeder's model, was pairwise compared with the newly proposed Box-Cox model for validation. The results indicate that the mean errors are 28.86% and 17.70% for the HCM and Schroeder's models, respectively, while the estimation error for the Box-Cox model is merely 7.20%. This sensitivity analysis reveals that the proportion of bidirectional lane changes, spatial use rate of lane changes, and proportion of heavier vehicles significantly compromises the CIWIM-based SFR. One important finding is that the models accounting for microscopic channel-change behaviors, with higher estimation accuracy compared with existing models, can also be used for traffic simulation parameter calibration and road delay estimation to obtain higher validity and precision.
... The calibration data were fitted to mixture Poisson models with K = 2, 3, 4 and 5 components. Based on the Akaike information criterion (AIC [18]), we found that the best model was the mixture Poisson model with K = 4 components. Moreover, we chose a model where the parameter u k is equal in all four components and from now on it will be denoted as u. ...
Preprint
Full-text available
To predict the health effects of accidental or therapeutic radiation exposure, one must estimate the radiation dose that person received. A well-known ionising radiation biomarker, phosphorylated gamma-H2AX protein, is used to evaluate cell damage and is thus suitable for the dose estimation process. In this paper, we present new Bayesian methods that, in contrast to approaches where estimation is carried out at predetermined post-irradiation times, allow for uncertainty regarding the time since radiation exposure and, as a result, produce more precise results. We also use the Laplace approximation method, which drastically cuts down on the time needed to get results. Real data are used to illustrate the methods, and analyses indicate that the models might be a practical choice for the gamma-H2AX biomarker dose estimation process.
... Therefore, in Table 8, the AIC and BIC coefficients in NB were considerably less than Poisson regression. Thus, the NB showed better performance compared to the Poisson regression [35]. Table 9 illustrates the NB results. ...
Article
On-street parking could be considered one of the most well-known kinds of parking in urban areas. This study investigated the role of factors affecting the parking maneuver time. These factors included driver characteristics (gender, age, weight, and clothing type), parking space-related factors (parking space length and parking permit), maneuver type, lighting condition, and vehicle length. The effect of these factors was measured utilizing several statistical methods as well as Poisson and negative binomial regressions. Analysis results showed that illegal parking significantly reduced the parking maneuver time. This is in terms of the larger parking spaces available in illegal places. Moreover, front-in parking took less time than reverse parking. While front-in parking takes just one movement, reverse parking requires at least two, making it a time-consuming process. Because the length of the parking space was inversely linked with the time required to navigate, increasing the length of the parking space resulted in a reduction in maneuver time. The result showed that maneuver time at night was longer compared to the daytime. Result of inadequate lighting, cars do the parking maneuver more carefully and add more time to the process. Furthermore, the average maneuver time was 12.44 s, significantly different from 18 s indicated in the highway capacity manual.
... K-fold cross-validation (Kohavi, 1995) was applied to compare and select the best teak stand growth modeling system. The dataset was randomly split into K folds subsamples (K = 10) of equal size, in which K -1 subsamples were used to develop models and calculate AIC (Akaike, 1973) and adjusted R 2 , and the remaining subsample was used to validate the models and estimate errors such as percent bias, root mean squared error (RMSE), and mean absolute percent error (MAPE,%). Finally, all those statistics were averaged over 10 times. ...
Article
Full-text available
We developed a system for modeling the growth and yield of planted teak (Tectona grandis L.f.) for small diameter products under varying management regimes in the tropical Central Highlands of Viet Nam. We compared an independent and simultaneous system of models to predict dominant height (Ho), quadratic mean diameter (Dg),averaged tree height (Hg) with Dg, and mean tree volume (V) versus stand age (A). In addition, the model system performance with and without site index (SI) and stand density (N) as covariates were compared using K-fold cross-validation. The best modeling system was obtained with the simultaneously fit models that included SI and N and were in the form of: Dg=Dm/(1 + a × exp(-b × A)) × exp[e1 × (SI– 15) + e2/1000 × (N – 722)]; Hg=Hm ×exp(-a × exp(-b × A)) × exp[e1 × (SI– 15) + e2/1000 × (N – 722)]; and V = π/4 ×10-4 x Dg2 × Hg × 0.45; where Dm, Hm, a, b, e1and e2 were the parameters to be estimated. These models will help predict the growth and yield of teak planted for different planting schemes, including monoculture, agroforestry, and forest enrichment planting in this region.
... If negative chi-square values remained after using the strictly positive Satorra-Bentler Chi-Square Tests, we concluded that the addition of the random intercept or slope did not improve the model. This conclusion was based on the variances of the random intercepts or random slopes, which were low and on the AIC (Akaike, 1998), BIC (Schwarz, 1978), and aBIC (Sclove, 1987) values, where lower values of AIC, BIC, aBIC indicate a better model fit. "a" = models without multi-level structure. ...
Article
Full-text available
This study examined (a) whether growing up with lower-educated parents and attending lower parental education schools associated with children's problem development within the behavioral, emotional, and peer relationship domains; and (b) whether the association of lower individual-level parental education with children's development within these three domains depended upon school-level parental education. To this end, 698 children (Mage = 7.08 in first grade) from 31 mainstream elementary schools were annually followed from first grade to sixth grade. Problems within the behavioral domain included conduct problems, oppositional defiant problems, attention-deficit and hyperactivity problems, and aggression. Problems within the emotional domain included depression and anxiety symptoms. Problems within the peer relationship domain included physical victimization, relational victimization, and peer dislike. Results from multi-level latent growth models showed that, as compared to children of higher-educated parents, children of lower-educated parents generally had higher levels of problems within all three domains in first grade and exhibited a faster growth rate of problems within the behavioral domain from first to sixth grade. Furthermore, as compared to children attending higher parental education schools, children attending lower parental education schools generally had higher levels of problems within the behavioral and emotional domains in first grade and showed a faster growth rate of peer dislike over time. In addition, cross-level interaction analyses showed that in higher parental education schools, children of lower-educated parents showed a faster growth rate of depression symptom levels than children of higher-educated parents. In lower parental education schools, the growth rate of depression symptom levels did not differ between children of higher- and lower-educated parents. Results highlight that addressing the needs of lower parental education schools and children growing up with lower-educated parents may be of primary importance.
... Thus, we do not pursue this procedure in this paper. Alternatively, the log-likelihood maximum and information-based criteria, such as the Akaike information criterion (A-IC) [1] and the Bayesian information criterion (BIC) [22], may be use to select the number of components. Although some success has been achieved using the model choice criteria, it is still difficult to determine the correct number of components for a mixture model, especially in the context of high dimensional data. ...
Article
Many data that exhibit asymmetrical behavior can be well modeled with skew-normal random errors. Moreover, data that arise from a heterogeneous population can be efficiently analyzed by a finite mixture of regression models. These observations motivate us to propose a novel finite mixture of mode regression model based on a mixture of the skew-normal distributions to explore asymmetrical data from several subpopulations. Thanks to the stochastic representation of the skew-normal distribution, we construct a Bayesian hierarchical modeling framework and then develop an efficient Markov chain Monte Carlo sampling algorithm to generate posterior samples for obtaining the Bayesian estimates of the unknown parameters and their corresponding standard errors. Simulation studies and a real-data example are presented to illustrate the performance of the proposed Bayesian methodology.
... This method was applied to¯ 2 for the proposed model (19). In addition, the Akaike information criterion (AIC) was used as a criterion for comparing the goodness of fit of the model [35]. ...
Article
Full-text available
Survival analysis is a set of methods for statistical inference concerning the time until the occurrence of an event. One of the main objectives of survival analysis is to evaluate the effects of different covariates on event time. Although the proportional hazards model is widely used in survival analysis, it assumes that the ratio of the hazard functions is constant over time. This assumption is likely to be violated in practice, leading to erroneous inferences and inappropriate conclusions. The accelerated failure time model is an alternative to the proportional hazards model that does not require such a strong assumption. Moreover, it is sometimes plausible to consider the existence of cured patients or long-term survivors. The survival regression models in such contexts are referred to as cure models. In this study, we consider the accelerated failure time cure model with frailty for uncured patients. Frailty is a latent random variable representing patients’ characteristics that cannot be described by observed covariates. This enables us to flexibly account for individual heterogeneities. Our proposed model assumes a shifted gamma distribution for frailty to represent uncured patients’ heterogeneities. We construct an estimation algorithm for the proposed model, and evaluate its performance via numerical simulations. Furthermore, as an application of the proposed model, we use a real dataset, Specific Health Checkups, concerning the onset of hypertension. Results from a model comparison suggest that the proposed model is superior to existing alternatives.
... The statistical model will be used for multiple linear regression, with and without the random sample consensus (RANSAC) algorithm. The accuracy of the statistical model will be evaluated using Akaike's information criterion (AIC) [52]. All statistical analyses will be performed using the Python script, with the scikitlearn library. ...
Article
Full-text available
Background Motor dysfunctions, such as slower walking speed, precede the occurrence of dementia and mild cognitive impairment, suggesting that walking parameters are effective biomarkers for detecting early sub-clinical cognitive risk. It is often also concurrent with self-complained cognitive dysfunction, called motoric cognitive risk (MCR) syndrome. Our preliminary study found several walking parameters, obtained by a three-dimensional motion capture system, to be correlated with computer-based assessments of various cognitive function modalities, although the sample size was small. The Cognitive-Gait (CoGait) Database Project, described in the current protocol, aims to establish a database of multi-dimensional walking and cognitive performance data, collected from a large sample of healthy participants, crucial for detecting early sub-clinical cognitive risk. Methods We will recruit healthy volunteers, 20 years or older, without any neurological musculoskeletal or psychiatric disorders. The estimated sample size is 450 participants, including a 10% attrition rate. Using computer-based cognitive assessments, participants will perform six tasks: (i) the simple reaction time task, (ii) Go/No-Go task, (iii) Stroop Color–Word Test, (iv) N-back test, (v) Trail Making Test, and (vi) digit span test. We will also conduct paper-based cognitive assessments such as the Mini-Mental State Examination, Montreal Cognitive Assessment, and the Geriatric Depression Scale-15 for assessing MCR. Gait will be measured through joint kinematics and global positioning in participants’ lower legs while walking at a comfortable and faster pace, using pants with an inertial measurement unit-based three-dimensional motion capture system. Finally, we will establish a prediction model for various cognitive performance modalities based on walking performance. Discussion This will be the first study to reveal the relationship between walking and cognitive performance using multi-dimensional data collected from a large sample of healthy adults, from the general population. Despite certain methodological limitations such as the accuracy of measurements, the CoGait database is expected to be the standard value for both walking and cognitive functions, supporting the evaluation of psychomotor function in early sub-clinical cognitive risk identification, including motoric-cognitive risk syndrome.
... The analyses were performed that considered 2nd to 4th orders of the polynomial (k) fit, including a constant term and powers of age (up to k-1) for 4 random effects for evaluating the best model, describing the data. The Logarithm of the REML function (LogL), Akaike's information criterion (AIC) (Akaike, 1998), and Bayesian Information Criterion (BIC) (Schwarz, 1978) were computed to rank the models. The criteria were AIC = -2logL + 2p; BIC = − 2logL + plog (N -r), where p is the number of parameters; N is the number of observations; and r is the rank of the incidence matrix of fixed effects. ...
Article
The Presence of genotype by environment (GxE) interaction over different ages across growth trajectory in different environments leads to alteration in sires ranking, it may cause problem in any breeding program by affecting efficiency of selection. The objective of the present study was to compare the random regression models (RRM) and conventional animal models and also to plot the trajectory of the growth curve using Legendre polynomial (LP) function of the RRM for better genetic evaluation for growth traits of the Malpura sheep. The data was collected from 8299 animals descended from 2529 dams and 525 sires for growth traits of Malpura sheep, selected over 45 years (1975–2019) maintained at the Animal Genetics and Breeding Division, ICAR-Central Sheep and Wool Research Institute (CSWRI), Avikanagar, Rajasthan, India. The Legendre polynomial based random regression models (LP-RRM) was more robust as compared to conventional animal model and it included direct genetic, direct maternal, maternal permanent environmental and animal permanent environmental effects as a random effects in different orders of fit. The estimates for the additive direct heritability (h²) using best LP-RRM (4444) were 0.176±0.022, 0.367±0.024, 0.314±0.021, 0.323±0.023 and 0.314±0.023, respectively for live body weight at birth (BWT), three-month (WWT), six-month (6WT), nine-month (9WT) and twelve-month (12WT). The comparable trends of variances were found in LP-RRM with trends obtained from the inclusive univariate animal model that included the direct genetic, direct maternal and maternal permanent environmental components. The h² estimates indicated the further scope of genetic improvement through selection. The maternal proportion of variance was accounting for 1 to 15% variance across growth trajectory indicating the low influence of maternal effects that reduced significantly post-weaning. However, we observed that the animal permanent environment accounted for 42 to 58% of variance across growth trajectory. The estimates of genetic correlation in LP-RRM were positive and higher (mostly>0.80) between most of the time points as compared to the estimates obtained from animal model, indicating less GxE when RRM is used. Currently selection of Malpura sheep is carried out at the age of six month for growth traits. The presence of genotype by environment (GxE) interaction resulted in to significant re-ranking of sires at 12WT on the basis of breeding values using animal model. However, more consistency in sires ranking were found in the reaction norms plotted for breeding values at 6WT and 12WT using LP-RRM. The RRM have property for accounting (GxE) interaction across growth trajectories more precisely. Therefore, it is recommended that the RRM approach should be used for genetic evaluation of Malpura sheep at 6-month age for unbiased, accurate, and consistent prediction of breeding values for growth traits.
... When a smoothing spline is used, for example, it is usual to consider the cross-validation method or the generalized cross-validation method Craven and Wahba (1978). Alternatively, this parameter may be selected by applying the Akaike information criterion (Akaike, 1973) or the Bayesian information criterion (Schwarz, 1978). An alternative approach for estimating the smooth function is to use a set of base functions which are more local in their effects compared to other methods such as fourier expansion. ...
Article
Full-text available
In this paper we present several diagnostic measures for the class of nonparametric regression models with symmetric random errors, which includes all continuous and symmetric distributions. In particular, we derive some diagnostic measures of global influence such as residuals, leverage values, Cook’s distance and the influence measure proposed by Peña (Technometrics 47(1):1–12, 2005) to measure the influence of an observation when it is influenced by the rest of the observations. A simulation study to evaluate the effectiveness of the diagnostic measures is presented. In addition, we develop the local influence measure to assess the sensitivity of the maximum penalized likelihood estimator of smooth function. Finally, an example with real data is given for illustration.
... Daher ist die Funktionalität der Mikrostruktur der weißen Substanz für alle komplexen neuronalen Prozesse von größter Bedeutung.[172,146] Abb.2.3.: Struktur und Aufbau der Myelinhülle, gewickelt um ein Axon. Die Internodien (Ranvier-Schnürringe) sind dargestellt und markiert. ...
Thesis
In dieser Arbeit wird die Machbarkeit der Wasserstoff (1H) Magnetresonanztomographie (MRT) mit ultrakurzen Echozeiten zur direkten Bildgebung von ultrakurzen T2*-Komponenten der weißen Hirnsubstanz bei 7 Tesla unter Verwendung der inversionsvorbereiteten Doppelechodifferenzbildgebung mit ultrakurzen Echozeiten (IR-Diff-UTE) demonstriert. Eigenschaften, Chancen und Limitationen dieser Methode werden bei dieser neuen Feldstärke untersucht: Auftretende Artefakte werden reduziert, MR Parameter in der weißen Hirnsubstanz quantifiziert und darauf basierend ein klinisch hochaufgelöstes Protokoll mit UTE Differenz- und Fraktionskontrast vorgestellt. Die IR-Diff-UTE Technik unterdrückt lange T2*-Signale in der weißen Hirnsubstanz durch Verwendung einer adiabatischen Inversionsvorbereitung in Kombination mit Doppelechodifferenzbildgebung. Mittels Bloch-Simulationen wurde das entsprechende Signalverhalten aller relevanten Gewebekompartimente nachgebildet und analysiert. Artefakte, die bei 7T durch Einfaltungen und Verschmierungen langer T2*-Fettsignale aus der Kopfhaut entstehen, wurden reduziert indem die IR Zentralfrequenz so verschoben wurde, dass die betreffenden Frequenzbereiche ebenfalls invertiert wurden. An 8 gesunden Probanden wurden anschließend die T2*-Relaxationszeiten der weißen Hirnsubstanz in Kompartimenten quantifiziert. Darauf aufbauend wurden an 20 gesunden Probanden das Signal- und Kontrast-zu-Rausch Verhältnis (SNR/CNR), sowie die Artefaktunterdrückung als auch die Stabilität der IR-Diff-UTE Kontraste evaluiert. Schließlich wurde bei 6 an Multipler Sklerose (MS) erkrankten Patienten die Fähigkeit der Technik untersucht, krankheitsbedingte Signaländerungen anzuzeigen. Das in dieser Arbeit verwendete 7 Tesla MRT System ermöglichte UTE Messungen im Gehirn mit einer Totzeit von 30 μs. Wirbelstromeffekte nullter Ordnung wurden als vernachlässigbar charakterisiert (Δ < 15°), lineare Beiträge traten besonders auf der x- und y-Gradientenachse auf (Δ < 6m−1) und mussten retrospektiv korrigiert werden. An Messphantomen wurden die vorgeschlagenen Studien zunächst als geeignet für die in vivo Anwendungen verifiziert. Eine optimale IR Unterdrückung der langen T2*-Komponenten in der weißen Hirnsubstanz wurde bei TR = 1500 ms für TI = 430 ms im vorgeschlagenen Quantifizierungsprotokoll und für TI = 465 ms im klinischen Protokoll gefunden. Durch Simulationen und Probandenmessungen wurde aufgezeigt, dass eine Frequenzverschiebung des IR Pulses um −1,2 ppm (d.h. in Richtung der Fettfrequenzen) zu einer guten Unterdrückung der Fettartefakte führt. Damit konnte in der Quantifizierungsstudie ein ultrakurzes Kompartiment von (68 ± 6) % mit einer T2*-Zeit von (147 ± 58) μs und einer chemischen Verschiebung von (−3,6 ± 0,5) ppm von Wasser quantifiziert werden. In der klinischen Kontraststudie wurde für die weiße Hirnsubstanz gesunder Probanden ein stabiler ultrakurzer T2*-Fraktionskontrast von 0,57 ± 0,01 mit einer durchschnittlichen Standardabweichung von 0,20 ± 0,01 berechnet. Für den Differenzkontrast wurden SNRDiff = 4,7±1,1 und CNRDiff = 8,7±2,4 bestimmt. Bei den MS-Patienten wurde eine signifikante Reduktion der gemessenen ultrakurzen Fraktionswerte beobachtet, sowohl in den identifizierten Läsionen (−0,09 ± 0,09) als auch in der normal aussehenden weißen Substanz (0,54 ± 0,05). Die im Rahmen dieser Arbeit gefundenen Ergebnisse deuten darauf hin, dass die in der weißen Hirnsubstanz gemessenen ultrakurzen T2*-Komponenten in erster Linie direkt aus dem Myelingewebe stammen. Die direkte IR-Diff-UTE Bildgebung von ultrakurzen T2*-Komponenten der weißen Substanz ist somit bei 7 Tesla artefaktfrei, mit hoher quantitativer Stabilität und guter Erkennung von Signalverlusten bei MS möglich.
... ting "-all-distributions") were included. The likelihood of the predicted models was assessed with the Akaike information criterion (setting "-sort A") 49 . The selected amino acid substitution model for the SFTP phylogeny was LG 50 . ...
Article
Full-text available
The epidermal differentiation complex (EDC) is a cluster of genes encoding components of the skin barrier in terrestrial vertebrates. EDC genes can be categorized as S100 fused-type protein (SFTP) genes such as filaggrin, which contain two coding exons, and single-coding-exon EDC (SEDC) genes such as loricrin. SFTPs are known to be present in amniotes (mammals, reptiles and birds) and amphibians, whereas SEDCs have not yet been reported in amphibians. Here, we show that caecilians (Amphibia: Gymnophiona) have both SFTP and SEDC genes. Two to four SEDC genes were identified in the genomes of Rhinatrema bivittatum, Microcaecilia unicolor and Geotrypetes seraphini. Comparative analysis of tissue transcriptomes indicated predominant expression of SEDC genes in the skin of caecilians. The proteins encoded by caecilian SEDC genes resemble human SEDC proteins, such as involucrin and small proline-rich proteins, with regard to low sequence complexity and high contents of proline, glutamine and lysine. Our data reveal diversification of EDC genes in amphibians and suggest that SEDC-type skin barrier genes have originated either in a common ancestor of tetrapods followed by loss in Batrachia (frogs and salamanders) or, by convergent evolution, in caecilians and amniotes.
... These techniques provide an objective way to compare performance of the B-spline models with various numbers of knots to select a model with optimal parametric complexity (in the sense of minimizing the MSE with a minimal number of knots distributed in an appropriate manner). Two of the most commonly used for this purpose methods are the Akaike information criterion (AIC) [29,30] and the Bayesian information criterion (BIC) [31] and both of them include a certain penalty term which reduces an obtained gain in the data fit quality due to increasing model complexity. Hence, as it can be seen in Fig. 2, the graph of corresponding IC measure (sometimes called a "score") has a minimum at some number of knots and the penalized criteria select the best model as the one with the lowest IC score. ...
Preprint
Full-text available
Currently, a Kramers-Kronig consistent B(asis)-spline representation of the dielectric function is an efficient and widely used method for accurate modeling of the material optical functions in ellipsometric data analysis. However, the B-spline approach to the dielectric function modeling should include an appropriate and user-independent way of a knot vector generation, i.e., the proper selection of the number and locations of knots. In this paper, we advocate for a systematic approach which combines a specific knot allocation method, based on so-called "Integral Span", a slope-weighting factor, with a selection of optimal number of knots using the Akaike and Bayesian Information Criteria, two statistical estimators, thereby replacing an intuitive and time-consuming "trial-and-error" strategy. The proposed hybrid approach is used to optimize the B-spline models for an epitaxial cobalt disilicide (CoSi 2) thin film and a crystalline silicon substrate (c-Si).
... Since mushroom dry mass data were nonnormally distributed, we used the logarithmic scale to normalize their distribution. Akaike's information criterion (AIC) was used to choose the best model (Akaike, 1973), using the MuMIn package (Bartoń, 2020). To estimate the contribution of each variable in a multivariate model we used hierarchical decomposition of goodness-of-fit measures of regressions (hier.part) ...
Article
Full-text available
Mushrooms play an important role in the maintenance of ecosystem processes and delivering ecosystem services, including food supply. They are also an important source of income for many people worldwide. Thus, understanding which environmental factors influence mushroom productivity is a high practical and scientific priority. We monitored the production of mushrooms in temperate mixed deciduous forest in Białowieża Primeval Forest in eastern Poland for two yielding seasons. The research plots were set under similar environmental conditions (topography, geology, soil type) but differed by tree species composition and tree species richness. The main factor explaining mushroom production (close to 35% of the variation explained by the model) was the species richness of mushrooms. In turn, the species richness of mushrooms was mainly explained by soil properties (pH and C/N ratio) and stand characteristics (including tree species richness and wood increment) for ectomycor-rhizal mushrooms and by soil pH for saprotrophic mushrooms. Higher precipitation in 2021 resulted in higher mushroom production than in 2020, while low levels of precipitation in 2020 resulted in stronger effect of ambient temperature. The differences in mushroom yield between years varied highly among plots. They were explained by stand characteristics, and in the case of saprotrophic mushrooms by tree richness and their own species richness. Our results suggest that promoting mushroom species richness is fundamental for increasing mushroom yield and should be taken into account in forest management.
... We excluded models for which t 0 was unconstrained for relative lack of individuals 5 years or younger (models would be unrealistic with highly negative t 0 values). We used information-theoretic methods (Burnham and Anderson 2002) to determine the highest-ranked models based on the relative Akaike's information criterion corrected for small sample sizes (ΔAICc) when comparing multiple models in a suite (Akaike 1973). ...
Article
Full-text available
The carpsuckers, which include Quillback Carpiodes cyprinus, river carpsucker Carpiodes carpio, and highfin carpsucker Carpiodes velifer, are ictiobine catostomids (Catostomidae) native to North America. At present, the Carpiodes are classified for management purposes in a multi-species group (derogatorily labeled “rough fish”) by state agencies throughout most of their USA range. This non-specific fisheries management persists despite widespread declines among Catostomidae in North America, and as additional fishing pressures have recently evolved. Carpiodes are increasingly targeted by bowfishing with virtually no regulation, monitoring, or management. From 2018 to 2021, we analyzed the otoliths of 81 Quillback from a Minnesota population to quantify size at age, onset of sexual maturity, accrual of age spots, and recruitment dynamics. Allometric analysis revealed that otolith size increases with body mass and age at rates greater than rates for total length, scale length, and operculum length in Quillback. Our findings also indicate Quillback can possibly live at least 44 years, reach sexual maturity around ages 8–9 years, accrue black spots on epidermal tissue, similar to age-spot pigmentation in bigmouth buffalo Ictiobus cyprinellus, after 30 years, and exhibit more variable recruitment than previously documented. The life history traits of Carpiodes warrant further study for three primary reasons: the “rough fish” label has perpetuated systemic neglect, there are rising rates of exploitation, and the majority of catostomids are already classified as imperiled. Management requires updated life history data in the face of these challenges.
... Two phylogenetic methods were used. Maximum Likelihood (ML) analysis was conducted in IQ-TREE with perturbation strength was set to 0.5 and the number of unsuccessful iterations set to 100; nodal support was evaluated by 1,000 ultrafast Bootstrap replicates (BS; Akaike 1973). A Bayesian analysis (BI) was conducted in MrBayes 3.2 (Ronquist and Huelsenbeck 2013) by means of two independent runs, each with five heated and one cold Markov chains and all model parameters estimated in MrBayes. ...
Article
Full-text available
Information collected from a complete female juvenile individual of Gray's beaked whale (Mesoplodon grayi) stranded on the Guanaqueros coast, Coquimbo Region in Chile (30°S) is provided. Difficulties to differentiate specimens of Gray's beaked whale and Hector's beaked whale (M. hectori) are discussed based on the use of diagnostic phenotypic characters, such as differences in color patterns and position of teeth on the lower jaw. The identification of the studied specimen as Gray's beaked whale was supported by a detailed review of cranial characters and molecular analyses. Finally, we provide an updated list containing all known Chilean records for this species. Se proporciona información del registro más completo de Mesoplodon grayi (Ziphiidae) que se conoce para Chile y que corresponde a una hembra juvenil varada en la costa de Guanaqueros, Región de Coquimbo (30°S). Se discuten los problemas de determinación de Mesoplodon grayi con respecto a M. hectori en base a la utilización de caracteres fenotipicos generales como patrones de coloración. La asignación a Meso-plodon grayi se apoyó en la revisión detallada de caracteres craneales y un análisis molecular. Además, se presenta un listado actualizado con todos los registros conocidos para esta especie en las costas de Chile.
... The best clustering solution was selected according to the Bayesian Information Criterion (BIC; [52]), which is one of the most well-known and successfully applied criteria to compare different clustering solutions and to determine which most closely fits the data [53]. The Akaike's Information Criterion (AIC; [54]), R-squared (R 2 ), and silhouette values were also taken into consideration. ...
Article
Full-text available
Few studies have focused on the persistence of nonsuicidal self-injury (NSSI) over time in developmental age. This study aimed to define the psycho-behavioral profiles of young inpatients according to past or recent NSSI onset (i.e., NSSI for more or less than one year, respectively), and identify possible risk factors for maintaining NSSI over time. A total of 118 Italian NSSI inpatients aged 9-17 were involved. The Youth Self-Report (YSR) was administered. K-means cluster analyses were conducted using the YSR affective disorders, social competencies, and social problems scales as clustering variables. A binomial logistic regression was run to clarify which of these variables discriminate between the past and recent NSSI onset groups. Chi-square tests were performed to pinpoint the variables associated with long-standing NSSI. The final cluster solution displayed four psycho-behavioral profiles; a greater number of inpatients with recent NSSI onset was found in the clusters characterized by scarce social competencies. Affective disorders and social competencies were significant predictors, and higher scores on both scales were more likely in the past NSSI onset group. School problems and alcohol/substance use were related to long-standing NSSI. Therefore, a lack of social skills may be involved in recent NSSI onset, while affective disorders and other problem behaviors may dictate the continuation of NSSI over time.
... The major disadvantages of this approach are the large number of parameters to be estimated and the fact that different states may require substantially different maximum sojourn times. For these reasons, herein we simultaneously select the optimal values of K and vector U using penalized likelihood criteria, such as AIC (Akaike 1998), BIC (Schwarz 1978) and ICL (Biernacki et al. 2000) which penalizes the BIC for the estimated mean entropy and it is given by: ...
Article
Full-text available
This paper develops a quantile hidden semi-Markov regression to jointly estimate multiple quantiles for the analysis of multivariate time series. The approach is based upon the Multivariate Asymmetric Laplace (MAL) distribution, which allows to model the quantiles of all univariate conditional distributions of a multivariate response simultaneously, incorporating the correlation structure among the outcomes. Unobserved serial heterogeneity across observations is modeled by introducing regime-dependent parameters that evolve according to a latent finite-state semi-Markov chain. Exploiting the hierarchical representation of the MAL, inference is carried out using an efficient Expectation-Maximization algorithm based on closed form updates for all model parameters, without parametric assumptions about the states’ sojourn distributions. The validity of the proposed methodology is analyzed both by a simulation study and through the empirical analysis of air pollutant concentrations in a small Italian city.
Article
The decline of pollinators is a widespread problem in today's agriculture, affecting the yield of many crops. Improved pollination management is therefore essential, and honey bee colonies are often used to improve pollination levels. In this work, we applied a spatially explicit agent-based model for the simulation of crop pollination by honey bees under different management scenarios and landscape configurations. The model includes 1) a representation of honey bee social dynamics; 2) an explicit representation of resource dynamics; 3) a probabilistic approach to the foraging site search process; and 4) a mechanism of competition for limited resources. We selected 60 sample units from the rural landscape of the Chilean region with the largest apple-growing area and evaluated the effectiveness of different pollination strategies in terms of number of visits and number of pollinated flowers per hectare of apple crops. Finally, we analyzed how the effects of these practices depended on the structure of adjacent landscapes. Higher colony density per hectare in the focal crop increased the number of honey bee visits to apple inflorescences; however, the effects were nonlinear for rates of pollinated flowers, suggesting that there is an optimum beyond which a greater number of honey bees does not signify increased levels of crop pollination. Furthermore, high relative proportions of mass flowering crops and natural habitats in the landscape led to a decrease in honey bee densities in apple fields in landscapes with high relative cover of apple orchards (dilution effect). Our results indicate that for optimal crop pollination, strategies for management of pollinator species should consider the modulating effects of the surrounding landscape on pollination effectiveness. This model could thus be a useful tool to help farmers, beekeepers, and policy-makers plan the provision of pollination services, while also promoting the biodiversity and sustainability of agroecosystems.
Article
Better mechanisms of predicting prognoses in patients with metastatic breast cancer will improve the identification of patients for whom curative treatments may be the most effective. In this study, the prognostic value of 18F-fluorodeoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) was assessed in patients with metastatic breast cancer. A retrospective analysis of women who underwent 18F-FDG PET/CT for staging of newly diagnosed metastatic breast cancer was conducted. In each patient, the maximum standardized uptake value (SUV) and total lesion glycolysis (TLG) of primary tumors and regional lymph nodes were measured and analyzed for association with survival using the Cox proportional hazards regression model. From 346 consecutive patients, 32 with metastatic invasive ductal carcinoma of the breast were included in the study. The median duration of follow-up was 22.5 months. Disease progression occurred in 26 patients, and 11 patients died. When multivariate analyses with a stepwise forward regression were applied, only the maximum SUV and TLG of regional lymph nodes showed a significant correlation with progression-free survival and overall survival, respectively. This study demonstrates that increased 18F-FDG uptake in regional lymph nodes is a strong independent predictor of survival in women with metastatic invasive ductal carcinoma of the breast.
Article
SporTran is a Python utility designed to estimate generic transport coefficients in extended systems, based on the Green-Kubo theory of linear response and the recently introduced cepstral analysis of the current time series generated by molecular dynamics simulations. SporTran can be applied to univariate as well as multivariate time series. Cepstral analysis requires minimum discretion from the user, in that it weakly depends on two parameters, one of which is automatically estimated by a statistical model-selection criterion that univocally determines the resulting accuracy. In order to facilitate the optimal refinement of these parameters, SporTran features an easy-to-use graphical user interface. A command-line interface and a Python API, easy to embed in complex data-analysis workflows, are also provided. Program summary Program Title: SporTran CPC Library link to program files: https://doi.org/10.17632/hm48f8kgj9.1 Licensing provisions: GPLv3 Programming language: Python Nature of problem: Given an M-variate time series, Jj(t), j=0,…M−1, typically describing a number of currents resulting from a molecular-dynamics simulation, SporTran estimates the transport coefficient κ=1/(Λ−1)00, where Λij=∫0∞〈Jj(t)Jk(0)〉dt is the matrix of the Onsager linear-response coefficients, and 〈⋅〉 indicates an equilibrium average over initial conditions. Solution method: i)It is first observed that the Onsager transport coefficients are the zero-frequency values of the cross power spectra of the currents under scrutiny: Λij=12Sij(ω=0), where Sij(ω)=∫−∞∞〈Ji(t)Jj(0)〉eiωtdt. ii)We next define the (cross) periodogram as the product of pairs of Fourier transforms of the current time series: Skij=ϵNJ˜kiJ˜kj⁎, where ϵ is the time step of the time series, N the number of their terms, and J˜kj=∑n=0N−1Jnje2πiknN their discrete Fourier transforms, and Jnj=Jj(ϵn). iii)As the current time series are realisations of a Gaussian process, in the long-time limit and for k≠k′ the Skij are uncorrelated complex Wishart random matrices (a matrix generalization of the χ2 distribution) whose expectation, according to the Wiener-Khintchine theorem, is the cross power spectrum we are after. It follows that (Sk−1)00 is proportional to a set of uncorrelated χ2 deviates; iv)A consistent estimator for log⁡(κ)=−log⁡((Λ−1)00) is finally obtained by applying a low-pass filter to the process log⁡((Sk−1)00). The theoretical background of the methodology implemented in SporTran is thoroughly presented in Refs. [1-3]. References [1]L. Ercole, A. Marcolongo, and S. Baroni, Sci. Rep. 7, 15835 (2017); [2]R. Bertossa, F. Grasselli, L. Ercole, and S. Baroni Phys. Rev. Lett. 122, 255901 (2019). [3]S. Baroni, R. Bertossa, L. Ercole, F. Grasselli, and A. Marcolongo, in Handbook of Materials Modeling. Applications: Current and Emerging Materials, edited by W. Andreoni and S. Yip (Springer, 2018) 2nd ed., Chap. 12-1 (https://arxiv.org/abs/1802.08006);
Article
Objective A growing body of research focuses on the automated diagnosis of acute myocardial infarction (AMI) using electrocardiogram (ECG) recordings. Several methods rely on differences between the ECG at baseline (no AMI) and during AMI condition. However, this approach may not sufficiently account for the progress of AMI, and it can underestimate the effect of false positives in a continuous monitoring setting. This in turn may hinder the adoption of automated methods for AMI diagnosis in the clinical practice. In this study, we propose a new automated method for the dynamic assessment of AMI condition. This method accounts for the dynamic nature underlying AMI events and the need for a low false positives incidence. Using a reduced 3-lead ECG system, we developed a novel set of parameters able to capture changes over time in the distribution properties of ECG-derived features. These parameters are used to train and validate a deep learning model in order to perform dynamic assessment of AMI condition. Conclusion: Results suggest that the proposed method is able to capture the dynamic evolution of AMI with a false positive rate below 1%. Significance: Thanks to the reduced number of leads, the proposed method could be used to assess AMI condition in long-term, remote and home monitoring, and intensive care unit (ICU) environments.
Chapter
Often it is not clear which model you should use for the data at hand—maybe because it is not known ahead of time which combination of variables should be used to predict the response, or maybe it is not obvious how the response should be modelled. In this chapter we will take a look at a few strategies for comparing different models and choosing between them.
Article
Background Due to climate change, days with high temperatures are becoming more frequent. Although the effect of high temperature on the kidneys has been reported in research from Central and South America, Oceania, North America and Europe, evidence from Asia is still lacking. This study aimed to examine the association between short-term exposure to high temperatures and acute kidney injury (AKI) in a nationwide study in South Korea. Methods We used representative sampling data from the 2002–2015 National Health Insurance Service–National Sample Cohort in South Korea to link the daily mean temperatures and AKI cases that occurred in the summer. We used a bidirectional case-crossover study design with 0–7 lag days before the emergency room visit for AKI. In addition, we stratified the data into six income levels to identify the susceptible population. Results A total of 1706 participants were included in this study. The odds ratio (OR) per 1°C increase at 0 lag days was 1.051, and the ORs per 1°C increase at a lag of 2 days were both 1.076. The association between exposure to high temperatures and AKI was slightly greater in the low-income group (OR = 1.088; 95% CI: 1.049–1.128) than in the high-income group (OR = 1.065; 95% CI: 1.026–1.105). Conclusions In our study, a relationship between exposure to high temperatures and AKI was observed. Precautions should be taken at elevated temperatures to minimize the risk of negative health effects.
Preprint
Full-text available
A particular challenge for disease progression modeling is the heterogeneity of a disease and its manifestations in the patients. Existing approaches often assume the presence of a single disease progression characteristics which is unlikely for neurodegenerative disorders such as Parkinson's disease. In this paper, we propose a hierarchical time-series model that can discover multiple disease progression dynamics. The proposed model is an extension of an input-output hidden Markov model that takes into account the clinical assessments of patients' health status and prescribed medications. We illustrate the benefits of our model using a synthetically generated dataset and a real-world longitudinal dataset for Parkinson's disease.
Article
Commercial fisheries, especially pelagic longline fisheries targeting tuna and/or swordfish, often land silky sharks (Carcharhinus falciformis), which are currently listed as vulnerable by the International Union for Conservation of Nature (IUCN). Due to increasing fishing effort and the fact that they overlap in habitat with target species, the population trend of silky sharks is declining worldwide. Understanding their relationships with environmental variables that lead to their capture by fisheries is critical for their management and conservation. Nevertheless, little is known about their size distribution in relation to environmental variables in the Pacific Ocean. Using data from the Chinese Observer Tuna Longline fishery from 2010 to 2020, this study developed a species distribution model (SDM) to analyze the relationships between silky shark size distribution patterns and environmental variables and spatio-temporal variability at fishing locations. Observed sizes ranged from 36 to 269 cm fork length (FL). The final model suggests that sea surface temperature (SST), primary production (photosynthetically available radiation, PAR), and ocean surface winds were the key environmental variables shaping size distribution patterns of silky sharks in the Pacific. A high proportion of larger silky sharks has been predicted in areas associated with productive upwelling systems. In addition, the model predicted that larger specimens (>140 cm FL) occur near the equator, and smaller specimens farther from the equator but still in tropical regions. Two regions in the eastern Pacific (the coastal upwelling area off northern Peru and the waters around the Galapagos Islands) seem to be important locations for larger specimens. The size distribution patterns of silky sharks in relation to environmental variables presented in this study illustrate how this species segregates spatially and temporally and presents potential habitat preference areas. The information obtained in the present study is critical in the quest for management and conservation of menaced species such as the silky shark.
Article
Good importance sampling strategies are decisive for the quality and robustness of photorealistic image synthesis with Monte Carlo integration. Path guiding approaches use transport paths sampled by an existing base sampler to build and refine a guiding distribution. This distribution then guides subsequent paths in regions that are otherwise hard to sample. We observe that all terms in the measurement contribution function sampled during path construction depend on at most three consecutive path vertices. We thus propose to build a 9D guiding distribution over vertex triplets that adapts to the full measurement contribution with a 9D Gaussian mixture model (GMM). For incremental path sampling, we query the model for the last two vertices of a path prefix, resulting in a 3D conditional distribution with which we sample the next vertex along the path. To make this approach scalable, we partition the scene with an octree and learn a local GMM for each leaf separately. In a learning phase, we sample paths using the current guiding distribution and collect triplets of path vertices. We resample these triplets online and keep only a fixed‐size subset in reservoirs. After each progression, we obtain new GMMs from triplet samples by an initial hard clustering followed by expectation maximization. Since we model 3D vertex positions, our guiding distribution naturally extends to participating media. In addition, the symmetry in the GMM allows us to query it for paths constructed by a light tracer. Therefore our method can guide both a path tracer and light tracer from a jointly learned guiding distribution.
Article
Full-text available
In the past, fires around railways were often associated with steam locomotives. Although steam locomotives have disappeared from everyday rail traffic, fires still occur. A vegetation fire near Bzenec (Czech Republic) on 21 June 2018 affected, for example, 124,110 m2 of forest and grassland. The investigation revealed that the fire was caused by a spark from a passing train. In this study, we analyzed vegetation fires that occurred near Czech railway lines between 2011 and 2019 to investigate their temporal pattern and relation to weather conditions or to identify the most hazardous locations. Fires were concentrated mainly between March and August in the afternoon. They are also more likely to occur during periods of high air temperature, low rainfall, low relative air humidity, and low wind speed. Using the KDE+ method, we identified 186 hotspots, which contained 510 vegetation fires and represented only 0.3% of the length of the entire Czech rail network. Spatial analysis revealed that there are more than 4 times higher odds that a vegetation fire occurs near an electrified railway line than near a non-electrified line or that additional 10 freight trains per 24 h increases the odds by 5%. As the results show, vegetation fires near railway lines are still relatively common phenomenon, mainly due to favorable weather conditions. Grassy areas with dry or dead vegetation are particularly at risk. These areas can be ignited, for example, by sparks from the brakes of railway vehicles. Due to global warming, vegetation fires can be expected to occur more frequently in the future. The identified hotspots can thus be used to reduce the risk of fires, for example by managing the surrounding vegetation.
Article
Full-text available
The level of structural integration (LSI), a psychodynamic/psychoanalytic concept originally developed by the Operationalized Psychodynamic Diagnosis (OPD), provides a promising empirical approach that is recognized beyond the boundaries of psychoanalysis and is highly relevant for therapy and research. The aim of our study was to investigate the intersession experiences of patients in psychotherapy with different levels of structural integration. The sample consisted of 69 inpatients who were undergoing psychotherapeutic treatment. The patients were asked to complete the German version of the Intersession Experience Questionnaire (IEQ), the short version of the OPD Structure Questionnaire (OPD-SQS) and the Brief-Symptom Inventory (BSI). LSI is associated with the situations, contents and negative emotions in the intersession experiences of patients, as well as their symptom distress over the course of therapy. Furthermore, the level of structural integration is a significant predictor of outcomes. Patients with different LSI had different intersession experiences.
Article
Full-text available
COVID-19 is a global health burden. We propose to model the dynamics of COVID-19 in Senegal and in China by count time series following generalized linear models. One of the main properties of these models is that they can detect potentials trends on the contagion dynamics within a given country. In particular, we fit the daily new infections in both countries by a Poisson autoregressive model and a negative binomial autoregressive model. In the case of Senegal, we include covariates in the models contrary to the Chinese case where the fitted models are without covariates. The short-term predictions of the daily new cases in both countries from both models are graphically illustrated. The results show that the predictions given by the negative binomial autoregressive model are more accurate than those given by the Poisson autoregressive model.
Article
Full-text available
Environmental conditions during early-life development can have lasting effects shaping individual heterogeneity in fitness and fitness-related traits. The length of telomeres, the DNA sequences protecting chromosome ends, may be affected by early-life conditions, and telomere length (TL) has been associated with individual performance within some wild animal populations. Thus, knowledge of the mechanisms that generate variation in TL, and the relationship between TL and fitness, is important in understanding the role of telomeres in ecology and life-history evolution. Here, we investigate how environmental conditions and morphological traits are associated with early-life blood TL and if TL predicts natal dispersal probability or components of fitness in 2746 wild house sparrow (Passer domesticus) nestlings from two populations sampled across 20 years (1994-2013). We retrieved weather data and we monitored population fluctuations, individual survival, and reproductive output using field observations and genetic pedigrees. We found a negative effect of population density on TL, but only in one of the populations. There was a curvilinear association between TL and the maximum daily North Atlantic Oscillation index during incubation, suggesting that there are optimal weather conditions that result in the longest TL. Dispersers tended to have shorter telomeres than non-dispersers. TL did not predict survival, but we found a tendency for individuals with short telomeres to have higher annual reproductive success. Our study showed how early-life TL is shaped by effects of growth, weather conditions, and population density, supporting that environmental stressors negatively affect TL in wild populations. In addition, shorter telomeres may be associated with a faster pace-of-life, as individuals with higher dispersal rates and annual reproduction tended to have shorter early-life TL.
Article
Variable selection plays an important role in data mining. It is crucial to filter useful variables and extract useful information in a high-dimensional setup when the number of predictor variables d tends to be much larger than the sample size n . Statistical inferences can be more precise after irrelevant variables are moved out by the screening method. This article proposes an orthogonal matching pursuit algorithm for variable screening under the high-dimensional setup. The proposed orthogonal matching pursuit method demonstrates good performance in variable screening. In particular, if the dimension of the true model is finite, OMP might discover all relevant predictors within a finite number of steps. Throughout theoretical analysis and simulations, it is confirmed that the orthogonal matching pursuit algorithm can identify relevant predictors to ensure screening consistency in variable selection. Given the sure screening property, the BIC criterion can be used to practically select the best candidate from the models generated by the OMP algorithm. Compared with the traditional orthogonal matching pursuit method, the resulting model can improve prediction accuracy and reduce computational cost by screening out the relevant variables.
Article
Multicellular synchronization is a ubiquitous phenomenon in living systems. However, how noisy and heterogeneous behaviors of individual cells are integrated across a population toward multicellular synchronization is unclear. Here, we study the process of multicellular calcium synchronization of the endothelial cell monolayer in response to mechanical stimuli. We applied information theory to quantify the asymmetric information transfer between pairs of cells and defined quantitative measures to how single cells receive or transmit information within a multicellular network. Our analysis revealed that multicellular synchronization was established by gradual enhancement of information spread from the single cell to the multicellular scale. Synchronization was associated with heterogeneity in the cells’ communication properties, reinforcement of the cells’ state, and information flow. Altogether, we suggest a phenomenological model where cells gradually learn their local environment, adjust, and reinforce their internal state to stabilize the multicellular network architecture to support information flow from local to global scales toward multicellular synchronization.
Article
Each year, billions of seabirds undertake extensive migrations, connecting remote regions of the world, potentially synchronizing population fluctuations among distant areas. This connectedness has implications for the uncertainty calculations of the total seabird bycatch estimate at a regional/global scale synthesized from individual assessments conducted at a local scale. Globally, fisheries bycatch poses an environmental problem in fishery management, and estimating the uncertainty associated with a regional/global seabird bycatch estimate is important, because it characterizes the accuracy and reliability of the fisheries’ impact on the seabird populations. In this study, we focus on the estimation of the variability of total seabird bycatch, synthesized from multiple sources. In addition to a theoretical exploration, we also provide a hypothetical scenario analysis based on data from the Western and Central Pacific Fisheries Commission convention area. The results show that the assumptions on the correlation between different areas has a big impact on the uncertainty estimates, especially when the number of areas to synthesize is large, and simplifying assumptions failed to capture the complex dynamics of seabird bycatch rates among different areas. It is recommended to empirically estimate the correlation of bycatch rates between each pair of sources when time series of bycatch rates are available.
Article
Full-text available
Episodic learning and memory retrieval are dependent on hippocampal theta oscillation, thought to rely on the GABAergic network of the medial septum (MS). To test how this network achieves theta synchrony, we recorded MS neurons and hippocampal local field potential simultaneously in anesthetized and awake mice and rats. We show that MS pacemakers synchronize their individual rhythmicity frequencies, akin to coupled pendulum clocks as observed by Huygens. We optogenetically identified them as parvalbumin-expressing GABAergic neurons, while MS glutamatergic neurons provide tonic excitation sufficient to induce theta. In accordance, waxing and waning tonic excitation is sufficient to toggle between theta and non-theta states in a network model of single-compartment inhibitory pacemaker neurons. These results provide experimental and theoretical support to a frequency-synchronization mechanism for pacing hippocampal theta, which may serve as an inspirational prototype for synchronization processes in the central nervous system from Nematoda to Arthropoda to Chordate and Vertebrate phyla.
Article
Through an industry-science collaboration, a modified SELTRA codend consisting of a grid with scaring floats (GSF) was successfully developed by the fishing industry, and tested in a scientific trial on board a twin-rig demersal trawler. The selectivity of the gears used in the fishery needs to be tailored to each vessel and season, as quota availability differs across vessels and seasons. The modified gear aims at avoiding catching unwanted fish when targeting Norway lobster. The use of the modified codend showed good potential for effectively reducing catches of unwanted fish without affecting the catch of the target species Norway lobster. With respect to a standard SELTRA codend, the GSF design led to a significantly lower retention of unwanted fish, with 71 (61–79) % less cod, 94 (73–100) % less saithe and 22 (14–37) % less plaice being retained. The results show that fishers have the knowlegde to develop more selective gears. The authors argue that fisheries management should support such industry driven innovation through more flexible management approaches.
Article
This work combines in situ measurements with a time series of satellite turbidity derived from Landsat 8-OLI images to provide a first synoptic overview of the main hydrometeorological drivers of turbidity in the Bahía Blanca Estuary (Argentina). An empirical relationship between turbidity and SPM concentrations was established for the study area (R2 = 0.92; RMSE = 0.098 mg m−3; NMAE = 6.2%). Several atmospheric correction schemes and turbidity retrieval algorithms were tested and the combination of the SWIR-v and the retrieval algorithm by Dogliotti et al. (2015) were applied to 121 Landsat 8-OLI scenes (2013–2021). The effects of tides, winds, and rainfall on satellite turbidity were evaluated through Generalized Linear Models (GLM) built for three different sectors along Canal Principal, from the inner zone to its mouth. Regardless of the zone, cumulative rainfall had negligible effects on turbidity. Tides had a significant effect in the inner and middle zones. In the inner zone, higher turbidity values significantly associated with ebb tide conditions, which produce erosion. In the middle section tidal current speeds positively correlated with turbidity, suggesting sediment resuspension over shallow banks. Close to the mouth of the estuary, turbidity responded entirely to winds. Winds blowing from de NW, aligned with the azimuth of Canal Principal, would aid to the export of estuarine sediments to the shelf.
Article
Selecting a suitable equation to represent a set of multifactor data that was collected for other purposes in a plant, pilot-plant, or laboratory can be troublesome. If there are k independent variables, there are 2 possible linear equations to be examined; one equation using none of the variables, k using one variable, k(k – 1)/2 using two variables, etc. Often there are several equally good candidates. Selection depends on whether one needs a simple interpolation formula or estimates of the effects of individual independent variables. Fractional factorial designs for sampling the 2 possibilities and a new statistic proposed by C. Mallows simplify the search for the best candidate. With the new statistic, regression equations can be compared graphically with respect to both bias and random error.
Article
Summary The use of a multidimensional extension of the minimum final prediction error (FPE) criterion which was originally developed for the decision of the order of one-dimensional autoregressive process [1] is discussed from the standpoint of controller design. It is shown by numerical examples that the criterion will also be useful for the decision of inclusion or exclusion of a variable into the model. Practical utility of the procedure was verified in the real controller design process of cement rotary kilns.
Article
In a recent paper by the present author [1] a simple practical procedure of predictor identification has been proposed. It is the purpose of this paper to provide a theoretical and empirical basis of the procedure.
Article
A fully computerized cement rotary kiln process control was tested in a real production line and the results are presented in this paper. The controller design was based on the understanding of the process behavior obtained by careful statistical analyses, and it was realized by using a very efficient statistical identification procedure and the orthodox optimal controller design by the statespace method. All phases of analysis, design and adjustment during the practical application are discussed in detail. Technical impact of the success of the control on the overall kiln installation is also discussed. The computational procedure for the identification is described in an Appendix.
Article
Incluye bibliografía e índice
Article
The foundations of a general theory of statistical decision functions, including the classical non-sequential case as well as the sequential case, was discussed by the author in a previous publication [3]. Several assumptions made in [3] appear, however, to be unnecessarily restrictive (see conditions 1-7, pp. 297 in [3]). These assumptions, moreover, are not always fulfilled for statistical problems in their conventional form. In this paper the main results of [3], as well as several new results, are obtained from a considerably weaker set of conditions which are fulfilled for most of the statistical problems treated in the literature. It seemed necessary to abandon most of the methods of proofs used in [3] (particularly those in section 4 of [3]) and to develop the theory from the beginning. To make the present paper self-contained, the basic definitions already given in [3] are briefly restated in section 2.1.
Article
Sherman [8] and Stein [9] have shown that a method given by the author [1] for comparing two experiments is equivalent, for experiments with a finite number of outcomes, to the original method introduced by Bohnenblust, Shapley, and Sherman [4]. A new proof of this result is given, and the restriction to experiments with a finite number of outcomes is removed. A class of weaker comparisons--comparison in $k$-decision problems--is introduced, in three equivalent forms. For dichotomies, all methods are equivalent, and can be described in terms of errors of the first and second kinds.
Article
The principle of maximum entropy, together with some generalizations, is interpreted as a heuristic principle for the generation of null hypotheses. The main application is to $m$-dimensional population contingency tables, with the marginal totals given down to dimension $m - r$ ("restraints of the $r$th order"). The principle then leads to the null hypothesis of no "$r$th-order interaction." Significance tests are given for testing the hypothesis of no $r$th-order or higher-order interaction within the wider hypothesis of no $s$th-order or higher-order interaction, some cases of which have been treated by Bartlett and by Roy and Kastenbaum. It is shown that, if a complete set of $r$th-order restraints are given, then the hypothesis of the vanishing of all $r$th-order and higher-order interactions leads to a unique set of cell probabilities, if the restraints are consistent, but not only just consistent. This confirms and generalizes a recent conjecture due to Darroch. A kind of duality between maximum entropy and maximum likelihood is proved. Some relationships between maximum entropy, interactions, and Markov chains are proved.
Article
Thesis (Ph. D. in Statistics)--University of California, Berkeley, June 1952. Bibliography: p. 125-128.
Article
Standard real business cycle models must rely on total factor productivity (TFP) shocks to explain the observed comovement of consumption, investment, and hours worked. This paper shows that a neoclassical model consistent with observed heterogeneity in labor supply and consumption can generate comovement in the absence of TFP shocks. Intertemporal substitution of goods and leisure induces comovement over the business cycle through heterogeneity in the consumption behavior of employed and unemployed workers. This result owes to two model features introduced to capture important characteristics of U.S. labor market data. First, individual consumption is affected by the number of hours worked: Employed agents consume more on average than the unemployed do. Second, changes in the employment rate, a central factor explaining variation in total hours, affect aggregate consumption. Demand shocks--such as shifts in the marginal efficiency of investment, as well as government spending shocks and news shocks--are shown to generate economic fluctuations consistent with observed business cycles.
Article
The problems of statistics are broadly classified into problems of specification and problems of inference, and a brief recapitulation is given of some standard methods in statistics, based on the use of the probability p (S/H) of the data S on the specification H (or on the use of the equivalent likelihood function). The general problems of specification and inference for time-series are then also briefly surveyed. To conclude Part I, the relation is examined between the information (entropy) concept used in communication theory, associated with specification, and Fisher's information concept used in statistics, associated with inference. In Part II some detailed methods of analysis are described with special reference to stationary time-series. The first method is concerned with the analysis of probability chains (in which the variable X can assume only a finite number of values or 'states', and the time t is discrete). The next section deals with autoregressive and autocorrelation analysis, for series defined either for discrete or continuous time, including proper allowance for sampling fluctuations; in particular, least-squares estimation of unknown coefficients in linear autogressive representations, and Quenouille's goodness of fit test for the correlogram, are illustrated. Harmonic or periodogram analysis is theoretically equivalent to autocorrelation analysis, but in the case of time-series with continuous spectra is valueless in practice without some smoothing device, owing to the peculiar distributional properties of the observed periodogram; one such arithmetical device is described in Section 7. Finally the precise use of the likelihood function (when available) is illustrated by reference to two different theoretical series giving rise to the same autocorrelation function.
On a semi-automatic power spectrum estimation procedure
  • H Akaike
Determination of the number of factors by an extended maximum likelihood principle
  • H Akaike
Tests of statistical hypotheses concerning several parameters when the number of observations is large
  • A Wald