Article

The Kolmogorov-Smirnov, Cramér-Von Mises Tests

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

1. Preface. This is an expository paper giving an account of the "goodness of fit" test and the "two sample" test based on the empirical distribution functiontests which were initiated by the four authors cited in the title. An attempt is made here to give a fairly complete coverage of the history, development, present status, and outstanding current problems related to these topics. The reader is advised that the relative amount of space and emphasis allotted to the various phases of the subject does not reflect necessarily their intrinsic merit and importance, but rather the author's personal interest and familiarity. Also, for the sake of uniformity the notationt of miany of the writers quoted has been altered so that when referring to the original papers it will be necessary to check their nomenlclature. 2. The empirical distribution function and the tests. Let XI, X2, * * ,XXn be independent random variables (observations) each having the same distribution

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... We abbreviate both variants of the χ 2 goodness-of-fit test by GoF1 and GoF2, respectively. Smirnov (1948) and Darling (1957). The idea of this test is to compare the empirical cumulative distribution function (cdf) F n (x) with a fully specified theoretical one, F 0 (x). ...
... The critical values of the KS test were completely tabulated by Miller (1956) [16] for underlying continuous distributions. Morrow (2014) [17] computed tighter bounds by Monte Carlo simulation for the discrete Benford distribution of the first digit, cf. ...
Preprint
Full-text available
The Benford Law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from a universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit D1 is One with probability P (D1 = 1) = log10 2 ≈ 0.3. There are several tests available for testing Benford, the best known are Pearson’s χ2 -test, the Kolmogorov-Smirnov test and the MAD-test suggested by Nigrini (2012). The latter test was enhanced to significance tests in K¨ossler, Lenz and Wang (2021) and in Cerqueti and Lupi (2021). In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford Law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent from the sample size. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.
... where I is the indicator function, given by The ECDF is used as a feature vector representation of the statistical properties of the window of interest. Computing the Euclidean distance (l 2 norm) between two ECDFs is equivalent to the Cramer-von Mises test, a useful metric for estimating distribution equality [27]. An alternate choice of vector norm could be used if desired, e.g. the l ∞ norm, which would correspond to the Kolmogorov-Smirnov test [27]. ...
... Computing the Euclidean distance (l 2 norm) between two ECDFs is equivalent to the Cramer-von Mises test, a useful metric for estimating distribution equality [27]. An alternate choice of vector norm could be used if desired, e.g. the l ∞ norm, which would correspond to the Kolmogorov-Smirnov test [27]. This work uses the k-nearest neighbors (k-NN) classifier, since many distance metrics are applicable. ...
Article
Full-text available
Nonintrusive identification of the energy consumption of individual loads from an aggregate power stream typically relies on relatively well-defined transient signatures. However, some loads have non-constant power demand that varies with loading conditions. These loads, such as computer-controlled machine tools, remain stubbornly resistant to conventional nonintrusive electrical monitoring methods. The power behavior of these loads can be modelled with stochastic processes. This paper presents statistical feature extraction techniques for identification of this fluctuating power behavior. An energy estimation procedure is presented and evaluated for two case studies: load operation on a shipboard microgrid and laboratory machine shop equipment.
... The two-sample Anderson-Darling test (hereafter, TSAD) was introduced by Darling (1957) and studied in detail by Pettitt (1976). The TSAD test based on the empirical distribution function (EDF) avoids the arbitrary binning of histograms and the small number of entries per bin in the  2 test (Bohm and Zech, 2017). ...
... Kolmogorov-Smirnov test is used to determine that the sample data comes with specific distribution (Darling, 1957;Razali & Wah, 2011) while Shapiro Wilk test is proved to be best for to check the normality of data. The Shapiro wilk test is valid for the normality test because it tends to show the sample data as drawn from normality distributed population, as in this case we have forty-four samples of papers. ...
Article
Full-text available
This paper explores the intercultural competence of Pakistani students who appear in Cambridge O-Level English language examination conducted in Pakistan by British council twice a year. English language is a core subject and is coded with 1123 for Cambridge International Examination (CIE). For this subject, two papers are taken i.e. paper-I was about reading comprehension and paper-II was about writing a composition. To analyze the extent to which intercultural competence assesses alongside linguistic competence amongst students through exam papers, forty-four exam papers (2013-2023) are analyzed by using Byram's (1993) checklist that constitutes eight categories of intercultural competence. Fairclough's (2003) model of discourse analysis has been used as a theoretical framework. For analytical framework, topics, themes, and the content of both the papers are thoroughly read and then are put under the eight categories of assessing intercultural competence proposed by Byram (1993). The findings of one-sample t-test indicate that the categories of Byram's checklist have negative effect on the content of the papers of O-Level. The results clearly display the absence of assessment of intercultural competence of students because less content on local culture was given in the exam papers.
... Finkelstein and Schafer (1971) describe a goodness-of-fit test for small sample sizes akin to the Kolmogorov-Smirnov test (e.g., Darling, 1957;Kolmogoroff, 1941;Lilliefors, 1969;Walsh, 1963). The Kolmogorov-Smirnov test compares a sample with a reference probability distribution or two samples with each other (in case of an unknown probability distribution function). ...
... If any firm cash payout to the monetary pro perfectly healthy of dividend would lessen the openness of resources for the association. The affirmations are demonstrated to the previous finding of ( Baker et al., 2001;Bruce, 2011;Darling, 1957) who found that liquidity (current ratio) and dividend payout association transport negative to each other. Consequently, significant and negative connection in between of liquidity and dividend payout is predicted. ...
Article
Full-text available
The aspiration of this research paper is to get the measure of the Financial Factors Influencing on Dividend Policy with respect to General Industrial Sector and to grasp the association ship that impact on dividend payout. Method of sampling is being used in Simple Pooled Regression Model, Fixed Effective Model and Random Effective Model including Hausman Test taking samples of the 9 organizations listed (General Industrial Sector) on the Karachi stock exchange for the periods of 11 years (2014-2022).In Simple Regression Model (OLS) the results showed that financial factors such as (ROA, ROE and CR) has significant influence on dividend payout ratio but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share) and other model has applied such as Fixed Effective Model (FEM) the results showed that (Return on assets and Current ratio has significant influence on dividend payout ratio) along with association betwixt dividend payout with all explanatory variables such as (positive with return on assets and leverage while negative with return on equity, firm size, current ratio and earnings per share) however Random Effective Model (REM) has applied the results has showed that (ROA, ROE and CR) has significant impact on dividend payout ratio and others were insignificant on dividend payout but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share).Hausman Test has applied to evaluate the fitted model that Random Effective Model has fitted model because of P-value.5578 higher than .05 or 5%. This research paper will assist for evaluate of the dividend policy regarding General Industrial Sector which is listed in the Karachi Stock Exchange.
... If any firm cash payout to the monetary pro perfectly healthy of dividend would lessen the openness of resources for the association. The affirmations are demonstrated to the previous finding of ( Baker et al., 2001;Bruce, 2011;Darling, 1957) who found that liquidity (current ratio) and dividend payout association transport negative to each other. Consequently, significant and negative connection in between of liquidity and dividend payout is predicted. ...
Article
Full-text available
The aspiration of this research paper is to get the measure of the Financial Factors Influencing on Dividend Policy with respect to General Industrial Sector and to grasp the association ship that impact on dividend payout. Method of sampling is being used in Simple Pooled Regression Model, Fixed Effective Model and Random Effective Model including Hausman Test taking samples of the 9 organizations listed (General Industrial Sector) on the Karachi stock exchange for the periods of 11 years (2014-2022). In Simple Regression Model (OLS) the results showed that financial factors such as (ROA, ROE and CR) has significant influence on dividend payout ratio but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share) and other model has applied such as Fixed Effective Model (FEM) the results showed that (Return on assets and Current ratio has significant influence on dividend payout ratio) along with association betwixt dividend payout with all explanatory variables such as (positive with return on assets and leverage while negative with return on equity, firm size, current ratio and earnings per share) however Random Effective Model (REM) has applied the results has showed that (ROA, ROE and CR) has significant impact on dividend payout ratio and others were insignificant on dividend payout but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share).Hausman Test has applied to evaluate the fitted model that Random Effective Model has fitted model because of P-value.5578 higher than .05 or 5%. This research paper will assist for evaluate of the dividend policy regarding General Industrial Sector which is listed in the Karachi Stock Exchange.
... where I Xi defined as following [44]: ...
Article
Full-text available
Development of efficient methods of the cellular image processing is an important avenue for practical application of modern artificial intelligence techniques. In particular, practical hematology requires automatic classification of images with or without leukemic (blast) cells in peripheral blood smears. This paper presents a new approach to the problem of classification of such cellular images based on graph theory, XGBoost algorithm and convolutional neural networks (CNN). Firstly, each image is transformed into a weighted graph using gradient of intensity. Secondly, a number of graph invariants are computed thus producing a set of synthetic features that is used to train machine learning model based on XGBoost. Combining XGBoost with CNN further increases the accuracy of leukemic cell classification. Sensitivity (TPR) and Specifity (TNR) of the XGBoost-based model were 95% and 97% accordingly; ResNet-50 model showed TPR of 95% and TNR of 98%. Combined use of the XGBoost-based and the ResNet-50 models demonstrated TPR of 99% and TNR of 99%.
... An alternative goodness-of-fit test is the Kolmogorov-Smirnov (KS) test, cf. Kolmogorov (1933), Smirnov (1948) and Darling (1957). The idea of this test is to compare the empirical cumulative distribution function (cdf) F n (x) with a fully specified theoretical one, F 0 (x) . ...
Article
Full-text available
The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s \(\chi ^2\)-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.
... We analyze our data set using a combination of the KS test -a nonparametric test of the equality of continuous, onedimensional probability distributions (Massey 1951;Darling 1957)-and the Rayleigh test, a statistical test for circular distributed data (Fisher 1953;Berens 2009). These tests offer a robust and versatile tool to examine the geometry of the γ-ray emissions and detect deviations from isotropy, without the constrains from predefined models. ...
Article
Full-text available
The Sun is one of the most luminous γ -ray sources in the sky and continues to challenge our understanding of its high-energy emission mechanisms. This study provides an in-depth investigation of the solar disk γ -ray emission, using data from the Fermi Large Area Telescope spanning 2008 August to 2022 January. We focus on γ -ray events with energies exceeding 5 GeV, originating from 0.°5 angular aperture centered on the Sun, and implement stringent time cuts to minimize potential sample contaminants. We use a helioprojection method to resolve the γ -ray events relative to the solar rotation axes and combine statistical tests to investigate the distribution of events over the solar disk. We found that integrating observations over large time windows may overlook relevant asymmetrical features, which we reveal in this work through a refined time-dependent morphological analysis. We describe significant anisotropic trends and confirm compelling evidence of energy-dependent asymmetry in the solar disk γ -ray emission. Intriguingly, the asymmetric signature coincides with the Sun’s polar field flip during the cycle 24 solar maximum, around 2014 June. Our findings suggest that the Sun’s magnetic configuration plays a significant role in shaping the resulting γ -ray signature, highlighting a potential link between the observed anisotropies, solar cycle, and the solar magnetic fields. These insights pose substantial challenges to established emission models, prompting fresh perspectives on high-energy solar astrophysics.
... In addition to chi-square tests, we used Kolmogorov-Smirnov tests. The sample size was also deemed appropriate for these tests as they are better for testing data distributions than chi-square tests even when the sample size is small (e.g., N < 50) and are more sensitive to the shape of distributions (Darling, 1957;Lilliefors, 1967;Engmann and Cousineau, 2011). ...
... Nonparametric approaches are more appealing due to their distribution-free feature. Classical examples include distance-based tests such as the Kolmogorov-Smirnov (K-S) test (Darling, 1957), the Anderson-Darling test (Scholz and Stephens, 1987), To overcome these limitations, we propose a likelihood-based test that can automatically adapt to densities with different shapes and develop a dataadaptive tuning method to automatically choose the penalization parameter. ...
... where x is the domain of the combined sample [20]. The Kolmogorov-Smirnov distance is illustrated in Figure 3. ...
Preprint
Full-text available
While mean-field models of cellular operations have identified dominant processes at the macroscopic scale, stochastic models may provide further insight into mechanisms at the molecular scale. In order to identify plausible stochastic models, quantitative comparisons between the models and the experimental data are required. The data for these systems have small sample sizes and time-evolving distributions. The aim of this study is to identify appropriate distance metrics for the quantitative comparison of stochastic model outputs and time-evolving stochastic measurements of a system. We identify distance metrics with features suitable for driving parameter inference, model comparison, and model validation, constrained by data from multiple experimental protocols. In this study, stochastic model outputs are compared to synthetic data across three scales: that of the data at the points the system is sampled during the time course of each type of experiment; a combined distance across the time course of each experiment; and a combined distance across all the experiments. Two broad categories of comparators at each point were considered, based on the empirical cumulative distribution (ECDF) of the data and of the model outputs: discrete based measures such as the Kolmogorov-Smirnov distance, and integrated measures such as the Wasserstein-1 distance between the ECDFs. It was found that the discrete based measures were highly sensitive to parameter changes near the synthetic data parameters, but were largely insensitive otherwise, whereas the integrated distances had smoother transitions as the parameters approached the true values. The integrated measures were also found to be robust to noise added to the synthetic data, replicating experimental error. The characteristics of the identified distances provides the basis for the design of an algorithm suitable for fitting stochastic models to real world stochastic data.
... Next, we present a more comprehensive analysis of the reflectances using a hypothesis testing approach. Treating the reflectances individually, we use the Kolmogorov-Smirnov test [31] on the empirical marginal distribution of the MCMC samples with the null hypothesis being that the reflectances are normally distributed. The p-values for each reflectance parameter are shown in Figure 13, with the red line representing p = 0.05. ...
Preprint
Full-text available
The joint retrieval of surface reflectances and atmospheric parameters in VSWIR imaging spectroscopy is a computationally challenging high-dimensional problem. Using NASA's Surface Biology and Geology mission as the motivational context, the uncertainty associated with the retrievals is crucial for further application of the retrieved results for environmental applications. Although Markov chain Monte Carlo (MCMC) is a Bayesian method ideal for uncertainty quantification, the full-dimensional implementation of MCMC for the retrieval is computationally intractable. In this work, we developed a block Metropolis MCMC algorithm for the high-dimensional VSWIR surface reflectance retrieval that leverages the structure of the forward radiative transfer model to enable tractable fully Bayesian computation. We use the posterior distribution from this MCMC algorithm to assess the limitations of optimal estimation, the state-of-the-art Bayesian algorithm in operational retrievals which is more computationally efficient but uses a Gaussian approximation to characterize the posterior. Analyzing the differences in the posterior computed by each method, the MCMC algorithm was shown to give more physically sensible results and reveals the non-Gaussian structure of the posterior, specifically in the atmospheric aerosol optical depth parameter and the low-wavelength surface reflectances.
... Although the Kuiper's test has been proposed for about 50 years, it is not widely propagated for college students, computer programmers, engineers, experimental psychologist and so on partly due to the difficulty of solving the upper tail quantile and partly due to the lack of open software for automatic calculation. Let C n = √ n · V n = √ n · (D + n + D − n ) (8) be the Kuiper's C n statistic of critical value for the V n -test, Kuiper [1,10] pointed out that ...
Preprint
Full-text available
Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem that solving the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to difficulties of solving the nonlinear equation and reasonable approximation of infinite series. The pioneering work by Kuiper and Stephens just provided the key ideas and few numerical tables created from the the upper tail probability $\alpha$ and sample capacity $n$, which limited its propagation and possible applications in various fields since there are infinite configurations for the parameters $\alpha$ and $n$. In this work, the contributions lie in two perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to get higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details. The algorithms are verified and validated by comparison with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worth of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.
... One of the oldest statistical problems is to estimate distributions for which Kolmogorov [2], Smirnov [3] and von Mises introduced nonparametric hypothesis tests [4]. Each of these tests define an ancillary statistic whose distribution does not depend on the true underlying distribution, but can be computed from any finite sample. ...
Preprint
One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.
... The ultimate goal of this work is to determine how similar the NT colors are to other populations in the solar system. A simple statistical test to measure the likelihood that two distributions are drawn from the same underlying distribution is the Kolmogorov-Smirnov (K-S) test (Darling 1957). Although the K-S test can be generalized to more than a single dimension, the interpretation becomes complicated. ...
Article
Full-text available
In 2018, Jewitt identified the “The Trojan Color Conundrum,” namely that Neptune's Trojan asteroids (NTs) had no ultrared members, unlike the the nearby Kuiper Belt. Since then, numerous ultrared NTs have been discovered, seemingly resolving this conundrum. However, it is still unclear whether or not the Kuiper Belt has a color distribution consistent with the NT population, as would be expected if it were the source population. In this work, we present a new photometric survey of 15 out of 31 NTs. We utilized the Sloan g ′ r ′ i ′ z ′ filters on the IMACS f/4 instrument, which is mounted on the 6.5 m Baade telescope. In this survey, we identify four NTs as being ultrared using a principal component analysis. This result brings the ratio of red to ultrared NTs to 7.75:1, more consistent with the corresponding trans-Neptunian object ratio of 4–11:1. We also identify three targets as being blue (nearly solar) in color. Such objects may be C-type surfaces, but we see more of these blue NTs than has been observed in the Kuiper Belt. Finally, we show that there are hints of a color-absolute magnitude (H) correlation, with larger H (smaller sized, lower albedo) tending to be more red, but more data are needed to confirm this result. The origin of such a correlation remains an open question that will be addressed by future observations of the surface composition of these targets and their rotational properties.
... where L is the log-likelihood value and k is the number of parameters of the fitted model and n represents the sample size. The Cramer-von Mises test (Darling 1957), a goodness of fit test, was used to compare a fitted theoretical (C θ ) and an empirical (C n ) copula. The Cramer-von Mises test statistics can be defined as ...
Article
Copula functions are widely used to derive multivariate probability distributions in hydrometeorology. One of the key steps in the copula method is the derivation of marginal distributions of individual variables which can be accomplished using the principle of maximum entropy where the distribution parameters are estimated from the specified constraints. This study, investigated two drought variables (severity and duration) by coupling the principle of maximum entropy with parametric and empirical copulas. So, homogeneous climatic zones were first identified by applying the fuzzy clustering method to data from 39 synoptic stations in Iran and then drought severity and duration were determined with the standardized precipitation index. These two variables were scaled and their marginal probability distribution functions were derived using the principle of maximum entropy as well as empirically. Then, the joint probability distribution of drought severity and duration was determined using maximum entropy-copula, and parametric and empirical copulas. Thereafter, bivariate conditional return periods were determined for each homogeneous region. Results showed that 1) univariate and bivariate distributions can be obtained by maximizing entropy; 2) the dependence structure via Spearman's rho, which directly affects the Lagrange parameters of entropy copula, was a controlling factor to optimize the objective function; 3) for a given set of constraints, the maximum entropy copula is independent of the types of marginals; 4)The entropy-entropy copula (with entropy marginals) is considered a better method than alternatives, because it has a similar result to the parametric methods while it only needs to fit a single model.
... There is also the two-sample AD test, introduced by Darling [26] and Pettitt [27], which generalizes according to equation (8). ...
Article
Full-text available
This paper presents a new method for analysing creeping discharges based on information theory as it applies to medical imaging. The analysis of information surface data is used to determine the impact of relaxation time on the characteristic parameters of creeping discharges. The same information is used to make a comparative study of the morphology of discharges propagating in palm kernel oil methyl ester (PKOME) and in mineral oil (MO). Other comparative methods based on fractal analysis and normality hypothesis tests associated with Anderson Darling (AD), Kolmogorov-Smirnoff (KS) and Shapiro-Wilk (SW) statistics are used. The results show that very short relaxation times increase the error on the measurement of the fractal dimension and the maximum extension of the discharges. A growth of the mutual information between 0 and 60% is observed for relaxation times varying between 60s and 420s respectively. For the same time interval, the P-value increases from 0.027 to 0.821 according to the AD statistic, from 0.01 to more than 0.150 according to KS and from 0.083 to more than 0.1 according to SW. This result indicates that the data are from a normal distribution. After 420s of relaxation, the error on the maximum extension measurement is reduced by 94% in PKOME and 92% in MO. Similarly, the error on the mean fractal dimension in MO is reduced by 86.7% for a relaxation time between 301s and 420s, and by 84.6% in PKOME for a time between 180s and 420s. These different results imply that the impact of the discharge can be predicted when it is in its initial phase during which the number of discharge occurrences is reduced. On the other hand, the physicochemical characteristics of the insulating liquid used dictate the relaxation time to be allowed for the laboratory measurements.
... In the case of abundant historical data on a cost 45 component or an input variable, the parameters of a probability distribution can be estimated. 46 criterion test (Darling 1957; Arnold and Emerson 2011), can be used to measure how good the 48 distribution represents the data. In the case of limited available data, triangular distribution is often 49 recommended to represent the underlying uncertain variables with the minimum, maximum, and 50 most likely values that are set using decision makers' judgment (Walls and Smith 1998;Campbell 51 and Brown 2003; Gransberg # Fit a normal distribution. ...
Chapter
Although many factors, including noneconomic barriers (e.g., regulatory and environmental factors), influence decision-making about construction projects, when it’s time to invest, many construction project owners should allocate their limited financial resources to projects with the highest returns on investment. However, the investment valuation of construction projects is subject to significant uncertainties, such as substantial construction cost variations that make decision-making difficult. This chapter presents several investment valuation methods, such as a stochastic life-cycle cost analysis technique and a real options analysis method, to evaluate investments in construction projects under uncertainties. The stochastic life-cycle cost analysis captures the volatility of the input variables in investment valuation based on their historical values, propagates them through the life-cycle cost analysis method, and determines the probability distribution of the life-cycle cost. Real options analysis evaluates real (nonfinancial) investments under uncertainty with elements for strategic management flexibility and delayed investment. Various examples of construction investment valuations, along with the R codes, are presented in this chapter to enhance the learning experience. These resources can be extended for the assessment of other construction investment projects.KeywordsConstruction investment valuation under uncertaintyReal options analysisStochastic life-cycle cost analysisBinomial decision treeMonte Carlo simulationInvestment decision-making under uncertainty
... The fitting of the distributions was done using the following twenty distributions: Cauchy, Error, Hypersecent, Gamma, Laplace, Logistic, Log Pearson 3, Rayleigh (2p), Weibull (3p), Log Logistic (3p), Triangular, Gen Gamma, Gen.Gamma (4p), Gen.Extream Value, Log Normal (3p), Pearson 5 (3p), Fatigue life (3p), I nv.Gaussian (3p), Naka The Goodness of Fit tests that would be important decision-making aids in selecting the best-fitting distribution are also listed in the section that follows [11], [12]. Some of the fitted distributions are: ...
Article
Full-text available
Telangana state's population is mostly dependent on agriculture. The Telangana state's economy depends heavily on agriculture, as does the nation's and the state's ability to achieve food security. Combining art and science to fit a statistical distribution to data involves making trade-offs along the way. The secret to effective data analysis is striking a balance between improving distributional fit and preserving ease of estimation while keeping in mind that the analysis's ultimate goal is to help you make better decisions. A recurring issue in agricultural research was which distribution should be utilized to simulate the production data from an experiment. An analysis is then carried out utilizing the obtained distributions using the statistical method to fit probability distributions to data of variables. These distributions would be a representation of the properties of the variable data. The twenty distributions are: Cauchy, Error, Hypersequent, Gamma(3p), Laplace, Logistic, Log Pearson 3, Rayleigh (2p), Weibull (3p), Log Logistic (3p), Triangular, Gen Gamma, Gen.Gamma(4p), Gen.Extreme Value, Log Normal (3p), Pearson 5 (3p), Fatigue life (3p), Inv. Gaussian (3p), Nakagami, In order to suit a distribution research, rice distributions are used. Twenty probability distributions were computed, and the test statistic Kolmogorov-Smirnov test, Anderson-Darling test, Chi-Square test, and each data set were used to choose the distribution that fit the data the best. The probability distributions include Cauchy, Error, Hypersequent, Gamma, Laplace, Logistic, Log Pearson 3, Rayleigh (2p), Weibull (3p), Log Logistic (3p), Triangular, Gen. Gamma, Gen. Gamma (4p), Gen. Extreme Value, Log Normal (3p), Log Pearson 5, Fatigue life (3p), Inv. Gaussian (3p), and Nakagami.
... Similar probability distributions for streambed P retention between model results and the observation-based estimates would indicate correctly simulated mean behavior, range of behavior, and likelihood of behaviors. We apply four statistical tests or measures to quantify the similarity between estimated (from observations) and simulated probability distributions of streambed P retention in the monitored reach: (a) Wilcoxon-Mann-Whitney (WMW) test for difference in means (Neuhäuser, 2011;Woolson, 2008), (b) Levene test for difference in variances (Glass, 1966;Lim & Loh, 1996), (c) Kolmogorov-Smirnov (K-S) statistic for difference in probability distributions (Berger & Zhou, 2014;Massey, 1951), and the (d) Cramer-von Mises (C-vM) test for difference in probability distributions (Anderson, 1962;Darling, 1957). Additional details for these statistical tests or measures are provided in Supporting Information S1 (Text S4). ...
Article
Full-text available
Efforts to reduce riverine phosphorus (P) loads have not been as fruitful as expected or hoped. One reason for the failure of these efforts appears to be that models used for watershed P management have understated and misrepresented the role of in‐stream processes in shaping watershed P export. Here, we update the latest release of the Soil and Water Assessment Tool (SWAT+), a widely used watershed management model, to better represent in‐stream P retention and remobilization (SWAT+P.R&R). We add new streambed pools where P is stored and tracked, and we incorporate three new processes driving in‐stream P dynamics: (a) deposition and resuspension of sediment‐associated P, (b) diffusion of dissolved P between the water column and streambed, and (c) adsorption and desorption of mineral P. The objective of this modeling work is to provide a diagnostic tool that enables researchers to challenge existing assumptions regarding how watersheds store, transform, and transport P. Here, in a first diagnostic analysis, SWAT+P.R&R helps reconcile in‐stream P retention theory (that P is retained at low flows and remobilized at high flows) and a discordant data set in our validation watershed. SWAT+P.R&R results (a) clarify that the theorized relationship between P retention and flow is only valid (for this point‐source affected testbed, at least) at the temporal scale of a single rising‐or‐falling hydrograph limb and (b) illustrate that hysteresis obscures the relationship at longer temporal scales. Future work using SWAT+P.R&R could further challenge assumptions regarding timescales of in‐stream P legacies and sources of P load variability.
... To quantify the differences between the g − i distribution of the NTs and the g − i distribution of other dynamical classes, we apply the Kolmogoro v-Smirno v (KS) test which measures the maximum difference between two cumulative distributions (Darling 1957 ). The KS method tests the null hypothesis that two cumulative distributions are drawn from the same parent distribution. ...
Article
Neptunian Trojans (NTs), trans-Neptunian objects in 1:1 mean-motion resonance with Neptune, are generally thought to have been captured from the original trans-Neptunian protoplanetary disk into co-orbital resonance with the ice giant during its outward migration. It is possible, therefore, that the colour distribution of NTs is a constraint on the location of any colour transition zones that may have been present in the disk. In support of this possible test, we obtained g, r, and i-band observations of 18 NTs, more than doubling the sample of NTs with known visible colours to 31 objects. Out of the combined sample, we found ≈4 objects with g-i colours of >1.2 mags placing them in the very red (VR) category as typically defined. We find, without taking observational selection effects into account, that the NT g-i colour distribution is statistically distinct from other trans-Neptunian dynamical classes. The optical colours of Jovian Trojans and NTs are shown to be less similar than previously claimed with additional VR NTs. The presence of VR objects among the NTs may suggest that the location of the red to VR colour transition zone in the protoplanetary disk was interior to 30-35 au.
... Generally, we see that COSMOS2020-PCz6.05-01 is skewed toward higher masses than the field CDF. To check if the overdensity and field are drawn from different distributions, we can perform a two-sample Kolmogorov-Smirnov (K-S) test (Darling 1957) using the stellar masses with the null hypothesis that the two independent samples are drawn from the same continuous distribution. We obtain a K-S statistic of 0.28 and a p-value of 0.10 and therefore cannot reject the null hypothesis at 10% level. ...
Article
Full-text available
We conduct a systematic search for protocluster candidates at z ≥ 6 in the Cosmic Evolution Survey (COSMOS) field using the recently released COSMOS2020 source catalog. We select galaxies using a number of selection criteria to obtain a sample of galaxies that have a high probability of being inside a given redshift bin. We then apply overdensity analysis to the bins using two density estimators, a Weighted Adaptive Kernel estimator and a Weighted Voronoi Tessellation estimator. We have found 15 significant (>4 σ ) candidate galaxy overdensities across the redshift range 6 ≤ z ≤ 7.7. The majority of the galaxies appear to be on the galaxy main sequence at their respective epochs. We use multiple stellar-mass-to-halo-mass conversion methods to obtain a range of dark matter halo mass estimates for the overdensities in the range of ∼10 ¹¹ –10 ¹³ M ⊙ , at the respective redshifts of the overdensities. The number and the masses of the halos associated with our protocluster candidates are consistent with what is expected from the area of a COSMOS-like survey in a standard Λ cold dark matter cosmology. Through comparison with simulation, we expect that all of the overdensities at z ≃ 6 will evolve into Virgo-/Coma-like clusters at present (i.e., with masses ∼10 ¹⁴ –10 ¹⁵ M ⊙ ). Compared to other overdensities identified at z ≥ 6 via narrowband selection techniques, the overdensities presented appear to have ∼10× higher stellar masses and star formation rates (SFRs). We compare the evolution in the total SFR and stellar mass content of the protocluster candidates across the redshift range 6 ≤ z ≤ 7.7 and find agreement with the total average SFR from simulations.
... Grain size curves are, in fact, empirical distributions of particle mass distribution. To answer the question of whether there are statistically significant differences between the two grain-size curves, the non-parametric Cramér-von Mises test was used [31,32]. Using this test, at each sampling point (measuring points 1-8), the grain size curves at the control points (S, F and E) were compared in pairs. ...
Article
Full-text available
Road dust is an important inexhaustible source of particulate matter from traffic and the resuspension of finer particles carried by wind and traffic. The components of this material are of both natural and anthropogenic origin. Sources of particulate pollution are vehicles and road infrastructure. The work aimed to analyze the mass fraction of the finest fractions of road dust (<0.1 mm) collected from highways and expressways with asphalt and concrete surfaces. Sampling points were located in the central and southern parts of Poland. The research material was sieved on a sieve shaker. It has been proven that concrete pavement is less susceptible to abrasion than asphalt pavement. Particles formed under the influence of the erosion of asphalt and concrete belong to the fraction gathering coarser particles than the critical for this research fraction (<0.1 mm). It was found that limiting the area with sound-absorbing screens leads to the accumulation of fine road dust in this place, contrary to the space where are strong air drafts that remove smaller particles from the vicinity of the road. In general, the mass fraction of particles smaller than 100 µm in road dust was from 12.8% to 3.4% for asphalt surfaces and from 12.0% to 6.5% for concrete surfaces.
... Article number, page 8 of 17 We performed a Kolmogorov-Smirnov test (Darling 1957;Press et al. 1992) on the GRB (N 1 = 839) and the complete SFXT (N 2 = 53) samples of T 90 and obtained a K-S statistic D N 1 ,N 2 = 0.651 and a K-S probability of 2 × 10 −19 , showing that the two underlying one-dimensional probability distributions differ significantly. Even when excluding all GRBs with T 90 < 10 s (effectively excluding short GRBs), D 656,53 = 0.622 and the K-S probability is 1 × 10 −17 , thus confirming that the T 90 of SFXTs and long GRBs are not drawn from the same parent distribution. ...
Article
Full-text available
Supergiant fast X-ray transients (SFXTs) are high mass X-ray binaries (HMXBs) displaying X-ray outbursts that can reach peak luminosities up to 10 ³⁸ erg s ⁻¹ and spend most of their lives in more quiescent states with luminosities as low as 10 ³² −10 ³³ erg s ⁻¹ . During the quiescent states, less luminous flares are also frequently observed with luminosities of 10 ³⁴ −10 ³⁵ erg s ⁻¹ . The main goal of the comprehensive and uniform analysis of the SFXT Swift triggers presented in this paper is to provide tools to predict whether a transient that has no known X-ray counterpart may be an SFXT candidate. These tools can be exploited for the development of future missions exploring the variable X-ray sky through large field-of-view instruments. We examined all available data on outbursts of SFXTs that triggered the Swift /Burst Alert Telescope (BAT) collected between 2005 August 30 and 2014 December 31, in particular those for which broad-band data, including the Swift /X-ray Telescope (XRT) data, are also available. This work complements and extends our previous catalogue of SFXT flares detected by BAT from 2005 February 12 to 2013 May 31, since we now include the additional BAT triggers recorded until the end of 2014 (i.e. beyond the formal first 100 months of the Swift mission). Due to a change in the mission’s observational strategy, virtually no SFXT triggers obtained a broad-band response after 2014. We processed all BAT and XRT data uniformly by using the Swift Burst Analyser to produce spectral evolution dependent flux light curves for each outburst in the sample. The BAT data allowed us to infer useful diagnostics to set SFXT triggers apart from the general γ -ray burst population, showing that SFXTs uniquely give rise to image triggers and are simultaneously very long, faint, and ‘soft’ hard-X-ray transients. We find that the BAT data alone can discriminate very well the SFXTs from other classes of fast transients, such as anomalous X-ray pulsars and soft gamma repeaters. On the contrary, the XRT data collected around the time of the BAT triggers are shown to be decisive for distinguishing SFXTs from, for instance, accreting millisecond X-ray pulsars and jetted tidal disruption events. The XRT observations of 35 (out of 52 in total) SFXT BAT triggers show that in the soft X-ray energy band, SFXTs display a decay in flux from the peak of the outburst of at least three orders of magnitude within a day and rarely undergo large re-brightening episodes, favouring in most cases a rapid decay down to the quiescent level within three to five days (at most).
... We performed a Kolmogorov Smirnov test (Darling 1957;Press et al. 1992) on the GRB (N 1 = 839) and complete SFXT (N 2 = 53) samples of T 90 and obtained a K-S statistic D N 1 ,N 2 = 0.651 and a K-S probability of 2 × 10 −19 , so that the two underlying one-dimensional probability distributions differ significantly. Even when excluding all GRBs with T 90 < 10 s (thus effectively excluding short GRBs) D 656,53 = 0.622 and the K-S probability is 1 × 10 −17 , thus confirming that the T 90 of SFXTs and long GRBs are not drawn from the same parent distribution. ...
Preprint
Full-text available
Supergiant Fast X-ray Transients (SFXT) are High Mass X-ray Binaries displaying X-ray outbursts reaching peak luminosities of 10$^{38}$ erg/s and spend most of their life in more quiescent states with luminosities as low as 10$^{32}$-10$^{33}$ erg/s. The main goal of our comprehensive and uniform analysis of the SFXT Swift triggers is to provide tools to predict whether a transient which has no known X-ray counterpart may be an SFXT candidate. These tools can be exploited for the development of future missions exploring the variable X-ray sky through large FoV instruments. We examined all available data on outbursts of SFXTs that triggered the Swift/BAT collected between 2005-08-30 and 2014-12-31, in particular those for which broad-band data, including the Swift/XRT ones, are also available. We processed all BAT and XRT data uniformly with the Swift Burst Analyser to produce spectral evolution dependent flux light curves for each outburst. The BAT data allowed us to infer useful diagnostics to set SFXT triggers apart from the general GRB population, showing that SFXTs give rise uniquely to image triggers and are simultaneously very long, faint, and `soft' hard-X-ray transients. The BAT data alone can discriminate very well the SFXTs from other fast transients such as anomalous X-ray pulsars and soft gamma repeaters. However, to distinguish SFXTs from, for instance, accreting millisecond X-ray pulsars and jetted tidal disruption events, the XRT data collected around the time of the BAT triggers are decisive. The XRT observations of 35/52 SFXT BAT triggers show that in the soft X-ray energy band, SFXTs display a decay in flux from the peak of the outburst of at least 3 orders of magnitude within a day and rarely undergo large re-brightening episodes, favouring in most cases a rapid decay down to the quiescent level within 3-5 days (at most). [Abridged]
... Such a test attempts to reject the null hypothesis that the two samples are actually generated by the same distribution. Commonly applied tests include Kolmogorov-Smirnov, Anderson-Darling, and Cramér-von Mises (Stephens, 1970;Darling, 1957). For our experiments we picked the Cramér-von Mises test as it turned out to be the most efficient in terms of computation time, while there was no particular reason to prefer one over the other. ...
Article
Full-text available
We perform statistical analyses on spatiotemporal patterns in the magnitude distribution of induced earthquakes in the Groningen natural gas field. The seismic catalogue contains 336 earthquakes with (local) magnitudes above 1.45, observed in the period between 1 January 1995 and 1 January 2022. An exploratory moving-window analysis of maximum-likelihood b-values in both time and space does not reveal any significant variation in time, but does reveal a spatial variation that exceeds the 0.05 significance level. In search for improved understanding of the observed spatial variations in physical terms we test five physical reservoir properties as possible b-value predictors. The predictors include two static (spatial, time-independent) properties: the reservoir layer thickness, and the topographic gradient (a measure of the degree of faulting intensity in the reservoir); and three dynamic (spatiotemporal, time-dependent) properties: the pressure drop due to gas extraction, the resulting reservoir compaction, and a measure for the resulting induced stress. The latter property is the one that is currently used in the seismic source models that feed into the state-of-the-art hazard and risk assessment. We assess the predictive capabilities of the five properties by statistical evaluation of both moving window analysis, and maximum-likelihood parameter estimation for a number of simple functional forms that express the b-value as a function of the predictor. We find significant linear trends of the b-value for both topographic gradient and induced stress, but even more pronouncedly for reservoir thickness. Also for the moving window analysis and the step function fit, the reservoir thickness provides the most significant results. We conclude that reservoir thickness is a strong predictor for spatial b-value variations in the Groningen field. We propose to develop a forecasting model for Groningen magnitude distributions conditioned on reservoir thickness, to be used alongside, or as a replacement, for the current models conditioned on induced stress.
... Note that it is always a good idea to visualize the data and check the descriptive statistics. We use Pandas Dataframe to analyze our data and perform common data manipulations to see the statistical properties of data such as mean and the standard deviation of columns [22]. If there exist some data points that do not belong to the rest of the population, we can easily detect and remove those outliers. ...
Article
Full-text available
The purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-degree view of this concept conceptualizing incompleteness of a dataset both as a continuous, discrete random variable depending on the aspect of the required analysis. During the course of the experiments, the investigators have identified Kolomogorov–Smirnov goodness of fit, Mielke distribution, and beta distributions as key methods to analyze the incompleteness of a dataset for the datasets used for experimentation. A comparison of these methods with a mixture density network was also performed. Overall, the investigators have provided key insights into the use of methods and algorithms that can be used to predict data incompleteness and have provided a pathway for further explorations and prediction of data incompleteness.
Article
Agriculture in developed countries is produced under heavily subsidized insurance. The pricing of these insurance contracts, termed premium rates, directly influences farmers profits, their financial solvency, and indirectly, global food security. Changing climate and technology have likely caused significant shifting of mass in crop yield distributions and, if so, has rendered some of the historical yield data irrelevant for estimating premium rates. Insurance is primarily interested in lower tail probabilities and as such the detection of structural change in tail probabilities or higher moments is of great concern for the efficacy of crop insurance programs. We propose a test for structural change with an unknown break(s) which has power against structural change in any moment and can be tailored to a specific range of the underlying distribution. Simulations demonstrate better finite sample performance relative to existing methods and reasonable performance at identifying the break. The asymptotic distribution is shown to follow the Kolmogorov distribution. Our proposed test finds structural change in most major U.S. field crop yields leading to significant premium rate differences. Results of an out-of-sample premium rating game indicate that incorporating structural change in crop yields leads to more accurate premium rates.
Article
Full-text available
The use of pseudo-convex mixtures generated from stable distributions for extremes offers a valuable approach for handling reliability-related data challenges. This framework encompasses pseudo-convex mixtures stemming from exponential distribution. However, precise parameter estimation, particularly in cases where the weight parameter ω is negative, remains a challenge. This work assesses the performance of the Expectation- Maximization algorithm in estimating parameters for pseudo-convex mixtures generated by the exponential distribution through simulation.
Article
Full-text available
Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. In this work, the contributions lie in three perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to achieve higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details; finally, finally, a mistake about the critical value cnα for (α,n)=(0.01,30) in Kuiper's distribution table has been labeled and corrected where n is the sample capacity and α is the upper tail quantile. The algorithms are verified and validated by comparing with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worth of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.
Article
This study aimed to evaluate probable maximum precipitation (PMP) estimated using surface dew points (SDP) or actual precipitable water obtained from upper‐air data (UAD) in the moisture‐maximization method with the help of sufficient extreme precipitation events using large‐scale climate ensemble simulation data (d4PDF). The deviations between the PMP variables estimated by the SDP and UAD approaches were analyzed for southern and northern areas of Japan to consider the regional characteristics of the deviations. We found that the deviations were high in northern areas where the SDPs are relatively low during precipitation events. The PMPs estimated using each approach were also compared to the extreme‐scale reference precipitation proposed in this study. The SDP approach overestimated the PMPs by over 20% compared to the reference precipitation in the northern region. However, the UAD approach showed very low average errors in all southern and northern areas. This tendency of the SDP approach was significantly related to the regional climatic characteristics of the SDP, which indicated that the SDP approach may estimate an uncertain PMP value depending on each regional climatic characteristic compared to the UAD approach. Regional climatic characteristics should be considered when using the SDP approach to estimate the PMP.
Article
Full-text available
The role of medical diagnosis is essential in patient care and healthcare. Established diagnostic practices typically rely on predetermined clinical criteria and numerical thresholds. In contrast, Bayesian inference provides an advanced framework that supports diagnosis via in-depth probabilistic analysis. This study’s aim is to introduce a software tool dedicated to the quantification of uncertainty in Bayesian diagnosis, a field that has seen minimal exploration to date. The presented tool, a freely available specialized software program, utilizes uncertainty propagation techniques to estimate the sampling, measurement, and combined uncertainty of the posterior probability for disease. It features two primary modules and fifteen submodules, all designed to facilitate the estimation and graphical representation of the standard uncertainty of the posterior probability estimates for diseased and non-diseased population samples, incorporating parameters such as the mean and standard deviation of the test measurand, the size of the samples, and the standard measurement uncertainty inherent in screening and diagnostic tests. Our study showcases the practical application of the program by examining the fasting plasma glucose data sourced from the National Health and Nutrition Examination Survey. Parametric distribution models are explored to assess the uncertainty of Bayesian posterior probability for diabetes mellitus, using the oral glucose tolerance test as the reference diagnostic method.
Article
Full-text available
Wireless signals are vulnerable to various security threats, like eavesdropping and jamming, due to the inherent broadcast nature of the wireless channel. Encryption may ensure the confidentiality of the data but does not guarantee successful communication among legitimate users in the presence of strong adversaries, like wideband jammers. In this scenario, hiding a secret signal in presence of another mundane ongoing communication is one of the ways to minimize its chances of getting intercepted. Wireless Steganography is a process of embedding a secret signal inside another signal that acts as a cover to hide the signal of interest. In this paper, we propose to encode secret bits into covert signals that are statistically indistinguishable from a hardware noise generated by a low-cost transmitter. As the covert signal resembles hardware noise, it can be transmitted over any waveform, making it adaptable and portable to any communication link. Each generated complex signal sample is merged with a cover signal sample, yielding a 50% embedding rate. We create the encoding and decoding models by creating a pair of complex-valued neural networks (NNs), which is trained in presence of another NN model, critic. The critic model differentiates between true hardware noise and encoder-generated covert signal, thus providing essential feedback to the NN pair to improve the encoding technique. The decoder undergoes a transfer learning process to adapt to the residual channel effects in over-the-air experiments. In an indoor testbed, we successfully decoded the covert communication that mimics a range of hardware noises and is transmitted using different modulation orders of cover OFDM waveform. Our steganalysis indicates that the covert signal can be generated to mimic specific hardware, which remains indistinguishable in different statistical tests. Our method performs an order of magnitude better in statistical steganalysis compared to the state-of-the-art method in this field.
Article
Full-text available
In the present paper, some well-known tests based on empirical distribution functions (EDF) with estimated parameters for testing composite normality hypothesis are revisited, and some new results on asymptotic properties are provided. In particular, the approximate Bahadur slopes are obtained in the case of close alternatives for the EDF-based tests as well as the likelihood ratio test. The local approximate efficiencies are calculated for several close alternatives. The obtained results could serve as a benchmark for evaluation of the quality of recent and future normality tests.
Thesis
Full-text available
امروزه با توجه به پیشرفت‌های علمی در زمینه‌ی جمع‌آوری داده‌ها می‌توان داده‌های بیشتری را و با بعد بالا ثبت کرد که اغلب این داده‌ها با پیچیدگی زیادی همراه است. این نوع از داده‌ها در ستاره‌شناسی، اخترفیزیک مهندسی، پزشکی، علم تصویری، اقتصاد، هواشناسی، کشاورزی و سایر حوزه‌های علمی پیدا می‌شوند. آزمون نیکویی‌برازش این نوع از داده‌ها از اهمیت خاصی برخوردار است. برای آزمون نیکویی‌برازش تک نمونه‌ای و دو نمونه‌ای از چنین داده‌هایی با بعد بالا، از فن تصویر تصادفی استفاده شده‌است. رویکرد تصویر تصادفی یک روش محاسباتی کارآمد و یک تکنیک قدرتمند و کافی به‌شمار می‌رود. توسط این روش می‌توان به‌طور تصادفی مجموعه‌ای از داده‌ها با بعد بالا را برروی زیرفضایی با بعد کمتر تصویر کرد طوری که با احتمال بالا فواصل جفتی حفظ شوند. به‌بیان دیگر، از آنجا که رویکرد تصویر تصادفی ‎(تقریباً)‎ ویژگی‌های کلیدی مجموعه‌ی اصلی نقاط از فضایی با بعد بالا را پس از تصویر حفظ می‌کند، در انواع مختلف مطالعات و مسائل با پیچیدگی محاسباتی بالا به الگوریتم‌های کارآمد و ساده منجر شده‌است. همچنین و با توجه به اینکه داده‌های تصویر مغز را می‌توان براساس مدل‌های میدان‌های تصادفی گاوسی برازش تحلیل کرد، از روش تصویر تصادفی برای آزمون گاوسی بودن این نوع از تصاویر استفاده خواهیم کرد و آزمون پیشنهادی بیزی وجود سیگنال در میدان تصادفی گاوسی-مقیاس مانای همسانگرد بررسی خواهد شد.
Article
Full-text available
Background Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality. Results To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets. Conclusions The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .
Article
Objective: Robot-assisted rehabilitation training is an effective way to assist rehabilitation therapy. So far, various robotic devices have been developed for automatic training of central nervous system following injury. Multimodal stimulation such as visual and auditory stimulus and even virtual reality (VR) technology were usually introduced in these robotic devices to improve the effect of rehabilitation training. This may need to be explained from a neurological perspective, but there are few relevant studies. Approach: In this study, ten participants performed right arm rehabilitation training tasks using an upper limb rehabilitation robotic device. The tasks were completed under four different feedback conditions including multiple combinations of visual and auditory components: auditory feedback (AF); visual feedback (VF); visual and auditory feedback (VAF); non-feedback (NonF). The functional near-infrared spectroscopy (fNIRS) devices record blood oxygen signals in bilateral motor, visual and auditory areas. Using hemoglobin concentration as an indicator of cortical activation, the effective connectivity of these regions was then calculated through Granger causality. Main results: We found that overall stronger activation and effective connectivity between related brain regions were associated with VAF. When participants completed the training task without visual and auditory feedback, the trends in activation and connectivity were diminished. Significance: This study revealed cerebral cortex activation and interacting networks of brain regions in robot-assisted rehabilitation training with multimodal stimulation, which is expected to provide indicators for further evaluation of the effect of rehabilitation training, and promote further exploration of the interaction network in the brain during a variety of external stimuli, and to explore the best sensory combination.
ResearchGate has not been able to resolve any references for this publication.