The Kolmogorov-Smirnov, Cramér-Von Mises Tests

Some new invariant sum tests and MAD tests for the assessment of Benford's Law

Preprint

Full-text available

Sep 2023

The Benford Law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from a universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit D1 is One with probability P (D1 = 1) = log10 2 ≈ 0.3. There are several tests available for testing Benford, the best known are Pearson’s χ2 -test, the Kolmogorov-Smirnov test and the MAD-test suggested by Nigrini (2012). The latter test was enhanced to significance tests in K¨ossler, Lenz and Wang (2021) and in Cerqueti and Lupi (2021). In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford Law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent from the sample size. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.

Energy Disaggregation of Stochastic Power Behavior

Article

Full-text available

Jan 2022

Nonintrusive identification of the energy consumption of individual loads from an aggregate power stream typically relies on relatively well-defined transient signatures. However, some loads have non-constant power demand that varies with loading conditions. These loads, such as computer-controlled machine tools, remain stubbornly resistant to conventional nonintrusive electrical monitoring methods. The power behavior of these loads can be modelled with stochastic processes. This paper presents statistical feature extraction techniques for identification of this fluctuating power behavior. An energy estimation procedure is presented and evaluated for two case studies: load operation on a shipboard microgrid and laboratory machine shop equipment.

A crisis like no other? Financial market analogies of the COVID-19-cum-Ukraine war crisis

Article

May 2024
N Am J Econ Finance

Assessing Intercultural Competence in Pakistani Students: An Analysis of O-Level English Language Examination

Article

Full-text available

Apr 2024

This paper explores the intercultural competence of Pakistani students who appear in Cambridge O-Level English language examination conducted in Pakistan by British council twice a year. English language is a core subject and is coded with 1123 for Cambridge International Examination (CIE). For this subject, two papers are taken i.e. paper-I was about reading comprehension and paper-II was about writing a composition. To analyze the extent to which intercultural competence assesses alongside linguistic competence amongst students through exam papers, forty-four exam papers (2013-2023) are analyzed by using Byram's (1993) checklist that constitutes eight categories of intercultural competence. Fairclough's (2003) model of discourse analysis has been used as a theoretical framework. For analytical framework, topics, themes, and the content of both the papers are thoroughly read and then are put under the eight categories of assessing intercultural competence proposed by Byram (1993). The findings of one-sample t-test indicate that the categories of Byram's checklist have negative effect on the content of the papers of O-Level. The results clearly display the absence of assessment of intercultural competence of students because less content on local culture was given in the exam papers.

Building design in a changing climate – Future Swiss reference years for building simulations

Article

Apr 2024

FINANCIAL FACTORS INFLUENCING DIVIDEND POLICY: AN EVIDENCE OF GENERAL INDUSTRIAL SECTOR LISTED IN KSE

Article

Full-text available

Apr 2024

The aspiration of this research paper is to get the measure of the Financial Factors Influencing on Dividend Policy with respect to General Industrial Sector and to grasp the association ship that impact on dividend payout. Method of sampling is being used in Simple Pooled Regression Model, Fixed Effective Model and Random Effective Model including Hausman Test taking samples of the 9 organizations listed (General Industrial Sector) on the Karachi stock exchange for the periods of 11 years (2014-2022).In Simple Regression Model (OLS) the results showed that financial factors such as (ROA, ROE and CR) has significant influence on dividend payout ratio but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share) and other model has applied such as Fixed Effective Model (FEM) the results showed that (Return on assets and Current ratio has significant influence on dividend payout ratio) along with association betwixt dividend payout with all explanatory variables such as (positive with return on assets and leverage while negative with return on equity, firm size, current ratio and earnings per share) however Random Effective Model (REM) has applied the results has showed that (ROA, ROE and CR) has significant impact on dividend payout ratio and others were insignificant on dividend payout but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share).Hausman Test has applied to evaluate the fitted model that Random Effective Model has fitted model because of P-value.5578 higher than .05 or 5%. This research paper will assist for evaluate of the dividend policy regarding General Industrial Sector which is listed in the Karachi Stock Exchange.

FINANCIAL FACTORS INFLUENCING DIVIDEND POLICY: AN EVIDENCE OF GENERAL INDUSTRIAL SECTOR LISTED IN KSE

Article

Full-text available

Dec 2023

The aspiration of this research paper is to get the measure of the Financial Factors Influencing on Dividend Policy with respect to General Industrial Sector and to grasp the association ship that impact on dividend payout. Method of sampling is being used in Simple Pooled Regression Model, Fixed Effective Model and Random Effective Model including Hausman Test taking samples of the 9 organizations listed (General Industrial Sector) on the Karachi stock exchange for the periods of 11 years (2014-2022). In Simple Regression Model (OLS) the results showed that financial factors such as (ROA, ROE and CR) has significant influence on dividend payout ratio but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share) and other model has applied such as Fixed Effective Model (FEM) the results showed that (Return on assets and Current ratio has significant influence on dividend payout ratio) along with association betwixt dividend payout with all explanatory variables such as (positive with return on assets and leverage while negative with return on equity, firm size, current ratio and earnings per share) however Random Effective Model (REM) has applied the results has showed that (ROA, ROE and CR) has significant impact on dividend payout ratio and others were insignificant on dividend payout but the association betwixt dividend payout with all explanatory variables such as (positive with return on assets, firm size and leverage while negative with return on equity, current ratio and earnings per share).Hausman Test has applied to evaluate the fitted model that Random Effective Model has fitted model because of P-value.5578 higher than .05 or 5%. This research paper will assist for evaluate of the dividend policy regarding General Industrial Sector which is listed in the Karachi Stock Exchange.

Image Classification of Leukemic Cells Using Invariants of Triangle-Free Graphs as Synthetic Features

Article

Full-text available

Jan 2024

Development of efficient methods of the cellular image processing is an important avenue for practical application of modern artificial intelligence techniques. In particular, practical hematology requires automatic classification of images with or without leukemic (blast) cells in peripheral blood smears. This paper presents a new approach to the problem of classification of such cellular images based on graph theory, XGBoost algorithm and convolutional neural networks (CNN). Firstly, each image is transformed into a weighted graph using gradient of intensity. Secondly, a number of graph invariants are computed thus producing a set of synthetic features that is used to train machine learning model based on XGBoost. Combining XGBoost with CNN further increases the accuracy of leukemic cell classification. Sensitivity (TPR) and Specifity (TNR) of the XGBoost-based model were 95% and 97% accordingly; ResNet-50 model showed TPR of 95% and TNR of 98%. Combined use of the XGBoost-based and the ResNet-50 models demonstrated TPR of 99% and TNR of 99%.

Some new invariant sum tests and MAD tests for the assessment of Benford’s law

Article

Full-text available

Feb 2024
COMPUTATION STAT

The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s $\chi ^2$-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.

Yet Another Sunshine Mystery: Unexpected Asymmetry in GeV Emission from the Solar Disk

Article

Full-text available

Feb 2024

The Sun is one of the most luminous γ -ray sources in the sky and continues to challenge our understanding of its high-energy emission mechanisms. This study provides an in-depth investigation of the solar disk γ -ray emission, using data from the Fermi Large Area Telescope spanning 2008 August to 2022 January. We focus on γ -ray events with energies exceeding 5 GeV, originating from 0.°5 angular aperture centered on the Sun, and implement stringent time cuts to minimize potential sample contaminants. We use a helioprojection method to resolve the γ -ray events relative to the solar rotation axes and combine statistical tests to investigate the distribution of events over the solar disk. We found that integrating observations over large time windows may overlook relevant asymmetrical features, which we reveal in this work through a refined time-dependent morphological analysis. We describe significant anisotropic trends and confirm compelling evidence of energy-dependent asymmetry in the solar disk γ -ray emission. Intriguingly, the asymmetric signature coincides with the Sun’s polar field flip during the cycle 24 solar maximum, around 2014 June. Our findings suggest that the Sun’s magnetic configuration plays a significant role in shaping the resulting γ -ray signature, highlighting a potential link between the observed anisotropies, solar cycle, and the solar magnetic fields. These insights pose substantial challenges to established emission models, prompting fresh perspectives on high-energy solar astrophysics.

Can robot advisers encourage honesty?: Considering the impact of rule, identity, and role-based moral advice

Article

Apr 2024
INT J HUM-COMPUT ST

Minimax Nonparametric Multi-sample Test under Smoothing

Article

Jan 2024
STAT SINICA

The Distance Between: An algorithmic approach to comparing stochastic models to time-series data

Preprint

Full-text available

Dec 2023

While mean-field models of cellular operations have identified dominant processes at the macroscopic scale, stochastic models may provide further insight into mechanisms at the molecular scale. In order to identify plausible stochastic models, quantitative comparisons between the models and the experimental data are required. The data for these systems have small sample sizes and time-evolving distributions. The aim of this study is to identify appropriate distance metrics for the quantitative comparison of stochastic model outputs and time-evolving stochastic measurements of a system. We identify distance metrics with features suitable for driving parameter inference, model comparison, and model validation, constrained by data from multiple experimental protocols. In this study, stochastic model outputs are compared to synthetic data across three scales: that of the data at the points the system is sampled during the time course of each type of experiment; a combined distance across the time course of each experiment; and a combined distance across all the experiments. Two broad categories of comparators at each point were considered, based on the empirical cumulative distribution (ECDF) of the data and of the model outputs: discrete based measures such as the Kolmogorov-Smirnov distance, and integrated measures such as the Wasserstein-1 distance between the ECDFs. It was found that the discrete based measures were highly sensitive to parameter changes near the synthetic data parameters, but were largely insensitive otherwise, whereas the integrated distances had smoother transitions as the parameters approached the true values. The integrated measures were also found to be robust to noise added to the synthetic data, replicating experimental error. The characteristics of the identified distances provides the basis for the design of an algorithm suitable for fitting stochastic models to real world stochastic data.

Evaluating the accuracy of Gaussian approximations in VSWIR imaging spectroscopy retrievals

Preprint

Full-text available

Aug 2023

The joint retrieval of surface reflectances and atmospheric parameters in VSWIR imaging spectroscopy is a computationally challenging high-dimensional problem. Using NASA's Surface Biology and Geology mission as the motivational context, the uncertainty associated with the retrievals is crucial for further application of the retrieved results for environmental applications. Although Markov chain Monte Carlo (MCMC) is a Bayesian method ideal for uncertainty quantification, the full-dimensional implementation of MCMC for the retrieval is computationally intractable. In this work, we developed a block Metropolis MCMC algorithm for the high-dimensional VSWIR surface reflectance retrieval that leverages the structure of the forward radiative transfer model to enable tractable fully Bayesian computation. We use the posterior distribution from this MCMC algorithm to assess the limitations of optimal estimation, the state-of-the-art Bayesian algorithm in operational retrievals which is more computationally efficient but uses a Gaussian approximation to characterize the posterior. Analyzing the differences in the posterior computed by each method, the MCMC algorithm was shown to give more physically sensible results and reveals the non-Gaussian structure of the posterior, specifically in the atmospheric aerosol optical depth parameter and the low-wavelength surface reflectances.

Fixed-Point Algorithm for Solving the Critical Value and Upper Tail Quantile of Kuiper's Statistics

Preprint

Full-text available

Aug 2023

Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem that solving the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to difficulties of solving the nonlinear equation and reasonable approximation of infinite series. The pioneering work by Kuiper and Stephens just provided the key ideas and few numerical tables created from the the upper tail probability $\alpha$ and sample capacity $n$, which limited its propagation and possible applications in various fields since there are infinite configurations for the parameters $\alpha$ and $n$. In this work, the contributions lie in two perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to get higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details. The algorithms are verified and validated by comparison with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worth of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.

Distribution-Free Inference for the Regression Function of Binary Classification

Preprint

Aug 2023

One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.

Photometric Survey of Neptune's Trojan Asteroids. I. The Color Distribution

Article

Full-text available

Aug 2023

In 2018, Jewitt identified the “The Trojan Color Conundrum,” namely that Neptune's Trojan asteroids (NTs) had no ultrared members, unlike the the nearby Kuiper Belt. Since then, numerous ultrared NTs have been discovered, seemingly resolving this conundrum. However, it is still unclear whether or not the Kuiper Belt has a color distribution consistent with the NT population, as would be expected if it were the source population. In this work, we present a new photometric survey of 15 out of 31 NTs. We utilized the Sloan g ′ r ′ i ′ z ′ filters on the IMACS f/4 instrument, which is mounted on the 6.5 m Baade telescope. In this survey, we identify four NTs as being ultrared using a principal component analysis. This result brings the ratio of red to ultrared NTs to 7.75:1, more consistent with the corresponding trans-Neptunian object ratio of 4–11:1. We also identify three targets as being blue (nearly solar) in color. Such objects may be C-type surfaces, but we see more of these blue NTs than has been observed in the Kuiper Belt. Finally, we show that there are hints of a color-absolute magnitude (H) correlation, with larger H (smaller sized, lower albedo) tending to be more red, but more data are needed to confirm this result. The origin of such a correlation remains an open question that will be addressed by future observations of the surface composition of these targets and their rotational properties.

Maximum entropy copula for bivariate drought analysis

Article

Jun 2023
Phys Chem Earth

Copula functions are widely used to derive multivariate probability distributions in hydrometeorology. One of the key steps in the copula method is the derivation of marginal distributions of individual variables which can be accomplished using the principle of maximum entropy where the distribution parameters are estimated from the specified constraints. This study, investigated two drought variables (severity and duration) by coupling the principle of maximum entropy with parametric and empirical copulas. So, homogeneous climatic zones were first identified by applying the fuzzy clustering method to data from 39 synoptic stations in Iran and then drought severity and duration were determined with the standardized precipitation index. These two variables were scaled and their marginal probability distribution functions were derived using the principle of maximum entropy as well as empirically. Then, the joint probability distribution of drought severity and duration was determined using maximum entropy-copula, and parametric and empirical copulas. Thereafter, bivariate conditional return periods were determined for each homogeneous region. Results showed that 1) univariate and bivariate distributions can be obtained by maximizing entropy; 2) the dependence structure via Spearman's rho, which directly affects the Lagrange parameters of entropy copula, was a controlling factor to optimize the objective function; 3) for a given set of constraints, the maximum entropy copula is independent of the types of marginals; 4)The entropy-entropy copula (with entropy marginals) is considered a better method than alternatives, because it has a similar result to the parametric methods while it only needs to fit a single model.

Effect of the relaxation time of mineral oil and monoesters on the fractal dimension and mutual information of creeping discharges propagating along a pressboard

Article

Full-text available

Jun 2023

This paper presents a new method for analysing creeping discharges based on information theory as it applies to medical imaging. The analysis of information surface data is used to determine the impact of relaxation time on the characteristic parameters of creeping discharges. The same information is used to make a comparative study of the morphology of discharges propagating in palm kernel oil methyl ester (PKOME) and in mineral oil (MO). Other comparative methods based on fractal analysis and normality hypothesis tests associated with Anderson Darling (AD), Kolmogorov-Smirnoff (KS) and Shapiro-Wilk (SW) statistics are used. The results show that very short relaxation times increase the error on the measurement of the fractal dimension and the maximum extension of the discharges. A growth of the mutual information between 0 and 60% is observed for relaxation times varying between 60s and 420s respectively. For the same time interval, the P-value increases from 0.027 to 0.821 according to the AD statistic, from 0.01 to more than 0.150 according to KS and from 0.083 to more than 0.1 according to SW. This result indicates that the data are from a normal distribution. After 420s of relaxation, the error on the maximum extension measurement is reduced by 94% in PKOME and 92% in MO. Similarly, the error on the mean fractal dimension in MO is reduced by 86.7% for a relaxation time between 301s and 420s, and by 84.6% in PKOME for a time between 180s and 420s. These different results imply that the impact of the discharge can be predicted when it is in its initial phase during which the number of discharge occurrences is reduced. On the other hand, the physicochemical characteristics of the insulating liquid used dictate the relaxation time to be allowed for the laboratory measurements.

Investment Valuation of Construction Projects Under Uncertainty

Chapter

Apr 2023

Although many factors, including noneconomic barriers (e.g., regulatory and environmental factors), influence decision-making about construction projects, when it’s time to invest, many construction project owners should allocate their limited financial resources to projects with the highest returns on investment. However, the investment valuation of construction projects is subject to significant uncertainties, such as substantial construction cost variations that make decision-making difficult. This chapter presents several investment valuation methods, such as a stochastic life-cycle cost analysis technique and a real options analysis method, to evaluate investments in construction projects under uncertainties. The stochastic life-cycle cost analysis captures the volatility of the input variables in investment valuation based on their historical values, propagates them through the life-cycle cost analysis method, and determines the probability distribution of the life-cycle cost. Real options analysis evaluates real (nonfinancial) investments under uncertainty with elements for strategic management flexibility and delayed investment. Various examples of construction investment valuations, along with the R codes, are presented in this chapter to enhance the learning experience. These resources can be extended for the assessment of other construction investment projects.KeywordsConstruction investment valuation under uncertaintyReal options analysisStochastic life-cycle cost analysisBinomial decision treeMonte Carlo simulationInvestment decision-making under uncertainty

Best Fit Probability Distribution Analysis of Major Crop Paddy of Rice Bowl State of India-Telangana

Article

Full-text available

Mar 2023

Telangana state's population is mostly dependent on agriculture. The Telangana state's economy depends heavily on agriculture, as does the nation's and the state's ability to achieve food security. Combining art and science to fit a statistical distribution to data involves making trade-offs along the way. The secret to effective data analysis is striking a balance between improving distributional fit and preserving ease of estimation while keeping in mind that the analysis's ultimate goal is to help you make better decisions. A recurring issue in agricultural research was which distribution should be utilized to simulate the production data from an experiment. An analysis is then carried out utilizing the obtained distributions using the statistical method to fit probability distributions to data of variables. These distributions would be a representation of the properties of the variable data. The twenty distributions are: Cauchy, Error, Hypersequent, Gamma(3p), Laplace, Logistic, Log Pearson 3, Rayleigh (2p), Weibull (3p), Log Logistic (3p), Triangular, Gen Gamma, Gen.Gamma(4p), Gen.Extreme Value, Log Normal (3p), Pearson 5 (3p), Fatigue life (3p), Inv. Gaussian (3p), Nakagami, In order to suit a distribution research, rice distributions are used. Twenty probability distributions were computed, and the test statistic Kolmogorov-Smirnov test, Anderson-Darling test, Chi-Square test, and each data set were used to choose the distribution that fit the data the best. The probability distributions include Cauchy, Error, Hypersequent, Gamma, Laplace, Logistic, Log Pearson 3, Rayleigh (2p), Weibull (3p), Log Logistic (3p), Triangular, Gen. Gamma, Gen. Gamma (4p), Gen. Extreme Value, Log Normal (3p), Log Pearson 5, Fatigue life (3p), Inv. Gaussian (3p), and Nakagami.

Updating SWAT+ to Clarify Understanding of In‐stream Phosphorus Retention and Remobilization: SWAT+P.R&R

Article

Full-text available

Mar 2023
WATER RESOUR RES

Efforts to reduce riverine phosphorus (P) loads have not been as fruitful as expected or hoped. One reason for the failure of these efforts appears to be that models used for watershed P management have understated and misrepresented the role of in‐stream processes in shaping watershed P export. Here, we update the latest release of the Soil and Water Assessment Tool (SWAT+), a widely used watershed management model, to better represent in‐stream P retention and remobilization (SWAT+P.R&R). We add new streambed pools where P is stored and tracked, and we incorporate three new processes driving in‐stream P dynamics: (a) deposition and resuspension of sediment‐associated P, (b) diffusion of dissolved P between the water column and streambed, and (c) adsorption and desorption of mineral P. The objective of this modeling work is to provide a diagnostic tool that enables researchers to challenge existing assumptions regarding how watersheds store, transform, and transport P. Here, in a first diagnostic analysis, SWAT+P.R&R helps reconcile in‐stream P retention theory (that P is retained at low flows and remobilized at high flows) and a discordant data set in our validation watershed. SWAT+P.R&R results (a) clarify that the theorized relationship between P retention and flow is only valid (for this point‐source affected testbed, at least) at the temporal scale of a single rising‐or‐falling hydrograph limb and (b) illustrate that hysteresis obscures the relationship at longer temporal scales. Future work using SWAT+P.R&R could further challenge assumptions regarding timescales of in‐stream P legacies and sources of P load variability.

Keck, Gemini, and Palomar 200-inch visible photometry of red and very-red Neptunian Trojans

Article

Feb 2023

Neptunian Trojans (NTs), trans-Neptunian objects in 1:1 mean-motion resonance with Neptune, are generally thought to have been captured from the original trans-Neptunian protoplanetary disk into co-orbital resonance with the ice giant during its outward migration. It is possible, therefore, that the colour distribution of NTs is a constraint on the location of any colour transition zones that may have been present in the disk. In support of this possible test, we obtained g, r, and i-band observations of 18 NTs, more than doubling the sample of NTs with known visible colours to 31 objects. Out of the combined sample, we found ≈4 objects with g-i colours of >1.2 mags placing them in the very red (VR) category as typically defined. We find, without taking observational selection effects into account, that the NT g-i colour distribution is statistically distinct from other trans-Neptunian dynamical classes. The optical colours of Jovian Trojans and NTs are shown to be less similar than previously claimed with additional VR NTs. The presence of VR objects among the NTs may suggest that the location of the red to VR colour transition zone in the protoplanetary disk was interior to 30-35 au.

COSMOS2020: Identification of High-z Protocluster Candidates in COSMOS

Article

Full-text available

Feb 2023

We conduct a systematic search for protocluster candidates at z ≥ 6 in the Cosmic Evolution Survey (COSMOS) field using the recently released COSMOS2020 source catalog. We select galaxies using a number of selection criteria to obtain a sample of galaxies that have a high probability of being inside a given redshift bin. We then apply overdensity analysis to the bins using two density estimators, a Weighted Adaptive Kernel estimator and a Weighted Voronoi Tessellation estimator. We have found 15 significant (>4 σ ) candidate galaxy overdensities across the redshift range 6 ≤ z ≤ 7.7. The majority of the galaxies appear to be on the galaxy main sequence at their respective epochs. We use multiple stellar-mass-to-halo-mass conversion methods to obtain a range of dark matter halo mass estimates for the overdensities in the range of ∼10 ¹¹ –10 ¹³ M ⊙ , at the respective redshifts of the overdensities. The number and the masses of the halos associated with our protocluster candidates are consistent with what is expected from the area of a COSMOS-like survey in a standard Λ cold dark matter cosmology. Through comparison with simulation, we expect that all of the overdensities at z ≃ 6 will evolve into Virgo-/Coma-like clusters at present (i.e., with masses ∼10 ¹⁴ –10 ¹⁵ M ⊙ ). Compared to other overdensities identified at z ≥ 6 via narrowband selection techniques, the overdensities presented appear to have ∼10× higher stellar masses and star formation rates (SFRs). We compare the evolution in the total SFR and stellar mass content of the protocluster candidates across the redshift range 6 ≤ z ≤ 7.7 and find agreement with the total average SFR from simulations.

Exploring the Relationship between Particulate Matter Emission and the Construction Material of Road Surface: Case Study of Highways and Motorways in Poland

Article

Full-text available

Jan 2023

Road dust is an important inexhaustible source of particulate matter from traffic and the resuspension of finer particles carried by wind and traffic. The components of this material are of both natural and anthropogenic origin. Sources of particulate pollution are vehicles and road infrastructure. The work aimed to analyze the mass fraction of the finest fractions of road dust (<0.1 mm) collected from highways and expressways with asphalt and concrete surfaces. Sampling points were located in the central and southern parts of Poland. The research material was sieved on a sieve shaker. It has been proven that concrete pavement is less susceptible to abrasion than asphalt pavement. Particles formed under the influence of the erosion of asphalt and concrete belong to the fraction gathering coarser particles than the critical for this research fraction (<0.1 mm). It was found that limiting the area with sound-absorbing screens leads to the accumulation of fine road dust in this place, contrary to the space where are strong air drafts that remove smaller particles from the vicinity of the road. In general, the mass fraction of particles smaller than 100 µm in road dust was from 12.8% to 3.4% for asphalt surfaces and from 12.0% to 6.5% for concrete surfaces.

The 100-month Swift catalogue of supergiant fast X-ray transients. II. SFXT diagnostics from outburst properties

Article

Full-text available

Dec 2022

Supergiant fast X-ray transients (SFXTs) are high mass X-ray binaries (HMXBs) displaying X-ray outbursts that can reach peak luminosities up to 10 ³⁸ erg s ⁻¹ and spend most of their lives in more quiescent states with luminosities as low as 10 ³² −10 ³³ erg s ⁻¹ . During the quiescent states, less luminous flares are also frequently observed with luminosities of 10 ³⁴ −10 ³⁵ erg s ⁻¹ . The main goal of the comprehensive and uniform analysis of the SFXT Swift triggers presented in this paper is to provide tools to predict whether a transient that has no known X-ray counterpart may be an SFXT candidate. These tools can be exploited for the development of future missions exploring the variable X-ray sky through large field-of-view instruments. We examined all available data on outbursts of SFXTs that triggered the Swift /Burst Alert Telescope (BAT) collected between 2005 August 30 and 2014 December 31, in particular those for which broad-band data, including the Swift /X-ray Telescope (XRT) data, are also available. This work complements and extends our previous catalogue of SFXT flares detected by BAT from 2005 February 12 to 2013 May 31, since we now include the additional BAT triggers recorded until the end of 2014 (i.e. beyond the formal first 100 months of the Swift mission). Due to a change in the mission’s observational strategy, virtually no SFXT triggers obtained a broad-band response after 2014. We processed all BAT and XRT data uniformly by using the Swift Burst Analyser to produce spectral evolution dependent flux light curves for each outburst in the sample. The BAT data allowed us to infer useful diagnostics to set SFXT triggers apart from the general γ -ray burst population, showing that SFXTs uniquely give rise to image triggers and are simultaneously very long, faint, and ‘soft’ hard-X-ray transients. We find that the BAT data alone can discriminate very well the SFXTs from other classes of fast transients, such as anomalous X-ray pulsars and soft gamma repeaters. On the contrary, the XRT data collected around the time of the BAT triggers are shown to be decisive for distinguishing SFXTs from, for instance, accreting millisecond X-ray pulsars and jetted tidal disruption events. The XRT observations of 35 (out of 52 in total) SFXT BAT triggers show that in the soft X-ray energy band, SFXTs display a decay in flux from the peak of the outburst of at least three orders of magnitude within a day and rarely undergo large re-brightening episodes, favouring in most cases a rapid decay down to the quiescent level within three to five days (at most).

The 100-month Swift catalogue of supergiant fast X-ray transients II. SFXT diagnostics from outburst properties

Preprint

Full-text available

Dec 2022

Supergiant Fast X-ray Transients (SFXT) are High Mass X-ray Binaries displaying X-ray outbursts reaching peak luminosities of 10$^{38}$ erg/s and spend most of their life in more quiescent states with luminosities as low as 10$^{32}$-10$^{33}$ erg/s. The main goal of our comprehensive and uniform analysis of the SFXT Swift triggers is to provide tools to predict whether a transient which has no known X-ray counterpart may be an SFXT candidate. These tools can be exploited for the development of future missions exploring the variable X-ray sky through large FoV instruments. We examined all available data on outbursts of SFXTs that triggered the Swift/BAT collected between 2005-08-30 and 2014-12-31, in particular those for which broad-band data, including the Swift/XRT ones, are also available. We processed all BAT and XRT data uniformly with the Swift Burst Analyser to produce spectral evolution dependent flux light curves for each outburst. The BAT data allowed us to infer useful diagnostics to set SFXT triggers apart from the general GRB population, showing that SFXTs give rise uniquely to image triggers and are simultaneously very long, faint, and `soft' hard-X-ray transients. The BAT data alone can discriminate very well the SFXTs from other fast transients such as anomalous X-ray pulsars and soft gamma repeaters. However, to distinguish SFXTs from, for instance, accreting millisecond X-ray pulsars and jetted tidal disruption events, the XRT data collected around the time of the BAT triggers are decisive. The XRT observations of 35/52 SFXT BAT triggers show that in the soft X-ray energy band, SFXTs display a decay in flux from the peak of the outburst of at least 3 orders of magnitude within a day and rarely undergo large re-brightening episodes, favouring in most cases a rapid decay down to the quiescent level within 3-5 days (at most). [Abridged]

Statistical analysis of static and dynamic predictors for seismic b-value variations in the Groningen gas field

Article

Full-text available

Nov 2022

We perform statistical analyses on spatiotemporal patterns in the magnitude distribution of induced earthquakes in the Groningen natural gas field. The seismic catalogue contains 336 earthquakes with (local) magnitudes above 1.45, observed in the period between 1 January 1995 and 1 January 2022. An exploratory moving-window analysis of maximum-likelihood b-values in both time and space does not reveal any significant variation in time, but does reveal a spatial variation that exceeds the 0.05 significance level. In search for improved understanding of the observed spatial variations in physical terms we test five physical reservoir properties as possible b-value predictors. The predictors include two static (spatial, time-independent) properties: the reservoir layer thickness, and the topographic gradient (a measure of the degree of faulting intensity in the reservoir); and three dynamic (spatiotemporal, time-dependent) properties: the pressure drop due to gas extraction, the resulting reservoir compaction, and a measure for the resulting induced stress. The latter property is the one that is currently used in the seismic source models that feed into the state-of-the-art hazard and risk assessment. We assess the predictive capabilities of the five properties by statistical evaluation of both moving window analysis, and maximum-likelihood parameter estimation for a number of simple functional forms that express the b-value as a function of the predictor. We find significant linear trends of the b-value for both topographic gradient and induced stress, but even more pronouncedly for reservoir thickness. Also for the moving window analysis and the step function fit, the reservoir thickness provides the most significant results. We conclude that reservoir thickness is a strong predictor for spatial b-value variations in the Groningen field. We propose to develop a forecasting model for Groningen magnitude distributions conditioned on reservoir thickness, to be used alongside, or as a replacement, for the current models conditioned on induced stress.

Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

Article

Full-text available

Oct 2022

The purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-degree view of this concept conceptualizing incompleteness of a dataset both as a continuous, discrete random variable depending on the aspect of the required analysis. During the course of the experiments, the investigators have identified Kolomogorov–Smirnov goodness of fit, Mielke distribution, and beta distributions as key methods to analyze the incompleteness of a dataset for the datasets used for experimentation. A comparison of these methods with a mixture density network was also performed. Overall, the investigators have provided key insights into the use of methods and algorithms that can be used to predict data incompleteness and have provided a pathway for further explorations and prediction of data incompleteness.

Testing for distributional structural change with unknown breaks: application to pricing crop insurance contracts

Article

May 2024

Agriculture in developed countries is produced under heavily subsidized insurance. The pricing of these insurance contracts, termed premium rates, directly influences farmers profits, their financial solvency, and indirectly, global food security. Changing climate and technology have likely caused significant shifting of mass in crop yield distributions and, if so, has rendered some of the historical yield data irrelevant for estimating premium rates. Insurance is primarily interested in lower tail probabilities and as such the detection of structural change in tail probabilities or higher moments is of great concern for the efficacy of crop insurance programs. We propose a test for structural change with an unknown break(s) which has power against structural change in any moment and can be tailored to a specific range of the underlying distribution. Simulations demonstrate better finite sample performance relative to existing methods and reasonable performance at identifying the break. The asymptotic distribution is shown to follow the Kolmogorov distribution. Our proposed test finds structural change in most major U.S. field crop yields leading to significant premium rate differences. Results of an out-of-sample premium rating game indicate that incorporating structural change in crop yields leads to more accurate premium rates.

Simulation Assessment of Expectation-maximization Algorithm in Pseudo-convex Mixtures Generated by the Exponential Distribution

Article

Full-text available

May 2024

The use of pseudo-convex mixtures generated from stable distributions for extremes offers a valuable approach for handling reliability-related data challenges. This framework encompasses pseudo-convex mixtures stemming from exponential distribution. However, precise parameter estimation, particularly in cases where the weight parameter ω is negative, remains a challenge. This work assesses the performance of the Expectation- Maximization algorithm in estimating parameters for pseudo-convex mixtures generated by the exponential distribution through simulation.

Multi-Agent Reinforcement Learning Method Based on Value Distribution

Article

Jan 2024

明志韩

Fixed-point algorithms for solving the critical value and upper tail quantile of Kuiper's statistics

Article

Full-text available

Apr 2024

Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. In this work, the contributions lie in three perspectives: firstly, the second order approximation for the infinite series of the cumulative distribution of the critical value is used to achieve higher precision; secondly, the principles and fixed-point algorithms for solving the Kuiper pair are presented with details; finally, finally, a mistake about the critical value cnα for (α,n)=(0.01,30) in Kuiper's distribution table has been labeled and corrected where n is the sample capacity and α is the upper tail quantile. The algorithms are verified and validated by comparing with the table provided by Kuiper. The methods and algorithms proposed are enlightening and worth of introducing to the college students, computer programmers, engineers, experimental psychologists and so on.

Analyzing uncertainty in probable maximum precipitation estimation with large ensemble climate simulation data

Article

Oct 2023

This study aimed to evaluate probable maximum precipitation (PMP) estimated using surface dew points (SDP) or actual precipitable water obtained from upper‐air data (UAD) in the moisture‐maximization method with the help of sufficient extreme precipitation events using large‐scale climate ensemble simulation data (d4PDF). The deviations between the PMP variables estimated by the SDP and UAD approaches were analyzed for southern and northern areas of Japan to consider the regional characteristics of the deviations. We found that the deviations were high in northern areas where the SDPs are relatively low during precipitation events. The PMPs estimated using each approach were also compared to the extreme‐scale reference precipitation proposed in this study. The SDP approach overestimated the PMPs by over 20% compared to the reference precipitation in the northern region. However, the UAD approach showed very low average errors in all southern and northern areas. This tendency of the SDP approach was significantly related to the regional climatic characteristics of the SDP, which indicated that the SDP approach may estimate an uncertain PMP value depending on each regional climatic characteristic compared to the UAD approach. Regional climatic characteristics should be considered when using the SDP approach to estimate the PMP.

On conditional spacings and their properties under coherent system setting

Article

Jan 2024

TESTING FOR EXPONENTIALITY USING A TWO-MOMENT ESTIMATOR AND A MEDIAN-CENTERED DISTANCE STATISTIC

Thesis

Full-text available

Mar 2024

Andrew Tierman

A Software Tool for Estimating Uncertainty of Bayesian Posterior Probability for Disease

Article

Full-text available

Feb 2024

The role of medical diagnosis is essential in patient care and healthcare. Established diagnostic practices typically rely on predetermined clinical criteria and numerical thresholds. In contrast, Bayesian inference provides an advanced framework that supports diagnosis via in-depth probabilistic analysis. This study’s aim is to introduce a software tool dedicated to the quantification of uncertainty in Bayesian diagnosis, a field that has seen minimal exploration to date. The presented tool, a freely available specialized software program, utilizes uncertainty propagation techniques to estimate the sampling, measurement, and combined uncertainty of the posterior probability for disease. It features two primary modules and fifteen submodules, all designed to facilitate the estimation and graphical representation of the standard uncertainty of the posterior probability estimates for diseased and non-diseased population samples, incorporating parameters such as the mean and standard deviation of the test measurand, the size of the samples, and the standard measurement uncertainty inherent in screening and diagnostic tests. Our study showcases the practical application of the program by examining the fasting plasma glucose data sourced from the National Health and Nutrition Examination Survey. Parametric distribution models are explored to assess the uncertainty of Bayesian posterior probability for diabetes mellitus, using the oral glucose tolerance test as the reference diagnostic method.

WISE: Waveform Independent Signal Embedding for Covert Communication

Article

Full-text available

Jan 2023

Wireless signals are vulnerable to various security threats, like eavesdropping and jamming, due to the inherent broadcast nature of the wireless channel. Encryption may ensure the confidentiality of the data but does not guarantee successful communication among legitimate users in the presence of strong adversaries, like wideband jammers. In this scenario, hiding a secret signal in presence of another mundane ongoing communication is one of the ways to minimize its chances of getting intercepted. Wireless Steganography is a process of embedding a secret signal inside another signal that acts as a cover to hide the signal of interest. In this paper, we propose to encode secret bits into covert signals that are statistically indistinguishable from a hardware noise generated by a low-cost transmitter. As the covert signal resembles hardware noise, it can be transmitted over any waveform, making it adaptable and portable to any communication link. Each generated complex signal sample is merged with a cover signal sample, yielding a 50% embedding rate. We create the encoding and decoding models by creating a pair of complex-valued neural networks (NNs), which is trained in presence of another NN model, critic. The critic model differentiates between true hardware noise and encoder-generated covert signal, thus providing essential feedback to the NN pair to improve the encoding technique. The decoder undergoes a transfer learning process to adapt to the residual channel effects in over-the-air experiments. In an indoor testbed, we successfully decoded the covert communication that mimics a range of hardware noises and is transmitted using different modulation orders of cover OFDM waveform. Our steganalysis indicates that the covert signal can be generated to mimic specific hardware, which remains indistinguishable in different statistical tests. Our method performs an order of magnitude better in statistical steganalysis compared to the state-of-the-art method in this field.

von M ises, R ichard M artin E dler

Chapter

Oct 2004

Hannelore Bernhardt

Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Article

Nov 2023

Asymptotic normality of Gini correlation in high dimension with applications to the K-sample problem

Article

Jan 2023

Evaluating Software Documentation Quality

Conference Paper

May 2023

Bahadur Efficiency of EDF Based Normality Tests when Parameters are Estimated

Article

Full-text available

Jun 2023
J Math Sci

In the present paper, some well-known tests based on empirical distribution functions (EDF) with estimated parameters for testing composite normality hypothesis are revisited, and some new results on asymptotic properties are provided. In particular, the approximate Bahadur slopes are obtained in the case of close alternatives for the EDF-based tests as well as the likelihood ratio test. The local approximate efficiencies are calculated for several close alternatives. The obtained results could serve as a benchmark for evaluation of the quality of recent and future normality tests.

Effective algorithms for solving statistical problems posed by COVID-19 pandemic

Chapter

Jan 2023

Dmitriy Klyushin

آزمون‌های نیکویی برازش آماری مبتنی بر تصویرافکنی تصادفی برای فرایندهای تصادفی

Thesis

Full-text available

Dec 2021

امروزه با توجه به پیشرفت‌های علمی در زمینه‌ی جمع‌آوری داده‌ها می‌توان داده‌های بیشتری را و با بعد بالا ثبت کرد که اغلب این داده‌ها با پیچیدگی زیادی همراه است. این نوع از داده‌ها در ستاره‌شناسی، اخترفیزیک مهندسی، پزشکی، علم تصویری، اقتصاد، هواشناسی، کشاورزی و سایر حوزه‌های علمی پیدا می‌شوند. آزمون نیکویی‌برازش این نوع از داده‌ها از اهمیت خاصی برخوردار است. برای آزمون نیکویی‌برازش تک نمونه‌ای و دو نمونه‌ای از چنین داده‌هایی با بعد بالا، از فن تصویر تصادفی استفاده شده‌است. رویکرد تصویر تصادفی یک روش محاسباتی کارآمد و یک تکنیک قدرتمند و کافی به‌شمار می‌رود. توسط این روش می‌توان به‌طور تصادفی مجموعه‌ای از داده‌ها با بعد بالا را برروی زیرفضایی با بعد کمتر تصویر کرد طوری که با احتمال بالا فواصل جفتی حفظ شوند. به‌بیان دیگر، از آنجا که رویکرد تصویر تصادفی ‎(تقریباً)‎ ویژگی‌های کلیدی مجموعه‌ی اصلی نقاط از فضایی با بعد بالا را پس از تصویر حفظ می‌کند، در انواع مختلف مطالعات و مسائل با پیچیدگی محاسباتی بالا به الگوریتم‌های کارآمد و ساده منجر شده‌است. همچنین و با توجه به اینکه داده‌های تصویر مغز را می‌توان براساس مدل‌های میدان‌های تصادفی گاوسی برازش تحلیل کرد، از روش تصویر تصادفی برای آزمون گاوسی بودن این نوع از تصاویر استفاده خواهیم کرد و آزمون پیشنهادی بیزی وجود سیگنال در میدان تصادفی گاوسی-مقیاس مانای همسانگرد بررسی خواهد شد.

MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification

Article

Full-text available

Dec 2022
BMC MED RES METHODOL

Background Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality. Results To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets. Conclusions The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .

The identification of interacting brain networks during robot-assisted training with multimodal stimulation

Article

Dec 2022

Objective: Robot-assisted rehabilitation training is an effective way to assist rehabilitation therapy. So far, various robotic devices have been developed for automatic training of central nervous system following injury. Multimodal stimulation such as visual and auditory stimulus and even virtual reality (VR) technology were usually introduced in these robotic devices to improve the effect of rehabilitation training. This may need to be explained from a neurological perspective, but there are few relevant studies. Approach: In this study, ten participants performed right arm rehabilitation training tasks using an upper limb rehabilitation robotic device. The tasks were completed under four different feedback conditions including multiple combinations of visual and auditory components: auditory feedback (AF); visual feedback (VF); visual and auditory feedback (VAF); non-feedback (NonF). The functional near-infrared spectroscopy (fNIRS) devices record blood oxygen signals in bilateral motor, visual and auditory areas. Using hemoglobin concentration as an indicator of cortical activation, the effective connectivity of these regions was then calculated through Granger causality. Main results: We found that overall stronger activation and effective connectivity between related brain regions were associated with VAF. When participants completed the training task without visual and auditory feedback, the trends in activation and connectivity were diminished. Significance: This study revealed cerebral cortex activation and interacting networks of brain regions in robot-assisted rehabilitation training with multimodal stimulation, which is expected to provide indicators for further evaluation of the effect of rehabilitation training, and promote further exploration of the interaction network in the brain during a variety of external stimuli, and to explore the best sensory combination.

Datamime: Generating Representative Benchmarks by Automatically Synthesizing Datasets

Conference Paper