Article

Density Estimation for Statistics and Data Analysis.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Estimation of bandwidth parameter minimizing AMISE requires estimating roughness of the second order derivative of the unknown density to be estimated (R(f (·))), where the roughness of a function is defined as an integration of the squared function. The rule(s)-of-thumb (ROT) bandwidth selectors estimate R(f (·)) assuming a reference distribution for f (·) (Silverman 1986). The simplest and the most popular normal reference ROT (NRROT) assumes f (·) to be a normal density. ...
... There has also been some efforts towards exploring Bayesian approach, as well, localized bandwidth estimation for KDE (Kulasekera and Padgett 2006;Cheng, Gao, and Zhang 2019). Interested users may explore many other monographs and articles discussing the topic on univariate and multivariate KDE in a more generalized framework of non-parametric estimations and kernel smoothing (Silverman 1986;Wand and Jones 1994;Simonoff 1996;Scott 2012;Turlach et al. 1993;Izenman 1991;Hwang, Lay, and Lippman 1994;Ćwik and Koronacki 1997). Overall, the ROT or scale measures for bandwidth estimation are simple and fast but lack accuracy. ...
... The article analysed that the poor performance was due to the misleading estimations of standard deviation and kurtosis for multimodal densities. To overcome such an overestimation of standard deviation, Silverman (Silverman 1986) suggested a modified NRROT, identified as Silverman's ROT (SROT). Along the same lines, the article derives a modified GCAExROT, identified as GCAk4ExROT, rule. ...
Article
Full-text available
Bandwidth parameter estimation in univariate Kernel Density Estimation has traditionally two approaches. Rule(s)-of-Thumb (ROT) achieve 'quick and dirty' estimations with some specific assumption for an unknown density. More accurate solve-the-equation-plug-in (STEPI) rules have almost no direct assumption for the unknown density but demand high computation. This article derives a balancing third approach. Extending an assumption of Gaussianity for the unknown density to be estimated in normal reference ROT (NRROT) to near Gaussianity, and then expressing the density using Gram-Charlier A (GCA) series to minimize the asymptotic mean integrated square error, it derives GCA series based Extended ROT (GCAExROT). The performance analysis using the simulated and the real datasets suggests to replace NRROT by a modified GCAExROT rule achieving a balancing performance by accuracy nearer to STEPI rules at computation nearer to NRROT, specifically at small samples. Thus, the article also motivates to derive many such ExROTs based on probability density function approximations through other infinite series expansions.
... Given n species occurrence localities and the covariate data layer representing environmental variable x (e.g., elevation), covariate values at cells in which species occurrence locations fall (sample data points xi's) can be extracted. Then, the occurrence PDF can be estimated using KDE (Silverman, 1986): ...
... where K is a kernel density function for which the Gaussian kernel is often adopted (Silverman, 1986). Essentially, the probability at estimation data point x is the average of kernel density contributions from all sample data points xi's. ...
... It is a crucial parameter for KDE which affects smoothness of the estimated PDF function ('flat' vs. 'spiky'). When sample size n is large, the 'rule-of-thumb' method can be used to determine bandwidth based on sample size and standard deviation of the sample data points (Silverman, 1986) (this is the default bandwidth option in PyCLKDE). Otherwise, an optimal bandwidth can be determined based on the maximum likelihood criterion through cross-validation on the sample data points using the 'golden section search' optimization procedure (Brunsdon, 1995). ...
Article
Full-text available
Species habitat suitability modeling and mapping (HSM) at large spatial scales (e.g., continental, global) and at fine spatiotemporal resolutions helps understand spatiotempo-ral dynamics of species distributions (e.g., migratory birds). Such HSM endeavors often involve "big" environmental and species datasets, which traditional software tools are often incapable of handling. To overcome the computational challenges facing conducting big data-involved HSM tasks, this study develops a big data-enabled high-performance computational framework to conduct HSM efficiently on large numbers of species records and massive volumes of environmental covariates. As a demonstration of its usability, PyCLKDE was implemented based on the computational framework for flexibly integrating multi-source species data for HSM. The computing performance of PyCLKDE was thoroughly evaluated through experiments modeling and mapping Empidonax virescens habitat suitability in the continental Americas using high-resolution environmental covariates and species observations obtained from citizen science projects. Results show that PyCLKDE can effectively exploit computing devices with varied computing capabilities (CPUs and GPUs on high-end workstations or commodity laptops) for parallel computing to accelerate HSM computations. PyCLKDE thus enables conducting big data-involved HSM using commonly available computing resources. Using PyCLKDE as an example, efforts are are called for to develop similar geocomputation tools based on the proposed framework to realize the potential of effectively performing geospatial big data analytics utilizing heterogenous computing resources on ‘personal-grade’ computing resources.
... Therefore, no assumption has been made for the emissions. The estimator used in this study is defined as: where n is the number of firms; x is a variable representing the relative value of a firm at time t; y is a variable representing the relative value of that firm at time t + 1; X i,t is an observed relative value at time t; X i,t+1 is the observed relative value at time t + 1; h 1 and h 2 are the bandwidths computed using the approach suggested by Silverman [63]; and K is the normal density function. The values of the current and previous periods are employed in the computation for each transition; therefore, the computation of the transitions is based on the value at time t and its lagged value one period before (t -1). ...
... Many economic variables tend to have a long right tail; therefore, an adaptive kernel with flexible bandwidth is employed in this study to consider the sparseness of the data. This approach is suggested by Silverman [63], and involves a two-stage procedure. In the first stage, a pilot estimate is derived, and then the bandwidth is rescaled by a factor that reflects the kernel density in the second stage. ...
... Cheong and Wu [62] developed the mobility probability plot (MPP) for presenting mobility probabilities in a twodimensional figure, thereby facilitating considerably the interpretation of the probability mass. Following the invention of this approach, MPP has been employed to study inequality and convergence in many different areas; for example, industrial output [63], carbon emissions [64,65], and energy consumption [66][67][68]. The MPP can be calculated by deriving p(x), which is the net upward mobility probability: ...
Article
The carbon emissions trading system (ETS) is expected to achieve energy transition and carbon-neutrality goals by theoretically reducing energy consumption and adjusting the energy mix. This study aims to provide firm-level evidence on these effects by adopting the distribution dynamics approach, which could reveal historical transition probabilities and predict long-term evolutions. By using the unique dataset from the Hubei ETS pilot in China, this study can serve as an example for promoting ETS. The results indicate that: (1) Most firms converge to a lower relative energy consumption under ETS, resulting in reduced total energy consumption. However, it would take a long time to achieve energy transition. (2) The primary peak of the relative coal consumption distribution reduces from 0.07 to 0.02 under ETS, implying a decline in coal use. Meanwhile, the share of electricity consumption increases, indicating a switch from coal to electricity. (3) ETS would place a limit on fossil energy, and an even stricter limit on coal for firms with high energy consumption. (4) Both the total and share of energy-induced carbon emissions also decrease, implying that ETS might realize the decoupling of economic growth from fossil energy and carbon emissions.
... In this regard, we consider a novel measure for assessment of clustering based on the classical nonparametric univariate kernel density estimation method (Silverman, 1986), which can evaluate the quality of cluster analysis of a data set with inherent grouping (i.e. K > 1 is assumed) obtained through any hard clustering algorithm wherein every data member is classified in only one group. ...
... where h is the smoothing parameter and Ker(·) is a real-valued kernel function with ∞ −∞ Ker(x)dx = 1 (Silverman, 1986; Bandyopadhyay and Modak, 2018). In this paper, we implement the Gaussian kernel with ...
... Using the nonparametric density estimator from Eq. (1) with the Gaussian kernel in Eq.(2), we consider h = 1.06 σ n −1/5 for which the asymptotic mean integrated squared error is minimized while estimating a normal density function with standard deviation σ (Silverman, 1986;Matioli et al., 2018). Here σ is estimated asσ by the standard deviation of the sample of interpoint distances and the corresponding value of h is denoted by h * = 1.06σ n −1/5 . ...
Article
Full-text available
A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric univariate kernel density estimation method to the interpoint distances computed between the members of data. Being based on interpoint distances, it is free of the curse of dimensionality and therefore efficiently computable for high-dimensional situations where the number of study variables can be larger than the sample size. The proposed measure is compatible with any clustering algorithm and with every kind of data set where the interpoint distance measure can be defined to have a density function. Our conducted simulation study proves its superiority over widely used cluster validity indices like the average silhouette width and the Dunn index, whereas its applicability is shown with respect to a high-dimensional Biostatistical study of Alon data set and a large Astrostatistical application of time series with light curves of new variable stars.
... First, we extracted the ISA and NTL data in 2018 using the administrat boundaries of each city. Afterward, the kernel density of ISA and NTL was calculat with bandwidth, automatically decided based on Silverman's rule of thumb [44]. The f points of these spatial kernels from the ISA and NTL dataset serve as the candidates urban centers in a city. ...
... First, we extracted the ISA and NTL data in 2018 using the administrative boundaries of each city. Afterward, the kernel density of ISA and NTL was calculated with bandwidth, automatically decided based on Silverman's rule of thumb [44]. The foci points of these spatial kernels from the ISA and NTL dataset serve as the candidates for urban centers in a city. ...
Article
Full-text available
This study investigates whether the intensity of human activities conducted by urban populations and carried by urban land follows a wave-shaped diffusion rule using a harmonized DMSP-like NTL dataset during 1992–2018 in 234 cities of China. The results show that variations in the intensity of human activities are diffused in a wave-shaped manner from the urban center to the periphery in cities of different sizes and structures. The results demonstrate that variations in the intensity of human activity also exhibit a wave-shaped diffusion pattern, which is best modeled by a Gaussian function with an average R2 of 0.79 and standard deviation of 0.36 across all fitted functions. The outward movement of these waves in monocentric cities with an urban population <8 million occurred at a pace of ~0.5–1.0 km per year, reaching an average distance of ~18 km from the urban centers. While the pace decreased to ~0.2–0.6 km per year in larger or polycentric cities, the average distance of the waves from the urban centers increased to ~22–25 km in these larger cities. In addition, a process-pattern link between the distance-decayed rule and the wave-shaped rule of human activity dynamics was established. Moreover, a spatiotemporal Gaussian function was further discussed to enable modelers to forecast future variations in the intensity of human activities. The disclosed wave-shape rule and model can benefit the simulation of urban dynamics if integrated with other simulation technologies, such as agent-based models and cellular automata.
... Commonly used kernel functions mainly include uniform, Gaussian, triangular, quadratic (Epanechnikov), and others [22]. According to the expressions of different kernel functions in Table 3, the uniform, triangular, and quadratic kernel functions intersect the x axis, and the Gaussian kernel function does not intersect the x axis, as shown in Figure 2. Table 3. Commonly used kernel functions. ...
... Commonly used kernel functions mainly include uniform, Gaussian, triangular, quadratic (Epanechnikov), and others [22]. According to the expressions of different kernel functions in Table 3, the uniform, triangular, and quadratic kernel functions intersect the axis, and the Gaussian kernel function does not intersect the axis, as shown in Fig In addition, this paper analyzes the influence of the kernel function on the estimated value of nuclear density by fitting the kernel density estimation curve of the above-mentioned commonly used kernel function. ...
Article
Full-text available
At present, the total length of accident blackspot accounts for 0.25% of the total length of the road network, while the total number of accidents that occurred at accident black spots accounts for 25% of the total number of accidents on the road network. This paper describes a traffic accident black spot recognition model based on the adaptive kernel density estimation method combined with the road risk index. Using the traffic accident data of national and provincial trunk lines in Shanghai and ArcGIS software, the recognition results of black spots were compared with the recognition results of the accident frequency method and the kernel density estimation method, and the clustering degree of recognition results of adaptive kernel density estimation method were analyzed. The results show that: the accident prediction accuracy index values of the accident frequency method, kernel density estimation method, and traffic accident black spot recognition model were 14.39, 16.36, and 18.25, respectively, and the lengths of the traffic accident black spot sections were 184.68, 162.45, and 145.57, respectively, which means that the accident black spot section determined by the accident black spot recognition model was the shortest and the number of traffic accidents identified was the largest. Considering the safety improvement budget of 20% of the road length, the adaptive kernel density estimation method could identify about 69% of the traffic accidents, which was 1.13 times and 1.27 times that of the kernel density estimation method and the accident frequency method, respectively.
... Journal of Advanced Transportation found the southern corridors of the Western Cape metro rail are used to formulate the optimisation of the railway level crossing closing time. us, the criterion used by Powell's method [21] is based on the minimisation of the area bounded by the threshold closing time as well as the actual and expected density functions. e proposed method is compared to the current status quo of the railway level crossings in the Western Cape, South Africa. ...
... e reason is that kernel density estimation can converge to the true density faster whilst guaranteeing a smooth output in comparison to its counterparts [23,24]. us, the kernel is a smooth function K which determines the shape of the estimator [21,25,26]. In addition, the kernel function partitions dataset of railway level crossing closing time into several bins and estimates the density from the bin count [25,26]. ...
Article
Full-text available
Long waiting time at railway level crossings poses a risk on the safety and affects capacity of rail and road traffic. However, in most cases, the long closing time can be prevented by reducing the time lost at a railway level crossing. The emphasis of this study is to present a numerical optimisation algorithm to reduce the time lost per train trip at a railway level crossing. Thus, attributes with the highest impact on the railway level crossing closing time were extracted from the data analysis of rail-road level crossings on the southern corridor of the Western Cape metro rail. Powell’s optimisation algorithm was formulated on the minimisation of the time lost at the railway level crossing per trip. Thus, time lost is constrained by the technical and train’s traction constraints. The upper and lower bounds of Powell’s algorithm were defined by the threshold closing time in addition to the actual and expected probability density functions. The algorithm was implemented in Matlab. Furthermore, the algorithm was trained on 8000 data sets and tested on 2000 data sets. The developed algorithm proved to be effective and robust in comparison to the current state of railway level crossings under study. Thus, the algorithm was validated to reduce the time lost at the railway level crossing by at least 50%.
... For the purposes of this current research study, the Global Moran's I was used for data analysis. Griffith defines the range as, "a Moran's Index value near +1.0 indicates clustering, while an index value near -1.0 indicates dispersion" (Goodchild, 1986;Griffith, 1987, p. 23;Silverman, 1986). Goodchild notes, "without looking at statistical significance, there is no basis for knowing if the observed pattern is just one of many possible versions of random. ...
... The Spatial Autocorrelation Test (Global Moran's I) (Nie et al., 2015) was run to evaluate the confidence level that the crime reduction and resulting crime patterns were probably not a result of random chance. The Moran's test measures spatial autocorrelation in the similarity of location and value features (Griffith, 2011;Silverman, 1986). Specifically, "given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random" (Griffith, 2011, p. 12). ...
... Owing to the importance of the application of kernel estimator in data explorations and visualizations, novel kernel estimators are been introduced by researchers (Mugdadi and Sani, 2020;Bouezmarni et al., 2020;Harfouche et al., 2020;Mohammed and Jassim, 2021;Bolancé and Acuña, 2021). The kernel estimation method has indirect applications in discriminant analysis, goodness-of-fit testing, hazard rate estimation, bump-hunting, image processing, remote sensing, seismology, cosmology, intensity function estimation, and classification with regression estimation (Sheather, 2004;Simonoff, 2012;Raykar et al., 2015;Silverman, 2018;Siloko et al., 2019a). This paper investigates the performance of the bivariate kernel estimator using the Gaussian kernel function known as the normal kernel. ...
... The applications of kernel estimator are mostly in multivariate case with emphasis majorly on the bivariate estimator whose density estimates can be viewed in twodimensional form or three-dimensional form. The popularity of the bivariate kernel estimator in higher dimensional density estimation is due to the simplification of the presentation of its estimates as surface plots (familiar perspective known as wire frame) or contour plots (Silverman, 2018;Siloko et al., 2021). Another factor that accounts for the application of the bivariate kernel estimator is the presentation of the observations with respect to their direction. ...
Article
Full-text available
The two-dimensional kernel estimators are very important because graphical presentation of data beyond three dimensional forms is oftentimes not too frequently employed in data visualizations. The frequency of the bivariate estimator in the multivariate setting is attributed to the sparseness of data that is associated with increase in dimension. The performance of bivariate kernel is reliant on the smoothing parameter and other statistical parameters. While the smoothness of the estimates generated by the kernel estimator is primarily regulated by the smoothing parameter, its performance numerically may be depended on other statistical parameters. One of the popular performance metrics in kernel estimation is the asymptotic mean integrated squared error (AMISE) whose popularity is occasioned by its mathematical tractability and the inclusion of dimension with respect to performance evaluation. The computation of the bivariate kernel AMISE besides the smoothing parameter depends on basic statistical properties such as correlation coefficient and standard deviations of the observations. This paper compares the performance of the bivariate kernel using the correlation coefficient, standard deviations and the smoothing parameter. The results of the comparison show that for bivariate observations with independent standard deviations and correlated, the AMISE values is smaller than the AMISE values computed with the smoothing parameter. Introduction The estimation of bivariate data involves the application of statistical tools in the derivation of statistical properties from the observations for the purpose of predicting the behaviour of the data.
... In particular, the estimation of the density of a probability law in R k , k ≥ 1, has been studied by very different methods, some of which include Refs. [12][13][14][15][16][17][18][19]. ...
... For more on the kernel density estimation, the reader can refer to the monographs of [13,16,18,19,27,28]. A more recent book is that of [29], which also gathers numerous recent references on the subject. ...
Article
Full-text available
This work is concerned with multivariate conditional heteroscedastic autoregressive nonlinear (CHARN) models with an unknown conditional mean function, conditional variance matrix function and density function of the distribution of noise. We study the kernel estimator of the latter function when the former are either parametric or nonparametric. The consistency, bias and asymptotic normality of the estimator are investigated. Confidence bound curves are given. A simulation experiment is performed to evaluate the performance of the results.
... The default search radius algorithm [40,41] was used to determine the search radius in the density analysis for lines and kernel density analysis, and the formula is as follows: ...
... Kernel density estimation is derived from the first law of geography, that is, the density value becomes greater as the point moves closer to the core element, reflecting spatial heterogeneity and the diminishing of intensity with increasing distance. This method can reveal the radiation of the influence of objects over those in their immediate vicinity [40,43], and the formula is as follows: ...
Article
Full-text available
The leisure service function is an important component of the derivative function and non-market function of cultivated land. Therefore, exploring the strength of the cultivated land leisure service function with the help of spatial information technology is significant in guiding the proper utilization and protection of cultivated land resources. This paper constructed an evaluation system based on the three dimensions of ecological landscape, social activities, and economic performance, explored the spatial difference of the cultivated land leisure service function in Yuanyang County, the major grain-producing area along the Yellow River through spatial weighted overlay, classified the hot spots of leisure services and presented suggestions for improvement. Results show the following: (1) the landscape resources in the northern part are relatively monotonous, while those in the southern part are rich and evenly distributed. Spatial accessibility presents a distribution of “one core with multiple subcores”. The distribution of leisure service supply capacity is characterized by “multiple cores and multiple circles.” (2) The hot spots of the cultivated land leisure service function are the Urban Agricultural Central Area and the Ecological Agriculture Core Area in the middle of the county, and the Suburban Agritourism Development Area, the Yellow River Agritourism Transitional Area, and the Leisure Agriculture Connection Area on the periphery of the county. (3) The agricultural landscape should be fully protected and utilized in the Urban Agricultural Central Area. The spatial accessibility and regional reputation of the Ecological Agriculture Core Area need to be improved. The landscape diversity and landscape quality should be improved in the Suburban Agritourism Development Area. The Yellow River Agritourism Transitional Area needs to overcome the loss of tourists. The Leisure Agriculture Connection Area should increase the number of leisure and tourism facilities.
... The approximation of the posterior density by kernels' mixture in (2.59) based on the example of the Dirac's mixture illustrated in Figure 2.15 is shown in Figure 2. 16 Then, the optimal kernel that minimizes (2.58) in the case that all weights are equivalent -which is the case after the resampling -is the Epanechnikov kernel [51,54] defined by: ...
... where Γ (·) ∈ R → R is Euler's gamma function. Then, the optimal bandwidth factor [51,54] associated with this optimal kernel is then given by: ...
Thesis
Actuator or sensor faults occurring in an unmanned aerial vehicle can compromise the system integrity. Fault diagnosis methods are therefore becoming a required feature for those systems. In this thesis, the focus is on fault estimation for fixed-wing unmanned aerial vehicles in the presence of simultaneous actuator and sensor faults. To deal with the challenging nature of some fault scenarios, such as simultaneous and ambiguous faults that induce multimodality, a jump-Markov regularized particle filter and enhanced versions of it are presented in this thesis.This method is based on a regularized particle filter that improves the robustness thanks to the approximation of the posterior density by a kernel mixture, and on a jump-Markov system. The jump strategy uses a small number of particles — called sentinel particles — to continue testing the alternate hypothesis under both fault free and faulty modes.The numerical results are obtained using linear then non-linear longitudinal dynamics of fixed wing unmanned aerial vehicle. It is compared to interacting multiple model Kalman filters and regularized particle filters and shown to outperform them in terms of accuracy, robustness and convergence time in the scenarios considered. The state estimation is also more accurate and robust to faults using the proposed approach. Performance enhancement compared to other filters is more pronounced when fault amplitudes increase.An enhanced version of this method named the robustified jump-Markov regularized particle filter is also presented and allows one to accurately and rapidly estimate faults with no prior knowledge of the fault dynamics. Finally, a new approach to compute an adaptive transition probability matrix is introduced by computing the false alarm and missed detection probabilities using a saddlepoint approximation.The proposed navigation algorithms allow unmanned aerial vehicles to meet their path following objectives autonomously, with increased safety and accuracy.
... Kernel estimation depends essentially on two parameters: the influence of radius (τ) and the Kernel estimation function (k) (Rudke et al., 2021). The radius influence was calculated using a variation of the Silverman's Rule of Thumb (Silverman, 1986). This method considers: (1) the mean center of the entry points; (2) the distance from the mean center; (3) the median of these distances (Dm), and; the standard distance (SD); and is given by Eq. (9): where n is the number of points analyzed. ...
... The Kernel density calculation is based on the quartic kernel function (Eq. 10), described in (Silverman, 1986): ...
Article
Full-text available
This article aims is to apply a methodology to identify Trip Generating Territories (TGT) in order to discuss the relationship between transport and land use. To achieve this goal, indirect methods were applied through digital processing of orbital remote sensing images along with spatial analysis, using the Kernel Density Estimation (KDE) method. The image processing results have revealed an overall accuracy of 71% for differentiation and characterization of intra-urban classes use through the adopted typologies. In this way, through data on land use and occupation, it was possible to map out the density of the trip generating territories associated to the main built surfaces and that can indicate a higher potential of trip attraction in the urban area. Additionally, the relationship between land use and public transport system was observed in the city of Petrolina-Brazil, the empirical area of study, and the highest concentration of TGT was observed to be located in the central portion of the city, on its surroundings and on the margins of some arterial roads. Thus, it is possible to extract, in the preliminary analysis, information to support the city's transport and mobility planning.
... Notice that, we do not add a softmax layer here, because there is no constrains on the sum of magnitude of predicted components. The order α is 0.75 and the kernel width σ is 0.1584 according to Silverman's rule [29]. Experiments are repeated for 10 times and the average scores are recorded. ...
... The z axis is the absolute correlation coefficient |ρ|. Normally, we can first set the order to be 1.01 to approximate Shannon entropy and apply Sliverman rule [29] for kernel size, that is N −1/(4+p) , where p is the dimension of the latent vector, N can be considered as the batch size. We set the batch size to be 2000, and the σ based on Sliverman rule should be 0.2187. ...
Preprint
We develop a new neural network based independent component analysis (ICA) method by directly minimizing the dependence amongst all extracted components. Using the matrix-based R{\'e}nyi's α\alpha-order entropy functional, our network can be directly optimized by stochastic gradient descent (SGD), without any variational approximation or adversarial training. As a solid application, we evaluate our ICA in the problem of hyperspectral unmixing (HU) and refute a statement that "\emph{ICA does not play a role in unmixing hyperspectral data}", which was initially suggested by~\cite{nascimento2005does}. Code and additional remarks of our DDICA is available at https://github.com/hongmingli1995/DDICA.
... For the wind speed V w and the wind-wave misalignment Θ, the kernel density estimation (KDE) is adopted to obtain the marginal distributions, since both variables have a lower limit and an upper limit, indicating that the extrapolation in their tail region to consider extreme events is unnecessary (Li and Zhang, 2020). The PDF given by the KDE method can be expressed as (Silverman, 1998) ...
... where K is the kernel function, which is taken as Gaussian kernel in this study; x i is the observation sample; n is the number of samples; and h is the bandwidth, which can be determined according to the optimal bandwidth criteria (Silverman, 1998). It is worth pointing out that the von Mises distribution is also usually adopted in the probabilistic modeling of wind-wave misalignment (Stewart et al., 2016). ...
Article
In the fatigue assessment of offshore wind turbines, joint probabilistic models of long-term wind and wave parameters are usually required. In practice, annual met-ocean data typically exhibit non-stationarity due to the seasonal variations and extreme weather effects, and therefore cannot be considered as being from the same probability space. Thus, data separation and data segmentation should be performed. In this paper, the full probabilistic modeling of wind and wave parameters for a site in the South China Sea usually hit by typhoons is studied. For this purpose, the typhoon data is firstly separated from the normal wind data using multi-source data and physically based approach. Then, a modified Fisher's optimum partition method is proposed for the seasonal effects segmentation of the normal wind data. On this basis, the full probabilistic model of the environmental variables is developed using the C-vine copula method. The application of the full probabilistic model to the fatigue analysis of a floating offshore wind turbine (FOWT) is illustrated through an example. Numerical results indicate that the separation and segmentation of the long-term met-ocean data is quite significant to the full probabilistic modeling of the environmental variables, and the proposed methods can deal with this problem effectively.
... Preventing cyber-attacks is mainly based on MTS anomaly detection algorithms [5,6], while optimizing customer experience by understanding latent common behaviours relies heavily on MTS clustering [7,8]. These and other applications have driven the development of numerous analysis and modeling techniques, ranging from classical parametric methods (such as the vector autoregression family [9] and state space models [10]) to non-parametric approaches (such as kernel smoothing [11] and empirical mode decomposition [12]) and even their hybrid forms (such as generalized additive models [13] and Bayesian structural time series [14]). Although these techniques have proven successful in various fields, the ever-increasing volume and complexity of MTS data combined with rapid advances in computational resources have made the Deep Neural Network (DNN) the predominant method for processing MTS. ...
Article
Full-text available
This paper introduces Tensor Visibility Graph-enhanced Attention Networks (TVGeAN), a novel graph autoencoder model specifically designed for MTS learning tasks. The underlying approach of TVGeAN is to combine the power of complex networks in representing time series as graphs with the strengths of Graph Neural Networks (GNNs) in learning from graph data. TVGeAN consists of two new main components: TVG which extend the capabilities of visibility graph algorithms in representing MTSs by converting them into weighted temporal graphs where both the nodes and the edges are tensors. Each node in the TVG represents the MTS observations at a particular time, while the weights of the edges are defined based on the visibility angle algorithm. The second main component of the proposed model is GeAN, a novel graph attention mechanism developed to seamlessly integrate the temporal interactions represented in the nodes and edges of the graphs into the core learning process. GeAN achieves this by using the outer product to quantify the pairwise interactions of nodes and edges at a fine-grained level and a bilinear model to effectively distil the knowledge interwoven in these representations. From an architectural point of view, TVGeAN builds on the autoencoder approach complemented by sparse and variational learning units. The sparse learning unit is used to promote inductive learning in TVGeAN, and the variational learning unit is used to endow TVGeAN with generative capabilities. The performance of the TVGeAN model is extensively evaluated against four widely cited MTS benchmarks for both supervised and unsupervised learning tasks. The results of these evaluations show the high performance of TVGeAN for various MTS learning tasks. In particular, TVGeAN can achieve an average root mean square error of 6.8 for the C-MPASS dataset (i.e., regression learning tasks) and a precision close to one for the SMD, MSL, and SMAP datasets (i.e., anomaly detection learning tasks), which are better results than most published works.
... where b f h ðxÞ is the probability density function for estimating kernel density; h is the bandwidth, which is a smoothing parameter; x is the sample points of terrorist attacks; K h is the Kernel function described in Silverman's work (Kemp and Silverman, 1987). ...
Article
Full-text available
Risk assessment and categorization of terrorist attacks can assist enhance awareness of terrorism and give crucial information support for anti-terrorism efforts. This study utilizes quantitative approaches for the risk assessment and categorization of terrorist attacks. A total of 210,454 terrorist attacks that occurred worldwide from 1970 through 2020 were collected in the Global Terrorism Database, and 22 indicators related to the risk of terrorist attacks were selected. Then, the moment estimation theory and four comprehensive evaluation models were utilized to identify the top 10 riskiest terrorist attacks in the world. Furthermore, the five clustering analysis methods and three evaluation criteria were performed for the risk categorization of terrorist attacks, and the visual analysis was carried out using the kernel density estimation method. The research results have identified the top 10 riskiest global terrorist attacks, which were led by the September 11 terrorist attack event, along with their downward counterfactual events. The spatial distribution of global terrorist attack risk is primarily composed of four “turbulent cores” in the region of Central Asia, Middle East & North Africa, South Asia, and Central America & Caribbean. This study also provided insights and recommendations for anti-terrorism efforts. It has realized the risk assessment and categorization of terrorist attacks, aiding in the swift identification of its risk levels, and holds immense significance for safeguarding global national security and societal stability under new circumstances.
... The results were verified via field investigations in typical areas. To identify gully spatial characteristics, the KD of gullies was calculated using the KD analysis tool in ArcGIS (Kemp and Silverman, 1987;Zhao et al., 2020), with the search radius calculated using the mean change point method (Ning et al., 2022). The spatial resolution of all the grid data was 90 m × 90 m. ...
Article
The primary factors controlling regional gully distribution in mountainous areas are poorly understood. To investigate the spatial characteristics and controlling factors of mountainous gullies at the regional scale, kernel density (KD) estimation, semivariogram, and Geodetector methods were used based on 11 environmental factors of gullies in the Yuanmou dry-hot valley. The results show that (a) gullies are widely but unevenly distributed in the valley, with an average KD of 1.155 km/km ² , and gully distribution displays spatial autocorrelation with environmental factors over diverse scales; (b) relief amplitude (Ra), landform type, and slope are the primary factors controlling gully distribution, and land use type, precipitation, and elevation are also important factors; and (c) a high risk of gully erosion occurs in areas with an elevation of more than 1437 m, slope greater than 15°, and Ra greater than 173 m, with grassland vegetation or luvisols and cambisols soil types. These results will not only help to understand the spatial pattern and formation mechanism of gullies at the macroscopic scale but also provide a scientific reference for regional gully management.
... Kernel density estimation is a non-parametric estimation method widely used in research to describe the distribution patterns of random variables (Silverman, 1986). Unlike ordinary parametric estimates, kernel density estimates do not add any prerequisites but fit distributions directly based on the characteristics and properties of the datasets themselves (Quah, 1997). ...
Article
Full-text available
Reducing carbon emissions and increasing carbon sinks are key strategies to effectively remove greenhouse gases from the atmosphere. Assessing the current carbon emission status and predicting future carbon emission scenarios could help formulate effective regional carbon emission reduction targets. However, it is necessary to enhance the carbon sink capacity of terrestrial ecosystems and improve forest management methods to promote greenhouse gas absorption. In this study, the spatiotemporal characteristics and dynamic evolution of fossil fuel CO2 (FFCO2) emissions in China from 2000 to 2019 were analyzed using the standard deviation ellipse, kernel density of emissions, and Theil index. A backpropagation (BP) neural network optimized with a genetic algorithm (GA) was used to predict FFCO2 emission during 2020–2030. The biomass increment methodology was used to predict the potential carbon sinks generated by Chinese Certified Emission Reduction (CCER) carbon-sink forestry projects during 2020–2030. The results showed that China’s FFCO2 emissions exhibited a gradual increasing trend during 2000–2019, with an average annual growth rate of 6.29%. China’s FFCO2 emissions show a greater distribution on the southeastern coast than in the northwestern interior. The GA-BP prediction shows that China’s FFCO2 emissions will continue to fluctuate and increase between 2020 and 2030. The potential of carbon sinks will be 0.69 × 108 Mg C generated by the CCER carbon-sink forestry projects during 2020–2030, which could offset 0.56% of FFCO2 emissions. In the future, imbalances in regional development should be considered when formulating carbon-reduction strategies. Moreover, using carbon-sink of forestry projects of CCER to balance economic development including poverty eradication and environmental conservation should be considered. Specifically, establishing a methodology of projects for the management of natural forests’ carbon-sink will be an important future strategy.
... Finally, we identified 3 city centers (one for each county), 13 city subcenters (one for each district), and 101 town centers (one for each town) in the study area (as shown in Fig. 1). The kernel density of POIs was estimated using a Gaussian-based kernel with the bandwidth automatically calculated based on Silverman's rule of thumb (Kemp & Silverman, 1987). ...
Article
Cellular automata (CA) has become one of the most prevalent approaches for spatially explicit urban growth modeling. Previous studies have investigated how the key components of CA models are defined, structured, and coupled to represent the top-down and bottom-up processes of urban growth. However, the spatiotemporal heterogeneity of urban demand at the macro level and its coupling with micro-level urban land configurations have not been fully explored in existing CA models. This study proposes a new urban CA modeling framework to simulate urban expansion by using a spatiotemporally explicit urban demand modeling scheme that guides the patch-based allocation of urban land at the micro level. In this framework, spatiotemporal Gaussian-based models were applied to represent the spatiotemporal heterogeneity of urban demand within a set of concentric rings in terms of the fraction of new urban land and frequency of new urban development. An application of the modeling framework to the metropolitan city of Wuhan, China demonstrates that the demand of new urban land in the study area exhibits an outgoing wave-shaped propagation pattern, which can be well fitted by the spatiotemporal Gaussian-based models, with R² values exceeding 0.8. The proposed spatiotemporally explicit representation of urban demand can improve model performance in capturing urban dynamics at both macro and micro levels, as revealed by pattern-level similarity and cell-level agreement of simulation results.
... This way, when data points are condensed, they are condensed in terms of their diffusion probabilities. Using default settings, diffusion condensation is calculated on potential distance using a fixed-bandwidth Gaussian kernel, where the initial bandwidth is set to 1/10 of Silverman's rule of thumb for kernel bandwidth 47 . The bandwidth is then increased by a ratio of 1.025 every iteration. ...
Article
Full-text available
As the biomedical community produces datasets that are increasingly complex and high dimensional, there is a need for more sophisticated computational tools to extract biological insights. We present Multiscale PHATE, a method that sweeps through all levels of data granularity to learn abstracted biological features directly predictive of disease outcome. Built on a coarse-graining process called diffusion condensation, Multiscale PHATE learns a data topology that can be analyzed at coarse resolutions for high-level summarizations of data and at fine resolutions for detailed representations of subsets. We apply Multiscale PHATE to a coronavirus disease 2019 (COVID-19) dataset with 54 million cells from 168 hospitalized patients and find that patients who die show CD16hiCD66blo neutrophil and IFN-γ+ granzyme B+ Th17 cell responses. We also show that population groupings from Multiscale PHATE directly fed into a classifier predict disease outcome more accurately than naive featurizations of the data. Multiscale PHATE is broadly generalizable to different data types, including flow cytometry, single-cell RNA sequencing (scRNA-seq), single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), and clinical variables. Disease signatures in high-dimensional biomedical data are detected with a visualization algorithm.
... (3) Kernel density analysis: Calculating the kernel density involves drawing a circular field around each sample point and then forming a smooth and curved surface suitable for each sample point using a mathematical function that changes from 1 at the center to 0 at the boundary. The calculation formula is expressed as follows (Kemp & Silverman, 1987): ...
Article
Polycentric Spatial structure Morphological dimension Functional dimension Network motif Abbreviations DAC, directed alternative centrality DAP, directed alternative power POI, point of interest A B S T R A C T Polycentric urban structures are important spatial development strategies in megacities; however, they have decentralized morphologies and functional characteristics that profoundly impact urban space development. Therefore, when promoting sustainable development in urban areas in Shenyang, the only megacity in northeast China, it is important to explore the polycentric spatial development of megacities based on multi-source data, geographic information system spatial analysis, and network model methods. This study was conducted to examine the polycentric structures of the morphological and functional dimensions. The results showed that morphological polycentric space can be defined as "centralized dispersion," with the functional polycentric only comprised "decentralized concentration." Additionally, agglomeration sub-centers of the morphological dimension were all on the periphery of the city, although the three sub-centers within the city (M6, M7, and M8) were diffusion types. In the functional dimension, sub-centers F7 and F6 exhibited high centrality and power for both agglomeration and diffusion. Finally, four-node motifs were more important than three-node motifs in the morphological polycentric micro-networks, but their frequency was low, whereas that of three-node motifs were more important than four-node motifs in functional polycentric micro-networks, and their occurrence frequency was high.
... Here,σ is the estimated standard variation of logðxÞ related to the dataset of Eq. (5). 41 For the sake of simplicity, a Gaussian kernel is considered here. ...
Article
Full-text available
We address the features channel characterization and performance evaluation for wireless body-area networks (WBANs) in medical applications during a walk scenario using optical wireless transmission. More specifically, we focus on optical extra-WBAN uplink communication between a central coordinator node (CN) placed on the patient’s body and an access point (AP) in a typical hospital room. To characterize the optical wireless channel, we use a Monte Carlo ray tracing-based method and take into account the effects of body shadowing and mobility based on realistic models, in contrast to the previous simplistic models considered in the literature. Using this approach, we derive the dynamic behavior of the channel DC gain for different configurations of the CNs and APs. Furthermore, based on the obtained results, we develop a statistical channel model based on kernel density estimation, which we use to investigate the impact of CN and AP placement on the communication link parameters. Also, based on the outage probability criterion, we discuss the link performance and further analyze the improvement in performance achieved through spatial diversity, i.e., by using multiple APs in the room, for different photodetector types, under different background noise conditions. The presented results show that CN placement and user’s local and global mobility significantly impact the performance of extra-WBAN links, which can nevertheless be reduced using spatial diversity. Finally, the presented performance analysis shows that a single AP equipped with an avalanche photodiode photodetector allows an acceptable link performance for low-to-moderate background noise conditions, whereas multiple APs equipped with PIN photodetectors should be used in the case of moderate-to-strong background noise.
... n c is the total number of observation X in class c. k is the number of possible unique feature values for input X. According to Silverman (1986), the rule-of-thumb method for estimating bandwidth "h" is h = 4 σ 5 3n ...
Article
Full-text available
This paper proposes a Naïve Bayes Classifier for Bayesian and nonparametric methods of analyzing multinomial regression. The Naïve Bayes classifier adopted Bayes' rule for solving the posterior of the multinomial regression via its link function known as Logit link. The nonparametric adopted Gaussian, bi-weight kernels, Silverman's rule of thumb bandwidth selector, and adjusted bandwidth as kernel density estimation. Three categorical responses of information on 78 people using one of three diets (Diet A, B, and C) that consist of scaled variables: age (in years), height (in cm), weight (in kg) before the diet (that is, pre-weight), weight (in kg) gained after 6 weeks of diet were subjected to the classifier multinomial regression of Naïve Bayes and nonparametric. The Gaussian and bi-weight kernel density estimation produced the minimum bandwidths across the three categorical responses for the four influencers. The Naïve Bayes classifier and nonparametric kernel density estimation for the multinomial regression produced the same prior probabilities of 0.3077, 0.3462, and 0.3462; and A prior probabilities of 0.3077, 0.3462, and 0.3462 for Diet A, Diet B, and Diet C at different smoothing bandwidths.
... Normalization is necessary because the volumes differed in magnitude among lakes. We then calculated the trends in the normalized volume for all lakes, and smooth the trends spatially using the kernel density method [46,47]: ...
Article
Full-text available
Lakes play a key role in the global water cycle, providing essential water resources and ecosystem services for humans and wildlife. Quantifying long-term changes in lake volume at a global scale is therefore important to the sustainability of humanity and natural ecosystems. Yet, such an estimate is still unavailable because, unlike lake area, lake volume is three-dimensional, challenging to be estimated consistently across space and time. Here, taking advantage of recent advances in remote sensing technology, especially NASA’s ICESat-2 satellite laser altimeter launched in 2018, we generated monthly volume series from 2003 to 2020 for 9065 lakes worldwide with an area ≥ 10 km2. We found that the total volume of the 9065 lakes increased by 597 km3 (90% confidence interval 239–2618 km3). Validation against in situ measurements showed a correlation coefficient of 0.98, an RMSE (i.e., root mean square error) of 0.57 km3 and a normalized RMSE of 2.6%. In addition, 6753 (74.5%) of the lakes showed an increasing trend in lake volume and were spatially clustered into nine hot spots, most of which are located in sparsely populated high latitudes and the Tibetan Plateau; 2323 (25.5%) of the lakes showed a decreasing trend in lake volume and were clustered into six hot spots—most located in the world’s arid/semi-arid regions where lakes are scarce, but population density is high. Our results uncovered, from a three-dimensional volumetric perspective, spatially uneven lake changes that aggravate the conflict between human demands and lake resources. The situation is likely to intensify given projected higher temperatures in glacier-covered regions and drier climates in arid/semi-arid areas. The 15 hot spots could serve as a blueprint for prioritizing future lake research and conservation efforts.
... To map the recreational use, the spatial information associated with each dataset was analysed within a GIS environment. Line density [58] was applied to calculate the magnitude per unit area from each dataset that was within a 100 m radius around each 25 m cell grid. The result was a raster image where one can identify the favourites and most popular places for each activity, i.e., hot spots, and their general spatial distribution. ...
Article
Full-text available
Data obtained through Volunteered Geographical Information (VGI) have gradually been used to monitor and support planning mainly in urban contexts. Regarding recreational activities in peri-urban green and natural areas, VGI has been used to map, measure use intensity, profile users, and evaluate their preferences and motivations. Given their extensive use, it is now worthwhile to assess the value of VGI data to (1) compare recreational uses, profile users and map recreational activities in different contexts (metropolitan vs. rural areas), and (2) evaluate outdoor and adventure tourist products such as Grand Routes (GR). Data from former GPSies (AllTrails nowadays), one of the most popular web-share services, were used to assess recreational uses in Lisbon Metropolitan Area (LMA) and southwest Portugal (SWPT). A set of 22,031 tracks of “on foot” and “on wheels” activities, submitted by 3297 national and foreign users, covering 12 years, was analysed within a GIS modelling environment. Results indicate that, although there are many more submissions in the LMA, the influence of foreigners in the SWPT is higher (11% vs. 19%). The existing GR in SWPT concentrates the foreign use for hiking (71% of foreign vs. 28% of national users), demonstrating its attractiveness. For the favourite activity in both areas—Mountain biking—results show a higher spatial dispersion, yet part of the activity in SWPT still conforms to the GR (16% of foreign and 20% of national use). This study proves other applications for VGI, showing its usefulness for assessing recreational uses in both metropolitan and rural areas. Spatial knowledge about recreational uses is a valuable tool to evaluate and monitor such activities, and to know what users like to do, and where, and is also useful information when designing recreational products considering their tourist potential, thus adding value to these offers.
... The KDE is one of the most popularly used methods in order to analyse the properties of a point event distribution [27], [28]. It has been used widely in the analysis of traffic accident 'hotspots' and detection. ...
Article
Full-text available
In this study, the southern expressway, which is the first and lengthiest E class highway (126 km) in Sri Lanka, was analysed for roadside accident incidences. The primary objective of this paper is to identify the best-fit interpolation techniques for the hotspots' most distinctive causes of vehicular crashes. The accident details were collected from the Police Headquarters consisting of 966 accidents that took place during the period from 2015 to 2017. To identify accident hotspots, GIS-based interpolation techniques such as Ordinary Kriging, Kernel Density Estimation (KDE), Inverse Distance Weighting (IDW), and Nearest Neighbour Interpolation methods were used. The spatial interpolation outcome of the four methods was compared based on the standard Prediction Accuracy Index (PAI). The analysis was executed using QGIS and GeoDa. Results of PAI revealed that an IDW and KDE outperformed the other two interpolation methods. The left and right lanes of the expressway, spotted with 11 and 20 hotspots, respectively, indicate the right lane was 50% more prone to accidents than the left lane. Notably, nearly 5% of the entire road stretch is estimated as accident-prone spots in both lanes. Peak accidents were recorded during afternoon and evening hours, and buses were the most active vehicle type. Uncontrolled speeding was the primary reason for more than 50% of the accidents, while unsuccessful overtake accounted for more than 20% of the accidents on the highway. The road design modifications and warning sign placements at appropriate places may be recommended as countermeasures. Keywords: Highway, hotspot analysis, kernel density estimation, prediction accuracy index, vehicle collision
... One possible approach which we term the CDF-ZML method, is to partition the input space into a set with probabilistic characteristics { p U (π r )}, using kernel density estimation [73] or a direct cdf estimation algorithm [74]. From the cdf it is possible to directly partition the input space according to { p U (π r )}. ...
Article
Full-text available
An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.
... It determines a magnitude per unit area from point features using a kernel function to fit a smoothly tapered surface to each point. 9 The surface value is highest at the location of the point, and diminishes away from the point, reaching zero at the end of the radius (Figure 3). 10 The kernel density method imposes a regular grid onto the study area and uses a three dimensional kernel function to visit each grid cell ( Figure 3) and to calculate a density value assigned to each grid cell. 11 The final kernel density estimate for one cell is then calculated by totalling all the values obtained from all the kernel density functions for that particular cell. ...
... ., T). The modeθ d,MAP can be defined as the univariate mode of the kernel density estimate (Silverman, 1998) of the univariate density for the sample of θ (t) d (t = 1, . . ., T) (see Johnson and Sinharay, 2016). ...
... e.g. Silverman [26]. In our geometric setting, it seems natural to resort to the heat kernel (associated to the Laplace-Beltrami operator). ...
Preprint
Using jointly geometric and stochastic reformulations of nonconvex problems and exploiting a Monge-Kantorovich gradient system formulation with vanishing forces, we formally extend the simulated annealing method to a wide class of global optimization methods. Due to an inbuilt combination of a gradient-like strategy and particles interactions, we call them swarm gradient dynamics. As in the original paper of Holley-Kusuoka-Stroock, the key to the existence of a schedule ensuring convergence to a global minimizer is a functional inequality. One of our central theoretical contributions is the proof of such an inequality for one-dimensional compact manifolds. We conjecture the inequality to be true in a much wider setting. We also describe a general method allowing for global optimization and evidencing the crucial role of functional inequalities à la Łojasiewicz.
... , L. To fit model (3), we consider each weight function w jl (.) as a smooth function approximated by a function belonging to the finite-dimensional space spanned by B-splines basis functions. This is not the only possibility, as other bases could be chosen such as Fourier expansion, wavelets, natural splines, etc. (Silverman, 2018). Also, we are going to choose the number of knots and knots placement in an ad-hoc manner. ...
Preprint
We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of L components of a mixture model. This process can be thought as a hierarchical model with first level modelling a scalar response according to a mixture of parametric distributions, the second level models the mixture probabilities by means of a generalised linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality since functional covariates can be measured at very small intervals leading to a highly parametrised model but also does not take into account the nature of the data. We use basis expansion to reduce the dimensionality and a Bayesian approach to estimate the parameters while providing predictions of the latent classification vector. By means of a simulation study we investigate the behaviour of our approach considering normal mixture model and zero inflated mixture of Poisson distributions. We also compare the performance of the classical Gibbs sampling approach with Variational Bayes Inference.
... Data were checked for normality assumption and, if found to be not normally distributed, subsequently transformed using various functions such as the log10 and cubic-root to find the optimal transformation for the underlying chromatin long-range interaction distance distribution data. Chromatin long-range interaction distance density functions were then estimated and plotted by group under the optimal transformation using the non-parametric kernel density approach with a normal weight function [34][35][36]. Pairwise comparisons of median chromatin longrange interaction distance among the groups were performed with multiple comparison adjustments using the Dwass, Steel, and Critchlow-Fligner method based on the Wilcoxon test for downstream and upstream data separately [36][37][38][39]. Pairwise comparisons between groups for chromatin long-range interaction distance distribution, median distance location shift, and distance distribution scale were implemented using the Kolmogorov-Smirnov two-sample test, Hodges-Lehmann estimation method and Fligner-Policello test, and Ansari-Bradley test, respectively, again for downstream and upstream data separately [36][37][38][39][40]. SAS version 9.4 was used to perform these analyses and generate density function plots for the chromatin long-range interaction distance data. ...
... Kernel density analysis has been applied in order to identify zones of accumulation of materials and classification of lower and higher density areas. This method calculates the density of point features around each raster cell, generating a smoothed map based on the location of points relative to one another (Silverman, 1986;Baxter et al., 1997). ...
Article
Puesto Roberts 1 -PR1- is an open-air archaeological site emplaced in a deflation sector that offer the possibility to investigate diverse lithic activities carried out by hunter-gatherers in central Patagonia Argentine. The site is located on the coast of Colhué Huapi lake, in Chubut province. In this work, we present a techno-morphological and intra-site spatial analysis of the lithic artifactual assemblage to discuss about stone tool production, lithic raw materials and to explore the activities that were carried out at the site. The action of post-depositional natural agents that may have caused the mobilization of materials and lithic refits are also analysed to bring insights into the spatial distribution of artifacts. The site is the result of the anthropogenic action, and the mobilization of certain materials produced mainly by the action of wind and water it was recognized. Different stages of knapping, lithic raw materials and lithic tools were recognized. Finally, PR1 is interpreted as an open-air campsite of short duration in its occupation on the late Holocene. Evidences of almost exclusive processing and consumption of carcass of Lama guanicoe were identified. This differs from sites recorded around the lake with chronologies similar to PR1, which show evidences of processing and consumption of fish. Therefore, PR1 contributes to the knowledge of the variability of human activities developed in this lacustrine landscape. https://authors.elsevier.com/a/1eYCw_,5MKXACHU
... Kernel regression (Nadaraya-Watson Estimator) was established by Nadaraya in 1965 and Watson in 1964 which is one of the most frequently used technique in nonparametric. [5,6,12] Both Nadaraya and Watson indicated the general estimator of ( ) f x in nonparametric regression related to smoothing bandwidth (h) and kernel (k) in bellow formula. ...
Article
Full-text available
Nonparametric kernel estimators are mostly used in a variety of statistical research fields. Nadaraya-Watson kernel (NWK) estimator is one of the most important nonparametric kernel estimators that is often used in regression models with a fixed bandwidth. In this article, we consider the four new Proposed Adaptive NWK Regression Estimators (Interquartile Range [IQR], Standard Deviation [SD], Mean Absolute Devotion, and Median Absolute Deviation) rather than (Fixed Bandwidth, Adaptive Geometric, Adaptive Mean, Adaptive Range, and Adaptive Median). The outcomes in both simulation and actual data in leukemia cancer showed that the four new Aadaptive NWK Estimators (IQR, SD, Mean Absolute devotion, and Median Absolute Deviation) are more effective than the kernel estimations with fixed bandwidth in previous studies based Mean Square Error Criterion.
... Kernel density estimation (KDE) is a statistical technique for estimating a probability density function from a ran-dom sample set [38,39]. As a common tool, KDE has been explored in various areas for different purposes [11], especially in spatio-temporal database [3,[40][41][42][43]. ...
Article
Full-text available
Sources of complementary information are connected when we link user accounts belonging to the same user across different platforms or devices. The expanded information promotes the development of a wide range of applications, such as cross-platform prediction, cross-platform recommendation, and advertisement. Due to the significance of user account linkage and the widespread popularization of GPS-enabled mobile devices, there are increasing research studies on linking user account with spatio-temporal data across location-aware social networks. Being different from most existing studies in this domain that only focus on the effectiveness, we propose a novel framework entitled HFUL (A Hybrid Framework for User Account Linkage across Location-Aware Social Networks), where efficiency, effectiveness, scalability, robustness, and application of user account linkage are considered. Specifically, to improve the efficiency, we develop a comprehensive index structure from the spatio-temporal perspective, and design novel pruning strategies to reduce the search space. To improve the effectiveness, a kernel density estimation-based method has been proposed to alleviate the data sparsity problem in measuring users’ similarities. Additionally, we investigate the application of HFUL in terms of user prediction, time prediction, and location prediction. The extensive experiments conducted on three real-world datasets demonstrate the superiority of HFUL in terms of effectiveness, efficiency, scalability, robustness, and application compared with the state-of-the-art methods.
... Basically, higher weights were assigned to the points closer to the centre; consequently, they contributed more to the total density value of the cell. By adding the values of overall circle surfaces for each location, the final grid values were computed (Mohammadi & Kaboli, 2016;Silverman, 2018). ...
Article
One of the key purposes of conservation selection strategies is to design a network of sites to support relevant biodiversity components and, therefore, decrease the risk of populations becoming isolated. To this end, it is important to be aware of the habitat locations of the target species and the threats of human activities, in order to identify areas of a high conservation priority. This paper takes the Chaharmahal and Bakhtiari province (Iran) as a case study, to highlight a network optimization for six target species of conservation concern, including the Persian leopard, Panthera pardus Pocock, wild sheep, Ovis orientalis Gmelin and wild goat, Capra aegagrus Erxleben. To run the optimization, we first generated the following input data: we modelled suitable habitats, using the InVEST model (Integrated Valuation of Environmental Services and Tradeoffs) and simulated the ecological impact of road networks (Spatial Road Disturbance Index (SPROADI), Kernel Density Estimation (KDE) and the Landscape Ecological Risk Index (ERI)). A visual inspection of the input data revealed that a large percentage of the study area constitutes a suitable habitat for the target species, however, the disturbances caused by the road demonstrate that the central and north-eastern regions of the study area are significantly affected. Indeed, approximately 10% and 25% of the study area are in the high and medium risk categories, respectively. Optimization using Marxan, shows that the north-western and southern regions of the study area should be given high conservation priority, necessary for an efficient conservation network. Habitats located in the north-central region should act as stepping-stone areas or corridors between the isolated regions in the north-east and the well-connected areas in the north-west and south. Overall, the findings of the present study show that the current network of protected areas is not contradictory to that suggested by Marxan, but has deficiencies in terms of size and stepping-stones.
... where P t is the endogenous regressor, K(x) = 0.75 · (1 − x 2 ) · I( x <= 1) and the bandwidth b is the one proposed by Silverman (1986), and is equal to b = 0.9 · T −1/5 · min(s, IQR/1.34). ...
... This KDE employs a Gaussian kernel function and is computed on an evenly spaced grid of 1024 points, covering the full range of the given posterior samples. The kernel bandwidth h is derived using Silverman's rule (Silverman 1986), ...
Preprint
The characterization of an exoplanet's interior is an inverse problem, which requires statistical methods such as Bayesian inference in order to be solved. Current methods employ Markov Chain Monte Carlo (MCMC) sampling to infer the posterior probability of planetary structure parameters for a given exoplanet. These methods are time consuming since they require the calculation of a large number of planetary structure models. To speed up the inference process when characterizing an exoplanet, we propose to use conditional invertible neural networks (cINNs) to calculate the posterior probability of the internal structure parameters. cINNs are a special type of neural network which excel in solving inverse problems. We constructed a cINN using FrEIA, which was then trained on a database of 5.61065.6\cdot 10^6 internal structure models to recover the inverse mapping between internal structure parameters and observable features (i.e., planetary mass, planetary radius and composition of the host star). The cINN method was compared to a Metropolis-Hastings MCMC. For that we repeated the characterization of the exoplanet K2-111 b, using both the MCMC method and the trained cINN. We show that the inferred posterior probability of the internal structure parameters from both methods are very similar, with the biggest differences seen in the exoplanet's water content. Thus cINNs are a possible alternative to the standard time-consuming sampling methods. Indeed, using cINNs allows for orders of magnitude faster inference of an exoplanet's composition than what is possible using an MCMC method, however, it still requires the computation of a large database of internal structures to train the cINN. Since this database is only computed once, we found that using a cINN is more efficient than an MCMC, when more than 10 exoplanets are characterized using the same cINN.
... The conditional logit model was estimated using the clogit command, and the mixlogit command was used for the mixed logit model (Hole, 2007a). Epanechnikov kernel density graphs (Silverman, 1998) were plotted with the kdensity command to illustrate the distribution of the individual parameter estimates (2019). WTA was estimated using the wtp command and the confidence intervals were calculated using the delta method (Hole, 2007b). ...
Article
Full-text available
Government-funded payments for ecosystem services (PES) have increasingly been used to facilitate transactions between users of environmental services and their providers. In order to improve the link between payments and the service provided, some countries in the EU have promoted result-based schemes (RBS), which remunerate farmers for ecological results, as part of their agricultural policy. Since PES programs are voluntary, it is important to understand farmers’ responses before more large-scale implementations of RBS are initiated. Using a choice experiment and a mixed logit model, we elicited the preferences of farmers in two Natura 2000 sites in Slovenia for different design elements of a hypothetical scheme for dry grassland conservation. We found that the majority of farmers preferred the result-based approach over the management-based scheme both in terms of payment conditions and monitoring; one group of farmers preferred the RBS very strongly (average WTA of more than 500 EUR/ha/yr) and another group less strongly (average WTA about 200 EUR/ha/yr). Farmers also showed a higher preference for on-farm advise and training in small groups than for lectures, which would be offered to a larger audience. A collective bonus, which would incentivise coordination and could potentially increase participation rates in the scheme, significantly influenced the farmers’ willingness to adopt the scheme. However, the estimated average WTA was comparable or lower than the 40 EUR/ha annual bonus payment. Older farmers and those who managed small and semi-subsistent farms were significantly more likely to be highly resistant to scheme adoption no matter its design.
... This formula sets a bound to ensure thatθ * j is equal or greater than unity. Silverman (1986) describes the procedure for determining the bandwidth h. ...
Article
Full-text available
The results-based budgeting (RBB) framework is a public management strategy in which economic resources are allocated to certain budget programmes, oriented towards delivering specific products and results to the population. The present paper analyzes the regional governments' efficiency in using their economic resources, under the RBB framework, with an application to the Peruvian context. To this end, we employ a data envelopment analysis (DEA) model with bootstrapping. In the first stage, different sectors of the regional governments are considered individually: education, security, health, sanitation, transportation, and recreation. In the second stage, the overall efficiency index is calculated using the sectoral indices obtained in the first stage. Finally, the factors or determinants influencing the level of efficiency are analyzed. The results show improvements in efficiency levels in the areas of health and sanitation, to the detriment of the rest of the sectors. The average overall efficiency level over the period 2013-2016 remains in the range of 0.25-0.30, which indicates an inefficiency level of 70%. Finally, the variables fiscal autonomy, capital stock, and population density show a positive relationship with respect to the overall efficiency index.
... For each vowel and analysis scale, a two-dimensional joint distribution of samples of standardized vowel distance and standardized social distance was calculated over players and analysis windows. This was done by calculating a two-dimensional discrete Gaussian kernel density function, using a 30-point grid from −3.0 to 3.0, with optimal bandwidth of n − 1 6 (see [49] for motivation), where n is the number of observations. The density was then normalized to sum to unity. ...
Article
Full-text available
Linguistic behaviors arise from strongly interacting, non-equilibrium systems. There is a wide range of spatial and temporal scales that are relevant for the analysis of speech. This makes it challenging to study language from a physical perspective. This paper reports on a longitudinal experiment designed to address some of the challenges. Linguistic and social preference behavior were observed in an ad-hoc social network over time. Eight people participated in weekly sessions for 10 weeks, playing a total of 535 map-navigation games. Analyses of the degree of order in social and linguistic behaviors revealed a global relaxation toward more ordered states. Fluctuations in linguistic behavior were associated with social preferences and with individual interactions.
... The probability density of importation events over time into Saudi Arabia by cluster is presented in Fig. 2b where a gaussian kernel density estimator is used with a bandwidth of 5.51 chosen using the Silverman rule of thumb, implemented in the geom_density function of the ggplot2 R package (v3.3.5) 76,77 . ...
Article
Full-text available
Monitoring SARS-CoV-2 spread and evolution through genome sequencing is essential in handling the COVID-19 pandemic. Here, we sequenced 892 SARS-CoV-2 genomes collected from patients in Saudi Arabia from March to August 2020. We show that two consecutive mutations (R203K/G204R) in the nucleocapsid (N) protein are associated with higher viral loads in COVID-19 patients. Our comparative biochemical analysis reveals that the mutant N protein displays enhanced viral RNA binding and differential interaction with key host proteins. We found increased interaction of GSK3A kinase simultaneously with hyper-phosphorylation of the adjacent serine site (S206) in the mutant N protein. Furthermore, the host cell transcriptome analysis suggests that the mutant N protein produces dysregu-lated interferon response genes. Here, we provide crucial information in linking the R203K/ G204R mutations in the N protein to modulations of host-virus interactions and underline the potential of the nucleocapsid protein as a drug target during infection.
Article
In the context of the global maritime industry, which plays a vital role in international trade, navigating vessels safely and efficiently remains a complex challenge, especially due to the absence of structured road-like networks on the open seas. This paper proposes MATNEC, a framework for constructing a data-driven Maritime Traffic Network (MTN), represented as a graph that facilitates realistic route generation. Our approach, leveraging Automatic Identification System (AIS) data along with portcall and global coastline datasets, aims to address key challenges in MTN construction from AIS data observed in the literature, particularly the imprecise placement of network nodes and sub-optimal definition of network edges. At the core of MATNEC is a novel incremental clustering algorithm that is capable of intelligently determining the placement and distribution of the graph nodes in a diverse set of environments, based on several environmental factors. To ensure that the resulting MTN generates maritime routes as realistic as possible, we design a novel edge mapping algorithm that defines the edges of the network by treating the mapping of AIS trajectories to network nodes as an optimisation problem. Finally, due to the absence of a unified approach in the literature for measuring the efficacy of an MTN’s ability to generate realistic routes, we propose a novel methodology to address this gap. Utilising our proposed evaluation methodology, we compare MATNEC with existing methods from literature. The outcome of these experiments affirm the enhanced performance of MATNEC compared to previous approaches.
Article
As the core and critical component of photovoltaic (PV) power stations, accurately evaluating the operational status of PV arrays is key to enabling intelligent operation of the power station. In the actual power station, only current and voltage data of the PV array are available, but the outputs of PV arrays exhibit noticeable random fluctuations, making it challenging to comprehensively evaluate the PV arrays in terms of power generation, data quality, and stability. To solve the above problems, this article proposed evaluation indicators for the operational status of arrays from multiple dimensions, including data quality, normalized generated power, performance ratio, dynamic performance, and stability of the arrays. The indicators were calculated individually, and then a comprehensive evaluation and recognition of the arrays’ operational status were achieved through the utilization of weight calculation and quantile method. The article presented a systematic approach for evaluating the operational status of PV arrays using multidimensional evaluation indicators and the comprehensive evaluation method innovatively. The proposed multidimensional indicators provide a comprehensive evaluation of the PV arrays' operational status, without requiring any additional hardware investment. This study aims to provide a theoretical and practical foundation to effectively monitor the operational status of PV arrays. Compared with the traditional fuzzy C-means method, the method can give quantitative evaluation results of the operational status for PV arrays.
Article
Actual environmental pollution will worsen people's perception of environmental problems, which may affect people's perception of government corruption, and thus affect the governments' credibility and social stability. From the micro perspective of residents' comprehensive feelings on environmental issues, this paper uses panel data for the period 2014–2018 obtained from a large‐scale dynamics survey in China to investigate the impact of environmental issues on corruption perception. The empirical results suggest that residents' perception of environmental problems has a significantly positive linear correlation with their corruption perception. Moreover, the causal explanation of this correlation is strengthened after the introduction of exogenous environmental condition variables that profoundly affect residents' environmental perception and after mitigating endogenous bias via the instrumental variable method. The findings of this study deeply reveal the political value of environmental governance, suggesting that the deterioration of environmental problems can increase public perception of government corruption, and this relationship is distinct in different demographic groups.
Chapter
Fault detection in non-stationary processes is a timely research topic in industrial process monitoring. The core objective of this research is to tackle anomaly detection in non-stationary industrial processes with manipulated set-point changes and uncertainties in the prior knowledge about the statistical nature of the measurements. In this research, the fault detection problem is investigated from an unsupervised perspective and a modified PCA approach is proposed. This method utilizes the base-line loading matrices and an upper bound to be determined for the variation range of time-series to relax the assumption on stationary characteristics. Hence, the mean used for normalizing the time-series are adaptively updated (using soft-calculation) without any need for a high-complexity recalibration procedure as needed in other existing adaptive/recursive PCA methods. Moreover, the first- and second-order error indices are introduced to monitor the statistical behaviour of process measurements. To develop a more reliable system condition indicator, an overall health index is given based on the proposed features using a non-parametric kernel density estimator (KDE). The proposed approach does not require a heavy online calculation in comparison with the existing adaptive solutions and it can successfully detect faults from healthy measurements’ mean changes. Finally, an alarm generator algorithm is presented which generates two alarm types, caution and actual fault for processes operators, utilizing the proposed overall health index. The effectiveness of the modified PCA approach is validated by both numerical examples and industrial case studies.
Article
Censoring often occurs in data collection. This article, considers nonparametric regression when the covariate is censored under general settings. In contrast to censoring in the response variable in survival analysis, regression with censored covariates is more challenging but less studied in the literature, especially for dependent censoring. We propose to estimate the regression function using conditional hazard rates. The asymptotic normality of our proposed estimator is established. Both theoretical results and simulation studies demonstrate that the proposed method is more efficient than the estimation based on complete observations and other methods, especially when the censoring rate is high. We illustrate the usefulness of the proposed method using a data set from the Framingham heart study and a data set from a randomized placebo‐controlled clinical trial of the drug D‐penicillamine.
Article
Full-text available
In this contribution it is shown how an extended uncertainty budget of the observations according to the Guide to the Expression of Uncertainty in Measurement (GUM) can be considered in adjustment computations. The extended uncertainty budget results from the combination of Type A standard uncertainties determined with statistical methods and Type B standard uncertainties derived with nonstatistical methods. Two solutions are investigated, namely the adjustment in the classical Gauss-Markov model and the adjustment in the Gauss-Markov model using Monte Carlo simulations for the consideration of the uncertainties of the observations. Numerical examples are given to show that an appropriate interpretation of the dispersion measures for the unknowns is particularly important in order to avoid misinterpretation of the results. Furthermore, the effects of changing the weights of the observations on the adjustment results are shown. Finally, practical advice for the consideration of an extended uncertainty budget of the observations in adjustment computations is given.
ResearchGate has not been able to resolve any references for this publication.