Fig 1 - uploaded by Damien François

Content may be subject to copyright.

# Examples of high-dimensional data. Left: spectrum; right: regressor for a time series prediction problem

Source publication

Modern data analysis tools have to work on high-dimensional data, whose components are not independently distributed. High-dimensional spaces show surprising, counter-intuitive geometrical properties that have a large influence on the performances of data analysis tools. Among these properties, the concentration of the norm phenomenon results in th...

## Contexts in source publication

**Context 1**

... data analysis has to cope with tremendous amounts of data. Data are indeed more and more easily acquired and stored, due to huge progresses in sensors and ways to collect data on one side, and in storage devices on the other side. Nowadays, there is no hesitation in many domains in acquiring very large amounts of data without knowing in advance if they will be analyzed and how. The spectacular increase in the amount of data is not only found in the number of samples collected for example over time, but also in the number of attributes, or characteristics, that are simultaneously measured on a process. The same arguments lead indeed to a kind of precaution principle: as there is no problem in measuring and storing many data, why not to collect many measures, even if some (many) of them prove afterward to be useless or irrelevant? For example, one could increase the number of sensors in a plant that has to be monitored, or increase the resolution of measuring instruments like spectrometers, or record many financial time series simultaneously in order to study their mutual influences, etc. In all these situations, data are gathered into vectors whose dimension correspond to the number of simultaneous measurements on the process of phenomenon. When the dimension grows, one speaks about high dimensional data, as each sample can be represented as a point or vector in a high-dimensional space. The di ffi culty in analyzing high-dimensional data results from the conjunction of two e ff ects. First, high-dimensional spaces have geometrical properties that are counter-intuitive, and far from the properties that can be observed in two-or three- dimensional spaces. Secondly, data analysis tools are most often designed having in mind intuitive properties and examples in low-dimensional spaces; usually, data analysis tools are best illustrated in 2-or 3-dimensional spaces, for obvious reasons. The problem is that those tools are also used when data are high-dimensional and more complex. In this kind of situations, we loose the intuition of the tools behavior, and might draw wrong conclusions about their results. Such loss of control is already encountered with basic linear tools, such as PCA (Principal Component Analysis): it is very di ff erent to apply PCA on a 2-dimensional example with hundreds of samples (as illustrated in many textbooks), or to apply it on a few tens of samples represented in a 100-dimensional space! Known problems such as collinearity and numerical instability easily occur. The problem is even worse when using nonlinear models: most nonlinear tools involve (much) more parameters than inputs (i.e. than the dimension of the data space), which results in lack of model identifiability, instability, overfitting and numerical instabilities. For all these reasons, the specificities of high-dimensional spaces and data must then be taken into account in the design of data analysis tools. While this statement is valid in general, its importance is even higher when using nonlinear tools such as artificial neural networks. This paper will show some of the surprising behaviors of high-dimensional data spaces, what are the consequences for data analysis tools, and paths to remedies. In Section 2, examples oh high dimensional data are given, along with some details about the problems encountered when analyzing them. Section 3 details surprising facts in high-dimensional spaces and some ideas that could be incorporated in the tools to lower the impact of these phenomena. In Section 4, the current research about nonlinear dimension reduction tools is briefly presented, as another way to face the problems encountered in high-dimensional spaces. Finally, Section 5 gives an example of a time series prediction task where the dimensionality of the regressors has to be taken into account. Working with high-dimensional data means working with data that are embedded in high-dimensional spaces. When speaking about non-temporal data, this means that each sample contains many attributes or characteristics. Spectra are typical examples of such data: depending on the resolution of the spectrometer, spectra contain several hundreds of measurements (see Figure 1 left). Fortunately for the sake of analysis, the hundreds of coordinates in spectra are not independent: it is precisely their dependencies that are analyzed in order to extract relevant information from a set of spectra [1]. More generally, redundancy in the coordinates is a necessary condition to analyse a low number of samples in a high-dimensional space. Indeed let us imagine on the contrary that all coordinates are independent; a simple linear regression model will contain as many parameters as the number of coordinates in the space. If the number of samples available for learning is less than the dimension of the space, the problem is undefined (in other words the model is unidentifiable). This problem is known as collinearity, and has no other solution than exploiting the dependencies between coordinates in order to reduce the number of model parameters; using smoothing splines is an example of dependency exploitation [2]. While collinearity is the expression of this phenomenon when linear models are used, a similar problem appears when nonlinear models are used; it results in overfitting, i.e. in a too efficient modelling of learning samples without model generalization ability. An example of high-dimensional data with temporal dependencies is shown in Figure 1 right. Knowing a time series up to time t, the problem consists in forecasting the next value(s) of the series. Without additional information from exogeneous variables, the forecasting problem is solved by building a regression model with a number of (often consecutive) values from the time series, and with output the next value. The model is built on the known part of the series, and used to predict unknown values. When no indication is available on the optimal regressor size, large regressors are usually preferred, in order to avoid loosing relevant information necessary for the prediction. However large regressors mean high-dimpensional input data to the model, a large number of parameters, and the same difficulties as the ones encountered with the first example. In both situations, the goal will be threefold: – to take into account in the model the dependencies between characteristics, in order to avoid a large number of effective model parameters; – to adapt the design of the model to the specificities of high-dimensional spaces. – to reduce, whenever possible, the dimensionality of the data through selection and projection techniques; The first goal is highly problem-dependent and beyond the scope of this paper. The second and third goal will be respectively discussed in Sections 3 and 4. This section describes some properties of high-dimensional spaces, that are counter intuitive compared to similar properties in low-dimensional spaces. Consequences on data analysis are discussed, with possible ideas to be incorporated in data analysis tools in order to meet the specific requirements of high-dimensional spaces. Data analysis tools based on learning principles infer knowledge, or information, from available learning samples. Obviously, the models built through learning are only valid in the range or volume of the space where learning data are available. Whatever is the model or class of models, generalization on data that are much di ff erent from all learning points is impossible. In other words, relevant generalization is possible from interpolation but not from extrapolation. One of the key ingredients in a successful development of learning algorithms is therefore to have enough data for learning so that they fill the space or part of the space where the model must be valid. It is easy to see that, every other constraint being kept unchanged, the number of learning data should grow exponentially with the dimension (if 10 data seem reasonable to learn a smooth 1-dimensional model), 100 are necessary to learn a 2-dimensional model with the same smoothness, 1000 for a 3-dimensional model, etc.). This exponential increase is the first consequence of what is called the curse of dimensionality [3]. It is among others illustrated by Silverman on the problem of the number of kernels necessary to approximate a dimension-dependent distribution up to a defined precision [4]. More generally, the curse of dimensionality is the expression of all phenomena that appear with high-dimensional data, and that have most often unfortunate consequences on the behavior and performances of learning algorithms. Even without speaking about data analysis, high-dimensional spaces have surprising geometrical properties that are counter-intuitive. Figure 2 illustrates four such phenomena. Figure 2 a) shows the volume of a unit-radius sphere with respect to the dimension of the space. It is seen that while this volume increases from dimension 1 (a segment) to 5 (a 5-dimensional hypersphere), it then decreases and reaches almost 0 as soon as the space dimension exceeds 20. The volume of a 20-dimensional hypersphere with radius equal to 1 is thus ...

**Context 2**

... but also in the number of attributes, or characteristics, that are simultaneously measured on a process. The same arguments lead indeed to a kind of precaution principle: as there is no problem in measuring and storing many data, why not to collect many measures, even if some (many) of them prove afterward to be useless or irrelevant? For example, one could increase the number of sensors in a plant that has to be monitored, or increase the resolution of measuring instruments like spectrometers, or record many financial time series simultaneously in order to study their mutual influences, etc. In all these situations, data are gathered into vectors whose dimension correspond to the number of simultaneous measurements on the process of phenomenon. When the dimension grows, one speaks about high dimensional data, as each sample can be represented as a point or vector in a high-dimensional space. The di ffi culty in analyzing high-dimensional data results from the conjunction of two e ff ects. First, high-dimensional spaces have geometrical properties that are counter-intuitive, and far from the properties that can be observed in two-or three- dimensional spaces. Secondly, data analysis tools are most often designed having in mind intuitive properties and examples in low-dimensional spaces; usually, data analysis tools are best illustrated in 2-or 3-dimensional spaces, for obvious reasons. The problem is that those tools are also used when data are high-dimensional and more complex. In this kind of situations, we loose the intuition of the tools behavior, and might draw wrong conclusions about their results. Such loss of control is already encountered with basic linear tools, such as PCA (Principal Component Analysis): it is very di ff erent to apply PCA on a 2-dimensional example with hundreds of samples (as illustrated in many textbooks), or to apply it on a few tens of samples represented in a 100-dimensional space! Known problems such as collinearity and numerical instability easily occur. The problem is even worse when using nonlinear models: most nonlinear tools involve (much) more parameters than inputs (i.e. than the dimension of the data space), which results in lack of model identifiability, instability, overfitting and numerical instabilities. For all these reasons, the specificities of high-dimensional spaces and data must then be taken into account in the design of data analysis tools. While this statement is valid in general, its importance is even higher when using nonlinear tools such as artificial neural networks. This paper will show some of the surprising behaviors of high-dimensional data spaces, what are the consequences for data analysis tools, and paths to remedies. In Section 2, examples oh high dimensional data are given, along with some details about the problems encountered when analyzing them. Section 3 details surprising facts in high-dimensional spaces and some ideas that could be incorporated in the tools to lower the impact of these phenomena. In Section 4, the current research about nonlinear dimension reduction tools is briefly presented, as another way to face the problems encountered in high-dimensional spaces. Finally, Section 5 gives an example of a time series prediction task where the dimensionality of the regressors has to be taken into account. Working with high-dimensional data means working with data that are embedded in high-dimensional spaces. When speaking about non-temporal data, this means that each sample contains many attributes or characteristics. Spectra are typical examples of such data: depending on the resolution of the spectrometer, spectra contain several hundreds of measurements (see Figure 1 left). Fortunately for the sake of analysis, the hundreds of coordinates in spectra are not independent: it is precisely their dependencies that are analyzed in order to extract relevant information from a set of spectra [1]. More generally, redundancy in the coordinates is a necessary condition to analyse a low number of samples in a high-dimensional space. Indeed let us imagine on the contrary that all coordinates are independent; a simple linear regression model will contain as many parameters as the number of coordinates in the space. If the number of samples available for learning is less than the dimension of the space, the problem is undefined (in other words the model is unidentifiable). This problem is known as collinearity, and has no other solution than exploiting the dependencies between coordinates in order to reduce the number of model parameters; using smoothing splines is an example of dependency exploitation [2]. While collinearity is the expression of this phenomenon when linear models are used, a similar problem appears when nonlinear models are used; it results in overfitting, i.e. in a too efficient modelling of learning samples without model generalization ability. An example of high-dimensional data with temporal dependencies is shown in Figure 1 right. Knowing a time series up to time t, the problem consists in forecasting the next value(s) of the series. Without additional information from exogeneous variables, the forecasting problem is solved by building a regression model with a number of (often consecutive) values from the time series, and with output the next value. The model is built on the known part of the series, and used to predict unknown values. When no indication is available on the optimal regressor size, large regressors are usually preferred, in order to avoid loosing relevant information necessary for the prediction. However large regressors mean high-dimpensional input data to the model, a large number of parameters, and the same difficulties as the ones encountered with the first example. In both situations, the goal will be threefold: – to take into account in the model the dependencies between characteristics, in order to avoid a large number of effective model parameters; – to adapt the design of the model to the specificities of high-dimensional spaces. – to reduce, whenever possible, the dimensionality of the data through selection and projection techniques; The first goal is highly problem-dependent and beyond the scope of this paper. The second and third goal will be respectively discussed in Sections 3 and 4. This section describes some properties of high-dimensional spaces, that are counter intuitive compared to similar properties in low-dimensional spaces. Consequences on data analysis are discussed, with possible ideas to be incorporated in data analysis tools in order to meet the specific requirements of high-dimensional spaces. Data analysis tools based on learning principles infer knowledge, or information, from available learning samples. Obviously, the models built through learning are only valid in the range or volume of the space where learning data are available. Whatever is the model or class of models, generalization on data that are much di ff erent from all learning points is impossible. In other words, relevant generalization is possible from interpolation but not from extrapolation. One of the key ingredients in a successful development of learning algorithms is therefore to have enough data for learning so that they fill the space or part of the space where the model must be valid. It is easy to see that, every other constraint being kept unchanged, the number of learning data should grow exponentially with the dimension (if 10 data seem reasonable to learn a smooth 1-dimensional model), 100 are necessary to learn a 2-dimensional model with the same smoothness, 1000 for a 3-dimensional model, etc.). This exponential increase is the first consequence of what is called the curse of dimensionality [3]. It is among others illustrated by Silverman on the problem of the number of kernels necessary to approximate a dimension-dependent distribution up to a defined precision [4]. More generally, the curse of dimensionality is the expression of all phenomena that appear with high-dimensional data, and that have most often unfortunate consequences on the behavior and performances of learning algorithms. Even without speaking about data analysis, high-dimensional spaces have surprising geometrical properties that are counter-intuitive. Figure 2 illustrates four such phenomena. Figure 2 a) shows the volume of a unit-radius sphere with respect to the dimension of the space. It is seen that while this volume increases from dimension 1 (a segment) to 5 (a 5-dimensional hypersphere), it then decreases and reaches almost 0 as soon as the space dimension exceeds 20. The volume of a 20-dimensional hypersphere with radius equal to 1 is thus almost 0! Figure 2 b) shows the ratio between the volume of a unit-radius sphere and the volume of a cube with edge lengths equal to 2 (the sphere is thus tangent to the cube). In dimension 2, the ration is obviously π /4, which means that most of the volume (here surface) of the cube is also contained in the sphere. When the dimension increases, this ratio rapidly decreases toward 0, to reach a negligible value as soon as the dimension reaches 10. In terms of density of data in a space, this means that if samples are drawn randomly and uniformly in a cube, the probability that they fall near the corners of the cube is almost one! As it will be detailed below, this also means that their norm is far from being random (it is concentrated near the maximum value, i.e. the square root of the dimension). Figure 2 c) shows the ratio between the volumes of two embedded spheres, with radii equal to 1 and 0.9 respectively. Unsurprisingly the ratio decreases exponentially with the dimension. What is more surprising is that, even if the two radii only di ff er by 10%, the ratio between both volumes is almost 0 in dimension 10. If data are randomly and uniformly distributed in the volume of the larger sphere, this means that almost all of them will fall in its skull, and will therefore ...

**Context 3**

... to overcome the limitations of Gaussian ones. When faced to di ffi culties resulting from the high dimension of the space, a possibi- lity is to try to decrease this dimension, of course without loosing relevant information in the data. Dimension reduction is used as preprocessing, before applying data analysis models on data with a lower dimension. PCA (Principal Component Analysis)is the most traditional tool used for dimension reduction. PCA projects data on a lower-dimensional space, choosing axes keeping the maximum of the data initial variance. Unfortunately, PCA is a linear tool. Nonlinear relations between the components of the initial data may be lost in the preprocessing. If the goal is to further use nonlinear data analysis tools on the reduced data, one easily sees that the use of a linear preprocessing is not appropriate. There is nowadays a huge research e ff ort in developing nonlinear projection tools that do not su ff er from the above limitation. Nonlinear projection means to find a lower-dimensional space in which the data as well are described as in the original space. This supposes that data lie on a sub-manifold in the original space. Ideally, there should be a bijection between this sub-manifold and the lower-dimensional space; the existence of a bijection is a proof that no information is lost in the transformation. Figure 5 shows an artificial example of nonlinear dimension reduction (from dimension 3 to 2, for illustration purposes). If curved axes as the ones shown on the left part of the figure could be found and defined in the initial data space, one could unfold the axes to find the lower dimensional representation as shown in the right figure. There are several ways to design nonlinear projection methods. A first one consists in using PCA, but locally in restricted parts of the space [11]. Joining local linear models leads to a global nonlinear one; it has however the disadvantage of being not continuous, therefore of limited interest. Kernel PCA [12] consists in first transforming the data into a higher-dimensional space, and then applying PCA on the transformed data. Kernel PCA benefits from the strong theoretical background of kernel methods, and reveals to be interesting in specific situations. However, the method su ff ers from a difficult choice of the initial transformation, and from the apparent contradiction to increase the dimension of the data before reducing it. Distance preservation methods form a class of nonlinear projection tools that have interesting geometrical properties. The principle is to find a lower dimensional representation of data where the pairwise distances are respected as much as possible with respect to the original data space. Sammon’s nonlinear mapping [13] belongs to this class of methods. Short distances in the original space are favored, to allow unfolding of large, nonlinear surfaces and volumes. Demartines and Herault’s CCA (Curvilinear Component Analysis) [14] greatly improves the previous method by giving more weight to short distances in the projection space instead of the original one. This seemingly minor modification allows to cut varieties with loops, which are more than common in high dimensional spaces. Another important improvement in distance preservation methods consists in measuring the distances in the original space along the manifold, instead of taking the Euclidean distance between pairs of points; unfolding is then much facilitated. The Curvilinear Distance Analysis (CDA) [15] and Isomap [16]methods, independently developed, belong to this category; contrarily to Isomap, CDA combines the advantages of the curvilinear measure and the larger weights on short distances, leading to e ffi cient unfolding in a larger class of situations. Other nonlinear projections tools must be mentioned too. Self-Organizing Maps (SOM) or Kohonen’s maps [17] may be viewed as neighbor-preservation nonlinear projection tools. SOM are classically used in representation tasks, where the dimension of the projection space is limited to 3. However, there is no technical di ffi culty to extend the use of SOM to higher-dimensional projection spaces. SOM are used when a combination of vector quantization, clustering and projection is looked for. However, the quality of the bijection with the original space (no loss of information in the transformation) is limited compared to distance-preservation methods. Finally, the classical bottleneck MLP [18] also performs a nonlinear dimension reduction that is bijective by design. Despite its interesting concept, the bottleneck MLP su ff ers from its limitation to simple problems, because of the numerical di culties to adjust the parameters of a MLP with many layers. Time series forecasting consists in predicting unknown values of a series, based on past, known values. Grouping past values into vectors (called regressors) makes time series forecasting an almost standard function approximation problem (see Figure 1 right). Naturally, because of the non random character of the time series, dependencies exist between the coordinates of the regressors. The situation is thus a typical where dimension reduction should be possible, leading to improved prediction performances. Takens’ theorem [19] provides a strong theoretical background for such dimension reduction. Let us first define a regressors state space as illustrated in Figure 6. The left part of the figure shows an artificial time series (that is obviously easy to predict). The right part shows state spaces of regressors formed by two (top) and three (bottom) consecutive values of the series. In these regressors spaces, it is possible to see that the data occupy a low-dimensional part of the space; this dimensionality is called the intrinsic dimension of the regressors (the intrinsic dimension is 1 in the illustrated example, as regressors follow a line), and may be estimated for example by using Grassberger- Procaccia’s method [20]. Takens’ theorem expresses two properties regarding the regressors space and its intrinsic dimension: – First, if q is the intrinsic dimension of the regressors (estimated in a sufficiently large-dimensional state space), then the size of the regressors to be used to predict the series is between q and 2q+1. In other words, more than 2q+1 values in the regressors do not carry supplementary information useful to predict the series. – Secondly, the regressors in the 2q + 1-dimensional space may be projected without loss of information in a q-dimensional space. As in most time series prediction problems the optimal size of regressors is difficult to know a priori, Takens’ theorem provides a way to estimate it. The prediction model will then be developed on a minimal but sufficient number of variables. Figure 7 shows an example of the application of the above methodology on the problem of predicting the Belgium BEL20 financial stock market index [21]. The left part of the figure shows the daily returns (relative variations) of the BEL20 index. According to standard procedures in stock market index forecasting [22], 42 indicators are built from the series (returns, averages of returns, moving averages, etc.). By design, many of these indicators are dependent or even correlated. A linear PCA is applied to first reduce the the dimensionality of the regressors. Keeping 99% of the variance leads to a reducet set of 25 compound indicators. Grassberger-Procaccia’s procedure is used to estimate the intrinsic dimensionality of the regressors, which is found to be approximately 9. Then, the 25-dimensional regressors resulting from the PCA are further projected in a 9-dimensional space, using the Curvilinear Component Analysis algorithm. Finally, a Radial-Basis Function Network is built on the 9-dimensional vectors to predict the next value of the BEL20 daily return. Unsurprisingly, it is extremely difficult to obtain very good predictions in such problem! Nevertheless, if the goal is restricted to predict the sign of the next returns (which means to predict if the index will increase or decrease), the results are not so bad. Figure 7 (right) shows percentage of good predictions of the sign, averaged over 90 days. Numerically, the percentage of success in the correct approximation of the sign is 57%, i.e. 7% more than a pure random guess. High-dimensional spaces show surprising geometrical properties that are counter intuitive with respect to the behavior of low-dimensional data. Among these properties, the concentration of norm phenomenon has probably the most impact on the design on data analysis tools. Its consequences are among others that standard Euclidean norms may become unselective in high-dimensional spaces, and that the Gaussian kernels, commonly used in many tools, become inappropriate too. Suggestions to overcome these consequences are presented. Another direction to follow is to reduce the dimensionality of the data space, through appropriate nonlinear data projection methods. The methodology is illustrated in the context of time series forecasting, on the BEL20 stock market index prediction ...

## Similar publications

Roundness is an essential geometrical feature in precision engineering metrology especially for quality control of the rotating parts. Talyrond machine becomes one of the main instruments in roundness measurement. Talyrond HPR TR-73 fitting software data analysis can contribute significantly on the measurement accuracy. The error characterization o...

Integral representations of the exact distributions of order statistics are derived in a geometric
way when three or four random variables depend on each other as the components of continuous ln,psymmetrically
distributed random vectors do, n ∈ {3,4}, p > 0. Once the representations are implemented
in a computer program, it is easy to change the de...

## Citations

... After the computation, it is activated by a nonlinear function DNN terminology represents the presence of multiple inner layers considered in network. The modeling method for a DNN is based on the back-propagating algorithm used in the backward step with two hidden layers [30]. In our propose network, one layer consisted of 8192 activation units and the other layer consiste activation units. ...

... The network structure was similar to the one already propose Ferlita et al. [15], except for the input features. The same number of units for the second layers was considered, as in our previous study [15], because the amoun available was similar and because the so-called curse of non-dimensionality [30] w into account. A network with many layers and a limited amount of data is more deliver poor predictions. ...

... The DNN terminology represents the presence of multiple inner layers considered inside the network. The modeling method for a DNN is based on the back-propagating learning algorithm used in the backward step with two hidden layers [30]. In our proposed neural network, one layer consisted of 8192 activation units and the other layer consisted of 4096 activation units. ...

The authors proposed a direct comparison between white- and black-box models to predict the engine brake power of a 15,000 TEU (twenty-foot equivalent unit) containership. A Simplified Naval Architecture Method (SNAM), based on limited operational data, was highly enhanced by including specific operational parameters. An OAT (one-at-a-time) sensitivity analysis was performed to recognize the influences of the most relevant parameters in the white-box model. The black-box method relied on a DNN (deep neural network) composed of two fully connected layers with 4092 and 8192 units. The network consisted of a feed-forward network, and it was fed by more than 12,000 samples of data, encompassing twenty-three input features. The test data were validated against realistic operational data obtained during specific operational windows. Our results agreed favorably with the results obtained for the DNN, which relied on sufficiently observed data for the physical model.

... The aforementioned methods have to wait for a number of novel classes defined by a threshold before a novel class is declared. Distance-based methods are difficult to get to work on high dimensional data such as image data and suffer the curse of dimensionality [44]. A method to counteract this is to project the data into a different representation. ...

Typically, Deep Neural Networks (DNNs) are not responsive to changing data. Novel classes will be incorrectly labelled as a class on which the network was previously trained to recognise. Ideally, a DNN would be able to detect changing data and adapt rapidly with minimal true-labelled samples and without catastrophically forgetting previous classes. In the Online Class Incremental (OCI) field, research focuses on remembering all previously known classes. However, real-world systems are dynamic, and it is not essential to recall all classes forever. The Concept Evolution field studies the emergence of novel classes within a data stream. This paper aims to bring together these fields by analysing OCI Convolutional Neural Network (CNN) adaptation systems in a concept evolution setting by applying novel classes in patterns. Our system, termed AdaDeepStream, offers a dynamic concept evolution detection and CNN adaptation system using minimal true-labelled samples. We apply activations from within the CNN to fast streaming machine learning techniques. We compare two activation reduction techniques. We conduct a comprehensive experimental study and compare our novel adaptation method with four other state-of-the-art CNN adaptation methods. Our entire system is also compared to two other novel class detection and CNN adaptation methods. The results of the experiments are analysed based on accuracy, speed of inference and speed of adaptation. On accuracy, AdaDeepStream outperforms the next best adaptation method by 27% and the next best combined novel class detection/CNN adaptation method by 24%. On speed, AdaDeepStream is among the fastest to process instances and adapt.

... It is worth noting that all numbers of attempts must be taken into account. However, the phenomenon known as the curse of dimensionality may happen due to the excessive number of dimensions [27], resulting in a significant increase in computational complexity. Principal Component Analysis (PCA) [28] involves the projection of the original data onto new principal components using a linear transformation. ...

Have you ever been helpless with huge data? Don’t know how to extract useful information from massive data? This article is designed to help you analyze your data step by step through Wordle game. Wordle is a recently developed word game that offers instantaneous feedback for each attempt and attracts a substantial player base. The act of sharing daily play statistics on Twitter by a significant number of players generates a considerable volume of data. Therefore , a study is being conducted based on the aforementioned data. This paper centers on examining Wordle’s statistical metrics and formulating prognostica-tions regarding forthcoming player figures through a variety of models, including ARIMA, KNN and neural network. The Gradient Boosting Decision Tree algorithm was employed to forecast the number of attempts by leveraging the analysis of word frequency and letter repetition within words. The K-means and Gaussian mixture model were used for clustering the difficulty levels of all words. Subsequently , the difficulty level of the word ”EERIE” was predicted. These steps will provide guidance for conducting data analysis in the future work. Our code is available at https://github.com/3249514357enenenyue/wordle

... However, without filtering the set of predictors, these models may overfit the data and provide poor out-of-sample forecasts. Furthermore, a large number of predictors can lead to the so-called "curse of dimensionality" (see Verleysen and François (2005)). For instance, a simple neural network with a single hidden layer consisting of k hidden nodes, where k equals the number of input variables, involves the training of k(k + 1) weights. ...

The application of artificial neural networks to finance has recently received a great deal of attention from both investors and researchers, particularly as a forecasting tool. However, when dealing with a large number of predictors, these methods may overfit the data and provide poor out-of-sample forecasts. Our paper addresses this issue by employing two different approaches to predict realized volatility. On the one hand, we use a two-step procedure where several dimensionality reduction methods, such as Bayesian Model Averaging (BMA), Principal Component Analysis (PCA), and Least Absolute Shrinkage and Selection Operator (Lasso), are employed in the initial step to reduce dimensionality. The reduced samples are then combined with artificial neutral networks. On the other hand, we implement two single-step regularized neural networks that can shrink the input weights to zero and effectively handle high-dimensional data. Our findings on the volatility of different stock asset prices indicate that the reduced models outperform the compared models without regularization in terms of predictive accuracy.

... A higher number of dimensions theoretically allows more information to be stored, but rarely helps as there is greater potential for noise and redundancy in real data. Collecting a large number of data can lead to a dimensioning problem where very noisy dimensions can be obtained with less information and without much benefit due to the large data [23] . The explosive nature of spatial scale is at the forefront of the Curse of Dimensions cause. ...

Monitoring economic conditions in real-time or Nowcasting is among the most important tasks routinely performed by economists as it is important in describing the investment environment in any country. Nowcasting brings some key challenges that characterize modern frugal data analyses in developing countries, often referred to as the three (V)s. These include: the small number of continuously published time series (volume), the complexity of the data covering different sectors of the economy and being asynchronous with different frequency and accuracy to be published (variety), and the need to incorporate new information within months of its publication (velocity). In this article, we explored alternative ways to use Bayesian Mixed Frequency Vector Autoregressive (BMFVAR) models to address these challenges. The research found that BMFVAR can effectively handle the three (V)s and create real-time accurate probabilistic forecasts of the Syrian economic activity and, beyond that, a powerful narrative via scenario analysis.

... HDD gives a hard time for any ML algorithm to identify a meaningful pattern and yield accurate results. It significantly decreases algorithm accuracy as well as increases the sparsity of the data [104,105]. This will also lead to an increase in the volume of blind spots in the data (region of feature space lack of any observation) [106]. ...

Owing to recent advancements in sensor technology, data mining, machine learning and cloud computation, structural health monitoring (SHM) based on a data-driven approach has gained more popularity and interest. The data-driven methodology has proved to be more efficient and robust compared with traditional physics-based methods. The past decade has witnessed remarkable progress in machine learning, especially in the field of deep learning which are effective in many tasks and has achieved state-of-the-art results in various engineering domains. In the same manner, deep learning has also revolutionized SHM technology by improving the effectiveness and efficiency of models, as well as enhancing safety and reliability. To some extent, it has also paved the way for implementing SHM in real-world complex civil and mechanical infrastructures. However, despite all the success, deep learning has intrinsic limitations such as its massive-labelled data requirement, inability to generate consistent results and lack of generalizability to out-of-sample scenarios. Conversely, in SHM, the lack of data corresponding to a different state of the structure is still a challenging task. Recent development in physics-informed machine learning methods has provided an opportunity to resolve these challenges in which limited-noisy data and mathematical models are integrated through machine learning algorithms. This method automatically satisfies physical invariants providing better accuracy and improved generalization. This manuscript presents the sate of- -the-art review of prevailing machine learning methods for efficient damage inspection, discuss their limitations, and explains the diverse applications and benefits of physics-informed machine learning in the SHM setting. Moreover, the latest data extraction strategy and the internet of things (IoT) that support the present data-driven methods and SHM are also briefly discussed in the last section.

... Information with high and complex measurements experience the ill effects of the scourge of dimensionality [27]. In these wonders, the measure of information required to help the outcomes develop dramatically with dimensionality, accordingly sabotaging the presentation of information mining [28][29]. At the end of the day, the volume of data space increments and involves the sparsity of dependable information through an additional measurement in the investigation information. ...

... Feature ranking and selection of 8 as well as 16 features in each training subset of each cohort and their three SRT variants was performed for further analysis [28]. These numbers were chosen to satisfy two requirements: on the one hand, to minimize the chances of overfitting while building predictive models according to the curse of dimensionality rule [10,29]. On the other hand, to ensure that the number of features encoded to quantum circuits is 2 N (N > 1), which was to result in an optimal number of qubits for the chosen classicto-quantum encoding step and subsequent quantum machine learning (QML) [30]. ...

Background
Cancer is a leading cause of death worldwide. While routine diagnosis of cancer is performed mainly with biopsy sampling, it is suboptimal to accurately characterize tumor heterogeneity. Positron emission tomography (PET)-driven radiomic research has demonstrated promising results when predicting clinical endpoints. This study aimed to investigate the added value of quantum machine learning both in simulator and in real quantum computers utilizing error mitigation techniques to predict clinical endpoints in various PET cancer patients.
Methods
Previously published PET radiomics datasets including 11C-MET PET glioma, 68GA-PSMA-11 PET prostate and lung 18F-FDG PET with 3-year survival, low-vs-high Gleason risk and 2-year survival as clinical endpoints respectively were utilized in this study. Redundancy reduction with 0.7, 0.8, and 0.9 Spearman rank thresholds (SRT), followed by selecting 8 and 16 features from all cohorts, was performed, resulting in 18 dataset variants. Quantum advantage was estimated by Geometric Difference (GDQ) score in each dataset variant. Five classic machine learning (CML) and their quantum versions (QML) were trained and tested in simulator environments across the dataset variants. Quantum circuit optimization and error mitigation were performed, followed by training and testing selected QML methods on the 21-qubit IonQ Aria quantum computer. Predictive performances were estimated by test balanced accuracy (BACC) values.
Results
On average, QML outperformed CML in simulator environments with 16-features (BACC 70% and 69%, respectively), while with 8-features, CML outperformed QML with + 1%. The highest average QML advantage was + 4%. The GDQ scores were ≤ 1.0 in all the 8-feature cases, while they were > 1.0 when QML outperformed CML in 9 out of 11 cases. The test BACC of selected QML methods and datasets in the IonQ device without error mitigation (EM) were 69.94% BACC, while EM increased test BACC to 75.66% (76.77% in noiseless simulators).
Conclusions
We demonstrated that with error mitigation, quantum advantage can be achieved in real existing quantum computers when predicting clinical endpoints in clinically relevant PET cancer cohorts. Quantum advantage can already be achieved in simulator environments in these cohorts when relying on QML.

... 1. Perpetual Data Sparsity. As each new "feature" (gene, biomarker, etc.) is identified, this adds dimensionality to the characterization of biological systems and this process carries with it a cost: the Curse of Dimensionality (Verleysen and François, 2005). The Curse of Dimensionality means that with each additional feature used to describe a system, there is an exponential increase in the potential configurations those features can take relative to each other (combinatorial explosion) and, similarly, the amount of data/sample points needed to characterize those potential combinations; this leads to a state of perpetual data sparsity. ...

The use of synthetic data is recognized as a crucial step in the development of neural network-based Artificial Intelligence (AI) systems. While the methods for generating synthetic data for AI applications in other domains have a role in certain biomedical AI systems, primarily related to image processing, there is a critical gap in the generation of time series data for AI tasks where it is necessary to know how the system works. This is most pronounced in the ability to generate synthetic multi-dimensional molecular time series data (subsequently referred to as synthetic mediator trajectories or SMTs); this is the type of data that underpins research into biomarkers and mediator signatures for forecasting various diseases and is an essential component of the drug development pipeline. We argue the insufficiency of statistical and data-centric machine learning (ML) means of generating this type of synthetic data is due to a combination of factors: perpetual data sparsity due to the Curse of Dimensionality, the inapplicability of the Central Limit Theorem in terms of making assumptions about the statistical distributions of this type of data, and the inability to use ab initio simulations due to the state of perpetual epistemic incompleteness in cellular/molecular biology. Alternatively, we present a rationale for using complex multi-scale mechanism-based simulation models, constructed and operated on to account for perpetual epistemic incompleteness and the need to provide maximal expansiveness in concordance with the Maximal Entropy Principle. These procedures provide for the generation of SMT that minimizes the known shortcomings associated with neural network AI systems, namely overfitting and lack of generalizability. The generation of synthetic data that accounts for the identified factors of multi-dimensional time series data is an essential capability for the development of mediator-biomarker based AI forecasting systems, and therapeutic control development and optimization.

... Figure 2 illustrates how the data frame was transformed for time series clustering purposes. We considered just the first 60 days of the second wave because the performance of K-Means depends on the number of features, and it cannot perform well when the number of features is increased [13]. Moreover, the minimum duration for the second COVID-19 wave was 65 days in Poland. ...

... Since we wanted to cluster time series data, dynamic time wrapping (DTW) distance [13] was used for calculating the distance among points and cluster centroids instead of simple Euclidean distance in the body of the K-Means algorithm. Euclidean distance ignores the time dimension of data and cannot take into account time shifts, whereas dynamic time wrapping can handle these features [13]. The time series clustering was coded in Python 3.8, and the K-Means algorithm from Python's Scikit-learn library was used. ...

... The time series clustering was coded in Python 3.8, and the K-Means algorithm from Python's Scikit-learn library was used. Since we wanted to cluster time series data, dynamic time wrapping (DTW) distance [13] was used for calculating the distance among points and cluster centroids instead of simple Euclidean distance in the body of the K-Means algorithm. Euclidean distance ignores the time dimension of data and cannot take into account time shifts, whereas dynamic time wrapping can handle these features [13]. ...

During the first year of the COVID-19 pandemic, governments only had access to non-pharmaceutical interventions (NPIs) to mitigate the spread of the disease. Various methods have been discussed in the literature for calculating the effectiveness of NPIs. Among these methods, the interrupted time series analysis method is the area of our interest. To study the second wave, we clustered countries based on levels of implemented NPIs, except for the target NPI (X) whose effectiveness wanted to be evaluated. To do so, the COVID-19 Policy Response Tracker data-set gathered by the “Our World in Data” team of Oxford University, and COVID-19 statistical data gathered by the John Hopkins Hospital were used. After clustering, we selected a counterfactual country from the countries that were in the same cluster as the target country, and implemented NPI (X) at its lowest level. Thus, the target country and the counterfactual country were similar in implementation level of other NPIs and only differed in the implementation level of the target NPI (X). Therefore, we can calculate the effectiveness of NPI (X) without being concerned about the impurity of the effectiveness values that might be caused by other NPIs. This allowed us to calculate the effectiveness of NPI (X) using the interrupted time series analysis with the control group. Interrupted time series analysis assesses the effect of different policy-implementation levels by evaluating interruptions caused by policies in trend and level after the policy-implementation date. Before the NPI-implementation date, the implementation levels of NPIs were similar in both selected countries. After this date, the counterfactual country could be treated as a baseline for calculating changes in the trends and levels of COVID-19 cases in the target country. To demonstrate this approach, we used the generalized least square (GLS) method to estimate interrupted time series parameters related to the effectiveness of school closure (the target NPI) in Spain (the target country). The results show that increasing the implementation level of school closure caused a 34% decrease in COVID-19 prevalence in Spain after only 10 days compared to the counterfactual country.