Pattern Classification
... Twelve different classifiers were utilized in this study: linear models including logistic regression with 2 ridge penalty (LRR) [17], support vector machines with linear kernel (SVM) [18], linear discriminant analysis (LDA) [19], and perceptron (PER) [20]; Gaussian naïve Bayes (GNB) [20], ensemble methods including RF [21], XGB [22], LightGBM (LGB) [23], gradient boosting with regression trees (GBRT) [24], and Adaboost with decision trees (ADB) [25]; KNN [20]; quadratic discriminant analysis (QDA) [19]. The selection of these classifiers is discussed in detail in the Materials and Methods section. ...
... Twelve different classifiers were utilized in this study: linear models including logistic regression with 2 ridge penalty (LRR) [17], support vector machines with linear kernel (SVM) [18], linear discriminant analysis (LDA) [19], and perceptron (PER) [20]; Gaussian naïve Bayes (GNB) [20], ensemble methods including RF [21], XGB [22], LightGBM (LGB) [23], gradient boosting with regression trees (GBRT) [24], and Adaboost with decision trees (ADB) [25]; KNN [20]; quadratic discriminant analysis (QDA) [19]. The selection of these classifiers is discussed in detail in the Materials and Methods section. ...
... Twelve different classifiers were utilized in this study: linear models including logistic regression with 2 ridge penalty (LRR) [17], support vector machines with linear kernel (SVM) [18], linear discriminant analysis (LDA) [19], and perceptron (PER) [20]; Gaussian naïve Bayes (GNB) [20], ensemble methods including RF [21], XGB [22], LightGBM (LGB) [23], gradient boosting with regression trees (GBRT) [24], and Adaboost with decision trees (ADB) [25]; KNN [20]; quadratic discriminant analysis (QDA) [19]. The selection of these classifiers is discussed in detail in the Materials and Methods section. ...
b>Background and objective: Hepatitis B virus (HBV) and hepatitis C virus (HCV) are major contributors to chronic viral hepatitis (CVH), leading to significant global health mortality. This study aims to predict the one-year mortality in patients with CVH using their demographics and health records.
Methods: Clinical data from 82,700 CVH patients diagnosed with HBV or HCV between January 2014 and December 2019 was analyzed. We developed a machine learning (ML) platform based on six broad categories including linear, nearest neighbors, discriminant analysis, support vector machine, naïve Bayes, and ensemble (gradient boosting, AdaBoost, and random forest) models to predict the one-year mortality. Feature importance analysis was performed by computing SHapley Additive exPlanations (SHAP).
Results: The models achieved an area under the curve between 0.74 and 0.8 on independent test sets. Key predictors of mortality were age, sex, hepatitis type, and ethnicity.
Conclusion: ML with administrative health data can be utilized to accurately predict one-year mortality in CVH patients. Future integration with detailed laboratory and medical history data could further enhance model performance.
... For the multispectral analysis using neural nets, the inputs are associated to the vector x = (x 1 , x 2 , x 3 ) T , where x i = f i (u), for 1 ≤ i ≤ 3. The net outputs represent the classes of interest and are associated to the vector y = (y 1 , y 2 , y 3 ) T , where each output corresponds to the class with the same index. The decision criterion employed in such analysis is the Bayes criterion: the output with greater value indicates the more probable class [39,40,41]. The training set and the test set were built using specialist knowledge at the selection of the regions of interest [42]. ...
... The confusion matrix for a universe of classes of interest Ω = {C 1 , C 2 , . . . , C m } is a m×m matrix T = [t i,j ] m×m where each element t i,j represents the number of objects belonging to class C j but classified as C i [39,44]. ...
... The overall accuracy φ is the rate between the number of objects correctly classified and the total number of objects, defined as follows [39,44]: ...
Multispectral image analysis is a relatively promising field of research with applications in several areas, such as medical imaging and satellite monitoring. A considerable number of current methods of analysis are based on parametric statistics. Alternatively, some methods in Computational Intelligence are inspired by biology and other sciences. Here we claim that Philosophy can be also considered as a source of inspiration. This work proposes the Objective Dialectical Method (ODM): a method for classification based on the Philosophy of Praxis. ODM is instrumental in assembling evolvable mathematical tools to analyze multispectral images. In the case study described in this paper, multispectral images are composed of diffusion-weighted (DW) magnetic resonance (MR) images. The results are compared to ground-truth images produced by polynomial networks using a morphological similarity index. The classification results are used to improve the usual analysis of the apparent diffusion coefficient map. Such results proved that gray and white matter can be distinguished in DW-MR multispectral analysis and, consequently, DW-MR images can also be used to furnish anatomical information.
... Next, nodes are assigned probabilities in order to distinguish nodes with common and rare features. The required probability density function (PDF) is gained by smoothing over points in the two dimensional PCA-plane (Parzen window approach [46,47,Chapter 4.3]). Now, the least probable nodes, i.e those with uncommon features, can be identified from the PDF. ...
... In step 3 of the workflow (Fig. 1), the Parzen window approach is used to estimate a probability density function (PDF) over all nodes [46,47,Chapter 4.3]. This is achieved by smoothing the overall arrangement of reduced feature vectors, which were obtained using principal component analysis (PCA) [45,Chapter 8] in the previous step 2. The dimensions of the smoothing kernel, i.e. the width and breadth of the Gaussian function N 2 (µ, Σ) can be controlled through its covariance matrix Σ = (σ ij ). ...
... Step 3: Estimate each node's probability using the Parzen window approach (PDF) [46,47,Chapter 4.3]. ...
Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes, as hubs before, might be found to play critical roles in real-world networks.
... It is used, for instance, in learning Bayesian networks [CL68,Pea88,Bun96,Hec98], to connect stochastically dependent nodes; it is used to infer classification trees [Qui93]. It is also used to select features for classification problems [DHS01], i.e. to select a subset of variables by which to predict the class variable. This is done in the context of a filter approach that discards irrelevant features on the basis of low values of mutual information with the class [Lew92, BL97, CHH + 02]. ...
... In the following we use classification as a thread to illustrate the above scenarios. Classification is one of the most important techniques for knowledge discovery in databases [DHS01]. A classifier is an algorithm that allocates new objects to one out of a finite set of previously defined groups (or classes) on the basis of observations on several characteristics of the objects, called attributes or features. ...
... It is well-known that model complexity must be in proper balance with available data in order to achieve good classification accuracy. In fact, unjustified complexity of inferred models leads classifiers almost inevitably to overfitting, i.e. to memorize the available sample rather than extracting regularities from it that are needed to make useful predictions on new data [DHS01]. Overfitting could be avoided by using the distribution of mutual information. ...
Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.
... Both such activities can be understood as directed to the topological characterization of the studied structures. Another related application is to use the obtained measurements in order to identify different categories of structures, which is directly related to the area of pattern recognition [39,40]. Even when modeling networks, it is often necessary to compare the realizations of the model with real networks, which can be done in terms of the respective measurements. ...
... Therefore, any objective attempt at characterizing, comparing or classifying complex networks needs to take into account distributions in phase spaces such as that in Figure 20. Such an important task can be effectively accomplished by using traditional and well-established concepts and methods from Multivariate Statistics (e.g., [39,40],McLachlan:04) and Pattern Recognition (e.g., [39,167,40]). As far as the choice and interpretation of network measurements are concerned, two multivariate methods stand out as being particularly useful, namely Principal Component Analysis -PCA (e.g. ...
... Therefore, any objective attempt at characterizing, comparing or classifying complex networks needs to take into account distributions in phase spaces such as that in Figure 20. Such an important task can be effectively accomplished by using traditional and well-established concepts and methods from Multivariate Statistics (e.g., [39,40],McLachlan:04) and Pattern Recognition (e.g., [39,167,40]). As far as the choice and interpretation of network measurements are concerned, two multivariate methods stand out as being particularly useful, namely Principal Component Analysis -PCA (e.g. ...
Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements. Important related issues covered in this work comprise the representation of the evolution of complex networks in terms of trajectories in several measurement spaces, the analysis of the correlations between some of the most traditional measurements, perturbation analysis, as well as the use of multivariate statistics for feature selection and network classification. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the proper application and interpretation of measurements.
... Using training data, the NB classifier estimates the probabilities of variable values within a class and applies these probabilities to classify new entities (Duda et al., 2012;Han et al., 2011;Witten & Frank, 2009). It relies on Bayes' theorem and assumes independence between features within a class (Han et al., 2011;Larkey et al., 2002;Silva & Ribeiro, 2003). ...
... Duda et al., 2012;Han et al., 2011;Witten & Frank, 2009). SVMs excel in learning tasks due to their fast algorithm and proven effectiveness. ...
... SVMs excel in learning tasks due to their fast algorithm and proven effectiveness. In SVM, examples are represented as points in space, separated by a substantial gap, ensuring instances from different categories are distinctly classified based on their position relative to the gap(Duda et al., 2012;Han et al., 2011;Witten & Frank, 2009). ...
One significant challenge in sentiment analysis is the presence of negation, which reverses the meanings of sentences, transformingpositive statements into negative ones and impacting the sentiment conveyed in the text. This issue is particularly pronounced in Arabic, a language known for its complex morphology. Detecting negation is crucial for enhancing sentiment analysis performance and various natural language processing applications. This paper presents an approach for automatically detecting negation in user-generated Arabic hotel reviews through lexical and structural features. It comprises several stages: data collection, text pre-processing, feature extraction, supervised learning classification, and evaluation. The study employed multiple supervised classification techniques, including naïve Bayes, random forest, logistic regression, support vector machines, and deep learning, to analyse lexical and structural features extracted from the dataset. The results of the experiments yielded promising outcomes, demonstrating the feasibility of the approach for practical applications. The classifiers exhibited highly comparable performance in identifying negation, with only marginal deviations in their performance metrics. Notably, the deep learning classifier consistently emerged as the top performer, achieving an exceptionally high overall accuracy rate of 99.24 percent, surpassing established benchmarks in Arabic text processing and underscoring its potential for practical applications. These findings hold significant implications for advancing Arabic text processing, particularly in sentiment analysis and related NLP tasks. The high accuracy of 99.24 percent achieved by the deep learning classifier highlights its robustness in accurately detecting negation, a critical challenge in sentiment analysis. This classifier performance demonstrates the potential to be integrated into real-world applications, such as automated review systems and opinion mining tools, where accurate sentiment interpretation is essential.
... It can learn a low-dimensional representation of data from highdimensional data. Many classical matrix factorization methods have been proposed, including non-negative matrix factorization [1,2], singular value decomposition (SVD) [3], principal component analysis (PCA) [4], and concept factorization [5]. ...
... The computational complexity of the proposed method needs O(nmd), which is smaller than other bipartite graph-based methods' O(nmd + m 2 d + m 3 ). 3. ...
... Matrix factorization SMF Symmetric matrix factorization LMF Low-rank matrix factorization FGLMF Fast global and local matrix factorization MCC [17] Maximum correntropy criterion NMF [1,2] Non-negative matrix factorization SVD [3] Singular value decomposition PCA [4] Principal component analysis CF [5] Concept factorization GNMF [7] Graph-regularized non-negative matrix factorization LCCF [8] Locally consistent concept factorization NMF-LCAG [9] Non-negative matrix factorization with locality-constrained adaptive graph GCCF [10] Correntropy-based graph-regularized concept factorization CHNMF [11] Correntropy-based hypergraph-regularized non-negative matrix factorization CNMF [12] Constrained non-negative matrix factorization CCF [13] Constrained concept factorization NMFCC [14] Non-negative matrix factorization-based constrained clustering CSNMF [15] Correntropy-based semi-supervised non-negative matrix factorization CSCF [16] Correntropy-based semi-supervised concept factorization CLMF [17] Correntropy-based low-rank matrix factorization SIS [17] Sparsity-induced similarity SNMFCC [14] Symmetric non-negative matrix factorization-based constrained clustering PCPSNMF [18] Pairwise constraint propagation-induced symmetric non-negative matrix factorization HSSNMF [19] Hypergraph-based semi-supervised symmetric non-negative matrix factorization LAE [20] Local anchor embedding EAGR [21] Efficient anchor graph regularization FLAE [21] Fast local anchor embedding BGSSL [22] Bipartite graph-based semi-supervised learning MURs ...
Matrix factorization has demonstrated outstanding performance in machine learning. Recently, graph-based matrix factorization has gained widespread attention. However, graph-based methods are only suitable for handling small amounts of data. This paper proposes a fast semi-supervised learning method using only matrix factorization, which considers both global and local information. By introducing bipartite graphs into symmetric matrix factorization, the technique can handle large datasets effectively. It is worth noting that by utilizing tag information, the proposed symmetric matrix factorization becomes convex and unconstrained, i.e., the non-convex problem minx(1−x2)2 is transformed into a convex problem. This allows it to be optimized quickly using state-of-the-art unconstrained optimization algorithms. The computational complexity of the proposed method is O(nmd), which is much lower than that of the original symmetric matrix factorization, which is O(n2d), and even lower than that of other anchor-based methods, which is O(nmd+m2n+m3), where n represents the number of samples, d represents the number of features, and m≪n represents the number of anchors. The experimental results on multiple public datasets indicate that the proposed method achieves higher performance in less time.
... Eight classifiers were used: Gaussian Naïve Bayes (GNB) [11]; K-Nearest Neighbors (KNN) [11]; Logistic Regression with L 2 regularization (LR) [12]; Random Forest (RF) [13]; AdaBoost (ADB) [14]; Gradient Boost (GB) [15]; eXtreme Gradient Boost (XGB) [16]; and Linear Discriminant Analysis (LDA) [17]. The rationale behind selecting these predictive models is as follows: (i) the selected models represent five well-known groups: ensemble; Gaussian process; nearest neighbor; linear models; and discriminant analysis, and (ii) these models have been widely utilized in previous studies to predict ICU admission [18][19][20] and COVID-19 severity risk [5,8,9]. ...
... Eight classifiers were used: Gaussian Naïve Bayes (GNB) [11]; K-Nearest Neighbors (KNN) [11]; Logistic Regression with L 2 regularization (LR) [12]; Random Forest (RF) [13]; AdaBoost (ADB) [14]; Gradient Boost (GB) [15]; eXtreme Gradient Boost (XGB) [16]; and Linear Discriminant Analysis (LDA) [17]. The rationale behind selecting these predictive models is as follows: (i) the selected models represent five well-known groups: ensemble; Gaussian process; nearest neighbor; linear models; and discriminant analysis, and (ii) these models have been widely utilized in previous studies to predict ICU admission [18][19][20] and COVID-19 severity risk [5,8,9]. ...
Background: The rapid onset of COVID-19 placed immense strain on many already overstretched healthcare systems. The unique physiological changes in pregnancy, amplified by the complex effects of COVID-19 in pregnant women, rendered prioritization of infected expectant mothers more challenging. This work aims to use state-of-the-art machine learning techniques to predict whether a COVID-19-infected pregnant woman will be admitted to ICU (Intensive Care Unit). Methods: A retrospective study using data from COVID-19-infected women admitted to one hospital in Astana and one in Shymkent, Kazakhstan, from May to July 2021. The developed machine learning platform implements and compares the performance of eight binary classifiers, including Gaussian naïve Bayes, K-nearest neighbors, logistic regression with L2 regularization, random forest, AdaBoost, gradient boosting, eXtreme gradient boosting, and linear discriminant analysis. Results: Data from 1292 pregnant women with COVID-19 were analyzed. Of them, 10.4% were admitted to ICU. Logistic regression with L2 regularization achieved the highest F1-score during the model selection phase while achieving an AUC of 0.84 on the test set during the evaluation stage. Furthermore, the feature importance analysis conducted by calculating Shapley Additive Explanation values points to leucocyte counts, C-reactive protein, pregnancy week, and eGFR and hemoglobin as the most important features for predicting ICU admission. Conclusions: The predictive model obtained here may be an efficient support tool for prioritizing care of COVID-19-infected pregnant women in clinical practice.
... A central task in machine learning is feature extraction [2]- [4] as, e.g., in the context of handwritten digit classification [5]. The features to be extracted in this case correspond, for example, to the edges of the digits. ...
... In Appendix A, we give a brief review of the general theory of semi-discrete frames, and in Appendices B and C we collect structured example frames in 1-D and 2-D, respectively. 4 We emphasize that the feature vector Φ W (f ) is a union of the sets of feature vectors Φ n W (f ). The architecture corresponding to the feature extractor Φ W in (1), illustrated in Fig. 2, is known as scattering network [22], and employs the frame Ψ ΛW and the modulus non-linearity | · | in every network layer, but does not include pooling. ...
Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical analysis of deep convolutional neural networks for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat's results by developing a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl-Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions.
... In order to investigate the effect of using successive shortest distances from each point, conformity ratios and coefficient of variations were obtained considering the second to the fifth shortest distances. Discriminant analysis [7,10] [19] was then applied in order to identify the contribution of each of such measurements to the overll Measurement linearity sensitivity discriminative power Number of sides very smal very low very low Coef. of variation high almost constant higher for small perturbations Conformity ratio small higher for small perturbations higher for large perturbations shortest distance high almost constant higher for large perturbations sum of angle diffs. high almost constant higher for small perturbations hexagonality index medium slightly higher for small perturbations high and almost constant separation between the two types of mosaics. ...
... In order to obtain the results in Figure 9, the above measurements were obtained for each point and Bayesian decision theory (e.g. [6,7]) was then applied in order to obtain the best threshold for separating between the two groups. The points identified as belonging to the higher order group are represented as the squares with holes in Figure 9. Two particularly interesting results can be inferred from the obtained results: (i) the use of successive distances tend to reduce the correct identification of the regions and (ii) the hexagonality index provided the best overall classification of points. ...
Several systems involve spatial arrangements of elements such as molecules or cells, the characterization of which bears important implications to biological and physical investigations. Traditional approaches to quantify spatial order and regularity have relied on nearest neighbor distances or the number of sides of cells. The current work shows that superior features can be achieved by considering angular regularity. Voronoi tessellations are obtained for each basic element and the angular regularity is then estimated from the differences between the angles defined by adjacent cells and a reference angle. In case this angle is 60 degrees, the measurement quantifies the hexagonality of the system. Other reference angles can be considered in order to quantify other types of spatial symmetries. The performance of the angular regularity is compared with other measurements including the conformity ratio (based on nearest neighbor distances) and the number of sides of the cells, confirming its improved sensitivity and discrimination power. The superior performance of the haxagonality measurement is illustrated also with respect to a real application concerning the characterization of retinal mosaics.
... Tambémé possível montar métodos de classificação supervisionada tendo por base o conjunto de treinamento e a otimização de funções obtidas a partir da matriz de confusão, tais como a taxa de acerto global e oíndice κ de correlação estatística [49,50]. ...
... onde a cada classe c j estão associados um vetor de pesos w j e uma função discriminante g j (x), onde 1 ≤ j ≤ m, tem-se a seguinte regra, inspirada no critério de decisão de Bayes [49,50]: ...
The unsupervised classification has a very important role in the analysis of multispectral images, given its ability to assist the extraction of a priori knowledge of images. Algorithms like k-means and fuzzy c-means has long been used in this task. Computational Intelligence has proven to be an important field to assist in building classifiers optimized according to the quality of the grouping of classes and the evaluation of the quality of vector quantization. Several studies have shown that Philosophy, especially the Dialectical Method, has served as an important inspiration for the construction of new computational methods. This paper presents an evaluation of four methods based on the Dialectics: the Objective Dialectical Classifier and the Dialectical Optimization Method adapted to build a version of k-means with optimal quality indices; each of them is presented in two versions: a canonical version and another version obtained by applying the Principle of Maximum Entropy. These methods were compared to k-means, fuzzy c-means and Kohonen's self-organizing maps. The results showed that the methods based on Dialectics are robust to noise, and quantization can achieve results as good as those obtained with the Kohonen map, considered an optimal quantizer.
... BACKGROUND We revisit the language model developed by Ref [12,13,24] with a slightly modified notation. Then we go over k-means clustering algorithm [25][26][27]. Finally we adapt k-means to language domain and use it to identify language subcommunities Language Model ...
... For a given cluster count K and a distance metric defined on set of observations, k-means clustering algorithm tries to find a partition with K clusters in such a way that average within cluster distance is optimized [25][26][27]. In this heuristic algorithm, one can find the best value of parameter K by trial and error. ...
An evolutionary model for emergence of diversity in language is developed. We investigated the effects of two real life observations, namely, people prefer people that they communicate with well, and people interact with people that are physically close to each other. Clearly these groups are relatively small compared to the entire population. We restrict selection of the teachers from such small groups, called imitation sets, around parents. Then the child learns language from a teacher selected within the imitation set of her parent. As a result, there are subcommunities with their own languages developed. Within subcommunity comprehension is found to be high. The number of languages is related to the relative size of imitation set by a power law.
... Classification is a well-known task within the area of machine learning [29]. The main objective of a classifier is to find a way to predict the label to be assigned to new data patterns. ...
... During classification (lines 11-12) the function classification is used (lines [24][25][26][27][28][29][30][31][32][33][34][35]. The class for the test instances given as the first parameter (T estData) is predicted. ...
High dimensionality, i.e. data having a large number of variables, tends to be a challenge for most machine learning tasks, including classification. A classifier usually builds a model representing how a set of inputs explain the outputs. The larger is the set of inputs and/or outputs, the more complex would be that model. There is a family of classification algorithms, known as lazy learning methods, which does not build a model. One of the best known members of this family is the kNN algorithm. Its strategy relies on searching a set of nearest neighbors, using the input variables as position vectors and computing distances among them. These distances loss significance in high-dimensional spaces. Therefore kNN, as many other classifiers, tends to worse its performance as the number of input variables grows. In this work AEkNN, a new kNN-based algorithm with built-in dimensionality reduction, is presented. Aiming to obtain a new representation of the data, having a lower dimensionality but with more informational features, AEkNN internally uses autoencoders. From this new feature vectors the computed distances should be more significant, thus providing a way to choose better neighbors. A experimental evaluation of the new proposal is conducted, analyzing several configurations and comparing them against the classical kNN algorithm. The obtained conclusions demonstrate that AEkNN offers better results in predictive and runtime performance.
... Training and inference (test) phases together form the classification procedure. In this study, we adopted a fivefold cross-validation approach during the training phase (Duda and Hart 2001). This strategy randomly divided the EEG signals into five equal portions. ...
... xxxx). Furthermore, the EEG method allows researchers to explore the effect genetic polymorphisms has on human brain activity (Duda and Hart 2001). ...
Epilepsy is a brain abnormality neurological disorder and is life-threatening, affecting the behavior and lifestyle of many people worldwide. Neurologists commonly use an electroencephalogram (EEG) to manually interpret the brain's electrical activity. Some patients respond differently to drugs as the effective doses will be ineffective in some patients or cause adverse drug reactions in others and genetic factors seem to be involved in variable responses in some cases. On the other hand, performing genetic tests to detect the genotype of patients is usually invasive, expensive, and time-consuming. Estimating the patient's genotype using data such as information obtained from EEG, can be considered a significant achievement. Genes involved in the functioning of the endocannabinoid system are known to be a critical element in the physiological function the brain and nerves perform, and the defects in the activity of this system have been confirmed in the pathobiology of diseases such as epilepsy. Levels of endocannabinoids can be influenced by gene polymorphisms such as fatty acid amide hydrolase (FAAH) gene single nucleotide polymorphism. Because the FAAH controls the endocannabinoids levels by hydrolyzing and terminating the activity of the anandamide within the central nervous system (CNS) and also the origin of seizures which is the electrical storm of brain neurons, it is thought that there can be a relation between the FAAH enzyme and the brain oscillations. Identifying genotypes that are related to EEG variation in epilepsy is crucial for clinical epilepsy monitoring and controlling brain oscillations. In this paper, we show the FAAH rs2295633 polymorphism can be detected (classified) using EEG signals (as a common diagnostic tool) following the convolutional neural network (CNN) classification method. The multichannel time series data of EEG collected through a sliding window technique was given as input into a deep CNN model to find a probable relationship between EEG and FAAH CC, CT, and TT genotype classes. The proposed method reached a precision of 94.15% (± 0.38%), accuracy of 94.09% (± 0.41%), sensitivity of 94.14% (± 0.38%), specificity 97.04% (± 0.20%), and F1 score 94.11% (± 0.40%) in detecting the rs2295633 polymorphism based on EEG patterns in epilepsy. Therefore, we can conclude that the FAAH rs2295633 polymorphism may modulate brain activity and EEG patterns. However, more extensive multifactorial studies are necessary to show the precise association between the rs2295633 polymorphism, and epileptic EEG.
... A branch of artificial intelligence (AI) called machine learning (ML) enables algorithms to learn from previous datasets (training data) by utilizing statistical, probabilistic, and optimization tools to automatically improve its performance, classify new data points, and detect new patterns or trends [8]. Although machine learning relies heavily on statistics and probability, it is fundamentally more powerful, as it allows inferences or decisions that may not be feasible using traditional statistical approaches [44,27]. According to [42], the choice of algorithm depends on multiple factors, such as the type of problem to be solved, the number of variables involved, and the model that would be the most suitable among others. ...
... Hence, there is no universal algorithm that fits all circumstances. Machine learning algorithms are classified primarily based on the intended outcome they aim to achieve according to [27] and [43]. There are two general types of machine learning algorithms; unsupervised and supervised learning. ...
... If the similarity also exists between the RD costs of the encoding modes, then having already encoded some views, it might be possible to predict the optimal encoding modes for the remaining views. A well-known method that can be used for such prediction is the Bayesian decision theory [21]. It has been used in pattern classification and other fields for probabilistic decision making. ...
... The approach is based on quantifying the trade-offs between various decisions using probability and the cost that accompany such decisions. It assumes that the decision problem is posed in probabilistic terms and that all the relevant probability values are known [21]. Consider a simple multiview video scenario with three views: V0, V1, and V2 as shown in Figure 1. ...
... Principal Component Analysis (PCA) is a widely used dimensionality reduction technique used to transform a dataset into a lower-dimensional space while preserving most of the information in the original data [46]. In the context of the given problem, PCA is applied to the set of BOLD signals to reduce redundancy and keep only the relevant information. ...
... To further reduce the dimensionality of the data and get an interpretable 3 dimensional representation, Linear Discriminant Analysis (LDA) is employed. LDA is a widely used technique in machine learning [46]. It is a supervised learning method that aims to reduce the dimensionality of the data while preserving the discriminatory information between different classes. ...
Exploring the dynamics of a complex system, such as the human brain, poses significant challenges due to inherent uncertainties and limited data. In this study, we enhance the capabilities of noisy linear recurrent neural networks (lRNNs) within the reservoir computing framework, demonstrating their effectiveness in creating autonomous in silico replicas - digital twins - of brain activity. Our findings reveal that the poles of the Laplace transform of high-dimensional inferred lRNNs are directly linked to the spectral properties of observed systems and to the kernels of auto-regressive models. Applying this theoretical framework to resting-state fMRI, we successfully predict and decompose BOLD signals into spatiotemporal modes of a low-dimensional latent state space confined around a single equilibrium point. lRNNs provide an interpretable proxy for clustering among subjects and different brain areas. This adaptable digital-twin framework not only enables virtual experiments but also offers computational efficiency for real-time learning, highlighting its potential for personalized medicine and intervention strategies.
... e1 = {e c 1 , e o 1 } denotes the evidence from the contextual factors and observable feature nodes at time t = 1. Then, the conditional probability at Y given the occurrence of e c 1 can be expressed as 29,46,47 ...
Fatigue can cause human error, which is the main cause of accidents. In this study, the dynamic fatigue recognition of unmanned electric locomotive operators under high-altitude, cold and low oxygen conditions was studied by combining physiological signals and multi-index information. The characteristic data from the physiological signals (ECG, EMG and EM) of 15 driverless electric locomotive operators were tracked and tested continuously in the field for 2 h, and a dynamic fatigue state evaluation model based on a first-order hidden Markov (HMM) dynamic Bayesian network was established. The model combines contextual information (sleep quality, working environment and circadian rhythm) and physiological signals (ECG, EMG and EM) to estimate the fatigue state of plateau mine operators. The simulation results of the dynamic fatigue recognition model and subjective synchronous fatigue reports were compared with the field-measured signal data. The verification results show that the synchronous subjective fatigue and simulated fatigue estimation results are highly consistent (correlation coefficient r = 0.971**), which confirms that the model is reliable for long-term dynamic fatigue evaluation. The results show that the established fatigue evaluation model is effective and provides a new model and concept for dynamic fatigue state estimation for remote mine operators in plateau deep mining. Moreover, this study provides a reference for clinical medical research and human fatigue identification under high-altitude, cold and low-oxygen conditions.
... Then, the features for certain class (e.g., feature from smile sequences) are clustered into 2 classes, namely, perceived neutral and perceived expression class (e.g., smile, disgust, etc.). K-means clustering (Duda, Hart, and Stork 2012) is utilized for the clustering. Figure 2 (left) illustrates an example of sequences after k-means clustering. ...
Spatio-temporal feature encoding is essential for encoding facial expression dynamics in video sequences. At test time, most spatio-temporal encoding methods assume that a temporally segmented sequence is fed to a learned model, which could require the prediction to wait until the full sequence is available to an auxiliary task that performs the temporal segmentation. This causes a delay in predicting the expression. In an interactive setting, such as affective interactive agents, such delay in the prediction could not be tolerated. Therefore, training a model that can accurately predict the facial expression "on-the-fly" (as they are fed to the system) is essential. In this paper, we propose a new spatio-temporal feature learning method, which would allow prediction with partial sequences. As such, the prediction could be performed on-the-fly. The proposed method utilizes an estimated expression intensity to generate dense labels, which are used to regulate the prediction model training with a novel objective function. As results, the learned spatio-temporal features can robustly predict the expression with partial (incomplete) expression sequences, on-the-fly. Experimental results showed that the proposed method achieved higher recognition rates compared to the state-of-the-art methods on both datasets. More importantly, the results verified that the proposed method improved the prediction frames with partial expression sequence inputs.
... Machine learning is intrinsically more powerful than traditional statistical approaches since it allows for conclusions or judgments that would not otherwise be achievable using classic statistical methods, although it still heavily relies on statistics and probability. Recently, machine learning has been used to support cancer prognosis and prediction [76]. New techniques for early cancer prediction are needed because conventional procedures are inaccurate and unsuitable for individualized care. ...
Normal cell development and prevention of tumor formation rely on the tumor-suppressor protein p53. This crucial protein is produced from the Tp53 gene, which encodes the p53 protein. The p53 protein plays a vital role in regulating cell growth, DNA repair, and apoptosis (programmed cell death), thereby maintaining the integrity of the genome and preventing the formation of tumors. Since p53 was discovered 43 years ago, many researchers have clarified its functions in the development of tumors. With the support of the protein p53 and targeted artificial intelligence modeling, it will be possible to detect cancer and tumor activity at an early stage. This will open up new research opportunities. In this review article, a comprehensive analysis was conducted on different machine learning techniques utilized in conjunction with the protein p53 to predict and speculate cancer. The study examined the types of data incorporated and evaluated the performance of these techniques. The aim was to provide a thorough understanding of the effectiveness of machine learning in predicting and speculating cancer using the protein p53.
... The network outputs represent the classes of interest and are associated to the vector y = (y 1 , y 2 , y 3 ) T , where each output corresponds to the class with the same index. The chosen decision criterion is Bayes' criterion: the output with greater value indicates the more probable class [8]. The training set is built using specialist knowledge at the selection of the regions of interest [9]. ...
Alzheimer's disease is the most common cause of dementia, yet hard to diagnose precisely without invasive techniques, particularly at the onset of the disease. This work approaches image analysis and classification of synthetic multispectral images composed by diffusion-weighted magnetic resonance (MR) cerebral images for the evaluation of cerebrospinal fluid area and measuring the advance of Alzheimer's disease. A clinical 1.5 T MR imaging system was used to acquire all images presented. The classification methods are based on multilayer perceptrons and Kohonen Self-Organized Map classifiers. We assume the classes of interest can be separated by hyperquadrics. Therefore, a 2-degree polynomial network is used to classify the original image, generating the ground truth image. The classification results are used to improve the usual analysis of the apparent diffusion coefficient map.
... Feature selection methods can be compared based on several criteria. For example, filter methods rank features based on specific criteria such as Fisher score (Duda et al., 2012) and mutual information (MI) (Kira & Rendell, 1992;Robnik-Šikonja & Kononenko, 2003). Filters are computationally efficient and select features independently. ...
The dramatic increase of observational data across industries provides unparalleled opportunities for data-driven decision making and management, including the manufacturing industry. In the context of production, data-driven approaches can exploit observational data to model, control and improve the process performance. When supplied by observational data with adequate coverage to inform the true process performance dynamics, they can overcome the cost associated with intrusive controlled designed experiments and can be applied for both monitoring and improving process quality. We propose a novel integrated approach that uses observational data for process parameter design while simultaneously identifying the significant control variables. We evaluate our method using simulated experiments and also apply it to a real-world case setting from a tire manufacturing company.
... Gaussian Mixture Models are extensively used across many tasks in machine learning, signal processing, and other areas [6,13,14,21,24,26,30]. For a vector x ∈ R d , the density of a Gaussian Mixture Model (G m m) is given by p(x) := ∑ K j=1 α j p N (x; µ j , Σ j ), (1.1) where p N is a Gaussian with mean µ ∈ R d and covariance Σ 0, i.e., p N (x; µ, Σ) := det(Σ) −1/2 (2π) −d/2 exp − 1 2 (x − µ) T Σ −1 (x − µ) . ...
We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM by appealing to the rich Riemannian geometry of positive definite matrices, using which we cast Gmm parameter estimation as a Riemannian optimization problem. Surprisingly, such an out-of-the-box Riemannian formulation completely fails and proves much inferior to EM. This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization. We then develop (Riemannian) batch and stochastic gradient algorithms that outperform EM, often substantially. We provide a non-asymptotic convergence analysis for our stochastic method, which is also the first (to our knowledge) such global analysis for Riemannian stochastic gradient. Numerous empirical results are included to demonstrate the effectiveness of our methods.
... value of rejection as an option in classifier decisionmaking has long been recognized, e.g. [15]. Moreover, in security-sensitive settings, where the stakes for test-time attacks are the highest, the problem is often not classification per se, but rather authentication 3 . ...
A significant threat to the recent, wide deployment of machine learning-based systems, including deep neural networks (DNNs), is adversarial learning attacks. We analyze possible test-time evasion-attack mechanisms and show that, in some important cases, when the image has been attacked, correctly classifying it has no utility: i) when the image to be attacked is (even arbitrarily) selected from the attacker's cache; ii) when the sole recipient of the classifier's decision is the attacker. Moreover, in some application domains and scenarios it is highly actionable to detect the attack irrespective of correctly classifying in the face of it (with classification still performed if no attack is detected). We hypothesize that, even if human-imperceptible, adversarial perturbations are machine-detectable. We propose a purely unsupervised anomaly detector (AD) that, unlike previous works: i) models the joint density of a deep layer using highly suitable null hypothesis density models (matched in particular to the non- negative support for RELU layers); ii) exploits multiple DNN layers; iii) leverages a "source" and "destination" class concept, source class uncertainty, the class confusion matrix, and DNN weight information in constructing a novel decision statistic grounded in the Kullback-Leibler divergence. Tested on MNIST and CIFAR-10 image databases under three prominent attack strategies, our approach outperforms previous detection methods, achieving strong ROC AUC detection accuracy on two attacks and better accuracy than recently reported for a variety of methods on the strongest (CW) attack. We also evaluate a fully white box attack on our system. Finally, we evaluate other important performance measures, such as classification accuracy, versus detection rate and attack strength.
... The consideration of multivariate statistical methods (e.g. MANOVA [38]) and data mining techniques can also help complementing the perturbation and discrimination analysis. Tables II to IX and Table I. ...
Complex networks obtained from the real-world networks are often characterized by incompleteness and noise, consequences of limited sampling as well as artifacts in the acquisition process. Because the characterization, analysis and modeling of complex systems underlain by complex networks are critically affected by the quality of the respective initial structures, it becomes imperative to devise methodologies for identifying and quantifying the effect of such sampling problems on the characterization of complex networks. Given that several measurements need to be applied in order to achieve a comprehensive characterization of complex networks, it is important to investigate the effect of incompleteness and noise on such quantifications. In this article we report such a study, involving 8 different measurements applied on 6 different complex networks models. We evaluate the sensitiveness of the measurements to perturbations in the topology of the network considering the relative entropy. Three particularly important types of progressive perturbations to the network are considered: edge suppression, addition and rewiring. The conclusions have important practical consequences including the fact that scale-free structures are more robust to perturbations. The measurements allowing the best balance of stability (smaller sensitivity to perturbations) and discriminability (separation between different network topologies) were also identified.
... We compare this filter to the two filters (introduced in [ZH02] for complete data and tested empirically in this case) that use credible intervals based on p(I|D) to robustly estimate mutual information. The filters are empirically tested in Section 5 by coupling them with the naive Bayes classifier [DHS01] to incrementally learn from and classify incomplete data. On five real data sets that we used, one of the two proposed filters consistently outperforms the traditional filter. ...
Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.
... In the design of NNs, a three-layer BP neural network is selected. On the one hand, a three-layer NN can approximate arbitrary bounded and continuous functions (Duda et al., 2001), and on the other hand, more layers will make the network more complex. Besides, BP NNs are well known for their good self-learning capability. ...
Traffic flow forecasting, especially the short-term case, is an important topic in intelligent transportation systems (ITS). This paper does a lot of research on network-scale modeling and forecasting of short-term traffic flows. Firstly, we propose the concepts of single-link and multi-link models of traffic flow forecasting. Secondly, we construct four prediction models by combining the two models with single-task learning and multi-task learning. The combination of the multi-link model and multi-task learning not only improves the experimental efficiency but also the prediction accuracy. Moreover, a new multi-link single-task approach that combines graphical lasso (GL) with neural network (NN) is proposed. GL provides a general methodology for solving problems involving lots of variables. Using L1 regularization, GL builds a sparse graphical model making use of the sparse inverse covariance matrix. In addition, Gaussian process regression (GPR) is a classic regression algorithm in Bayesian machine learning. Although there is wide research on GPR, there are few applications of GPR in traffic flow forecasting. In this paper, we apply GPR to traffic flow forecasting and show its potential. Through sufficient experiments, we compare all of the proposed approaches and make an overall assessment at last.
... For example, there are myriads of distance measures for networks 25,27 . In addition, hierarchical clustering comes with various options in how to connect different clusters, and there are many other data clustering methods 71 . Furthermore, there are other methods for determining the number of groups apart from the Dunn's index 35 . ...
Many time-evolving systems in nature, society and technology leave traces of the interactions within them. These interactions form temporal networks that reflect the states of the systems. In this work, we pursue a coarse-grained description of these systems by proposing a method to assign discrete states to the systems and inferring the sequence of such states from the data. Such states could, for example, correspond to a mental state (as inferred from neuroimaging data) or the operational state of an organization (as inferred by interpersonal communication). Our method combines a graph distance measure and hierarchical clustering. Using several empirical data sets of social temporal networks, we show that our method is capable of inferring the system's states such as distinct activities in a school and a weekday state as opposed to a weekend state. We expect the methods to be equally useful in other settings such as temporally varying protein interactions, ecological interspecific interactions, functional connectivity in the brain and adaptive social networks.
... are the class mean and grand mean vectors, respectively. Scatter matrices also satisfy an additivity rule: S T = S B + S W [17]. The goal of LDA is to maximize the separation between classes, which is defined as the ratio of between-to within-class variance: ...
How does one find dimensions in multivariate data that are reliably expressed across repetitions? For example, in a brain imaging study one may want to identify combinations of neural signals that are reliably expressed across multiple trials or subjects. For a behavioral assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. Correlated Components Analysis (CorrCA) addresses this problem by identifying components that are maximally correlated between repetitions (e.g. trials, subjects, raters). Here we formalize this as the maximization of the ratio of between-repetition to within-repetition covariance. We show that this criterion maximizes repeat-reliability, defined as mean over variance across repeats, and that it leads to CorrCA or to multi-set Canonical Correlation Analysis, depending on the constraints. Surprisingly, we also find that CorrCA is equivalent to Linear Discriminant Analysis for zero-mean signals, which provides an unexpected link between classic concepts of multivariate analysis. We present an exact parametric test of statistical significance based on the F-statistic for normally distributed independent samples, and present and validate shuffle statistics for the case of dependent samples. Regularization and extension to non-linear mappings using kernels are also presented. The algorithms are demonstrated on a series of data analysis applications, and we provide all code and data required to reproduce the results.
... The core principle of clustering is to group samples with similar attributes into the same cluster while ensuring that samples in different clusters exhibit distinct characteristics [1][2][3][4]. Over the years, clustering has found applications across a diverse range of fields, including bioinformatics, pattern recognition, machine learning, data mining, and image processing [3,[5][6][7]. ...
Clustering algorithms are fundamental in data analysis, enabling the organization of data into meaningful groups. However, individual clustering methods often face limitations and biases, making it challenging to develop a universal solution for diverse datasets. To address this, we propose a novel clustering framework that combines the Minimum Description Length (MDL) principle with a genetic optimization algorithm. This approach begins with an ensemble clustering solution as a baseline, which is refined using MDL-based evaluation functions and optimized with a genetic algorithm. By leveraging the MDL principle, the method adapts to the intrinsic properties of datasets, minimizing dependence on input clusters and ensuring a data-driven process. The proposed method was evaluated on thirteen benchmark datasets using four validation metrics: accuracy, normalized mutual information (NMI), Fisher score, and adjusted Rand index (ARI). Results show that the method consistently outperforms traditional clustering algorithms, achieving higher accuracy, greater stability, and reduced biases. Its adaptability makes it a reliable tool for clustering complex and varied datasets. This study demonstrates the potential of combining MDL and genetic optimization to create a robust and versatile clustering framework, advancing the field of data analysis and offering a scalable solution for diverse applications.
... Neural networks are a group of neurons stacked together in multiple layers and modeled to mimic the parts of a human brain. A perceptron was used to classify a set of linearly separable classes through adaptive weight updates [3,4,9,10]. However, the perceptron was unable to classify the nonlinearly separable data samples. ...
Neural networks are a group of neurons stacked together in multiple layers to mimic the biological neurons in a human brain. Neural networks have been trained using the backpropagation algorithm based on gradient descent strategy for several decades. Several variants have been developed to improve the backpropagation algorithm. The loss function for the neural network is optimized through backpropagation, but several local minima exist in the manifold of the constructed neural network. We obtain several solutions matching the minima. The gradient descent strategy cannot avoid the problem of local minima and gets stuck in the minima due to the initialization. Particle swarm optimization (PSO) was proposed to select the best local minima among the search space of the loss function. The search space is limited to the instantiated particles in the PSO algorithm, and sometimes it cannot select the best solution. In the proposed approach, we overcome the problem of gradient descent and the limitation of the PSO algorithm by training individual neurons separately, capable of collectively solving the problem as a group of neurons forming a network.
... At every evolution from the mother subject [114], new approaches, theories, and technologies were developed, and new terms were coined with big expectations to deal with the varying nature of data, as well as the decision-making tasks. Still, a beginner should avoid diving headfirst into new technologies before knowing adequately the basic theories. ...
Transfer learning (TL) is a popular phrase in deep learning (DL) domain. It is one of the latest artificial intelligence (AI) technologies that has a significant impact on big data analysis. Methods of traditional machine learning (ML) require the availability of an adequate quantity of training data as well as similarity of characteristics among the feature spaces corresponding to training and test data while performing supervised learning tasks. However, in real-life analytical problems, data scarcity often arises. In such scenarios, the TL approach has shown effectiveness in transferring knowledge from the source tasks that had large training data to a target task that has less training data. Basically, in TL, a model that has been trained on one task is essentially applied to a second related (but not exact) task. In this way, the issue of distribution mismatch can also be addressed. TL is not like conventional machine learning algorithms that try to learn each task starting from the beginning. Meteorological research is such an example of big data analysis which often faces the data scarcity issue. The current study addresses the contemporary challenges in weather forecasting that can be solved (or better dealt with) using TL methods. It presents a brief review of earlier research with the evolution of various technologies used since 1990s, followed by potential applications of TL algorithms to several key challenges in weather prediction, which includes the prediction of air quality, thunderstorms, precipitation, visibility, and cyclones, among others. Special emphasis is given to high-impact weather (HIW) prediction. These high-impact events are extremely difficult to predict, and they can cause enormous property damage and fatalities around the world. TL techniques have shown advantages in predicting HIW. Various challenging issues in implementing TL technology are then discussed. Finally, we address various prospects associated with TL, propose new research directions, and more importantly mention some concerns for beginners in DL-TL research. An extensive list of references is also provided.
... For each node, the algorithms chose the best variable that performs the split to minimize Gini impurity. 31 When the model is built, each sample is predicted using all trees, and the majority voting is applied to identify the class label. RF works especially well for classification problems even when the number of classes is very big. ...
Vibrational spectroscopy methods such as mid-infrared (MIR), near-infrared (NIR), and Raman spectroscopies have been shown to have great potential for in vivo biomedical applications, such as arthroscopic evaluation of joint injuries and degeneration. Considering that these techniques provide complementary chemical information, in this study, we hypothesized that combining the MIR, NIR, and Raman data from human osteochondral samples can improve the detection of cartilage degradation. This study evaluated 272 osteochondral samples from 18 human knee joins, comprising both healthy and damaged tissue according to the reference Osteoarthritis Research Society International grading system. We established the one-block and multi-block classification models using partial least squares discriminant analysis (PLSDA), random forest, and support vector machine (SVM) algorithms. Feature modeling by principal component analysis was tested for the SVM (PCA-SVM) models. The best one-block models were built using MIR and Raman data, discriminating healthy cartilage from damaged with an accuracy of 77.5% for MIR and 77.8% for Raman using the PCA-SVM algorithm, whereas the NIR data did not perform as well achieving only 68.5% accuracy for the best model using PCA-SVM. The multi-block approach allowed an improvement with an accuracy of 81.4% for the best model by PCA-SVM. Fusing three blocks using MIR, NIR, and Raman by multi-block PLSDA significantly improved the performance of the single-block models to 79.1% correct classification. The significance was proven by statistical testing using analysis of variance. Thus, the study suggests the potential and the complementary value of the fusion of different spectroscopic techniques and provides valuable data analysis tools for the diagnostics of cartilage health.
... The hidden layer utilized the hyperbolic tangent activation function, while the output layer employed the softmax function. They used the cross-entropy cost function and the gradient descent optimization method [28,30,73]. The input layer received normalized feature values obtained by applying the formula (X i − µ i )/σ i , where X i , µ i and σ i represent, respectively, the i th input vector, its mean, and its standard deviation. ...
Machine learning techniques have shown success in classifying hand gestures. As the prevalence of prosthetic devices continues to rise, the adoption of non-invasive technologies, such as surface electromyography (sEMG), becomes paramount. This study systematically assesses the isolated influence of classification algorithms within hand gesture recognition (HGR) systems using sEMG data and dynamic time warping (DTW) based features. This approach effectively handles temporal variations in sEMG signals by leveraging DTW, ensuring input features are invariant to gesture speed. Six supervised learning classifiers were evaluated: the multilayer perceptron, support vector machine, logistic regression, linear discriminant analysis, k-nearest neighbors, and decision tree. Cross-validation was employed to fine-tune the segmentation hyperparameters, significantly improving results. To ensure reproducibility, the source code has been made available, the proposed system design has been detailed, and the evaluation protocols have been described. Our findings indicate that logistic regression outperformed other classifiers in this setup, achieving 95.2% accuracy in classifying six hand movements from ten healthy individuals, representing a 1.6% improvement over the best previously reported performance using the same publicly available dataset. Future research will assess the proposed HGR system’s generalization capability on larger datasets suitable for training more complex classifiers, including deep learning models.
The growing demand for seamless and personalized customer experiences has transformed how businesses approach self-service and promotional strategies. This research explores implementing customized recommendation systems to enhance customer engagement, satisfaction, and loyalty across various industries. By leveraging advanced algorithms and customer data, these systems enable businesses to offer tailored solutions that meet individual preferences, streamline self-service interactions, and improve promotional effectiveness. Through surveys, experiments, and case studies, the study highlights the positive impact of personalized recommendations on customer behavior, including increased engagement rates, improved retention, and higher conversion rates. The findings underscore the potential of such systems to enhance effortless customer experiences and drive business growth by fostering deeper connections with consumers. This paper provides actionable insights for businesses aiming to adopt or optimize recommendation systems to stay competitive in an increasingly customer-centric marketplace.
Neuroblastoma is a common malignant tumor in childhood that seriously endangers the health and lives of children, making it essential to find effective prognostic markers to accurately predict their clinical outcomes. The development of high-throughput technology in the biomedical field has made it possible to obtain multi-omics data, whose integration can compensate for missing or unreliable information in a single data source. In this study, we integrated clinical data and two omics data, i.e., gene expression and DNA methylation data, to study the prognosis of neuroblastoma. Since the features in omics data are redundant, it is crucial to conduct feature selection on them. We proposed a two-step feature selection (TSFS) method to quickly and accurately select the optimal features, where the first step aims at selecting candidate features and the second step is to remove redundant features among them using our proposed maximal association coefficient (MAC). Our goal is to predict composite clinical outcomes for neuroblastoma patients, i.e., their survival time and vital status at the last follow-up, which was validated to be two inter-correlated tasks. We conducted a series of experiments and evaluated the experimental results using accuracy and AUC (area under the ROC curve) evaluation metrics, which indicated that by the combination of the integration of the three types of data, our proposed TSFS method and a multi-task learning method can synergistically improve the reliability and accuracy of the prediction models.
In fulfillment of the requirements for obtaining a Master of Science degree in Computer Science Faculty of Science, Port said University.
La variabilidad es inherente al habla y surge de factores relacionados con el hablante (por ejemplo, sociolingüísticos y personales) y factores lingüísticos (por ejemplo, fonético-fonológicos y coarticulatorios). Para el mismo mensaje emitido en el mismo contexto, la variabilidad entre hablantes, que puede ser anatómica o fisiológica, resulta de diferencias en las estructuras del tracto vocal y las rutinas motoras, mientras que la variabilidad dentro del hablante es biomecánica, y surge de variaciones en la ejecución del habla de un individuo. A pesar de esta comprensión, los roles específicos de los diferentes componentes del tracto vocal en la clasificación de los hablantes siguen sin estar claros, y pocos estudios han utilizado el habla continua. Este estudio tiene como objetivo modelar la variabilidad del hablante considerando las estructuras articulatorias y vocales en el habla continua. Desarrollamos un procedimiento de clasificación basado en un modelo de regresión que elimina parte de la variabilidad del contexto, utilizando los residuos para la comparación de hablantes. El desarrollo del procedimiento y las pruebas posteriores se llevaron a cabo utilizando 18 grabaciones de la base de datos CEFALA-1. Los hallazgos clave incluyen: (1) la mayor parte de la variabilidad acústica entre hablantes se atribuye a diferencias relacionadas con el sexo del hablante, y (2) tanto las variables articulatorias como las vocales son significativas para la clasificación de los hablantes, y las variables vocales superan ligeramente a las variables articulatorias en modelos aislados. Las limitaciones del estudio incluyen su enfoque exclusivo en la variabilidad estática, excluyendo los aspectos dinámicos, y la omisión de la variabilidad consonántica.
The purpose of this work is to identify patterns in the recordings of electroencephalograms of patients with migraine using the Karhunen–Loeve orthogonal decomposition method. The work examines the main features of electroencephalographic dynamics, and the impact on these features of сhronic migraine severity. Methods. To collect experimental data, the method of recording electroencephalograms during the modified multiple sleep latency test was used. During the experiment, studies were conducted of the subjects´reaction to the presented visual stimulus. The obtained data were processed using the Karhunen–Loeve transformation, which allows one to interpret the complex dynamics of the system from the point of view of the coexistence and interaction of coherent orthogonal space-time structures. Results. Studies have shown that the energy distribution of modes in active and sleep states can differ significantly. The character of this distribution depends on the brain zone of signal recordings, on the duration of the experiment, and on at what time point in the experiment certain stages of the subject’s reaction were recorded. It has been shown that the greatest response in the form of evoked potentials in people with migraine is most often localized in the occipital lobe of the brain, and there is a correlation of this effect with the frequency of migraine attacks. For some groups of patients, there is a connection between the severity of evoked potentials in the brain and the energy of the first, most energetic, Karhunen–Loeve mode. Conclusion. It has been shown that there is a relationship between the number of significant modes and the power of the alpha rhythm in electroencephalography signals, and the spatial localization of this effect in the occipital region of the brain can be traced. For the frontal lobe of the brain, significant differences in the distribution of the first mode were demonstrated, assessed for groups of patients with rare and frequent migraine attacks.
Objetivo: Este estudo investiga se o possível viés na sobreamostragem via janelamento de dados de marcha em indivíduos com Doença de Parkinson (DP) também ocorre em sinais vocais. Um estudo anterior levantou a hipótese de que amostras distintas de um mesmo indivíduo não devem ser tratadas independentemente, dado o risco de enviesamento dos modelos. Método: Usamos sinais de voz de 24 indivíduos com DP e 8 saudáveis, e os algoritmos K-Nearest Neighbors (KNN), Support Vector Machine (SVM) e Random Forest (RF). A validação cruzada foi feita com Leave-one-out (LOOCV), adaptada para cenários com e sem viés nos dados de treinamento. Resultados: Modelos avaliados sem considerar o viés apresentaram performances inflacionadas, enquanto a abordagem rigorosa mostrou resultados mais modestos. Conclusão: Amostras do mesmo indivíduo em treinamento e teste podem inflar a performance dos modelos. A correta aplicação da sobreamostragem é crucial para desenvolver modelos confiáveis para o diagnóstico de DP.
The high-cost pressure caused by the level of competition poses major challenges for breweries. While microbreweries can develop local strengths and brewery groups develop synergies, this does not represent a decisive improvement. The application of machine learning, on the other hand, could give breweries a significant advantage in their brewing process. Several approaches to the application of machine learning in the brewing process have already been proposed in the literature. To guide possible areas of applications and the respective available solution approaches to improve the brewing process based on machine learning, a systematic review of the application of machine learning in the brewing process is presented in this paper. In this systematic review, all potentially relevant publications were included at first. Subsequently, irrelative publications were filtered out by using a clustering approach. Afterward, the remaining 21 publications were analyzed and synthesized. Based on a developed framework considering the brewing process steps, areas of improvement, machine learning tasks, and machine learning algorithms, these publications were classified. Upon the classification, a descriptive analysis was performed to identify common approaches in the existing literature. One result was that research on artificial intelligence in brewing lags significantly behind the general trend of artificial intelligence research. Additionally, there is very limited research into the association between the recipe and the desired chemical properties of the beer. Furthermore, it was noticeable that machine learning tasks utilizing artificial neural networks or support vector machines were preferred over others.
In this paper, we study stochastic models for discrete letter encoding and object classification via ensembles of different modality datasets. For these models, the minimal values of the average mutual information between a given ensemble of datasets and the corresponding set of possible decisions are constructed as the appropriate monotonic decreasing functions of a given admissible error probability. We present examples of such functions constructed for a scheme of coding independent letters represented by pairs of observation values with possible errors as well as for a scheme of classifying composite objects given by pairs of face and signature images. The inversions of the obtained functions yield the lower bounds for the error probability for any amount of processed information. So, these functions can be considered as the appropriate bifactor fidelity criteria for source coding and object classification decisions. Moreover, the obtained functions are similar to the rate distortion function known in the information theory.
Person recognition systems that consolidate evidence from multiple sources of biometric information in order to determine the identity of an individual are known as multibiometric systems. For example, face and iris traits, or fingerprints from all ten fingers of an individual, may be used together to accurately and robustly resolve the identity of the person. Multibiometric systems can overcome many of the limitations of unibiometric systems because the different biometric sources usually compensate for the inherent limitations of one another. Hence, multibiometric systems are generally expected to be more reliable and accurate than unibiometric systems, as well as provide a wider population coverage (reduce the failure to enroll rate). The process of consolidating the information or evidence presented by multiple biometric sources is known as information fusion, which is the main focus of this chapter. More specifically, this chapter introduces the different sources and types of biometric information that can be fused and the various fusion methodologies.
Public welfare interests must not only be defended against the dominance of particular interests. The latter must, at the same time, be integrated into an overall strategy. This requires an approach that can hold its own in theory and in the context of very heterogeneous initial and boundary conditions. Evidence-based quality production proves to be the key to accessing the realization of common good interests. Process qualities such as animal and environmental protection services correspond directly to improving common goods. At the same time, they offer options for increased value creation through synergy effects. In connection with increased product quality, the farms’ gradually graduated animal and environmental protection services can create the preconditions to compensate for the indispensable additional labor and financial costs. A well-balanced mix of political and market instruments will be needed to reward farms’ quality and public welfare services with appropriate compensation. This raises various questions concerning a valid assessment of quality services, the definition of system boundaries, and how to deal with the actors necessary to achieve a breakthrough in quality production.
Feature selection is a problem that continues to be explored by researchers until an exact and fast method is found. A variety of criteria based on statistics, consistency, information, distance and similarity were used to propose new feature selection methods. In this paper, a new criterion inspired by the crowding distance used in multi-objective optimization problems is introduced. A crowding distance is computed for each feature based on its neighbors. All computed distances are sorted to assess the variability of each feature. Features with the highest values exhibit more diversity and capture some pattern in the data; these features are the most relevant ones. Based on this criterion, a simple novel supervised and unsupervised filter feature selection method called Crowding Clustering Algorithm (CCA) is proposed. Mainly, crowding distance is used to determine relevant features, and clustering is used to eliminate redundant features. Several variants of each method are also proposed in order to allow user to select the most suitable method based on specific needs and constraints of the application. We conduct an extensive comparative study on diverse datasets to evaluate the effectiveness and efficiency of the proposed methods. Our proposed unsupervised filter method achieved the best performance on 18 out of 26 datasets, with a mean accuracy above 81%, significantly outperforming five state-of-the-art methods, which range from 67 to 73%. This reflects an 8–14% accuracy improvement. Similarly, our supervised filter methods achieved the highest accuracy on 18 out of 26 datasets, exceeding 87%, representing a 2–5% improvement over six state-of-the-art methods (82–85% accuracy). In terms of processing time, both our unsupervised and supervised methods show a slight increase compared to state-of-the-art methods, with differences ranging from 0.02 to 0.2 s, which are typically negligible. Overall results demonstrate that the proposed methods outperform or achieve comparable results to commonly used filter feature selection techniques in terms of accuracy and computational time. The findings provide valuable insights and guidance for further research in this domain.
The goal of "A Machine Learning Approach to Identifying Stray Dogs" is to create a comprehensive system that uses deep learning techniques to detect stray dogs, evaluate their health, and handle user complaints through the intervention ofnon-governmental organizations. The system has a user interface where users can upload pictures of stray dogs for breedidentification by registering or logging in. Once the breed has been identified, the user can examine the dog's health. The usercan submit a request for help to NGOs if the dog is ill. Additionally, the user can ask NGOs to intervene if the dog displaysharmful behavior. NGOs can log in and view user complaints and the reported dogs' conditions using the NGOs section.NGOs can then respond to the complaints with suitable remedies.
ResearchGate has not been able to resolve any references for this publication.