Article

Hierarchial Clustering Schemes

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Techniques for partitioning objects into optimally homogeneous groups on the basis of empirical measures of similarity among those objects have received increasing attention in several different fields. This paper develops a useful correspondence between any hierarchical system of such clusters, and a particular type of distance measure. The correspondence gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data. In an explicitly defined sense, one method forms clusters that are optimally “connected,” while the other forms clusters that are optimally “compact.”

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Two cluster analysis methods -connectiveness and diameter -were applied. The algonthm used was based on that proposed by Johnson (1967). ...
... . Steps 2 and 3 are repeated until thestrong clusten'ng is finally obtained.The version of the program by U.P.P.C. whose input was absolute values of the corrélation matn'x, used thèse variables for computation(Johnson (1967)). ...
Preprint
Full-text available
Reducing the number of key components defining the Furniture Manufacturing Parameters in Quebec Canada
... where {CPU ,RAM ,...,IOPS } denotes the negotiability of different resource dimensions as defined in the strategies listed above. A range of standard ML clustering algorithms such as k-means [16] and hierarchical clustering [18] can then be executed on the resulting in order to profile customers into different groups. ...
Preprint
Full-text available
Selecting the optimal cloud target to migrate SQL estates from on-premises to the cloud remains a challenge. Current solutions are not only time-consuming and error-prone, requiring significant user input, but also fail to provide appropriate recommendations. We present Doppler, a scalable recommendation engine that provides right-sized Azure SQL Platform-as-a-Service (PaaS) recommendations without requiring access to sensitive customer data and queries. Doppler introduces a novel price-performance methodology that allows customers to get a personalized rank of relevant cloud targets solely based on low-level resource statistics, such as latency and memory usage. Doppler supplements this rank with internal knowledge of Azure customer behavior to help guide new migration customers towards one optimal target. Experimental results over a 9-month period from prospective and existing customers indicate that Doppler can identify optimal targets and adapt to changes in customer workloads. It has also found cost-saving opportunities among over-provisioned cloud customers, without compromising on capacity or other requirements. Doppler has been integrated and released in the Azure Data Migration Assistant v5.5, which receives hundreds of assessment requests daily.
... We further explored potential non-linearities and variance within the dataset using a hierarchical clustering method. Hierarchical clustering is a method of grouping respondents together so that the within-group variance in means for each clustering variable is minimized and the between-group variance is maximized (Johnson 1967). Because we hypothesized that people's experiences of climate stressors were not homogeneous throughout the RMI population, the clusters were constructed using variables related to whether a household had experienced climate stressors, respondents' perceptions of the state and trend in natural resources and ecosystem services, and ranking of problems associated with climate stressors and the environment. ...
Article
Full-text available
Climate change is impacting public health in the Republic of the Marshall Islands (RMI). Meanwhile, migration within the RMI and abroad is driven, in part, by access to better healthcare, and migration is also expected to be accelerated by climate change. Based on a survey of 199 RMI households, this study used logistic regression and hierarchical clustering to analyze the relationships between climate stressors, climate-related health impacts, and migration outcomes and identify vulnerable segments of the population. Climate stressors were experienced by all respondents but no significant correlations were found between stressors, health impacts, and expectation to migrate. When grouped according to the climate stressors they faced, however, one group was characterized by low stressors, high wealth, and a low expectation to migrate, whereas another experienced very high climate stressors, low wealth, and a high expectation to migrate. Only the first exhibited a statistically significant relationship between climate-related health impacts and migration; however, for the second, climate stressors were significantly related to proximate determinants of health, and there was no association with migration. To create equitable adaptation outcomes across diverse society, policies should expand economic and education prospects and reduce vulnerability to the direct and indirect health impacts of climate change. Graphical Abstract Households that were surveyed in the Marshall Islands have experienced many climate stressors and direct impacts to health, as well as the determinants of health, in recent years.
... The basic idea of hierarchical clustering is bottom-up and merging layer by layer [7]. The clustering process starts with each point being a separate class, and then, the two classes with the highest similarity are merged and iteratively repeated. ...
Article
Full-text available
The direction-based label propagation clustering (DBC) algorithm needs to set the number of neighbors ( k ) and the angle value (degree), which are highly sensitive. Moreover, DBC algorithm is not suitable for datasets with uneven neighbor density distribution. To overcome above problems, we propose an improved DBC algorithm based on adaptive angle and label redistribution (ALR-DBC). The ALR-DBC algorithm no longer input parameter degree, but dynamically adjusts the deviation angle through the concept of high-low density region to determine the receiving range. This flexible receiving range is no longer affected by the uneven distribution of neighbor density. Finally, those points that do not meet the expectations of the main direction are redistributed. Experiments show that the ALR-DBC algorithm performs better than DBC algorithm in most artificial datasets and real datasets. It is also superior to the classical algorithms listed. It also has good experimental results when applied to wireless sensor data annotation.
... Notably, passthrough filtering is essential only for considering the object cluster points that belong to the space of interest. Next, point clusters are extracted using the hierarchical Euclidean clustering method [32]. ...
Article
Full-text available
The detection of human hand intrusion is crucial for improving the productivity of human–robot collaboration. Recent trends to safety-related sensors have adopted the concept of radar-based human presence detection. Some of them have already been commercialized. However, it has not yet been elucidated whether radars can sufficiently detect human hand intrusion and satisfy the required safety integrity level. In this study, we present outlier and error profiles obtained by detecting human hand intrusion based on the motion and speeds of the human hand and radar sensor. Our experiments indicate that slow hand movements require further studies from the viewpoint of safety. In addition, we suggest a new noise model demonstrating random noise and temporary outliers based on the recorded noise profiles of actual human participants. The obtained outlier and error profiles can be utilized as an outlier judgement criterion of the sensing in the specific case of the radar.
... We consider three regions (northwest, southwest, and east Mongolia) in Mongolia, as shown in Fig. 2, based on clusters proposed by previous studies (Lall et al., 2016). These spatial clusters are based on the mortality data at the soum (county) level from 1972 to 2010, using hierarchical clustering (Johnson, 1967), which were adjusted with the spatial patterns of the Mongolian topography, climate zones, and mean precipitation in growing seasons. It is reasonable to use these clusters be- cause the objective of the study is to improve risk analysis of dzud and mortality of livestock in Mongolia. ...
Article
Full-text available
Mass livestock mortality events during severe winters, a phenomenon that Mongolians call dzud, cause the country significant socioeconomic problems. Dzud is an example of a compound event, meaning that multiple climatic and social drivers contribute to the risk of occurrence. Existing studies argue that the frequency and intensity of dzud events are rising due to the combined effects of climate change and variability, most notably summer drought and severe winter conditions, on top of socioeconomic dynamics such as overgrazing. Summer droughts are a precondition for dzud because scarce grasses cause malnutrition, making livestock more vulnerable to harsh winter conditions. However, studies investigating the association between climate and dzud typically look at a short time frame (i.e., after 1940), and few have investigated the risk or the recurrence of dzud over a century-scale climate record. This study aims to fill the gaps in technical knowledge about the recurrence probability of dzud by estimating the return periods of relevant climatic variables: summer drought conditions and winter minimum temperature. We divide the country into three regions (northwest, southwest, and east Mongolia) based on the mortality index at the soum (county) level. For droughts, our study uses as a proxy the tree-ring-reconstructed Palmer drought severity index (PDSI) for three regions between 1700–2013. For winter severity, our study uses observational data of winter minimum temperature after 1901 while inferring winter minimum temperature in Mongolia from instrumental data in Siberia that extend to the early 19th century. Using a generalized extreme value distribution with time-varying parameters, we find that the return periods of drought conditions vary over time, with variability increasing for all the regions. Winter temperature severity, however, does not change with time. The median temperature of the 100-year return period for winter minimum temperature in Mongolia over the past 300 years is estimated as −26.08 ∘C for the southwest, −27.99 ∘C for the northwest, and −25.31 ∘C for the east. The co-occurrence of summer drought and winter severity increases in all the regions in the early 21st century. The analysis suggests that a continued trend in summer drought would lead to increased vulnerability and malnutrition. Prospects for climate index insurance for livestock are also discussed.
... The basic idea of the hierarchical clustering algorithm [17] is to construct the hierarchical relationship between data for clustering. The obtained clustering result has the characteristics of a tree structure, which is called a clustering tree. ...
Article
Full-text available
Aiming to resolve the problems of the traditional hierarchical clustering algorithm that cannot find clusters with uneven density, requires a large amount of calculation, and has low efficiency, this paper proposes an improved hierarchical clustering algorithm (referred to as PRI-MFC) based on the idea of population reproduction and fusion. It is divided into two stages: fuzzy pre-clustering and Jaccard fusion clustering. In the fuzzy pre-clustering stage, it determines the center point, uses the product of the neighborhood radius eps and the dispersion degree fog as the benchmark to divide the data, uses the Euclidean distance to determine the similarity of the two data points, and uses the membership grade to record the information of the common points in each cluster. In the Jaccard fusion clustering stage, the clusters with common points are the clusters to be fused, and the clusters whose Jaccard similarity coefficient between the clusters to be fused is greater than the fusion parameter jac are fused. The common points of the clusters whose Jaccard similarity coefficient between clusters is less than the fusion parameter jac are divided into the cluster with the largest membership grade. A variety of experiments are designed from multiple perspectives on artificial datasets and real datasets to demonstrate the superiority of the PRI-MFC algorithm in terms of clustering effect, clustering quality, and time consumption. Experiments are carried out on Chinese household financial survey data, and the clustering results that conform to the actual situation of Chinese households are obtained, which shows the practicability of this algorithm.
... Advantage of LSA in comparison with hierarchical [29] methods: ...
Article
Full-text available
The main goal in cluster analysis is finding groups that naturally exist in any given data. Conceptually, observations are generated from a Probability Density Function (PDF). The number of clusters, their shapes, sizes, and densities is dictated by the PDF from which the data are sampled, i.e., level sets of the PDF play a fundamental role in cluster analysis. In this article, a novel cluster analysis algorithm called Level Set Analysis (LSA) is proposed which is based on the theory of the geometry of PDF level sets. LSA presents a unique style of data visualization which labels data points either as distinct clusters or outliers. In LSA, a non-parametric model is trained on data that tries to replicate the PDF. It provides a tool in both exploratory and predictive data mining.
... We applied two clustering techniques: Agglomerative Hierarchical Clustering (AHC) and K-means (Hartigan & Wong, 1979;Johnson, 1967). ...
Article
Spatial prepositions have been studied in some detail from multiple disciplinary perspectives. However, neither the semantic similarity of these prepositions, nor the relationships between the multiple senses of different spatial prepositions, are well understood. In an empirical study of 24 spatial prepositions, we identify the degree and nature of semantic similarity and extract senses for three semantically similar groups of prepositions using t-SNE, DBSCAN clustering, and Venn diagrams. We validate the work by manual annotation with another data set. We find nuances in meaning among proximity and adjacency prepositions, such as the use of close to instead of near for pairs of lines, and the importance of proximity over contact for the next to preposition, in contrast to other adjacency prepositions.
... The single-linkage method is the oldest model developed by Polish researchers in 1950s (Murtagh and Contreras 2012). It was first defined by Florek et al. (1951) and later by Sneath (1957) and Johnson (1967). The distance between two clusters (C1) and (C2∪C3) is defined as the minimum distance between any sample in a set and any other sample Everitt et al. (2011) and can be obtained by equation 4 (Carvalho et al. 2019). ...
Article
The aim of this study is to compare hierarchical clustering methods by Cophenetic Correlation Coefficient (CCC) when there is a big data. For this purpose, after giving information about big data, clustering methods and CCC, analyzes are carried out for the related data set. The 2015 air travel consumer report, which was used in the application part of the study and published by the US Ministry of Transport, was used as big data. Libraries of the Python programming language installed on the Amazon cloud server, which includes open-source big data technologies, were used for data analysis. Since there is big data in the study, in order to save time and economy, the variables used in the study were first reduced by feature selection method, standardized and analyzed over the final 4 different data sets. As a result of the clustering analysis, it was observed that the highest CCC was obtained with the Average clustering method for all of these four different data sets.
... This is useful when working with large amounts of data, where labeling every data point is infeasible. Some examples of unsupervised learning methods include k-means clustering [122,123], hierarchical clustering [124,125], DBSCAN [126], isolation forests [127], principal component analysis [128], autoencoders [107,108], locally linear embedding [129], and expectation-maximization algorithms [130]. ...
Chapter
Full-text available
Machine learning is a subfield of artificial intelligence which combines sophisticated algorithms and data to develop predictive models with minimal human interference. This chapter focuses on research that trains machine learning models to study antimicrobial resistance and to discover antimicrobial drugs. An emphasis is placed on applying machine learning models to detect drug resistance among bacterial and fungal pathogens. The role of machine learning in antibacterial and antifungal drug discovery and design is explored. Finally, the challenges and prospects of applying machine learning to advance basic research on and treatment of antimicrobial resistance are discussed. Overall, machine learning promises to advance antimicrobial resistance research and to facilitate the development of antibacterial and antifungal drugs.
... We use a transposed version of the dataset in order to represent the features as the to-be clustered items. • Hierarchical clustering [46] is a hierarchical algorithm that merges the most similar elements into the same group until obtaining a full tree. Cutting the obtained hierarchy produces a disjoint partition. ...
Article
Full-text available
Dimension reduction methods is effective for tackling the complexity of models learning from high-dimensional data. Usually, they are presented as a black box, where the reduction process is unknown to the practitioners. Yet, this process potentially transmits a reliable framework for understanding the regularities behind the data. Furthermore, in some applications contexts, the available datasets are presented with a huge lack of records. Therefore, the classical and the deep dimension reduction methods often fall in the over-fitting trap. We propose to tackle these challenges under the Bayesian network paradigm associated with the latent variables learning. We propose an interpretable framework for learning a reduced dimension while ensuring the effectiveness against the curse of dimensionality. Our exhaustive experimental results, over benchmark datasets, prove that our dimension reduction algorithm yields a user-friendly model that not only minimizes the information loss due to the reduction process, but also escapes data overfitting due to the lack of records.
... These types of models are used to find natural groupings of data, known as clusters. Common clustering algorithms are the k-means [59], hierarchical clustering [60] and the expectation maximisation algorithm [61]. In Figure 3.7 is presented a typical structure of a cluster-based model. ...
Thesis
Full-text available
In our days there is a need for even more and more sophisticated circuits. The design companies to reduce the operating costs and facilitate mass production of integrated circuits, outsource their fabrication to third-party foundries. This process increases the risk of intrusion attacks in the form of hardware viruses, also known as hardware trojans (HTs) viruses. HTs viruses are a critical problem that has the potential to become an outbreak in the coming years, presenting a significant threat both technologically and socially. The majority of the studies are concerned with the development of countermeasures against HTs for Field-Programmable Gate Array (FPGA) circuits at the post-silicon stage. Also, there is limited information and published studies for the Application-Specific Integrated Circuits (ASICs) and specifically for the pre-silicon stage. ASICs are challenging due to the variety of design phases especially on the pre-silicon stage and for the need for professional tools for the design of each phase. In this thesis, we studied several phases for the design process on ASICs and we found that there is a general lack of free benchmark circuits and also there is a high imbalance problem between uninfected and infected benchmark circuits. We used and designed all the limited benchmark circuits for the Gate-Level Netlist (GLN) phase of ASICs with a professional tool and extracted area, power and time analysis features. We developed our Machine Learning (ML) classification models based on this limited data and we observed that the lack of samples leads to the development of imbalanced and no robust ML-based classification approaches against HTs viruses. We solved the problem of the limited data with the development of our Deep Learning (DL) - Generative Adversarial Networks (GANs) models which were able to synthesize new generated data based on our real limited data. GANs are novel DL algorithms that are used in the computer vision field for generating artificial images and it was the first time that GANs were used in this research field. Based on our new generated data we developed a robust ML-based classifier as a countermeasure against HTs at the GLN phase and compared it with existing methods. Finally, we turned our generative model into a free tool to be used as a solution for dealing with the limited number of data
... www.nature.com/scientificdata/ To build a non-redundant gene set, we first used hierarchical clustering 42 to combine the homologous-based gene sets of G. gallus and T. guttata. The gene model with the highest identity to the query was preserved if a locus has been annotated with more than one gene model. ...
Article
Full-text available
Manakins are a family of small suboscine passerine birds characterized by their elaborate courtship displays, non-monogamous mating system, and sexual dimorphism. This family has served as a good model for the study of sexual selection. Here we present genome assemblies of four manakin species, including Cryptopipo holochlora, Dixiphia pipra (also known as Pseudopipra pipra ), Machaeropterus deliciosus and Masius chrysopterus , generated by Single-tube Long Fragment Read (stLFR) technology. The assembled genome sizes ranged from 1.10 Gb to 1.19 Gb, with average scaffold N50 of 29 Mb and contig N50 of 169 Kb. On average, 12,055 protein-coding genes were annotated in the genomes, and 9.79% of the genomes were annotated as repetitive elements. We further identified 75 Mb of Z-linked sequences in manakins, containing 585 to 751 genes and an ~600 Kb pseudoautosomal region (PAR). One notable finding from these Z-linked sequences is that a possible Z-to-autosome/PAR reversal could have occurred in M. chrysopterus . These de novo genomes will contribute to a deeper understanding of evolutionary history and sexual selection in manakins.
... Hierarchical clustering, also known as hierarchical cluster analysis, was used to classify 91 cities in the study area for comparative analysis. It is an algorithm that groups similar objects into what are called clusters [59,60]. In theory, hierarchical clustering begins by treating each observation as a separate cluster. ...
Article
Full-text available
The essence of sustainable urbanization is to take a holistic approach to the harmonious development of economic, social, cultural and environmental protection. This paper applies the urban sustainability assessment system to analyze the characteristics of indicators related to the quality of the built environment and environmental pressure of 91 cities in four major megalopolises in China from 2010 to 2018. It also combines statistical methods to summarize the general features of urban development through a comprehensive urban performance evaluation by comparative and classification analysis for the purpose of scientific guidance on sustainable urbanization. The comparative results showed that in terms of urban sustainability, the Yangtze River Delta performed best, followed by JingJinJi, Pearl River Delta and Shandong Peninsula. Of which, the quality of built environment in JingJinJi and the environment pressure in the Shandong Peninsula require particular attention to improve and decrease, respectively. Moreover, cities can be grouped into six development types through performance clustering including three positive and three negative types. The characteristics of all types are summarized, and the performance of the specific indicators are detailed compared to serve as a guiding basis for making generic recommendations of sustainable urbanization.
Chapter
Full-text available
Spray-wall interactions take place in many technical applications such as spray cooling, combustion processes, cleaning, wetting of surfaces, coating and painting, etc. The outcome of drop impact onto hot surfaces depends on a variety of parameters like for example material and thermal properties of the liquid and wall, substrate wetting properties, surrounding conditions which determine the saturation temperature, spray impact parameters and surface temperature. The aim of the current project is to improve knowledge of the underlying physics of spray-wall interactions. As an important step towards spray impact modeling first a single drop impact onto hot substrates is considered in detail. Various regimes of single drop impact, such as thermal atomization , magic carpet breakup , nucleate boiling and thermosuperrepellency , observed at different wall temperatures, ambient pressures and impact velocities, have been investigated experimentally and modelled theoretically during the project period. The heat flux, an important parameter for spray cooling, has been modeled not only for single drop impacts but also for sprays within many regimes. The models show a good agreement with experimental data as well as data from literature.
Article
Full-text available
Sixteen genotypes of mung bean ( Vigna radiata (L.) Wilczek var. radiata ) were subjected to 18 treatment combinations (environments) resulting from 3 levels of N, 3 planting densities, and 2 planting times. Measurements were made on yield and its components: pods per plant, seeds per pod, and seed weight. Cluster analysis was used to provide an index of similarity of the genotypes for each character. Genetic similarity of the genotypes, as indicated by a “one-trait-at-a-time” analysis, is reflected in their phenetic similarity in an 18 dimensional space corresponding to the 18 environments. No relationship between geographic distribution and genetic diversity was obtained for all characters. Information on the diversity of the components of yield would be useful in choosing parents that yield superior progenies. Pods per plant was the most important component followed by seeds per pod, and seed weight. Selection of parents for the component characters, with regard to high performance and genetic diversity, are expected to follow the same order.
Article
Clustering ensemble deals with the inconsistent parts of base clusterings while maintaining consistency in the clustering results. The cluster core samples generally have more consistent neighbor relationship in base clusterings than the cluster edge samples, and these two types of samples have different contributions to the determination of the underlying data structure. In addition, in clustering ensemble, some samples that are equivalent in base clusterings can be merged into equivalence granularity to reduce the size of input data. Based on these ideas and combined with rough set theory, we propose a novel clustering ensemble algorithm based on the approximate accuracy of equivalence granularity. We initially define the clustering equivalence relation, extract the consistent results of base clusterings based on this relation, and upgrade the clustering ensemble from the sample level to the equivalence granularity level. Afterward, we use the approximate accuracy of the rough set to quantify the contribution of equivalence granularity in discovering the underlying data structure, divide the equivalence granularities into core and edge equivalence granularities, and implement different clustering processing strategies. The visual experiments on four synthetic data sets and comparison experiments with 14 state-of-the-art clustering ensemble algorithms on 21 data sets respectively prove the rationality and excellent performance of our proposed algorithm.
Article
Due to significant implications for resource and food sectors that directly influence social well-being, commodity price comovements represent an important issue in agricultural economics. In this study, we approach this issue by concentrating on daily prices of the corn futures market and 496 cash markets from 16 states in the United States for the period of July 2006 – February 2011 through correlation based hierarchical analysis and synchronization analysis, which allow for determining interactions and interdependence among these prices, heterogeneities in price synchronization, and their changing patterns over time. As the first study of the issue focusing on prices of the futures and hundreds of spatially dispersed cash markets for a commodity of indubitable economic significance, empirical findings show that the degree of comovements is generally higher after March 2008 but no persistent increase is observed. Different groups of cash markets are identified, each of which has its members exhibit relatively stable price synchronization over time that is generally at a higher level than the synchronization among the futures and all of the 496 cash markets. The futures is not found to show stable price synchronization with any cash market. Certain cash markets have potential of serving as cash price leaders. Results here benefit resource and food policy analysis and design for economic welfare. The empirical framework has potential of being adapted to network analysis of prices of different commodities.
Article
Clustering is the most fundamental technique for data processing. This paper presents a collaborative annealing power k-means++ clustering algorithm by integrating the k-means++ and power k-means algorithms in a collaborative neurodynamic optimization framework. The proposed algorithm starts with k-means++ to select initial cluster centers, then leverages the power k-means to find multiple sets of centers as alternatives and a particle swarm optimization rule to reinitialize the centers in the subsequential iterations for improving clustering performance. Experimental results on twelve benchmark datasets are elaborated to demonstrate the superior performance of the proposed algorithm to seven mainstream clustering algorithms in terms of 21 internal and external indices.
Article
Full-text available
Purpose Measuring the exact technology complementarity between different institutions is necessary to obtain complementary technology resources for R&D cooperation. Design/methodology/approach This study constructs a morphology-driven method for measuring technology complementarity, taking medical field as an example. First, we calculate semantic similarities between subjects (S and S) and action-objects (AO and AO) based on the Metathesaurus, forming clusters of S and AO based on a semantic similarity matrix. Second, we identify key technology issues and methods based on clusters of S and AO. Third, a technology morphology matrix of several dimensions is constructed using morphology analysis, and the matrix is filled with subjects -action-objects (SAO) structures according to corresponding key technology issues and methods for different institutions. Finally, the technology morphology matrix is used to measure the technology complementarity between different institutions based on SAO. Findings The improved technology complementarity method based on SAO is more of a supplementary and refined framework for the traditional IPC method. Research limitations In future studies we will reprocess and identify the SAO structures which were not in the technology morphology matrix, and find other methods to characterize key technical issues and methods. Furthermore, we will add the comparison between proposed method and traditional and mostly used complementarity measurement method based on industry chain and industry code. Practical implications This study takes medical field as an example. The morphology-driven method for measuring technology complementarity can be migrated and applied for any given field. Originality/value From the perspective of complementary technology resources, this study develops and tests a more accurate morphology-driven method for technology complementarity measurement.
Article
We propose two new strategies based on Machine Learning techniques to handle polyhedral grid refinement, to be possibly employed within an adaptive framework. The first one employs the k-means clustering algorithm to partition the points of the polyhedron to be refined. This strategy is a variation of the well known Centroidal Voronoi Tessellation. The second one employs Convolutional Neural Networks to classify the “shape” of an element so that “ad-hoc” refinement criteria can be defined. This strategy can be used to enhance existing refinement strategies, including the k-means strategy, at a low online computational cost. We test the proposed algorithms considering two families of finite element methods that support arbitrarily shaped polyhedral elements, namely the Virtual Element Method (VEM) and the Polygonal Discontinuous Galerkin (PolyDG) method. We demonstrate that these strategies do preserve the structure and the quality of the underlaying grids, reducing the overall computational cost and mesh complexity.
Article
Full-text available
Xanthomonas citri pv. citri, a Gram-negative bacterium, is the causal agent of citrus canker, a significant threat to citrus production. Understanding of global expansion of the pathogen and monitoring introduction into new regions are of interest for integrated disease management at the local and global level. Genetic diversity can be assessed using genomic approaches or information from partial gene sequences, satellite markers or clustered regularly interspaced short palindromic repeats (CRISPR). Here, we compared CRISPR loci from 355 strains of X. citri pv. citri, including a sample from ancient DNA, and generated the genealogy of the spoligotypes, i.e., the absence/presence patterns of CRISPR spacers. We identified 26 novel spoligotypes and constructed their likely evolutionary trajectory based on the whole-genome information. Moreover, we analyzed ~30 additional pathovars of X. citri and found that the oldest part of the CRISPR array was present in the ancestor of several pathovars of X. citri. This work presents a framework for further analyses of CRISPR loci and allows drawing conclusions about the global spread of the citrus canker pathogen, as exemplified by two introductions in West Africa.
Chapter
Full-text available
We describe a modelling approach for the simulation of droplet dynamics in strong electric fields. The model accounts for electroquasistatic fields, convective and conductive currents, contact angle dynamics and charging effects associated with droplet breakup processes. Two classes of applications are considered. The first refers to the problem of water droplet oscillations on the surface of outdoor high-voltage insulators. The contact angle characteristics resulting from this analysis provides a measure for the estimation of the electric field inception thresholds for electrical discharges on the surface. The second class of applications consists in the numerical characterization of electrosprays. Detailed simulations confirm the scaling law for the first electrospray ejection and, furthermore, provide insight on the charge-radius characteristics for transient as well as steady state electrosprays.
Chapter
Full-text available
The modelling of liquid–vapour flow with phase transition poses many challenges, both on the theoretical level, as well as on the level of discretisation methods. Therefore, accurate mathematical models and efficient numerical methods are required. In that, we focus on two modelling approaches: the sharp-interface (SI) approach and the diffuse-interface (DI) approach. For the SI-approach, representing the phase boundary as a co-dimension-1 manifold, we develop and validate analytical Riemann solvers for basic isothermal two-phase flow scenarios. This ansatz becomes cumbersome for increasingly complex thermodynamical settings. A more versatile multiscale interface solver, that is based on molecular dynamics simulations, is able to accurately describe the evolution of phase boundaries in the temperature-dependent case. It is shown to be even applicable to two-phase flow of multiple components. Despite the successful developments for the SI approach, these models fail if the interface undergoes topological changes. To understand merging and splitting phenomena for droplet ensembles, we consider DI models of second gradient type. For these Navier–Stokes–Korteweg systems, that can be seen as a third order extension of the Navier–Stokes equations, we propose variants that are more accessible to standard numerical schemes. More precisely, we reformulate the capillarity operator to restore the hyperbolicity of the Euler operator in the full system.
Chapter
Full-text available
The origin of rebound suppression of an impacting droplet by a small amount of polymer additive has been tentatively explained by various physical concepts including the dynamic surface tension, the additional energy dissipation by non-Newtonian elongational viscosity, the elastic force of stretched polymer, and the additional friction on a receding contact line. To better understand the role of polymer on a molecular level, we performed multi-body dissipative particle dynamics simulations of droplets impacting on solvophobic surfaces. The rebound suppression is achieved by the elastic force of stretched polymer during the hopping stage, and the additional friction on the contact line during the retraction stage. Both slow-hopping and slow-retraction mechanisms coexist in a wide range of simulation parameters, but the latter is prevailing for large droplets, and for the strong attraction strength between polymer and surface. The increased polymer adsorption, which maybe achieved by a higher polymer concentration or a larger molecular weight, stimulates both mechanisms. Also, the molecular evidence of the additional friction on the receding contact line is shown from the relation between the contact angle and the contact line velocity where the slope of the fitted line is an indication of the additional friction.
Chapter
Full-text available
Drop impact on a hot surface heated above the saturation temperature of the fluid plays an important role in spray cooling. The heat transferred from the wall to the fluid is closely interrelated with drop hydrodynamics. If the surface temperature is below the Leidenfrost temperature, the heat transport strongly depends on the transport phenomena in the vicinity of the three-phase contact line. Due to extremely high local heat flux, a significant fraction of the total heat flow is transported through this region. The local transport processes near the three-phase contact line, and, therefore, the total heat transport, are determined by the wall superheat, contact line velocity, system pressure, fluid composition, surface structure and physical properties on the wall. The effect of the aforementioned influencing parameters on fluid dynamics and heat transport during evaporation of a single meniscus in a capillary slot are studied in a generic experimental setup. The hydrodynamics and evolution of wall heat flux distribution during the impact of a single drop onto a hot wall are also studied experimentally by varying the impact parameters, wall superheat, system pressure, and wall topography. In addition, the fluid dynamics and heat transport behavior during vertical and horizontal coalescence of multiple drops on a heated surface are studied experimentally.
Article
Path choice set identification is essential for route choice modelling and travel behaviour studies. Recent advancements in data collection techniques have gained attention towards a data-driven choice set identification process. However, empirical vehicle trajectory datasets result in several path observations compared to the traditional algorithms, complicating the route choice modelling process. This study proposes a bi-level vehicle trajectory clustering framework where the output of the upper-level clustering provides a representative path choice set for simple/mixed logit modelling (MNL), whereas the lower-level clustering provides a nested or cross-nested representation of the paths based on hard and soft clustering, respectively. As proof of concept, the proposed methodology is applied on real Bluetooth-based trajectories from Brisbane, where 62 unique paths were observed from a one-year trajectory data for an origin–destination pair. The results of the MNL model for the representative paths provide desirable magnitude with negative coefficients for the distance and travel time path attributes. Further, the results of the (cross) nested modelling appropriately identified the (cross) nested structure for the path choice set.
Chapter
Full-text available
The computation of two-phase flow scenarios in a high pressure and temperature environment is a delicate task, for both the physical modeling and the numerical method. In this article, we present a sharp interface method based on a level-set ghost fluid approach. Phase transition effects are included by the solution of the two-phase Riemann problem at the interface, supplemented by a phase transition model based on classical irreversible thermodynamics. We construct an exact Riemann solver, as well as an approximate Riemann solver. We compare numerical results against molecular dynamics data for an evaporation shock tube and a stationary evaporation case. In both cases, our numerical method shows a good agreement with the reference data.
Chapter
Full-text available
High-voltage composite insulators are specially designed to withstand different environmental conditions to ensure a reliable and efficient electric power distribution and transmission. Especially, outdoor insulators are exposed to rain, snow or ice, which might significantly affect the performance of the insulators. The interaction of sessile water droplets and electric fields is investigated under various boundary conditions. Besides the general behavior of sessile droplets, namely the deformation and oscillation, the inception field strength for partial discharges is examined depending on the droplet volume, strength and frequency of the electric field and the electric charge. Particularly, the electric charge is identified to significantly affect the droplet behavior as well as the partial discharge inception field strength. In addition to ambient conditions, the impact of electric fields on ice nucleation is investigated under well-defined conditions with respect to the temperature and electric field strength. High electric field strengths are identified to significantly promote ice nucleation, especially in case of alternating and transient electric fields. Different influencing factors like the strengths, frequencies and time constants of the electric fields are investigated. Consequently, the performed experiments enhance the knowledge of the behavior of water droplets under the impact of electric fields under various conditions.
Article
Neural Architecture Search (NAS) desired to bring Machine Learning to the common masses. But ironically, because of its high-resources requirements, it remained exclusive to the elite. After several efficiency enhancements, its most efficient version (ENAS), found a place across some commonly used Deep Learning libraries, but it still could not gain mass popularity. Especially in the field of malware forensics, there exists no popular implementation of NAS. AutoML, as it stands today, comprises NAS and hyperparameter tuning as sub-domains. But both from effort and impact perspectives, the data dimension has 80% weight in an ML problem, but still, the data dimension of ML is currently missing from AutoML. In forensics, optimal sample discovery may have more impact than an optimal model discovery. Therefore, in this paper, we propose Neural Sample Search (NSS) using DRo, to comprise the data discovery dimension in AutoML. Further, we prove that, for malware forensics, NSS outperforms all expert-curated and NAS-suggested models by an exceptionally large margin. This gains further significance, as the baseline expert model had over 6700% higher neural inference complexity than the NSS model, and was curated with efforts of several forensic experts across several years to reach that performance level; and the Efficient-NAS model had (ironically) over 100,000% higher neural inference complexity than the proposed NSS mechanism. With such high performance at such minimal model footprint and complexity that NSS brings, we can claim that by including NSS, AutoML can truly be ready for mass adoption in the field of malware forensics.
Article
Network traffic identification is of great value as an effective way for network management. Typically, intelligent classifiers are deployed for advanced network protection, including intrusion traffic detection and network behavior monitoring etc. Robust network traffic classifiers with generalization ability should also ensure stable performance in various network environments. Unfortunately, although the existing network traffic classifiers can achieve the claimed performance when initialized and tested in invariant network environments with stable attribute distributions, they are inclined to fail when adapting to varying practical networks and suffer from significant performance degradation. Based on analysis of representative state-of-the-art classifiers with respect to their transferability, we arrive at that feature distribution of the same class is vulnerable to the changes of networking which account for the degradation of most existing methods. To tackle the issues, we propose a weakly-supervised network traffic classification method based on graph matching. Specifically, network sessions are aggregated to several clusters with the extracted principal features through weakly-supervised clustering. Moreover, we measure the correlations between clusters from the same networks to construct the similarity graphs. Clusters from the testing network are associated with those from the initial network by the carefully designed graph matching algorithm, so that the testing clusters can be labeled according to the associated ones in the initialized network. Our method shows eye-catching robustness, achieving an accuracy of 88.19% when practically deployed in different networks, which significantly outperforms the existing approaches.
Article
Accurately predicting the next shopping basket of a customer is important for retailers, as it offers an opportunity to serve customers with personalized product recommendations or shopping lists. The goal of next-basket prediction is to predict a coherent set of products that the customer will buy next, rather than just a single product. However, if the assortment of the retailer contains thousands of products, the number of possible baskets becomes extremely large and most standard choice models can no longer be applied. Therefore, we propose the use of a gated recurrent unit (GRU) network for next-basket prediction in this study, which is easily scalable to large assortments. Our proposed model is able to capture dynamic customer taste, recurrency in purchase behavior and frequent product co-occurrences in shopping baskets. Moreover, it allows for the inclusion of additional covariates. Using two real-life datasets, we demonstrate that our model is able to outperform both naive benchmarks and a state-of-the-art next-basket prediction model on several performance measures. We also illustrate that the model learns meaningful patterns about the retailer’s assortment structure.
Article
Small public spaces are important for citizens to live and socialize with a high utilization rate. The vitality of small public space plays an important role in evaluating space quality and attraction and provides reference for urban governance issues such as vitality evaluation of public space, quality optimization, and site micro-renewal. Previous studies of vitality based on low-throughput surveys or big data with low positioning accuracy are not suitable for the high-efficiency study of small public space. In this study, a systematic framework of vitality quantification in small public spaces is built on fine-grained human trajectories extracted from videos for more efficient and refined human-oriented vitality evaluation. A multi-indicator vitality quantification method is first proposed to comprehensively represent human vitality, including number of people, duration of stay, motion speed, trajectory diversity and trajectory complexity. Furthermore, a video dataset of small public space along with our sub-index-assisted expert assessing scheme is proposed to evaluate our vitality quantification framework. Finally, we analyze the correlation between quantitative vitality indicators and the expert-assessing vitality through multiple linear regression and obtain the optimal vitality quantification model. The experimental results indicate that our dataset is reliable and the vitality quantification model constructed with our quantitative indicators can better characterize urban vitality than the previous model based on number of people and staying time.
Article
Full-text available
The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of resolution and relevance, to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.
Article
In 2015, Driemel, Krivošija and Sohler introduced the ( k , ℓ)-median clustering problem for polygonal curves under the Fréchet distance. Given a set of input curves, the problem asks to find k median curves of at most ℓ vertices each that minimize the sum of Fréchet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in \(\mathbb {R}^d \) and achieves approximation factor (1 + ϵ) with respect to the clustering costs. The algorithm has worst-case running time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in d , ℓ, 1/ϵ and 1/ δ , i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity ℓ, but of complexity at most 2ℓ − 2, and whose vertices can be computed efficiently. We combine this lemma with the superset sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.
Chapter
Full-text available
Physics of supercritical fluids is extremely complex and not yet fully understood. The importance of the presented investigations into the physics of supercritical fluids is twofold. First, the presented approach links the microscopic dynamics and macroscopic thermodynamics of supercritical fluids. Second, free falling droplets in a near to supercritical environment are investigated using spontaneous Raman scattering and a laser induced fluorescence/phosphorescence thermometry approach. The resulting spectroscopic data are employed to validate theoretical predictions of an improved evaporation model. Finally, laser induced thermal acoustics is used to investigate acoustic damping rates in the supercritical region of pure fluids.
Chapter
Full-text available
The present paper aims at developing a generally valid, consistent numerical description of a turbulent multi-component two-phase flow that experiences processes that may occur under both subcritical and trans-critical or supercritical operating conditions. Within an appropriate LES methodology, focus is put on an Euler-Eulerian method that includes multi-component mixture properties along with phase change process. Thereby, the two-phase flow fluid is considered as multi-component mixtures in which the real fluid properties are accounted for by a composite Peng-Robinson (PR) equation of state (EoS), so that each phase is governed by its own PR EoS. The suggested numerical modelling approach is validated while simulating the disintegration of an elliptic jet of supercritical fluoroketone injected into a helium environment. Qualitative and quantitative analyses are carried out. The results show significant coupled effect of the turbulence and the thermodynamic on the jet disintegration along with the mixing processes. Especially, comparisons between the numerical predictions and available experimental data provided in terms of penetration length, fluoroketone density, and jet spreading angle outline good agreements that attest the performance of the proposed model at elevated pressures and temperatures. Further aspects of transcritical jet flow case as well as comparison with an Eulerian-Lagrangian approach which is extended to integrate the arising effects of vanishing surface tension in evolving sprays are left for future work.
Chapter
Full-text available
Rocket engine manufacturers attempt to replace toxic, hypergolic fuels by less toxic substances such as cryogenic hydrogen and oxygen. Such components will be superheated when injected into the combustion chamber prior to ignition. The liquids will flash evaporate and subsequent mixing will be crucial for a successful ignition of the engine. We now conduct a series of DNS and RANS-type simulations to better understand this mixing process including microscopic processes such as bubble growth, bubble-bubble interactions, spray breakup dynamics and the resulting droplet size distribution. Full scale RANS simulations provide further insight into effects associated with flow dynamic such as shock formation behind the injector outlet. Capturing these gas dynamic effects is important, as they affect the spray morphology and droplet movements.
Chapter
Full-text available
A fundamental understanding of droplet dynamics is important for the prediction and optimization of technical systems involving drops and sprays. The Collaborative Research Center (CRC) SFB-TRR 75 was established in January 2010 to focus on the dynamics of basic drop processes, and in particular on processes involving extreme ambient conditions, for example near thermodynamic critical conditions, at very low temperatures, under the influence of strong electric fields, or in situations involving extreme gradients of the boundary conditions. The goal of the CRC was to gain a profound physical understanding of the essential processes, which is the basis for new analytical and numerical descriptions as well as for improved predictive capabilities. This joint initiative involved scientists at the University of Stuttgart, the TU Darmstadt, the TU Berlin, and the German Aerospace Center (DLR) in Lampoldshausen. This first chapter provides a brief overview of the overall structure of this CRC as well as a summary of some selected scientific achievements of the subprojects involved. For further details the reader is referred to the subsequent chapters of this book related to the individual subprojects.
Chapter
Full-text available
The substitution of the toxic hydrazine in current high-altitude rocket engines like upper stages or reaction control thrusters by green propellants is a major key driver in the current technology development of rocket propulsion systems. Operating these kind of rocket engines at high-altitude leads to a sudden pressure drop in the liquid propellants during their injection into the combustion chamber with a near-vacuum atmosphere prior to ignition. The resulting superheated thermodynamic state of the liquid causes a fast and eruptive evaporation which is called flash boiling. The degree of atomisation is important for a successful ignition and a secure operation of the rocket engine. The development and operation of a cryogenic high-altitude test bench at DLR Lampoldshausen enables the systematical experimental characterization of cryogenic flash boiling due to its ability to adjust and control the injection parameters like temperature, pressure or geometry. Several test campaigns with liquid nitrogen (LN2) were performed using two optical diagnostic methods: First, flash boiling LN2 spray patterns were visualised by means of high-speed shadowgraphy and, secondly, we determined the droplet size and velocity distributions in strongly superheated LN2 sprays with the help of a laser-based Phase Doppler system (PDA). The experimental data generated within these measurement campaigns provide defined boundary conditions as well as a broad data base for the numerical modelling of cryogenic flash boiling like e.g. the publications [8, 9].
Chapter
Full-text available
Due to availability of powerful computers and efficient algorithms, physical processes occurring at the micrometer scale can nowadays be studied with atomistic simulations. In the framework of the collaborative research center SFB-TRR75 “Droplet dynamics under extreme ambient conditions”, investigations of the mass transport across vapour-liquid interfaces are conducted. Non-equilibrium molecular dynamics simulation is employed to study single- and two-phase shock tube scenarios for a simple noble gas-like fluid. The generated data show an excellent agreement with computational fluid dynamics simulations. Further, particle and energy flux during evaporation are sampled and analysed with respect to their dependence on the interface temperature, employing a newly developed method which ensures a stationary process. In this context, the interface properties between liquid nitrogen and hydrogen under strong gradients of temperature and composition are investigated. Moreover, the Fick diffusion coefficient of strongly diluted species in supercritical CO $$_{2}$$ 2 is predicted by equilibrium molecular dynamics simulation and the Green-Kubo formalism. These results are employed to assess the performance of several predictive equations from the literature.
Chapter
Full-text available
The complexity of binary droplet collisions strongly increases in case of immiscible liquids with the occurrence of triple lines or for high energetic collisions, where strong rim instabilities lead to the spattering of satellite droplets. To cope with such cases, the Volume of Fluid method is extended by an efficient interface reconstruction, also applicable to multi-material cells of arbitrary configuration, as well as an enhanced continuous surface stress model for accurate surface force computations, also applicable to thin films. For collisions of fully wetting liquids, excellent agreement to experimental data is achieved in different collision regimes. High-resolution simulations predict droplet collisions in the spattering regime and provide detailed insights into the evolution of the rim instability. Another challenge is the numerical prediction of the collision outcome in the bouncing or coalescence region, where the rarefied gas dynamics in the thin gas film determines the collision result. To this end, an important step forward became possible by modelling the pressure in the gas film. With the introduction of an interior collision plane within the flow domain, it is now possible to simulate droplet collisions with gas film thickness reaching the physically relevant length scale.
Chapter
Full-text available
Ice accretion resulting from the impact of supercooled water drops is a hazard for structures exposed to low temperatures, for instance aircraft wings and wind turbine blades. Despite a multitude of studies devoted to the involved phenomena, the underlying physical processes are not yet entirely understood. Hence, modelling of the conditions for ice accretion and prediction of the ice accretion rate are presently not reliable. The research conducted in this study addresses these deficiencies in order to lend insight into the physical processes involved. While presenting an overview of results obtained during the first funding periods of this project, new results are also presented, relating to the impact of supercooled drops onto a cold surface in a cold air flow. The experiments are conducted in a dedicated icing wind tunnel and involve measuring the residual mass after impact of a liquid supercooled drop exhibiting corona splash as well as the impact of dendritic frozen drops onto a solid surface.
Article
Motivated by the advancements made by the recently proposed DRo algorithm to uplift the performance of data scarce Deep Learning malware detector for edge, we propose an adaptive and efficient system for hybrid edge-cloud detection and forensics, named GreenForensics. The proposed adaptive enhancement, makes the system more suitable for devices with custom battery-performance optimization mandates like tablets and laptops. Further, the enhancements offer various discrete and continuous controls for influencing the detection coverage and model footprints in real time. To further enhance the detection efficiency and making the detection resilient to adversarial-attacks, the proposed system can work with adversarial-DL immune algorithms. In the experiments conducted, GreenForensics was able to significantly outperform even the best baseline deep architectures and improved the detection and forensics robustness by up to 100% and performance by up to 40%. This gains further significance as the incumbent baseline DL architecture had up to 6700% higher neural inference complexity, and had its performance and robustness benchmarks had remained unchallenged for a long time.
Chapter
Full-text available
In this study, hydrodynamics and heat transport during the impact of single and multiple drops onto a hot wall are studied numerically. The heat transfer in the vicinity of the three-phase contact line, where solid, liquid and vapour meet, contributes significantly to the global heat transfer. The microscale processes in the region of the three-phase contact line are analysed using a lubrication approximation. The results in the form of correlations are integrated into an overall model. The impingement of drops on a macro scale is simulated using a numerical model developed within the OpenFOAM library. The influence of dimensionless parameters, i.e., the Reynolds, Weber, Bond, Prandtl and Jakob numbers, as well as the influence of pressure, on the transport phenomena is discussed. The analysis of the influence of drop frequency and substrate thickness during the vertical coalescence and the influence of the drop spacing during the horizontal coalescence of drops on the hot surface complete the study. The results contribute to a better understanding of the complex mechanisms of spray cooling.
Chapter
Full-text available
This article presents an overview of visual analysis techniques specifically developed for high-resolution direct numerical multiphase simulations in the droplet dynamic context. Visual analysis of such data covers a large range of tasks, starting from observing physical phenomena such as energy transport or collisions for single droplets to the analysis of large-scale simulations such as sprays and jets. With an increasing number of features, coalescence and breakup events might happen, which need to be visually presented in an interactive explorable way to gain a deeper insight into physics. But also the task of finding relevant structures, features of interest, or a general dataset overview becomes non-trivial. We present an overview of new approaches developed in our SFB-TRR 75 project A1 covering work from the last decade to the current work-in-progress. They are the basis for relevant contributions to visualization research as well as useful tools for close collaborations within the SFB.
Chapter
Full-text available
Phase change processes of supercooled droplets at different boundary conditions are presented. This study is a summary of the current developments within subproject B1 of the SFB-TRR 75 with the focus on evaporation, sublimation, and freezing of supercooled droplets. To this end, new numerical methods to describe the phase transition were developed and novel strategies dealing with the challenges of droplets under extreme conditions are presented. The numerical solution procedure of all phase changes are summarized in a compact way within this work. In order to validate the numerical models, experiments were conducted. For this, new experimental setups and approaches were developed. These comprise a test chamber for optical levitation of supercooled droplets, which is able to trap a droplet by means of a laser beam at subzero temperatures and variable ambient humidity. Comparisons of the numerical simulations and the conducted experiments are presented for several phase change processes. The results are in very good agreement and proof the capability of the methods.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n − 1 mutually exclusive sets by considering the union of all possible n(n − 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.
Article
Sixteen English consonants were spoken over voice communication systems with frequency distortion and with random masking noise. The listeners were forced to guess at every sound and a count was made of all the different errors that resulted when one sound was confused with another. With noise or low‐pass filtering the confusions fall into consistent patterns, but with high‐pass filtering the errors are scattered quite randomly. An articulatory analysis of these 16 consonants provides a system of five articulatory features or “dimensions” that serve to characterize and distinguish the different phonemes: voicing, nasality, affrication, duration, and place of articulation. The data indicate that voicing and nasality are little affected and that place is severely affected by low‐pass and noisy systems. The indications are that the perception of any one of these five features is relatively independent of the perception of the others, so that it is as if five separate, simple channels were involved rather than a single complex channel.
Article
Multidimensional scaling is the problem of representingn objects geometrically byn points, so that the interpoint distances correspond in some sense to experimental dissimilarities between objects. In just what sense distances and dissimilarities should correspond has been left rather vague in most approaches, thus leaving these approaches logically incomplete. Our fundamental hypothesis is that dissimilarities and distances are monotonically related. We define a quantitative, intuitively satisfying measure of goodness of fit to this hypothesis. Our technique of multidimensional scaling is to compute that configuration of points which optimizes the goodness of fit. A practical computer program for doing the calculations is described in a companion paper.
Article
The first in the present series of two papers described a computer program for multidimensional scaling on the basis of essentially nonmetric data. This second paper reports the results of two kinds of test applications of that program. The first application is to artificial data generated by monotonically transforming the interpoint distances in a known spatial configuration. The purpose is to show that the recovery of the original metric configuration does not depend upon the particular transformation used. The second application is to measures of interstimulus similarity and confusability obtained from some actual psychological experiments.
Article
A computer program is described that is designed to reconstruct the metric configuration of a set of points in Euclidean space on the basis of essentially nonmetric information about that configuration. A minimum set of Cartesian coordinates for the points is determined when the only available information specifies for each pair of those points—not the distance between them—but some unknown, fixed monotonic function of that distance. The program is proposed as a tool for reductively analyzing several types of psychological data, particularly measures of interstimulus similarity or confusability, by making explicit the multidimensional structure underlying such data.