Stephen S.-T. Yau's research while affiliated with Tsinghua University and other places

Publications (210)

Article
Full-text available
For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal model for a group of gene sequences is to find their longest common sequence (L...
Article
Full-text available
Mutations may produce highly transmissible and damaging HIV variants, which increase the genetic diversity, and pose a challenge to develop vaccines. Therefore, it is of great significance to understand how mutations drive the virulence of HIV. Based on the 11897 reliable genomes of HIV-1 retrieved from HIV sequence Database, we analyze the 12 type...
Article
Full-text available
Let V be a hypersurface with an isolated singularity at the origin defined by the holomorphic function f : (C n , 0) → (C, 0). The Yau algebra L(V) is defined to be the Lie algebra of derivations of the moduli algebra A(V) := O n /(f, ∂f ∂x1 , · · · , ∂f ∂xn), i.e., L(V) = Der(A(V), A(V)). It is known that L(V) is a finite dimensional Lie algebra a...
Article
Full-text available
The classification of protein sequences provides valuable insights into bioinformatics. Most existing methods are based on sequence alignment algorithms, which become time-consuming as the size of the database increases. Therefore, there is a need to develop an improved method for effectively classifying protein sequences. In this paper, we propose...
Article
The comparison of DNA sequences is of great significance in genomics analysis. Although the traditional multiple sequence alignment (MSA) method is popularly used for evolutionary analysis, optimally aligning k sequences becomes computationally intractable when k increases due to the intrinsic computational complexity of MSA. Despite numerous k-mer...
Article
Full-text available
A comprehensive description of human genomes is essential for understanding human evolution and relationships between modern populations. However, most published literature focuses on local alignment comparison of several genes rather than the complete evolutionary record of individual genomes. Combining with data from the 1,000 Genomes Project, we...
Article
Full-text available
Mutation is the driving force of species evolution, which may change the genetic information of organisms and obtain selective competitive advantages to adapt to environmental changes. It may change the structure or function of translated proteins, and cause abnormal cell operation, a variety of diseases and even cancer. Therefore, it is particular...
Article
Full-text available
Let (V, 0) be an isolated hypersurface singularity. We introduce a series of new derivation Lie algebras L k (V) associated to (V, 0). Its dimension is denoted as λ k (V). The L k (V) is a generalization of the Yau algebra L(V) and L 0 (V) = L(V). These numbers λ k (V) are new numerical analytic invariants of an isolated hypersurface singu-larity....
Article
Full-text available
Let (V, 0) = {(z 1 , · · · , z n) ∈ C n : f (z 1 , · · · , z n) = 0} be an isolated hypersurface singularity with mult(f) = m. Let J k (f) be the ideal generated by all k-th order partial derivative of f. For 1 ≤ k ≤ m − 1, the new object L k (V) is defined to be the Lie algebra of derivations of the new k-th local algebra M k (V), where M k (V) :=...
Article
It remains challenging how to find existing but undiscovered genome sequence mutations or predict potential genome sequence mutations based on real sequence data. Motivated by this, we develop approaches to detect new, undiscovered genome sequences. Because discovering new genome sequences through biological experiments is resource-intensive, we wa...
Article
Full-text available
Finite dimensional Lie algebras are semi-direct product of the semi-simple Lie algebras and solvable Lie algebras. Brieskorn gave the connection between simple Lie algebras and simple singularities. Simple Lie algebras have been well understood, but not the solvable (nilpotent) Lie algebras. It is extremely important to establish connections betwee...
Conference Paper
Full-text available
Finite dimensional Lie algebras are semi-direct product of the semi-simple Lie algebras and solvable Lie algebras. Brieskorn gave the connection between simple Lie algebras and simple singularities. Simple Lie algebras have been well understood, but not the solvable (nilpotent) Lie algebras. It is extremely important to establish connections betwee...
Article
Full-text available
Bacterial evolution is an important study field, biological sequences are often used to construct phylogenetic relationships. Multiple sequence alignment is very time-consuming and cannot deal with large scales of bacterial genome sequences in a reasonable time. Hence, a new mathematical method, joining density vector method, is proposed to cluster...
Preprint
Full-text available
Background Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins, such as...
Article
Let R = C [ x 1 , x 2 , … , x n ] / ( f 1 , … , f m ) R= {\Bbb C}[x_1,x_2,\ldots , x_n]/(f_1,\ldots , f_m) be a positively graded Artinian algebra. A long-standing conjecture in algebraic geometry, differential geometry, and rational homotopy theory is the non-existence of negative weight derivations on R R . Alexsandrov conjectured that there are...
Preprint
Next-generation sequencing technology enables routine detection of bacterial pathogens for clinical diagnostics and genetic research. Whole genome sequencing has been of importance in the epidemiologic analysis of bacterial pathogens. However, few whole genome sequencing-based genotyping pipelines are available for practical applications. Here, we...
Preprint
Full-text available
In 1997, Shor and Laflamme defined the weight enumerators for quantum error-correcting codes and derived a MacWilliams identity. We extend their work by introducing our double weight enumerators and complete weight enumerators. The MacWilliams identities for these enumerators can be obtained similarly. With the help of MacWilliams identities, we ob...
Article
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis...
Article
Full-text available
It is well known that the geometric genus and multiplicity are two important invariants for isolated singularities. In this paper we give a sharp lower estimate of the geometric genus in terms of the multiplicity for isolated hypersurface singularities. In 1971, Zariski asked whether the multiplicity of an isolated hypersurface singularity depends...
Article
Comparing DNA and protein sequence groups plays an important role in biological evolutionary relationship research. Despite the many methods available for sequence comparison, only a few can be used for group comparison. In this study, we propose a novel approach using convex hulls. We use statistical information contained within the sequences to r...
Article
We classify three fold isolated quotient Gorenstein singularity $C^3/G$. These singularities are rigid, i.e. there is no non-trivial deformation, and we conjecture that they define 4d $\mathcal{N}=2$ SCFTs which do not have a Coulomb branch.
Article
Full-text available
Prochlorococcus marinus, one of the most abundant marine cyanobacteria in the global ocean, is classified into low-light (LL) and high-light (HL) adapted ecotypes. These two adapted ecotypes differ in their ecophysiological characteristics, especially whether adapted for growth at high-light or low-light intensities. However, some evolutionary rela...
Article
Full-text available
With sharp increasing in biological sequences, the traditional sequence alignment methods become unsuitable and infeasible. It motivates a surge of fast alignment-free techniques for sequence analysis. Among these methods, many sorts of feature vector methods are established and applied to reconstruction of species phylogeny. The vectors basically...
Article
Ever since the technique of Kalman filter was popularized, there has been an intense interest in finding new classes of finite dimensional recursive filters. In the late seventies, the idea of using estimation algebra to construct finite-dimensional nonlinear filters was first proposed by Brockett and Mitter independently. It has been proven to be...
Article
Classification of protein are crucial topics in biology. The number of protein sequences stored in databases increases sharply in the past decade. Traditionally, comparison of protein sequences is usually carried out through multiple sequence alignment methods. However, these methods may be unsuitable for clustering of protein sequences when gene r...
Data
GenBank access numbers of genes or genomes of Ebola virus, influenza virus and E.coli. (XLSX)
Article
Full-text available
Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent devel...
Data
Fourier transform of VP24 protein of Ebola virus. (PDF)
Data
Phylogenetic analysis of proteins in Ebola virus. (PDF)
Article
Full-text available
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify...
Article
For all known finite dimensional filters, one always assumes that the observation terms be degree one polynomials. However, in practice, the observation terms may be nonlinear, e.g. tracking problems. In this paper, we consider the Yau filtering system ( ∂fj∂xi-∂fi∂xj=cij is constant for all i, j) with nonlinear observation terms and arbitrary ini...
Article
In this paper we study the structure of finite dimensional estimation algebras with state dimension 3 and rank 2 arising from a nonlinear filtering system by using the theories of the Euler operator and underdetermined partial differential equations. The structure of the Wong -matrix is shown to be linear. The fundamental strategy we use in this pa...
Article
Zika virus (ZIKV) is a mosquito-borne flavivirus. It was first isolated from Uganda in 1947 and has become an emergent event since 2007. However, because of the inconsistency of alignment methods, the evolution of ZIKV remains poorly understood. In this study, we first use the complete protein and an alignment-free method to build a phylogenetic tr...
Article
In this paper, we construct a new suboptimal filter by deriving the Ito’s stochastic differential equations of the estimation of higher order central moments, satisfy, and impose some conditions to form a closed system. The essentially infinite-dimensional cubic sensor problem has been investigated in detail numerically to illustrate the reasonable...
Article
Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it...
Article
Let R=C[x1,x2,...,zn]/(f) where f is a weighted homogeneous polynomial defining an isolated singularity at the origin. Then R, and Der(R), the Lie algebra of derivations on R, are graded. It is well-known that Der(R) has no negatively graded component [10]. J. Wahl conjectured that the above fact is still true in higher codimensional case provided...
Article
Full-text available
Detailed studies of four dimensional N=2 superconformal field theories (SCFT) defined by isolated complete intersection singularities are performed: we compute the Coulomb branch spectrum, Seiberg-Witten solutions and central charges. Most of our theories have exactly marginal deformations and we identify the weakly coupled gauge theory description...
Article
We classify three dimensional isolated weighted homogeneous rational complete intersection singularities, which define many new four dimensional N=2 superconformal field theories. We also determine the mini-versal deformation of these singularities, and therefore solve the Coulomb branch spectrum and Seiberg-Witten solution.
Article
The free-living SAR11 clade is a globally abundant group of oceanic Alphaproteobacteria, with small genome sizes and rich genomic A+T content. However, the taxonomy of SAR11 has become controversial recently. Some researchers argue that the position of SAR11 is a sister group to Rickettsiales. Other reseachers advocate that SAR11 is located within...
Article
Because of its importance in number theory and singularity theory, the problem of finding a polynomial sharp upper estimate of the number of positive integral points in an n-dimensional (n≥3) polyhedron has received attention by a lot of mathematicians. The first named author proposed the Number Theoretic Conjecture for the upper estimate. The prev...
Article
Due to vast sequence divergence among different viral groups, sequence alignment is not directly applicable to genome-wide comparative analysis of viruses. More and more attention has been paid to alignment-free methods for whole genome comparison and phylogenetic tree reconstruction. Among alignment-free methods, the recently proposed “Natural Vec...
Article
Full-text available
Let X be a compact connected strongly pseudoconvex CR manifold of real dimension (Formula presented.) in (Formula presented.). It has been an interesting question to find an intrinsic smoothness criteria for the complex Plateau problem. For (Formula presented.) and (Formula presented.), Yau found a necessary and sufficient condition for the interio...
Article
Let X 1 and X 2 be two compact connected strongly pseudoconvex embeddable Cauchy-Riemann (CR) manifolds of dimensions 2m - 1 and 2n - 1 in ℂm+1 and ℂn+1, respectively. We introduce the Thom-Sebastiani sum X = X 1⊕X 2 which is a new compact connected strongly pseudoconvex embeddable CR manifold of dimension 2m+2n+1 in ℂm+n+2. Thus the set of all cod...
Article
Let V be a hypersurface with an isolated singularity at the origin defined by the function of f: (Cⁿ, 0) → (C, 0). Let L(V) be the Lie algebra of derivations of the moduli algebra A(V):= C{x1, …, xn}/(f, ∂f/∂x1, …, ∂f/∂xn). It is known that L(V) is a finite dimensional solvable Lie algebra ([Ya1], [Ya2]). L(V) is called the Yau algebra of V in [Yu]...
Article
In this paper we derive the stochastic differentials of the conditional central moments of the nonlinear filtering problems, especially those of the polynomial filtering problem, and develop a novel suboptimal method by solving this evolution equation. The basic idea is to augment the state of the original nonlinear system by including the original...
Article
Full-text available
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphic...
Article
Full-text available
Let p be normal singularity of the 2-dimensional Stein space V. Let π: M → V be a minimal good resolution of V, such that the irreducible components Ai of A = π-1(p) are nonsingular and have only normal crossings. Associated to A is weighted dual graph Γ which, along with the genera of the Ai, fully describes the topology and differentiable structu...
Article
Inspired by Durfee Conjecture in singularity theory, Yau formulated the Yau number theoretic conjecture (see Conjecture 1.3) which gives a sharp polynomial upper bound of the number of positive integral points in an n-dimensional (n ≥ 3) polyhedron. It is well known that getting the estimate of integral points in the polyhedron is equivalent to get...
Article
According to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viru...
Article
Full-text available
What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of e...
Article
Let X be a nonsingular projective variety in CPn-1. Then the cone over X in Cn is an affine variety V with an isolated singularity at the origin. It is a very natural and important question to ask when an affine variety with an isolated singularity at the origin is a cone over nonsingular projective variety. This problem is very hard in general. In...
Chapter
By a beautiful theorem of Harvey and Lawson, strongly pseudoconvex connected compact embeddable CR manifolds are the boundaries of subvarieties in \(\mathbb{C}^{N}\) with only normal isolated singularities. This leads to a natural question of how to determine the properties of the interior singularities from the CR manifolds and vice versa. In this...
Article
A time-dependent Hermite-Galerkin spectral method (THGSM) is investigated in this paper for the nonlinear convection-diffusion equations in the unbounded domains. The time-dependent scaling factor and translating factor are introduced in the definition of the generalized Hermite functions (GHF). As a consequence, the THGSM based on these GHF has ma...
Article
The estimate of integral points in right-angled simplices has many applications in number theory, complex geometry, toric variety and tropical geometry. In [24], [25], [27], the second author and other coworkers gave a sharp upper estimate that counts the number of positive integral points in n dimensional () real right-angled simplices with vertic...
Article
Based on the k-mer model for genetic sequence, a k-mer sparse matrix representation is proposed to denote the types and sites of k-mers appearing in a genetic sequence, and there exists a one-to-one relationship between a genetic sequence and its associated k-mer sparse matrix. With the singular value decomposition of the k-mer sparse matrix, the k...
Article
Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and...
Article
Full-text available
Intron-containing and intronless genes have different biological properties and statistical characteristics. Here we propose a new computational method to distinguish between intron-containing and intronless gene sequences. Seven feature parameters [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see tex...
Article
Full-text available
Multiple sequence alignments (MSA) is a prominent method for classification of DNA sequences, yet it is hampered with inherent limitations in computational complexity. Alignment-free methods have been developed over past decade for more efficient comparison and classification of DNA sequences than MSA. However, most alignment-free methods may lose...
Article
Strongly pseudoconvex CR manifolds are boundaries of Stein varieties with isolated normal singularities. We introduce a series of new invariant plurigenera delta m, m is an element of Z(+) for a strongly pseudoconvex CR manifold. The main purpose of this paper is to present the following result: Let X-1 and X-2 be two compact strongly pseudoconvex...
Article
Full-text available
The singular parabolic problem $u_t-\triangle u=\lambda{\frac{1+\delta|\nabla u|^2}{(1-u)^2}}$ on a bounded domain $\Omega$ of $\mathbb{R}^n$ with Dirichlet boundary condition, models the Microelectromechanical systems (MEMS) device with fringing field. In this paper, we focus on the quenching behavior of the solution to this equation. We first sho...
Article
It is well known that getting an estimate of the number of integral points in right-angled simplices is equivalent to getting an estimate of the Dickman-de Bruijn function psi(x, y) which is the number of positive integers <= x and free of prime factors > y. Motivated by the Yau Geometric Conjecture, the third author formulated a number-theoretic c...
Article
Using the theory of the mixed Hodge structure one can define a notion of exponents of a singularity. In 2000, Hertling proposed a conjecture about the variance of the exponents of a singularity. Here, we prove that the Hertling conjecture is true for isolated surface singularities with modality <= 2.
Article
In this paper, we show two splitting criteria for vector bundles on complex projective spaces by analytic method. We also prove a splitting criterion for reflexive sheaves on Horrocks schemes by algebraic method.
Article
It is well known that geometric genus p g and irregularity q are two important invariants for isolated singularities. In this paper, we give a formula relating p g and q for isolated singularities with ℂ * -action in any dimension. We also give a simple characterization of the quasi-homogeneous isolated complete intersection singularities using p g...
Article
Full-text available
It is well-known that sparse grid algorithm has been widely accepted as an efficient tool to overcome the "curse of dimensionality" in some degree. In this note, we first give the error estimate of hyperbolic cross (HC) approximations with generalized Hermite functions. The exponential convergence in both regular and optimized hyperbolic cross appr...
Data
Full-text available
This file SI contains: (1) Dataset, including Table S1–S3; (2) Discussion on cut-off setting, including Table S4–S5; (3) Predictions by our method, including Table S6–S10; (4) Simulated evaluation of (12-dimensional) genome space, including Table S11; (5) Graph descriptions of Baltimore I, II, IV, VI; (6) List of virus information used in this pape...
Article
Characterization of homogeneous polynomials with isolated critical point at the origin follows from a study of complex geometry. Yau previously proposed a Numerical Characterization Conjecture. A step forward in solving this conjecture, the Granville–Lin–Yau Conjecture was formulated, with a sharp estimate that counts the number of positive integra...
Article
Full-text available
In this paper, we investigate the Hermite spectral method (HSM) to numerically solve the forward Kolmogorov equation (FKE). A useful guideline of choosing the scaling factor of the generalized Hermite functions is given in this paper. It greatly improves the resolution of HSM. The convergence rate of HSM to FKE is analyzed in the suitable function...
Article
The problem of classification of finite-dimensional estimation algebras was formally proposed by Brockett in his lecture at International Congress of Mathematicians in 1983. Due to the difficulty of the problem, in the early 1990s Brockett suggested that one should understand the low-dimensional estimation algebras first. In this article, we extend...
Article
Let f V ℂn, 0) →(ℂ,0)/ be a germ of a complex analytic function with an isolated critical point at the origin. Let V ={z ε ℂn : f(z)=0}. A beautiful theorem of Saito [1971] gives a necessary and sufficient condition for V to be defined by a weighted homogeneous polynomial. It is a natural and important question to characterize (up to a biholomorphi...
Article
Full-text available
It is well known that the nonlinear filtering problem has important applications in both military and civil industries. The central problem of nonlinear filtering is to solve the Duncan-Mortensen-Zakai (DMZ) equation in real time and in a memoryless manner. In this paper, we shall extend the algorithm developed previously by S.-T. Yau and the secon...
Article
Let $X$ be a compact connected strongly pseudoconvex $CR$ manifold of real dimension 2n-1 in $\mathbb{C}^{N}$. It has been an interesting question to find an intrinsic smoothness criteria for the complex Plateau problem. For $n\ge 3$ and $N=n+1$, Yau found a necessary and sufficient condition for the interior regularity of the Harvey-Lawson solutio...
Article
We introduce some new invariants for complex manifolds. These invariants measure in some sense how far the complex manifolds are away from having global complex coordinates. For applications, we introduce two new invariants f (1,1) and g (1,1) for isolated surface singularities. We show that f (1,1) =g (1,1) =1 for rational double points and cyclic...
Article
The Durfee conjecture, proposed in 1978, relates two important invariants of isolated hypersurface singularities by a famous inequality; however, the inequality in this conjecture is not sharp. In 1995, Yau announced his conjecture which proposed a sharp inequality. The Yau conjecture characterizes the conditions under which an affine hypersurface...
Article
We prove that BDiff (+)(S-2, {x(1), ... , x(n+1)}) is a K(pi, 1) space, where pi is the mapping class group of an (n+1)- punctured sphere. As a consequence we derive that the center-projecting braid monodromy of a fiber-type projective line arrangement determines the diffeomorphic type of its complement.
Data
Full-text available
Supporting information S1 contains the complete proof of the correspondence theorem, the bootstrapping analysis on A H1N1 genomes, distribution of the distance between each pair of random shuffled genomes under simulation, clustering of the segmented gene PB2, computational time chart of natural vector method, ClustalW2, MUSCLE and MAFFT, and the G...
Article
Let X 1 and X 2 be two compact strongly pseudoconvex CR manifolds of dimension 2n-1≥5 which bound complex varieties V 1 and V 2 with only isolated normal singularities in ℂ N1 and ℂ N2 respectively. Let S 1 and S 2 be the singular sets of V 1 and V 2 respectively and S 2 is nonempty. If 2n-N 2 -1≥1 and the cardinality of S 1 is less than 2 times th...
Conference Paper
The problem of classification of finite-dimensional estimation algebras was formally proposed by Brockett in his lecture at International Congress of Mathematicians in 1983. Due to the difficulty of the problem, in the early 1990s Brockett suggested that one should understand the low-dimensional estimation algebras first. In this paper, We extend Y...
Article
One of the most fundamental problems in complex geometry is to determine when two bounded domains in Cn are biholomorphically equivalent. Even for complete Reinhardt domains, this fundamental problem remains unsolved completely for many years. Using the Bergmann function theory, we construct an infinite family of numerical invariants from the Bergm...
Data
Feature vector computation in Mathematica. Computation of the feature vector for a single protein in Mathematica. (0.00 MB TXT)
Data
PCA code for Matlab. Principle component analysis code for Matlab. (0.00 MB TXT)
Data
Myoglobin Feature Vectors. This file provides the set of all feature vectors for the Myoglobins in our dataset. (0.17 MB XLS)
Data
Sample Parameter Calculations. This file works through the calculations of the parameters for Alanine in a short, hypothetical protein, and demonstrates the construction of the feature vector for this protein. (0.05 MB PDF)
Data
Protein datasets. Accession numbers and taxonomic information for Protein Kinase C (PKC), Hemoglobin and Myoglobin dataset. Each protein dataset is provided as a separate worksheet. (0.16 MB XLS)
Data
Protein Kinase C Feature Vectors. This file contains the set of all feature vectors for the PKC proteins in our dataset. (0.16 MB XLS)
Data
Feature vectors computation in C++. Computation of the feature vectors for a protein data set in C++. (0.01 MB TXT)
Data
Hemoglobin Feature Vectors. This file provides the set of all feature vectors for the Hemoglobins in our dataset. (1.03 MB XLS)
Article
Local holomorphic De Rham cohomology introduced in this paper and punctured local holomorphic De Rham cohomology intro- duced by Huang-Luk-Yau are two important local invariants for vari- eties with isolated singularities. We find some relations between these two invariants and the invariants defined by Steenbrink on surface singu- larities, and fr...