Ya-Ping Zhang’s research while affiliated with Nantong University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (735)


Statistics of the SEA3K genomic variants
a, Geographic locations of the sampled MSEA populations. The size of the circle indicates the size of the sampled population. The pie charts indicate the languages spoken in each population. The Malay population is from ref. ¹⁶. b, The novel SNVs and indels identified in the SEA3K dataset. c, The counts of genomic variants of all autosomal regions categorized by allele frequency. Singleton, allele count = 1; doubleton, allele count = 2; rare, AF ≤ 0.01; common, 0.01 < AF ≤ 0.2; very common, AF > 0.2. d, Counts of small variants (SNVs and indels) per genome among the 30 MSEA populations (Supplementary Table 2). In box plots, the box edges delineate first to third quartiles, the horizontal line indicates the median and whiskers extend from the quartiles to the minimum or maximum value. e, Evaluation of imputation performance using the SEA3K reference panel. Comparison of the fold change (FC) of the imputation error rates for the worldwide populations using either the SEA3K panel or the 1KGP panel. The results of additional comparisons with other panels are shown in Supplementary Fig. 2. f, The imputation accuracy evaluation using squared Pearson correlation coefficient (RPcc2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({R}_{{\rm{Pcc}}}^{2})$$\end{document} indicates better performance of the SEA3K panel compared with the other panels. The sample size of each panel is indicated. The MEGA panel combines the SEA3K, 1KGP and SG10K datasets. The target population is generated from MSEA populations by randomly sub-sampling 300 individuals from the 2,183 unrelated SEA3K individuals, and the remaining 1,883 samples were used to conduct haplotype phasing (Methods).
SV discovery based on the long-read genome data of 37 MSEA individuals
a, Geographic locations of the 40 selected MSEA individuals for long-read sequencing. The pie chart indicates the composition of the languages spoken in each population (see Fig. 1a). b, Broken-line graph showing the assembly contiguity of the 74 partially phased and assembled haplotypes. The contigs of the two reference genomes (T2T-CHM13 and GRCh38 (black lines)) are included for comparison. N50, the shortest contig length that needs to be included to cover 50% of the genome. c, Sketch diagram of SV discovery. Read-mapping-based (top) and assembly-based (bottom) calling were conducted to call simple SVs and complex SVs using Sniffles2, PAV and SVision-pro. d, Counts of each SV class in MSEA populations, including deletion (DEL), insertion (INS), duplication (DUP), inversion (INV) and complex SV (CSV). e, Allele frequency distribution of the identified SVs in MSEA populations. SVs were classified into 4 categories by frequency: all shared (identified in all samples), major (identified in at least 50% of samples), polymorphic (identified in more than 1 sample), and singleton (identified in only 1 sample) SVs. f, Counts of SVs per genome of the 37 MSEA individuals. The mean counts of each SV type are shown. g, Size distribution of the four SV classes in MSEA. There are 2 peaks, indicating the mobile elements of Alu (200–300 bp) and LINE-1 (5–6 kb), respectively. h, Functional annotation of the identified simple SVs. i, Intersection of the SV sets (insertions and deletions) between the known SVs (union of HGSVC3 and HPRC) and those identified in MSEA. The overlap cutoff is set to require more than 50% reciprocal overlapping of SV length. There are 24,622 SVs that are not reported in the known sets, and they are likely to represent novel SVs that occurred in MSEA populations.
Genetic structure and population history of MSEA populations
a, Procrustes-transformed PCA plot indicating a high similarity between the genetic divergence and the geographic distance of the 2,183 unrelated MSEA individuals. Procrustes similarity statistic t0 = 0.548, P = 1.2 × 10⁻⁶ based on 100,000 permutations (one-sided permutation test), and the rotation angle of the PCA map is θ = 16.11°. Principal component 1 (PC1) and PC2 are indicated by the dotted lines, which account for 1.02% and 0.53% of the total variance, respectively. Colours indicate the newly sequenced populations in this study. Population abbreviations are described in Supplementary Table 2. The shaded backgrounds indicate the country origins of the samples. ISEA, Island Southeast Asia; KHV, Kinh in Ho Chi Minh City, Vietnam. b, Maximum-likelihood tree indicates the genetic relationship among the MSEA (highlighted in red) and other representative global populations, including representative Asian samples from 1KGP, HGDP, SGDP, Malay and Tibetan populations. The admixture plot shows the genetic components of the SEA3K and reference populations (K = 8). c, Comparison of LD decay cross populations. Five superpopulations (including two representative populations per superpopulation) from 1KGP are taken as reference. AFR, African; AMR, American; EAS, East Asian; EUR, European; SAS, South Asian. d, Distribution of the ROH segments ordered and coloured by size classes (short, medium and long), indicating a shift in ROH class boundaries towards longer ROHs for the medium and long classes in the SEA3K populations in comparison to the 1KGP populations (each two representative populations of five super-groups were chosen for comparison, see Supplementary Table 9). K, thousand; M, million. e, The inferred historical changes of effective population sizes (Ne) over time, indicating varied population dynamics among the MSEA populations. The pairwise sequentially MSMC2 method was used to estimate effective population size. The stratified population version and the results of SMC++ are shown in Extended Data Fig. 6.
Genomic signals of positive selection in MSEA populations
a, Manhattan plot of the CMS scores of the genome-wide SNVs in MSEA populations. The lead genes in the top-signal regions are labelled and the newly reported genes of this study are highlighted in red. The signals in each population are shown as dot plots at the bottom. Only the signals falling in the top 0.01% of genome-wide CMS scores in each population are shown, and the dot size denotes the P value of the CMS score. Statistical significance was assessed using χ² test. CH, China; CM, Cambodia; LA, Laos; MY, Myanmar; TH, Thailand; VI, Vietnam. b, The most divergent region under selection on chr. 1, covering the FLG gene. The Manhattan plots show the DAF differences of the PS-SNVs between MSEA and other populations from 1KGP. A newly found missense variant (rs143280202) in FLG is labelled. Tracks for candidate cis-regulatory elements (cCREs) and H3K27Ac combined from all cell types of ENCODE are shown. c, The DAF distribution of rs143280202 in world populations with a regional enrichment in MSEA. d, Dot plot showing SVs enriched in the MSEA dataset. The 785 candidate SVs are highlighted. The top MSEA-specifically enriched SV, a 7,439-bp deletion in the PEX14 gene is labelled with a purple circle. e, Comparison of allele frequencies of the 7,439-bp deletion shows regional enrichment in MSEA (blue) compared with the HGDP populations. CSA, central south Asia; ME, Middle East; OC, Oceania. f, Schematic map indicating the genomic location (top) and read coverage (bottom) of the 7,439-bp deletion and its flanking sequences in MSEA populations. The overlap between this deletion and regulatory elements are indicated. TFBS, transcription factor binding site.
The landscape of archaic introgression in MSEA populations
a, The average amount of the detected introgressed sequences per individual in a population, categorized by their affinity to the Altai Neanderthal and the Altai Denisovan genomes. Only two representative populations in each super-group of 1KGP are shown as controls in the plot, and the results of all populations in 1KGP are presented in Supplementary Table 15. PJL, Punjabi in Lahore, Pakistan; PUR, Puerto Rican in Puerto Rico; TSI, Toscani in Italia. b, Contour density plots of the match proportions of the introgressed segments to the Altai Neanderthal and the Altai Denisovan genomes. Gradient colours indicate the height of the density corresponding to each contour line. One (bottom) or two (top) pulses of Denisovan introgression in each population are labelled. c, Comparison of frequencies of the adaptive introgression segments detected in MSEA and 1KGP populations. There is a Neanderthal-introgressed region on chr. 11 (approximately 150 kb), containing the ARHGEF12 and TLCD5 genes. Each row displays the adaptive introgression segments (horizontal lines) in a population over the genomic region, and the frequencies of introgression are listed in the right column. The 11 Neanderthal-derived PS-SNVs are denoted with red rhombuses. The genomic position is denoted according to GRCh37. d, Hierarchical clustering of haplotypes spanning the adaptive introgression region on chr. 11. Rows illustrate individual haplotypes. Columns indicate the genotypes of 87 variants with selection signals, and the 11 Neanderthal-derived PS-SNVs are denoted with red rhombuses. Grey and black represent ancestral and derived alleles, respectively. NEA, Neanderthal.
Genome diversity and signatures of natural selection in mainland Southeast Asia
  • Article
  • Publisher preview available

May 2025

·

368 Reads

·

1 Citation

Nature

·

·

·

[...]

·

Mainland Southeast Asia (MSEA) has rich ethnic and cultural diversity with a population of nearly 300 million1,2. However, people from MSEA are underrepresented in the current human genomic databases. Here we present the SEA3K genome dataset (phase I), generated by deep short-read whole-genome sequencing of 3,023 individuals from 30 MSEA populations, and long-read whole-genome sequencing of 37 representative individuals. We identified 79.59 million small variants and 96,384 structural variants, among which 22.83 million small variants and 24,622 structural variants are unique to this dataset. We observed a high genetic heterogeneity across MSEA populations, reflected by the varied combinations of genetic components. We identified 44 genomic regions with strong signatures of Darwinian positive selection, covering 89 genes involved in varied physiological systems such as physical traits and immune response. Furthermore, we observed varied patterns of archaic Denisovan introgression in MSEA populations, supporting the proposal of at least two distinct instances of Denisovan admixture into modern humans in Asia³. We also detected genomic regions that suggest adaptive archaic introgressions in MSEA populations. The large number of novel genomic variants in MSEA populations highlight the necessity of studying regional populations that can help answer key questions related to prehistory, genetic adaptation and complex diseases.

View access options

Contrasting evolutionary trajectories of terrestrial vertebrates in the Hengduan Mountains hotspot

May 2025

·

156 Reads

National Science Review

The Hengduan Mountains (HDM) harbor the richest temperate diversity in the Northern Hemisphere, yet our understanding of how this exceptionally diverse biota evolved remains obscure. Large-scale historical biogeographic analyses of 851 terrestrial vertebrate species and their relatives (totaling 4862 species) reveal multiple evolutionary pathways formed this biodiversity hotspot. Whereas in situ speciation dominates in amphibians and non-avian reptiles, near-equal in situ speciation and colonization occurs in mammals, and colonization happens primarily in birds. HDM are a ‘cradle’ for neo-endemics and a ‘sink’ receiving surrounding biotas, mostly (>30%) coming from Indo-Malaysia. Orogenesis and monsoon intensification triggered in situ speciation initiated in the early Oligocene and peaking around 7–8 Ma. Analyses of different taxonomic groups reveal contrasting evolutionary processes and how major geo-climatic events override taxon-specific attributes. Results highlight the need to incorporate taxon-specific traits into future conservation planning to effectively address the unique needs and challenges of different groups.



Selection Increases Mitonuclear DNA Discordance but Reconciles Incompatibility in African Cattle

February 2025

·

171 Reads

Molecular Biology and Evolution

Mitochondrial function relies on the coordinated interactions between genes in the mitochondrial (mtDNA) and nuclear genomes. Imperfect interactions following mitonuclear incompatibility may lead to reduced fitness. MtDNA introgressions across species and populations are common and well documented. Various strategies may be expected to reconcile mitonuclear incompatibility in hybrids or admixed individuals. African admixed cattle (Bos taurus × B. indicus) show sex-biased admixture, with taurine (B. taurus) mtDNA and a nuclear genome predominantly of humped zebu (B. indicus). Here, we leveraged local ancestry inference approaches to identify the ancestry and distribution patterns of nuclear functional genes associated with the mitochondrial oxidative phosphorylation process in the genomes of African admixed cattle. We show that most of the nuclear genes involved in mitonuclear interactions are under selection and of humped zebu ancestry. Variation in mtDNA copy number (mtDNA-CN) may have contributed to the recovery of optimal mitochondrial function following admixture with the regulation of gene expression, alleviating or nullifying mitochondrial dysfunction. Interestingly, some nuclear mitochondrial genes with enrichment in taurine ancestry may have originated from ancient African aurochs (B. primigenius africanus) introgression. They may have contributed to the local adaptation of African cattle to pathogen burdens. Our study provides further support and new evidence showing that the successful settlement of cattle across the continent was a complex mechanism involving adaptive introgression, mtDNA-CN variation, regulation of gene expression, and selection of ancestral mitochondria-related genes.




The genetic affiliation of Sulawesi feral chicken. (a) The maximum‐likelihood tree constructed with nuclear genomic SNPs. After pruning linkage disequilibrium, a total of 10.62 million SNPs from 742 genomes were analyzed. The node marked with a star indicates the divergence of feral chickens of Sulawesi and domestic chickens of Indonesia, with local support value of likelihood = 1. (b) The mitochondrial DNA (mtDNA) haplogroup tree constructed with mtDNA genomic sequences. The mtDNA sequences for the two Sulawesi samples are the same, so that just one sample is shown as 6. The mtDNA of sample 7 is retrieved from the published data (Wang et al., 2020).
Structural variation (SV) discovery in the genome assembly of Sulawesi feral chicken. (a) Size distributions of SV types. The peak around 6 kb indicates the mobile elements of LINE1. (b) Functional annotation of SVs. (c) Comparison of deletions and insertions detected in feral chickens of Sulawesi with those in 52 global chicken breeds (Ren et al., in press; Rice et al., 2023; Zhang et al., 2022).
Genome sequencing and assembly of feral chickens in the wild of Sulawesi, Indonesia

December 2024

·

87 Reads

The feralization of domestic chicken makes the conservation and management of red jungle fowl (Gallus gallus) more complicated and challenging. We collected two Sulawesi feral chickens, located east of the Wallace Line, for whole‐genome sequencing and de novo genome assembly. Phylogenetic and f4‐statistics analyses indicated that the Sulawesi feralized domestic chickens (G. g. domesticus) received gene flow from G. g. gallus. We integrated ~45× ultra‐long Oxford Nanopore Technology reads and ~28× PacBio HiFi reads to generate a de novo genome assembly of a female Sulawesi feral chicken (GGsula) with a contig N50 of 19.88 Mbp. We characterized structural variations in GGsula, and found some were related to nervous system. Our study provides the first genome assembly of feral chickens, which is a unique genomic resource to explore the process of chicken domestication and feralization.


Two distinct structural variants involving EDN3 cause hyperpigmentation in chicken

December 2024

·

28 Reads

Phenotypic diversity and its genetic basis are central questions in biology, with domesticated animals offering valuable insights due to their rapid evolution the last 10,000 years. In chickens, fibromelanosis (FM) is a striking pigmentation phenotype characterized by hyperpigmentation. A previous study identified a complex structural variant causing upregulated expression of the Endothelin 3 (EDN3) gene. However, the detailed structural arrangement and functional consequences of the variant remained unclear. In this study, we conducted a comprehensive genomic survey of 692 FM chickens representing 55 breeds and uncovered two distinct structural variants causing the FM phenotype: FM*A, the previously reported complex rearrangement involving the EDN3 locus, and FM*B, a tandem duplication of a 16 kb region upstream of EDN3. We demonstrate that both structural variants (SVs) significantly upregulate EDN3 expression, with FM*B associated with even higher expression than FM*A. A luciferase reporter assays showed that the 16 kb region in FM*B contains powerful enhancers and the copy number expansion of this element is a likely explanation for EDN3 upregulation and hyperpigmentation. Furthermore, our analysis of linkage disequilibrium patterns allowed us to resolve the complex arrangement of duplications and inversion on the FM*A haplotype.


Repressor elements provide insights into tissue development and phenotypes in the pig

November 2024

·

36 Reads

动物学研究

Repressor elements significantly influence economically relevant phenotypes in pigs; however, their precise roles and characteristics are inadequately understood. In the present study, we employed H3K27me3 profiling, assay for transposase-accessible chromatin with highthroughput sequencing (ATAC-seq), and RNA sequencing (RNA-seq) data across six tissues derived from three embryonic layers to identify and map 2 034 super repressor elements (SREs) and 22 223 typical repressor elements (TREs) in the pig genome. Notably, many repressor elements were conserved across mesodermal and ectodermal tissues. SREs exhibited tight regulation of their target genes, affecting a limited number of genes within a specific genomic region with pronounced effects, while TREs exerted broader but weaker regulation over a wider range of target genes. Furthermore, in neuronal tissues, genes regulated by repressor elements started to be repressed during the differentiation of stem cells into progenitor cells. Notably, analysis showed that many repressor elements exhibited cooperative and additive effects on the modulation of KLF4 expression. This research provides the first comprehensive map of pig repressor elements, serving as an essential reference for future studies on repressor elements.


Figure 1. Ov ervie w of data content and organization in iDog 2.0.
iDog: a multi-omics resource for canids study

November 2024

·

42 Reads

Nucleic Acids Research

iDog (https://ngdc.cncb.ac.cn/idog/) is a comprehensive public resource for domestic dogs (Canis lupus familiaris) and wild canids, designed to integrate multi-omics data and provide data services for the worldwide canine research community. Notably, iDog 2.0 features a 15-fold increase in genomic samples, including 29.55 million single nucleotide polymorphisms (SNPs) and 16.54 million insertions/deletions (InDels) from 1929 modern samples and 29.09 million SNPs from 111 ancient Canis samples. Additionally, 43487 breed-specific SNPs and 530 disease/trait-associated variants have been identified and integrated. The platform also includes data from 141 BioProjects involving gene expression analyses and a single-cell transcriptome module containing data from 105 057 Beagle hippocampus cells. iDog 2.0 also includes an epignome module that evaluates DNA methylation patterns across 547 samples and chromatin accessibility across 87 samples for the analysis of gene expression regulation. Additionally, it provies phenotypic data for 897 dog diseases, 3207 genotype-to-phenotype (G2P) pairs, and 349 dog disease-associated genes, along with two newly constructed ontologies for breed and disease standardization. Finally, 13 new analytical tools have been added. Given these enhancements, the updated iDog 2.0 is an invaluable resource for the global cannie research community.


Citations (61)


... The fact that EDN3 shows a highly significant upregulation at the mRNA level in skin from FM chickens strongly suggested that this is the causal mechanism for the massive expansion of pigment cells in FM chicken 2 . It was not possible to resolve the organization of the complex rearrangement with the short-read whole genome sequence data available a decade ago, even with the structural variation detection tools widely used today 7 , and three possible configurations of the complex structural rearrangement (Fig. 1) were established based on PCR analysis of breakpoint regions 2 . A single recombinant found in a pedigree segregating for the FM mutation was only consistent with one of the possible configurations denoted FM-2 2 . ...

Reference:

Population genomic analysis identifies the complex structural variation at the fibromelanosis (FM) locus in chicken
Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data

BMC Genomics

... Despite extensive studies, significant gaps remain in our understanding of Drosophila diversity, especially in the tropical regions known for high species richness and endemism (Katoh et al. 2024). These gaps include the limited molecular data available for many species, the incomplete resolution of phylogenetic relationships, and the underrepresentation of some lineages in tropical ecosystems. ...

Molecular phylogeny and species diversity of the genus Dichaetophora Duda and related taxa (Diptera: Drosophilidae)
  • Citing Article
  • September 2024

Molecular Phylogenetics and Evolution

... Amphibians are the most vulnerable taxonomic group facing imminent extinction risk compared to other terrestrial vertebrates, largely due to their limited dispersal abilities and specialised habitat requirements (Daru et al. 2020;Paúl et al. 2023;Xu et al. 2024). Over 41% of extant amphibian species worldwide are severely threatened with extinction (Hoffmann et al. 2010;Catenazzi 2015;Mi et al. 2022). ...

Hidden hotspots of amphibian biodiversity in China

Proceedings of the National Academy of Sciences

... They concluded that demic diffusion significantly shaped the genetic makeup of the Han people [23]. Additional research using low-density mtDNA/Y-chromosome markers has shed light on the peopling process of the Tibetan Plateau and the expansion of Tibeto-Burman-speaking populations [24][25][26][27][28]. The accumulation of Y-chromosome sequence data offers significant potential to enhance our understanding of human genetic history, benefiting a range of fields from evolutionary studies to forensic genetics [6,7,[29][30][31]. ...

Sex-biased adaptation shapes uniparental gene pools in Tibetans
  • Citing Article
  • February 2024

Science China. Life sciences

... Our assessment of potential causal loci found 2 genes previously linked to plumage or pigmentation in past studies. One of these genes, TRPM7, belongs to the same family as TRPM1, which has been associated with feather pigmentation in domesticated chickens (Gu et al. 2024). More notably, PCSK5 has been directly correlated with variation in melanin levels in wild birds and is involved in melanin production pathways (San-Jose et al. 2017). ...

Genomic insights into local adaptation and phenotypic diversity of Wenchang chickens

Poultry Science

... Considering the importance of social interaction, animal models are essential for investigating the neural mechanisms and brain regions involved in social behaviors and disorders such as autism spectrum disorder (ASD) 2 . The three-chamber test is widely used assay to evaluate social behaviors across species, including mice, dogs, and zebrafish [3][4][5] . In the first phase, the animal explores a central chamber and two adjacent chambers-one containing an unfamiliar conspecific and the other empty-to evaluate sociability. ...

Modeling SHANK3-associated autism spectrum disorder in Beagle dogs via CRISPR/Cas9 gene editing

Molecular Psychiatry

... The amplified products were visualized on 1% agarose gels; purified using an Isolate II PCR and Gel Kit (Bioline), and then sequenced in both directions using Sanger sequencing (DNA SEQUENCING, Vietnam). Macey et al. (1997) The new nucleotide sequences of ND2 and COI were inspected by Chromatogram 2.6 (Fairuz-Fozi et al., 2020) and then combined with all available sequences of the Cyrtodactylus intermedius group from Grismer et al. (2020), , Nguyen et al. (2013), Nguyen et al. (2014), Brennan et al. (2017), andNgo et al. (2022). The dataset of sequences was aligned using MUSCLE (Edgar, 2004) integrated with default parameters in MEGA11 (Tamura et al., 2020). ...

Phylogeny of the Cyrtodactylus irregularis species complex (Squamata: Gekkonidae) from Vietnam with the description of two new species
  • Citing Article
  • October 2023

Zootaxa

... Sequential detection of cTnI and ATP was achieved by de-hybridizing the ds-DNA polymer to release ECL probes into the solution. Zhao et al. 95 designed a "peptide-cTnI-Ab 1 " sandwich biosensor, using polyaniline-coated cerium oxide loaded with GNPs (Au@PANI@CeO 2 ) as a substrate to afford Ab 1 , and PtPd@ZrN@COF loaded with the affinity peptide and probe TB for signal amplification. As shown in Figure 7B, after the cTnI/Ab 1 /Au@PANI@CeO 2modified electrode was incubated with the probe, the current response increased sharply due to the excellent conductivity of PtPd NFs and the electrocatalytic ability of ZrN@COF for the degradation to TB. ...

Electrochemical/colorimetric dual-mode sensing strategy for cardiac troponin I detection based on zirconium nitride functionalized covalent organic frameworks
  • Citing Article
  • September 2023

Sensors and Actuators B Chemical

... In parallel, these 79 protein-coding and splicing variants were compared with 1987 genomes from the Dog10K project (Meadows et al., 2023) to identify unique variants that are not present in the overall dog population. From this analysis, 26 variants were private to the family (Table S2). ...

Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture

Genome Biology

... The potential of miRNAs as therapeutic targets for COPD treatment has a broad spectrum of application prospects, underscoring their significance in the medical field. 83,84 The establishment of a core gene TF-miRNA-mRNA network revealed that the transcription factor CREB1 could reduce susceptibility to COPD, 85 hsa-miR-543 can regulate the progression of COPD by targeting IL33, 86 and hsa-miR-181c 87 plays a significant role in influencing the inflammatory response, neutrophil infiltration, reactive oxygen species production, and inflammatory factors in COPD. MiR-146a-5p, through the negative modulation of IL1A, can provoke the induction of IL8, thus leading to persistent inflammatory conditions in the pulmonary regions affected by COPD 88 . ...

Remote regulation of rs80245547 and rs72673891 mediated by transcription factors C-Jun and CREB1 affect GSTCD expression

iScience