Available via license: CC BY 4.0
Content may be subject to copyright.
Unexpected clustering pattern in dwarf galaxies chal-
lenges formation models
Ziwen Zhang 1,2, Yangyao Chen 1,2, Yu Rong 1,2, Huiyuan Wang 1,2, Houjun Mo 3, Xiong
Luo 1,2& Hao Li 1,2
1Department of Astronomy, University of Science and Technology of China, Hefei, Anhui 230026,
China
2School of Astronomy and Space Science, University of Science and Technology of China, Hefei
230026, China
3Department of Astronomy, University of Massachusetts, Amherst MA 01003-9305, USA
The galaxy correlation function serves as a fundamental tool for studying cosmology, galaxy
formation, and the nature of dark matter. It is well established that more massive, redder
and more compact galaxies tend to have stronger clustering in space1,2. These results can
be understood in terms of galaxy formation in Cold Dark Matter (CDM) halos of different
mass and assembly history. Here, we report an unexpectedly strong large-scale clustering for
isolated, diffuse and blue dwarf galaxies, comparable to that seen for massive galaxy groups
but much stronger than that expected from their halo mass. Our analysis indicates that the
strong clustering aligns with the halo assembly bias seen in simulations3with the standard
ΛCDM cosmology only if more diffuse dwarfs formed in low-mass halos of older ages. This
pattern is not reproduced by existing models of galaxy evolution in a ΛCDM framework 4–6,
and our finding provides new clues for the search of more viable models. Our results can
be explained well by assuming self-interacting dark matter7, suggesting that such a scenario
should be considered seriously.
1
arXiv:2504.03305v1 [astro-ph.CO] 4 Apr 2025
Our dwarf galaxies are selected from the New York University Value Added Galaxy Catalog
sample8of the Sloan Digital Sky Survey (SDSS) DR79. We only consider isolated dwarfs, defined
as the centrals of galaxy groups10, to avoid complications by satellite galaxies in interpreting our
results. We also excluded dwarfs with red color and large S´
ersic index, so that we can focus on
“late-type” galaxies which have so far been believed to form late and to have weak clustering in
space. The dwarfs are divided into four samples according to their surface mass density (Σ∗).
We then calculated the projected two-point cross-correlation functions (2PCCFs; see Methods),
with results shown in Fig. 1a, and derived the relative bias defined as the ratio of the 2PCCF of a
sample with that of compact (highest-Σ∗) dwarfs. The relative bias as a function of Σ∗plotted in
Fig. 1b shows clearly that the bias increases with decreasing Σ∗, contrary to common belief. For
the lowest-Σ∗dwarfs (diffuse dwarfs), which are similar to ultra-diffuse galaxies (UDGs) defined
in the literature11, the relative bias is 2.31+0.20
−0.19, indicating a dependence on Σ∗at about 7σlevel.
For the second-lowest Σ∗sample, the relative bias is 1.49+0.10
−0.11, demonstrating that the decline with
Σ∗is over the entire range of Σ∗covered by our sample.
We used various tests to assess the reliability of our findings against effects of sample incom-
pleteness, cosmic variance, satellite contamination, and uncertainties in measurements of galaxy
properties (see Methods). We found that the incompleteness is mainly in M∗and marginally
in color and Σ∗. Dividing our sample into two sub-samples with different M∗ranges, we ob-
served no notable difference in the bias-Σ∗relation between the two. Massive dwarfs with 8.5<
log M∗/M⊙<9at z≤0.04 are much more complete than the total population (the main sam-
ple), and their results, shown in Fig. 1b for comparison with the main sample, indicate clearly that
the incompleteness does not change the outcome significantly, as is expected when selection ef-
fects are independent of the large-scale structure. Dividing the total sample into two sub-volumes
either according to sky coverage or redshift gives similar results, demonstrating that the cosmic
variance does not change our conclusion. Stronger clustering would be anticipated if the diffuse
dwarf sample were significantly affected by satellites in massive groups/clusters of galaxies. This
possibility is conclusively negated by examining the satellites’ contribution, which was found to
only increase small-scale correlation but have little effects on large scales where the relative bias is
measured. Finally, uncertainties in galaxy-property measurements are not known to be correlated
with large-scale structures and thus can only reduce the difference between samples, implying that
2
the true correlation between the bias and Σ∗is even stronger than what is estimated from the data.
All these demonstrate that our results are robust against observational effects.
Massive halos are known to be clustered more strongly than low-mass halos on average12. It
is thus interesting to check whether the difference in clustering between the diffuse and compact
dwarfs is caused by a difference in halo mass. Here, we present halo mass measurements using
two different methods (see Methods). The first is based on the rotational velocity traced by HI-
emission lines13. The median halo masses for the diffuse and compact dwarfs with HI detections
are 1010.38M⊙and 1010.85M⊙, respectively. The second is based on the assumption that different
subsets of the dwarf population obey the same stellar mass-halo mass relation (SHMR)14, and
the median halo masses for diffuse and compact dwarfs obtained in this way are 1010.83M⊙and
1011.01M⊙, respectively. The two methods give a consistent result that both diffuse and compact
dwarfs have comparable halo masses. The halo bias model15 predicts a bias ratio of 0.94 and 0.99
between the diffuse and compact dwarfs using halo masses given by the HI kinematics and the
SHMR, respectively. Even though the uncertainty in the halo mass is large, the uncertainty in the
predicted bias ratio is very small (less than or equal to 0.02), because the average bias depends
very weakly on halo mass in the low-mass end12,15. Indeed, even we use the upper bound in the
scatter of the halo masses (Mh= 1011.5M⊙) for diffuse dwarfs and the lower bound (1010.0M⊙)
for compact dwarfs, the predicted relative bias is only about 1.14, much lower than the observed
value ∼2.31, indicating that the difference in clustering between the diffuse and compact samples
cannot be explained by the difference in their halo masses (see Fig. 1c).
The clustering of galaxy groups aligns with the halo bias model and simulation predictions16,
making it a reliable reference for the absolute clustering strength of dwarf galaxies. Fig. 1a shows
that, on scales rp∼0.1h−1Mpc, the correlation functions for diffuse and compact dwarfs are
similar and considerably lower than that for groups with Mh∼1011.5M⊙. Since the small-scale
clustering is sensitive to halo mass, the result suggests that both diffuse and compact dwarfs in-
habit halos with masses below 1011.5M⊙, consistent with the halo mass estimates shown above.
However, diffuse dwarfs exhibit much stronger clustering on large scales than these groups, with a
correlation amplitude comparable to that of massive groups with Mh∼1013.5M⊙(Fig. 1c). These
results clearly contradict the conventional expectation that low-mass, blue, and diffuse galaxies
3
have weaker clustering than their massive, red, and compact counterparts1,2.
Fig. 2a–d depict spatial distributions of diffuse and compact dwarf galaxies on top of the
distribution of galaxy groups10 and on filamentary structures17. It appears that diffuse dwarfs tend
to be associated with prominent filamentary structures, whereas compact dwarfs have a more dif-
fused distribution. To quantify this, we used the reconstructed mass density field from the ELUCID
project17 to classify the cosmic web into filament, sheet, void and knot components. Approxi-
mately 50% of the dwarfs are found in filaments and 30% in sheets, with diffuse ones showing a
stronger tendency to reside in filaments than compact ones. We calculated the 2PCCFs between
diffuse/compact dwarfs and different components of the cosmic web (Fig. 2e and f). Compared to
their compact counterparts, diffuse dwarfs show a much weaker correlation with voids, but exhibit
a stronger association with filaments and knots on large scales. This suggests that diffuse dwarfs
are more likely to be found within and around large cosmic structures than compact dwarfs. How-
ever, on small scales, diffuse dwarfs have a weaker correlation with knots than compact ones,
likely because star-forming gas in diffuse dwarfs is more susceptible to stripping by high-density
environments than that in compact dwarfs.
For a given mass, the large-scale clustering of halos can also depend on their intrinsic prop-
erties, a phenomenon referred to as the assembly bias3,18–20. The strong dependence of the relative
bias on Σ∗aligns with such bias provided that Σ∗is correlated with some intrinsic properties of
halos. Dwarf galaxies are ideal for studying the assembly bias because the dependence of clus-
tering on the halo mass is very weak at the low-mass end. We considered two halo properties for
which the assembly bias has been investigated extensively: the spin and the formation redshift zf,
with the latter found to be closely correlated with the halo concentration21. We found that, for
Mh∼1011 M⊙, the dependence of the bias on halo spin is too weak22 to explain the range of the
relative bias shown in Fig. 1, while the dependence on zfmay be sufficient to cover the range 23 (see
Methods). To quantify this, we first applied the abundance-matching technique to establish a con-
nection between Σ∗and zfusing the massive dwarf sample (Fig. 3b), and then assigned a Σ∗value
to each simulated halo according to its zfand the Σ∗-zfrelation. Fig. 3a shows the relative bias as
a function of Σ∗obtained from halos, taken from the constrained simulation of ELUCID17,24, in
the same volume as the observational sample to minimize cosmic variances. The observed bias-Σ∗
4
relation is well reproduced provided that Σ∗is tightly related to zf, with a correlation coefficient
ρ > 0.8. The question is whether such a relation between Σ∗and zfis expected in the current
paradigm of galaxy formation.
In the current cold dark matter (CDM) paradigm, several mechanisms have been proposed
for the formation of diffuse dwarfs. Environmental processes such as tidal heating, galaxy inter-
action and ram pressure stripping are found to be able to make dwarf galaxies more diffuse6,25–28.
However, these mechanisms are effective mainly in group and cluster environments, although some
simulations suggest that filamentary environments might also strip gas from dwarf galaxies29. Such
mechanisms are expected to remove gas from dwarf galaxies and quench star formation in them,
producing red and gas-poor dwarfs observed in clusters and groups of galaxies. They are not ex-
pected to be efficient for the formation of the diffuse dwarfs concerned here, because those dwarfs
reside in low-mass halos, have blue colors, and possess extended HI disks (see Methods). It has
also been proposed that diffuse dwarfs may be produced in halos of high spin4,28,30,31 according to
the disk formation model32. However, this scenario cannot explain the strong large-scale clustering
of diffuse dwarfs. Alternatively, multiple episodes of supernova feedback may trigger oscillations
in the gravitational potential, which lead to expansion in the inner parts of halos and the formation
of blue diffuse dwarfs5,33. Such a process might explain the observational result if its effect is more
significant in older halos. Unfortunately, existing simulations suggest that the effect is independent
of halo age and concentration5(see Methods). The same conclusion can be reached by comparing
the observational results with the predictions of L-Galaxies34,35, a semi-analytic model of galaxy
formation, and IllustrisTNG36 (hereafter TNG), a hydro cosmological simulation of galaxy for-
mation. These two models do not predict any significant dependence of the bias on Σ∗(Fig. 3a).
Furthermore, the zf-Σ∗relation predicted by the two models is either very weak or opposite to that
needed to explain the bias-Σ∗relation (Fig. 3b).
It is interesting to note that the supernova-driven expansion was proposed as a potential
solution to the “small-scale crises” of the CDM model, such as the cusp-core problem and the too-
big-to-fail problem7,37. However, such a scenario has yet to be extended so as to produce a relation
between the expansion and the halo assembly in order to explain the observed bias-Σ∗relation,
and further research is needed to assess the feasibility.
5
Beyond CDM, self-interacting dark matter (SIDM) model has also been proposed as a promis-
ing solution to the small-scale problems7,38–41. SIDM halos are expected to have the same forma-
tion histories and large-scale clustering as their CDM counterparts, so that the assembly bias is
also expected to be the same, and have significantly reduced central densities due to subsequent
collisions of dark matter particles42. Since the probability of collision between dark matter parti-
cles increases with density and halo age, older halos are expected to possess larger cores and lower
central densities43. Thus, if dwarf galaxies with lower Σ∗are associated with SIDM halos with
larger cores (lower central densities), as is consistent with the observation that halos of diffuse
dwarfs usually have low central densities or large cores44,45, an anti-correlation between Σ∗and
zf, as well as between Σ∗and the relative bias are expected, as shown in Fig. 3a and b. Thus, the
SIDM model combined with the assembly bias provides a plausible explanation for the observed
bias-Σ∗relation.
Should SIDM drive the formation of diffuse dwarfs, self-interaction has to be sufficiently
strong to produce noticeable cores, thus providing testable predictions. We used the sample of
ELUCID halos presented in Fig. 3and assigned each of the halo a galaxy with Σ∗that is obtained
from its zfusing abundance matching. We then assumed an interacting cross-section, σm, and
adopted the isothermal Jeans model43 to predict the profile (core radius, rc, and central density, ρ0,
defined by the expectation of “one scattering”; see Methods) of SIDM. The result shown in Fig. 4
highlights the similarity between SIDM cores and dwarf galaxies, in terms of the distribution of
sizes (rcversus R50), and the dependencies of zfand the large-scale bias on the size, indicating
that the SIDM cores are viable proxies of structural properties of dwarf galaxies. The predicted
relation is nearly a power-law Σ∗∝r−2
cfor a given halo mass, implying that R50 ∝rcif the stellar
mass M∗in a halo depends only on the halo mass. Parameterizing the relation as R50 =Arrc,
iterating the Jeans model until convergence, and adjusting the normalization factor Ar, we found
that the predicted Σ∗can reproduce the observed relative bias-Σ∗relation. The model prediction
and required Arfor given σmare shown in Fig. 3c and d. For comparison, we also show in
Fig. 4the distribution of rρ0/4, defined as the radius where the halo density drops to ρ0/446. For
a given cross-section σm,rρ0/4is smaller than rc. These indicates that the constraint on the cross-
section depends on how R50 is related to the defined core radius and can be obtained by future
observations of resolved rotation curves for a representative population of dwarf galaxies. Our
6
finding clearly disfavors a large cross-section that leads to core collapse and inverts the trend of
the bias with Σ∗. The predicted scaling relations, Σ∗∝r−2
cand R50 ∝rc, indicated that the
stellar components of diffuse dwarfs follow closely the dynamics driven by the dark matter. Such a
condition may be created by a process that can effectively mix stars and star-forming gas with dark
matter, similar to the process that produces the homology of dynamically hot galaxies with dark
matter halos47,48. Clearly, these hypotheses need to be tested using hydro simulations of SIDM that
can model properly not only the dynamics of the SIDM component but also processes of galaxy
formation. Our results provide strong motivation for such investigations.
7
10 1100101
r
p (
h
1Mpc)
101
102
103
w
p(
r
p) (
h
1Mpc)
a
The fitting range for
relative bias
Diffuse dwarfs
Compact dwarfs
Groups, 11.0 log
M
h (M )<12.0
Groups, 12.0 log
M
h (M )<13.0
Groups, 13.0 log
M
h (M )
5.0 10.0 50.0
* (M pc 2)
1.0
1.5
2.0
2.5
Relative bias
b
Main sample
Massive sample
10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
log
M
h (M )
1.0
1.5
2.0
2.5
Relative bias
c
Diffuse dwarfs
7*<15
15 *<25
Compact dwarfs
Fig. 1:Projected two-point cross-correlation functions (2PCCFs) and relative biases. a, 2PCCFs (wp)
as functions of projected separation (rp). Blue and red solid curves are for diffuse and compact SDSS9
dwarfs, respectively. Dashed curves are for groups with varying halo masses (Mh). Diffuse dwarfs display
the most pronounced large-scale clustering, yet they show the least small-scale clustering that is comparable
to that of compact dwarfs. Shaded region indicates the radial interval used to define the large-scale relative
bias. b, Relative bias versus surface mass density (Σ∗) for dwarfs (solid, main sample; dashed, massive
sample). A noticeable dependence of bias on Σ∗is seen. Note that the relative biases for each sample are
measured against the compact dwarfs in that sample. c, Relative biases as functions of halo mass for dwarfs
and galaxy groups10. Dashed curves are the same theoretical prediction for the absolute bias15, scaled to
the observed values of relative biases for groups with different ranges of halo masses. For comparison,
the relative biases for the dwarfs in the main sample are also shown, with their halo masses obtained by
HI kinematics. The dependence of bias on halo mass for groups aligns with theoretical prediction, but
the bias for diffuse dwarfs is much higher than that expected from their halo masses. The 2PCCFs and the
relative biases are computed using the z-weighting method (see Methods). Markers with error bars represent
medians with 16th–84th percentiles of bootstrap samples for wp, and of posterior distributions obtained by
Markov chain Monte Carlo (MCMC) fitting for relative biases. Markers with error bars for Mhof dwarfs
show the medians with dispersions (not uncertainties) of the Mhdistributions. Markers for Mhof groups
show the medians of the Mhdistributions.
8
In filament
In sheet
In knot
a b
cd
In void
Dwarf-void
Dwarf-sheet
Dwarf-filament
Dwarf-knot
e
f
Fig. 2:Correlation between dwarf galaxies and cosmic web. a–d, Spatial distribution of dwarf galax-
ies, galaxy groups, and filaments of the cosmic web (see Methods). Each blue (red) marker represents a
diffuse (compact) dwarf, with different marker shape indicating different type of cosmic web in which it
resides. Each grey dot in aand brepresent a galaxy group with Mh⩾1012 M⊙, with marker size propor-
tional to halo virial radius. Grey shades in cand dshow the fraction of field points classified as filament
along line of sight, darker for higher fraction. Only dwarfs, groups and field points with corrected red-
shift 0.02 < zcor <0.03 are included. Compact dwarfs are down-sampled without replacement to match
the number of diffuse dwarfs. e, f, Real-space 2PCCFs (ξ) between dwarfs (blue and red for the diffuse
and compact, respectively) and cosmic web of different types (void and sheet in e; filament and knot in f).
Dwarfs are taken from the main sample and are z-weighted, while cosmic web points are weighted by their
matter density. Markers with error bars show medians with 16th–84th percentiles estimated from bootstrap
samples. Leftmost markers (indicated by left arrows) are obtained by combining all pairs below 1h−1Mpc,
the smoothing scale of the reconstructed field. The strong large-scale correlation with filaments/knots and
small-scale anti-correlation with voids of diffuse dwarfs suggest that they preferentially reside within/around
large cosmic structures.
9
c
d
ρ= 0.9
Matching with scatter
ρ= 0.8
ρ= 0.5
ρ= 0.85
ρ= 0.85
2.5% 23.2% 26.0% 48.3 %
(Diffuse) (Compact)
ρ= 1
ρ= 1
a
b
2.5% 23.2% 26.0% 48.3 %
(Diffuse) (Compact)
Fig. 3:Relative bias as a function of Σ∗from galaxy formation models. a, bias-Σ∗relations from ob-
servation (orange) and models: TNG36 (green), L-Galaxies35 (purple), and our abundance matching (see
Methods) that links zfof halos in the constrained simulation of ELUCID17 to Σ∗of observed dwarfs. Ran-
dom scatter in abundance matching is controlled by the correlation coefficient ρbetween Σ∗and zf:ρ= 1
for zero scatter (black) and ρ < 1for non-zero scatter (grey). Black curve with shades shows a fine-binning
result, while black and grey markers are binned the same way as the observation by Σ∗(indicated by orange
regions, with the percentages of samples labeled). b,zf-Σ∗relations implied by the models: our abundance
matching with ρ= 1 (black) and ρ= 0.85 (grey curve with two bounds, also shown in d), TNG and L-
Galaxies. Panels c, d are for the SIDM model assuming different σm. In d, right axis shows the cumulative
percentages of zfof ELUCID halos; shades are for the σm= 0.3 cm2g−1case; inset panel shows the ratio
between galaxy size (R50) and SIDM core size (rc) required to match the observation, as a function of σm.
The massive sample (see Extended Data Table 1) is used for observation and abundance matching. Massive
(M∗= 108.5–109M⊙) star-forming central dwarfs are used for TNG and L-Galaxies. ELUCID halos with
Mh= 1010.5−1011 M⊙, within 0.01 ≤z≤0.04, and without backsplash, are used in abundance matching
and SIDM. Markers with error bars/shades in aand cshow medians with 16th–84th percentiles. Curves
with shades/bounds in band dshow medians with 16th–84th percentiles of Σ∗at given zf. Our findings
suggest that halo assembly (zf) bias is sufficient to explain the observed bias-Σ∗relation of isolated dwarfs,
provided that Σ∗has a tight anti-correlation with zf.
10
a
b
c
d
Fig. 4:Relative bias in self-interacting dark matter (SIDM) models. Here we use an isothermal Jeans
model43 for SIDM halos, adapted from the sample of CDM halos in ELUCID17 with abundance-matched
Σ∗presented in Fig. 3a and b (the ρ= 0.85 case), to predict the core size rcand central density ρ0for
each halo (see Methods). Cases for velocity-independent cross-sections σm= 0.1,0.3and 1.0 cm2g−1are
shown by dashed, solid and dotted black curves, respectively. a, Probability density functions (PDFs) of
rc.b, Relative biases as functions of rc, binned according to the fractions of observed dwarf subsamples
in the massive sample. c, Relations between halo formation time (zf) and rc.d, Relations between galaxy
stellar mass surface density (Σ∗) and rc. Curves with shades or error bars show medians with 16th–84th
percentiles. Green dots in (c) and (d) represent individual galaxies for σm= 0.3 cm2g−1, color-coded by
ρ0. The σmin use are typical values suggested by recent observational constraints49,50. For comparison,
results using an alternative definition of core size, rρ0/4, assuming σm= 3.0 cm2g−1are shown by grey
curves. The distribution of R50 and its relation with Σ∗for the observed dwarfs in the massive sample are
shown by orange curves. Given σm, the model predicts scaling relations Σ∗∝r−2
cand R50 ∝rcfor dwarfs
in isolated SIDM halos.
11
Methods
The sample of dwarf galaxies Our galaxy sample is taken from the New York University Value
Added Galaxy Catalog (NYU-VAGC)8of the Sloan Digital Sky Survey (SDSS) DR79. We selected
galaxies with the r-band Petrosian magnitudes r≤17.72, the redshift completeness fgotmain ≥
0.7, and redshift 0.01 ≤z≤0.2. Isolated galaxies are defined as the central (dominating) galaxies
of galaxy groups identified by the group-finding algorithm10,51. The NYU-VAGC provides mea-
surements of the size of a galaxy, R50, the radius enclosing 50 percent of the Petrosian r-band flux,
and the r-band S´
ersic index, n. The 0.1(g−r)color used here is K+Ecorrected to z= 0.1. We
cross-matched the sample with the MPA-JHU DR7 catalog to obtain the stellar mass (M∗)52 . As
our sample of dwarf galaxies, we selected galaxies with 107.5≤M∗/M⊙<109.0. The surface
mass density of a galaxy, Σ∗, is defined as Σ∗=M∗/(2πR2
50). Extended Data Fig. 1a–d shows the
distributions in 0.1(g−r),n,Σ∗and z.
We excluded dwarfs with 0.1(g−r)>0.6to reduce potential contamination by satellites
and with n > 1.6to ensure a relatively pure sample of late-type galaxies. These selections result
in a sample of 6,919 galaxies (the main sample). As shown below, we also constructed a massive
sample (8.5<log M∗/M⊙<9and z≤0.04), which is much more complete than the main
sample, and used it to compare with models. We divide each of the main and massive samples
into four subsamples according to Σ∗. Galaxies with Σ∗<7 M⊙pc−2and Σ∗>25 M⊙pc−2
are referred to as diffuse and compact dwarfs, respectively. Detailed information of the samples is
listed in Extended Data Table 1. Our conclusion is robust against the details of the sample-splitting.
The specific splitting and the ncut are opted so that the diffuse dwarfs are akin to UDGs5,11,30,53.
See Supplementary Information for details.
The projected cross-correlation function and relative bias We first computed the two-dimensional
2PCCF using the Davis & Peebles estimator54. To obtain the projected 2PCCF, we integrated the
two-dimensional one along the line of sight within 40 h−1Mpc, sufficiently large to include al-
most all correlated pairs. The difference in redshift distribution (Extended Data Fig. 1d) of dwarf
samples necessitates a control of the redshift distributions for a fair comparison. We used two
schemes, z-weighting and z-matching, to achieve this. In the former, the diffuse sample is used
as a reference, and weights are assigned to every galaxies in other samples to make the weighted
12
redshift distributions the same as that for diffuse dwarfs. In the latter, we constructed a control
sample for each sample (Supplementary Information) so that all control samples share the same
redshift distribution as indicated by the shaded region in Extended Data Fig. 1d. Extended Data
Fig. 2shows the 2PCCFs obtained using the two schemes.
The large-scale bias is measured through the 2PCCFs. We first determined the ratio of the
2PCCF of a sample to that of the compact sample (Extended Data Fig. 2). We then used a constant
function f(rp) = bto model the ratio within 2h−1Mpc < rp<10 h−1Mpc (shaded regions in
Extended Data Fig. 2) and applied EMCEE55 to constrain b. The likelihood function adopted is the
same as equation (7) in ref.56, with the covariance matrix calculated as in ref.57. Extended Data
Fig. 2shows results for the main sample. The relative bias quoted is the median of the posterior
distribution, with the error bars indicating the 16th and 84th percentiles. Extended Data Table 1
shows that results from z-weighting and z-matching are similar. In the main text, we only show
results based on the z-weighting scheme.
Impacts of incompleteness, sample selection and cosmic variances The completeness of our
samples can be influenced by several selection effects (SEs) dictated largely by the apparent mag-
nitude and surface brightness58. Given our focus on M∗and Σ∗, we address the SEs in terms of
M∗and Σ∗. The apparent magnitude of a galaxy is influenced by its redshift (z), M∗, and color,
while its surface brightness is controlled by M∗and color. So the SEs are related to z,M∗, color,
and Σ∗. The volume number densities of galaxies are directly affected by SEs and thus can be used
to gauge their impacts. Extended Data Fig. 3shows n(z), the number density as a function of z,
for different Σ∗. Since the intrinsic densities differ between different samples, we normalize n(z)
by that of the lowest-zbin, n0. To examine the dependence on M∗and the color, we show results
for two mass bins and two color bins. In the absence of SEs, n(z)/n0is expected to be roughly
constant. A faster decline of n(z)/n0with zsuggests a stronger SE and thus greater incomplete-
ness. As shown in Extended Data Fig. 3, the SEs primarily depend on M∗(or magnitude), and
only weakly on Σ∗(and size) and color for given M∗.
The SEs for massive dwarfs (8.5<log M∗/M⊙<9) at z≤0.04 are much weaker than
the total population. We select these dwarfs to form a “massive sample”. This sample is used for
abundance matching which focuses on the Σ∗distribution (see below). Within the massive sample,
13
the dwarf fractions in the four Σ∗bins are 2.5%, 23.2%, 26.0% and 48.3%, respectively (Extended
Data Table 1). These fractions at lower zare similar; for example at z≤0.03 they are 2.3%,
22.2%, 25.2% and 50.3%, respectively. Thus, the Σ∗distribution of the massive sample is not
affected significantly by the SEs.
Note that SEs should not affect the clustering strength but only reduce the signal-to-noise ra-
tio if they are independent of the large-scale structure (LSS). Thus, the SEs can affect the relative-
bias measurements if they depend on LSS and if the dependence is different between diffuse and
compact dwarfs. As shown in Strauss et al.58, the galaxy selection is independent of LSS, sug-
gesting that the weak SEs in Σ∗should not have significant impacts on our results. Since we have
already controlled the redshift distribution and since the M∗and color ranges are quite restrictive,
the impact of the SEs through M∗and the color is also expected to be weak. As a check, we made
analyses in narrower ranges of redshift, mass and color (Extended Data Fig. 4a,b,c). If our finding
was dominated by the SEs in z,M∗or color, the trend would be weakened within each of the
narrower bins. In contrast, the trends obtained are consistent with the results of the main sample.
The only exception is that the relative bias of the redder sample is lower than the bluer one at the
level of ∼3σ. We also examined uncertainties in M∗,R50 and z(Supplementary Information),
and found no significant impact on our results.
Cosmic variances can have significant effects on galaxy statistics obtained from small samples59,60.
As shown in Extended Data Fig. 4a and d, the results obtained in distinctive volumes defined by the
two redshift intervals and the two sky areas are consistent with each other, indicating that cosmic
variances do not have big impacts on our results.
Our tests also show that 2PCCFs are highly sensitive to contamination by satellite galaxies
only on small scales (Extended Data Fig. 5), indicating that the contamination cannot explain our
finding that is based on large-scale clustering.
To test the impact of the cut in S´
ersic index used in our sample selection, we conducted test
by removing the cut (Extended Data Fig. 4e). Without restricting n, diffuse dwarfs are still more
strongly clustered than compact dwarfs. However, since the dependence on Σ∗is weak for dwarfs
with n > 1.6(Extended Data Fig. 4e), including large-ndwarfs weakens the Σ∗dependence.
14
Extended Data Fig. 4f shows that the relative bias is quite independent of n, indicating that the
assembly bias is not well reflected by n. Apparently, the Σ∗of large-ndwarfs are not determined
by halo assembly history, in contrast to that of small-ndwarfs, but the physics behind it is not yet
understood. Including large-ndwarfs thus dilutes the signal of assembly bias and complicates the
interpretation of results. Because of this, we excluded dwarfs with n > 1.6(about 28% of the total)
from the main sample.
Halo mass estimates The halo mass is defined as the mass enclosed by the radius within which
the mean density is 200 times the mean matter density of the Universe at the epoch in question. We
first adopted the SHMR14 to estimate the halo mass. Abundance matching found that scatter in M∗,
σlog M∗, is about 0.2 dex at given Mh61. Thus, the scatter in Mhis σlog Mh=σlog M∗
d(log Mh)
d(log M∗)∼0.1
at log M∗/M⊙∼9. We estimated the median Mhand its uncertainty for a sample as follows.
For a given galaxy, the uncertainty in M∗is considered to be Gaussian, with a spread set by the
measurement error of M∗. A random stellar mass, M∗,r, is assigned to the galaxy. Halo mass at
given stellar mass is also assumed to follow a Gaussian distribution with a dispersion of 0.1 dex.
We then generated a random halo mass from M∗,rand used the halo bias model15 to predict a
halo bias. Finally, we obtained one measurement of the median Mhand the mean bias of the
sample. This process was repeated 100 times, yielding 100 measurements of the median Mhand
the mean bias. The 50th percentiles of these measurements represent the median halo mass and
halo bias for the sample, while the 16th and 84th percentiles represent their uncertainties (as listed
in Extended Data Table 1). The predicted bias ratio between diffuse and compact dwarfs is 0.99
with an uncertainty less than 0.01.
We then used HI kinematics to measure the halo mass. We cross-matched our dwarf sam-
ple with the complete Arecibo Legacy Fast Arecibo L-band Feed Array (ALFALFA α.100) HI
survey62,63. To estimate the rotation velocities and halo masses, we excluded galaxies with dubious
HI spectra, low HI spectra signal-to-noise ratios (SNR<8) and large axis ratios (b/a > 0.7). We
used the same method outlined in ref.64 to obtain the rotation velocity from the line width (W20) and
the halo mass by assuming the Burkert profile46 with a central core65,66 (see also Supplementary
Information). The halo mass uncertainty is determined by taking into account the uncertainties in
stellar mass, HI mass, HI line width, inclination and the assumed profile (∼0.15 dex67). Since re-
15
solved HI maps are unavailable, we used the inclination of the stellar disk64 to estimate the HI incli-
nation, assuming a misalignment given by a Gaussian distribution with dispersion δϕ ≃20◦64,68,69.
Extended Data Fig. 6shows the halo mass obtained from HI kinematics, Mh,HI, versus M∗.
The overall trends in the Mh,HI-M∗relations resemble the SHMR14, but with much larger disper-
sion due to the uncertainties in Mh,HI. Our estimates for diffuse dwarfs are consistent with those of
UDGs obtained by ref.44 from HI rotation curves. The uncertainty of individual galaxies surpasses
the Mh,HI measurement dispersion, and is thus overestimated, primarily due to the inclination er-
rors. Assuming the uncertainty of Mh,HI to follow a Gaussian with dispersion equal to its error, we
generated a new Mh,HI and predicted a bias b(Mh,HI)15 for each galaxy. We then adopted the same
method as for the SHMR mass to obtain the median halo mass/halo bias and their uncertainties
for individual samples(Extended Data Table 1). The predicted bias ratio between the diffuse and
compact samples is 0.66/0.7 = 0.94, with an uncertainty ∼0.02.
HI mass of dwarf galaxies We cross-matched the optical counterparts of the ALFALFA sample62,63
with our dwarf galaxies. The HI detection rates for the four samples in the ascending order of Σ∗
are 84.0%,68.1%,49.6% and 35.6%, respectively. Extended Data Fig. 8shows the HI mass for
galaxies with HI detections. Clearly, diffuse dwarfs are gas richer than compact ones, suggesting
that they cannot be produced by environmental processes capable of stripping their extended HI
disks.
The distribution of dwarf galaxies in the cosmic web To investigate the connection between
dwarf galaxies and the cosmic web, we used the reconstructed mass density field of the local
Universe provided by the ELUCID project17. The cosmic web was classified using the “T-Web”
method70, which utilizes eigenvalues of the local tidal tensor to define the morphology of the local
structure as knot, filament, sheet and void. The grey shades in Fig. 2c and d show the fraction
of filament grids along each line-of-sight. Since the redshift-space distortion (RSD) is corrected
in the reconstruction, we assigned a corrected redshift17,zcor , to each of the galaxies and groups
shown in Fig. 2a–d. To quantify the spatial correlation between dwarfs and the cosmic web, we
computed the 2PCCF in real space between dwarf galaxies in our main sample and different grid
points. Galaxies are z-weighted to match the redshift distribution of diffuse dwarfs, and grid
points are weighted by their matter density. Fig. 2e and f show the eight 2PCCFs, highlighting
16
the difference in large-scale environment between diffuse and compact dwarfs. We calculated the
projected distance from a diffuse dwarf to the nearest group (see Supplementary Information) and
found that the median distance is significantly higher than that for backsplash halos23, indicating
that diffuse dwarfs are not backsplashs. This also aligns with the observation that diffuse dwarfs
contain more HI-gas than compact dwarfs.
Halo assembly bias in cosmological simulations We analyzed the dark-matter-only (DMO) sim-
ulation, TNG300-1-Dark71, to explore whether halo assembly bias3can explain the observed bias-
Σ∗relation. The resolution of this simulation allows us to compute halo spin accurately. We
excluded backsplash halos, as they are unlikely to be relevant to diffuse dwarfs.
Halos with 1010.5≤Mh/M⊙<1011 were divided into subsamples by half-mass formation
time72 (zf) or spin73 (λ). The reference sample to estimate the 2PCCF included all centrals and
satellites with Mh,peak ≥1010.5M⊙, where Mh,peak denotes peak main-branch halo mass. We
incorporated redshift-space distortions (RSD) along one simulation axis60. Extended Data Fig. 7a
and b show that dwarf-host halos with the highest zfhave clustering comparable to halos with
Mh≳1013 M⊙.
Our findings imply that the bias-zfrelation can explain the observed bias-Σ∗relation, pro-
vided that zfgoverns Σ∗. To see this, we applied an abundance matching74 between Σ∗in the
massive sample and zfof dwarf-host halos in the same volume simulated by ELUCID17, assum-
ing some scatter in the matching. ELUCID is an N-body simulation constrained to reproduce the
density field underlying SDSS galaxies, thus ensuring the same large-scale environments for the
simulated halos and observed dwarfs. The Σ∗–zfmapping follows75
Σ∗=P−1
Σ∗◦ N h−ρN−1◦ Pzf(zf) + p1−ρ2ϵi,(1)
where Nis the cumulative distribution function (CDF) of a Gaussian variable; Pzfand PΣ∗are
CDFs of zfand Σ∗, respectively, obtained numerically from the samples in question; “◦” denotes
function composition and “−1” denotes functional inversion; ρquantifies the zf–Σ∗correlation
and ϵis a unit Gaussian random noise. This matching assigns a Σ∗to each halo, preserving the
observed Σ∗distribution. The relative bias of halos is shown in Fig. 3a as a function of the assigned
Σ∗, assuming different ρ. See Supplementary Information for more details of abundance matching.
17
The formation of diffuse dwarfs in the cold dark matter scenario Current models for the for-
mation of (ultra-)diffuse dwarfs in CDM halos fail to reproduce the observed bias-Σ∗relation.
Tidal heating26,27, galaxy interactions76, and ram pressure stripping29 require dense environments
(groups/filaments) which remove gas or quench star formation, incompatible with the blue, HI-rich
nature of diffuse dwarfs(Extended Data Fig. 8). Models attributing diffuse dwarfs to suppressed
star formation in massive halos interacting with the large-scale structure25,77 conflict with the halo-
mass estimates and the small-scale clustering (Fig. 1a). Models relying on exceedingly high-spin
halos to host diffuse dwarfs 4,30,31 predict a bias-Σ∗relation that is inconsistent with observations
(see Fig. 3for L-Galaxies), as the assembly bias in halo spin is too weak (Extended Data Fig. 7).
Episodic stellar/supernova feedback-driven outflows and associated variations of gravitational po-
tential, seen in simulations like NIHAO5and FIRE33, could cause galaxies and halos to expand.
However, these models disfavor UDGs in halos of high concentration (thus high-zf78, high-bias;
see Extended Data Fig. 7c), and cause deficits of compact dwarfs27 and steep dark-matter profiles79.
To demonstrate the discrepancy between the models and our observation, we directly com-
pared our results with two models, the TNG100-1 hydro simulation36,71,80–85 and the L-Galaxies
semi-analytic model35,86. Central star-forming dwarfs in both models show weak zf-Σ∗relations
(Fig. 3b) and have 2PCCFs (Extended Data Fig. 7d and e) that are inconsistent with the observed
Σ∗dependence. Tests using TNG with higher resolutions71,87,88 (Extended Data Fig. 7f) and L-
Galaxies in a larger volume proved that our conclusions are robust. We also found that backsplash
halos have negligible effects on the large-scale bias. We suspect that the discrepancy arises from
model assumptions: L-Galaxies ties cold-gas sizes to halo spins34 (anti-correlated with zf78), while
in TNG the sizes are regulated by stellar winds85 that may erase halo assembly effects.
Assembly bias in self-interacting dark matter models Dark matter self-interaction flattens halo
central profiles while preserving the outer shape and large-scale clustering. The thermalized SIDM
core can be described by its central density (ρ0) and core radius (rc) at which each particle is
expected to experience one scattering over the halo lifetime89, both governed by the cross-section
per particle mass, σm. Alternative definitions of the core size also exist, e.g., rρ0/4, defined as the
radius at which the density drops to ρ0/446.
Large SIDM simulations capable of resolving halos of dwarfs remain impractical. We instead
18
applied a semi-analytical method43,89 to CDM halos used in Fig. 3a and b to predict SIDM cores
via the isothermal Jeans modeling. Concentrations of ELUCID halos were assigned using the
conditional distribution p(c|zf)calibrated from a simulation with higher resolution. Each halo is
populated with a galaxy of M∗= 108.8M⊙(Extended Data Table 1) and an exponential profile
according to its Σ∗assigned by the abundance matching. Adiabatic contraction due to baryons and
Jeans modeling43 were then applied to predict rc,rρ0/4and ρ0.
Current constraints on σmfor low-mass halos range from ≤1.63 cm2g−1(based on inner
halo profiles)49 to ≤10 cm2g−1(based on the Tully-Fisher relation)50,90. See refs.50,91 for a sum-
mary. For demonstration we adopted velocity-independent σm= 0.1–1.0 cm2g−1for rcand ρ0and
3.0 cm2g−1for rρ0/4. Fig. 4shows that (i) the rcdistribution assuming σm= 0.3 cm2g−1aligns
with the observed R50 distribution, with a higher σmpredicting a proportionally shifted distribu-
tion to the right (panel a); (ii) the tight monotonic bias-rc(panel b) and zf-rc(panel c) relations
mirror the observed bias-Σ∗(R50) relation (Fig. 3); (iii) the Σ∗-rcand Σ∗-R50 relations match each
other closely (panel d); (iv) matching R50 with rρ0/4requires larger σm. We also found that the
inclusion of baryons in the adiabatic contraction and in the Jeans-Poisson equation makes the core
size larger, more so for halos with lower zf, but does not disorder the rc(or rρ0/4) - R50 relation,
provided that σmis not so large that core collapse inverts the bias-Σ∗relation required by the ob-
servation. Note that our predictions for SIDM cores rely on the sequence of assumptions that were
incrementally incorporated. See Supplementary Information for more details.
Data availability The stellar mass and star formation rate for SDSS galaxies used in this paper are publicly
available at https://wwwmpa.mpa-garching.mpg.de/SDSS/DR7/. The galaxy size and S´
ersic
index data can be downloaded at http://sdss.physics.nyu.edu/vagc/. The galaxy group cat-
alog is publicly available at https://gax.sjtu.edu.cn/data/Group.html. The ALFALFA HI
sample can be downloaded at https://egg.astro.cornell.edu/alfalfa/data/. The simula-
tion data are available through the IllustrisTNG public data release71 at https://www.tng-project.
org/ for the runs used in this paper, and for L-Galaxies implemented on the runs. The ELUCID simulation
data are available upon request.
Code availability The code used in this paper is available at https://github.com/ChenYangyao/
dwarf_assembly_bias. The code for the semi-analytic method based on the isothermal Jeans model
19
is publicly available at https://github.com/JiangFangzhou/SIDM.
Acknowledgements We thank Fangzhou Jiang, Liang Gao, Jie Wang, Qi Guo, Xiaohu Yang, Ying Zu,
Ran Li, Daneng Yang, Ce Gao and Kai Wang for comments. This work is supported by the National Natural
Science Foundation of China (NSFC, Nos. 12192224, 12273037 and 11890693). HYW acknowledgements
supports from CAS Project for Young Scientists in Basic Research, Grant No. YSBR-062. Y.R. acknowl-
edgements supports from the CAS Pioneer Hundred Talents Program (Category B), as well as the USTC
Research Funds of the Double First-Class Initiative. Y.C. acknowledgements supports from China Post-
doctoral Science Foundation (Grant No. 2022TQ0329). We acknowledge the science research grants from
the China Manned Space Project with CMS-CSST-2021-A03 and Cyrus Chun Ying Tang Foundations. The
work is also supported by the Supercomputer Center of University of Science and Technology of China,
and the Tsinghua Astrophysics High-Performance Computing platform of Tsinghua University. HYW ac-
knowledges the hospitality of the International Centre of Supernovae (ICESUN), Yunnan Key Laboratory at
Yunnan Observatories Chinese Academy of Sciences.
Author Contributions The listed authors made substantial contributions to this manuscript; all co-authors
read and commented on the document. ZWZ, YYC and YR contributed equally to this work, and YYC and
YR are co-first authors of this paper. HYW conceived the original idea, initiated the project and led the
analysis. HJM contributed to the writing and interpretation of results. XL contributed to the analysis of
observational data. HL contributed to the analysis of simulation data.
Competing Interests The authors declare that they have no competing financial interests.
Supplementary Information is available for this paper.
Correspondence Correspondence and requests for materials should be addressed to HYW (email: why-
wang@ustc.edu.cn).
Reprints and permissions information is available at www.nature.com/reprints.
20
Extended Data Fig. 1:Dwarf galaxy sample selection.a, b, c, Distributions of dwarf properties. Green
dots represent the finally selected dwarfs. Red dots show the dwarfs with 0.1(g−r)>0.6and blue dots
show the dwarfs with 0.1(g−r)<0.6and n > 1.6.d, Redshift distributions of dwarf galaxy samples
with different Σ∗. Shaded region shows the redshift distribution for control samples, fctl(z), used in the
z-matching method.
21
Extended Data Fig. 2:2PCCFs for the main samples. The first and third rows show the 2PCCFs for
dwarfs with different surface density. And the second and fourth rows show the 2PCCF ratios relative to
compact dwarfs. The top two rows show the results using the z-weighting method, while the bottom two
rows present those for the z-matching method. The error bars for both the 2PCCFs and the 2PCCF ratios
represent the 16th and 84th percentiles of 100 bootstrap samples. The shaded region indicates the radius
interval used for fitting and best-fit relative bias. The error bars for relative bias represent the 16th and 84th
percentiles of the posterior distribution.
22
Extended Data Fig. 3:Number density n(z)as a function of redshift for different Σ∗.n(z)is nor-
malized by that of the lowest-zbin (n0). a,n(z)for low-mass (7.5<log M∗/M⊙≤8.5) and massive
(8.5<log M∗/M⊙≤9) dwarfs separately. For massive dwarfs, the SEs become large only when z > 0.04.
For less-massive dwarfs, the SEs are significant even at z∼0.02. For given M∗, the impact of the SEs de-
pends only weakly on Σ∗, as is expected from the small redshift concerned here. At z > 0.04, there is no
low-mass dwarf with Σ∗>7 M⊙pc−2.b,n(z)for red (0.3<0.1(g−r)<0.6) and blue (0.1(g−r)<0.3)
dwarfs separately. Dwarfs with different colors exhibit similar behavior, indicating that the SEs are insen-
sitive to galaxy color. This is because our galaxies have already been restricted to a relatively narrow color
range.
23
Extended Data Fig. 4:Relative biases obtained based on different dwarf subsamples. Here the samples
are divided by z(a), M∗(b), color (c), Right Ascension (R.A., d), and S´
ersic n(e), respectively, and the
relative biases versus Σ∗are shown for subsamples. In e, the main sample is exactly the sample used in the
main text. The n > 1.6sample consists of isolated dwarf galaxies with n > 1.6and 0.1(g−r)<0.6. The
no n-cut sample includes the main sample and n > 1.6sample. Note that the three curves are normalized
to different compact samples that may have different clustering strength. f, Relative bias as a function of n
for no n-cut dwarf sample. The relative bias is normalized to the subsample with the largest n. Only results
using the z-weighting method are shown here. The results from the z-matching method are very similar and
thus not presented. Markers with error bars are median values with 16th–84th percentiles of relative biases
obtained from the posterior distribution of MCMC fitting.
24
Extended Data Fig. 5:2PCCFs with satellite contamination. Blue and red solid curves represent the 2PC-
CFs for diffuse and compact dwarf galaxies, respectively, while dotted curves show the impact of different
levels of satellite contamination on these dwarfs. Satellite contamination can notably amplify small-scale
clustering, while it moderately enhances large-scale clustering for compact dwarfs and leaves the large-scale
clustering unchanged for diffuse dwarfs. Note that the wine and cyan dotted lines show the results includ-
ing all compact and diffuse satellite dwarfs, respectively. Thus, satellite contamination cannot explain the
strong large-scale clustering observed in isolated diffuse dwarfs. Error bars represent 16th–84th percentiles
of bootstrap samples.
25
Extended Data Fig. 6:Comparison of halo masses of dwarf galaxies derived from different methods.
a–d, halo mass versus M∗for dwarf samples with different Σ∗. Symbols with error bars show the halo
mass obtained by using the HI kinematics versus M∗and their uncertainties. Teal shadow region shows the
SHMR14 and its 1σuncertainty. Cyan symbols show the results for UDGs taken from ref.44. These UDGs
have spatially-resolved HI kinematics maps, therefore their halo mass measurements are more reliable than
ours. As can be seen, these UDGs follow the same trend as our diffuse dwarfs.
26
Extended Data Fig. 7:Numerical simulations for dwarf galaxies and dwarf-host halos at z= 0. a, b,
2PCCF of dwarf-host halos (1010.5≤Mh/M⊙<1011 M⊙; backsplash excluded) in the DMO simulation
TNG300-1-Dark71, shown for subsamples with different ranges of halo formation time, zf(a), and halo
spin, λ(b), and for the total sample (black in aand b). Fractions of halos in subsamples are equal to
those of dwarfs in the subsamples of the massive sample (see Fig. 3a and Extended Data Table 1). c, PDF
and median of halo concentration (c) for halo (sub)samples in a. Halo concentrations of UDG analogues
simulated by NIHAO5are shown by grey shaded area (minimum to maximum) and error bar (mean and
standard deviation). Their concentration, cDM, is evaluated from the halos in the DMO counterpart of the
hydro, compatible with ours. d, e, f, 2PCCF of central star-forming (sSFR ⩾10−11 yr−1) dwarfs in galaxy-
formation models: L-Galaxies86 (run on TNG100-1-Dark35,71 ,d), TNG100-136 (e) and TNG50-187 (f),
shown for subsamples with different ranges of Σ∗, and for the total sample (black). Dwarfs here include
those with 108.5⩽M∗/M⊙<109for L-Galaxies and TNG100-1, and 108⩽M∗/M⊙<109for TNG50-
1. Reference sample includes all galaxies (central or satellite, star-forming or quiescent) above the lower
mass limit of the dwarf sample. In a, b, d–f, grey markers linked by curves from thin to thick are the
2PCCFs of massive halos with given ranges of mass in that simulation. Each upper panel shows wp, while
each lower panel shows the ratio of wpto that of total. Markers with error bars for 2PCCFs show median
values with 16th–84th percentiles estimated from bootstrap samples.
27
Extended Data Fig. 8:HI mass (MHI) verses M∗for dwarf galaxy samples with varying Σ∗.The
colored lines represent the median relationships of different samples.
28
Extended Data Table 1:Sample selection and the corresponding results
Main sample Sample size log (M∗/M⊙)alog (Mh/M⊙)bBias (z-matching)dBias (z-weighting)eHalo biasf
total 6,919 8.72+0.0
−0.010.99+0.0
−0.0— — 0.68+0.0
−0.0
0≤Σ∗<7 349 8.38+0.01
−0.01 10.83+0.01
−0.01 2.44+0.25
−0.25 2.31+0.20
−0.19 0.67+0.0
−0.0
7≤Σ∗<15 1,782 8.65+0.0
−0.010.96+0.0
−0.01.41+0.12
−0.12 1.49+0.10
−0.11 0.68+0.0
−0.0
15 ≤Σ∗<25 1,738 8.73+0.0
−0.010.99+0.0
−0.01.20+0.13
−0.13 1.24+0.09
−0.09 0.68+0.0
−0.0
25 ≤Σ∗3,050 8.77+0.0
−0.011.01+0.0
−0.01.00 1.00 0.68+0.0
−0.0
Massive sample Sample size log (M∗/M⊙)alog (Mh/M⊙)bBias (z-matching)dBias (z-weighting)eHalo biasf
total 4,944 8.8+0.0
−0.011.03+0.0
−0.0— — 0.68+0.0
−0.0
0≤Σ∗<7 122 8.66+0.01
−0.01 10.96+0.01
−0.01 2.22+0.33
−0.33 2.4+0.28
−0.28 0.68+0.0
−0.0
7≤Σ∗<15 1,148 8.76+0.0
−0.011.03+0.0
−0.01 1.39+0.11
−0.11 1.44+0.10
−0.10 0.68+0.0
−0.0
15 ≤Σ∗<25 1,284 8.79+0.0
−0.01 11.01+0.0
−0.01.22+0.11
−0.11 1.29+0.09
−0.09 0.68+0.0
−0.0
25 ≤Σ∗2,390 8.82+0.0
−0.011.03+0.0
−0.01.00 1.00 0.68+0.0
−0.0
HI-detected sample Sample size log (M∗/M⊙)alog (Mh,HI/M⊙)cBias (z-matching)dBias (z-weighting)eHalo biasf
total 565 8.64+0.01
−0.01 10.77+0.04
−0.04 — — 0.69+0.01
−0.0
0≤Σ∗<7 59 8.33+0.02
−0.02 10.38+0.15
−0.12 — — 0.66+0.01
−0.01
7≤Σ∗<15 195 8.55+0.01
−0.01 10.72+0.06
−0.06 — — 0.68+0.01
−0.0
15 ≤Σ∗<25 156 8.68+0.01
−0.01 10.85+0.06
−0.07 — — 0.69+0.01
−0.01
25 ≤Σ∗155 8.81+0.01
−0.01 10.85+0.07
−0.08 — — 0.7+0.01
−0.01
No n-cut sample Sample size log (M∗/M⊙)alog (Mh/M⊙)bBias (z-matching)dBias (z-weighting)eHalo biasf
total 9,649 8.72+0.0
−0.010.98+0.0
−0.0— — 0.68+0.0
−0.0
0≤Σ∗<7 505 8.40+0.01
−0.01 10.83+0.01
−0.01 1.77+0.15
−0.15 1.83+0.13
−0.13 0.67+0.0
−0.0
7≤Σ∗<15 2,317 8.67+0.0
−0.010.97+0.0
−0.01.32+0.09
−0.09 1.27+0.07
−0.07 0.68+0.0
−0.0
15 ≤Σ∗<25 2,122 8.74+0.0
−0.011.00+0.0
−0.01.10+0.09
−0.09 1.12+0.07
−0.07 0.68+0.0
−0.0
25 ≤Σ∗4,705 8.76+0.0
−0.011.00+0.0
−0.01.00 1.00 0.68+0.0
−0.0
Extended Data Table 2:Sample selection and the corresponding results The columns show the values of the corresponding quantities,
with uncertainties corresponding to the 16% and 84% percentiles. The uncertainties are rounded to two decimal places, and a value of 0.0
represents the uncertainty of less than 0.004. Column a: Median stellar mass of the sample; Column b: Halo mass estimated from SHMR;
Column c: Halo mass measured from HI-kinematics; Column d: Relative bias obtained using the z-matching method; Column e: Relative
bias obtained using the z-weighting method; Column f: Theoretical halo bias of the sample.
29
Supplementary Information
The details of the sample selection and splitting methods We opted for this specific method of
sample splitting because the diffuse dwarfs defined in this paper are akin to ultra-diffuse galaxies
(UDGs) in the literature5,11,30,53. UDGs are identified with thresholds of a surface brightness of
µe>24 mag/arcsec2and an effective radius R50 >1.5 kpc11. Using the relation between stellar
mass-to-light ratio and color (MLCR)92, we obtain
log (M∗/M⊙) = −0.306 + 1.097(g−r)−0.1−0.4(Mr−4.64) −0.12
= 1.33 + 1.097(g−r)−0.4Mr,(2)
where Mris the r-band absolute magnitude, 4.64 is the r-band magnitude of the Sun in the AB
system93, the −0.10 term effectively implies the use of a Kroupa (2001) IMF94, which is adopted
for the estimation of M∗52, and the −0.12 term is used to account for the difference between the
MPA-JHU mass and the mass estimated using the MLCR at log(M∗/M⊙)∼9.0(see Figure 8 in
Zhang et al.95). Please see Yang et al.10 for details. We then obtain,
log Σ∗
M⊙pc−2= 9.96 + 1.097(g−r) + 4 log(1 + z)−0.4µe
mag/arcsec2.(3)
Assuming g−r= 0.6(UDGs in clusters are usually red), the surface brightness criterion of µe=
24 mag/arcsec2roughly corresponds to Σ∗= 10 M⊙pc−2. Assuming g−r= 0.3(UDGs in fields
are usually blue), the surface brightness criterion of µe= 24 mag/arcsec2roughly corresponds to
Σ∗= 5 M⊙pc−2. We therefore adopted Σ∗<7 M⊙pc−2to select diffuse dwarfs. We also checked
the R50-distribution of diffuse dwarfs and found that 321/349 exhibit R50 >1.5 kpc. Furthermore,
the S´
ersic index distribution of UDGs usually peaks at n≈153,96, indicating an exponential light
profile. Our criterion of n < 1.6results in a median S´
ersic index of ∼1.2for these dwarfs, akin to
UDGs, and ensures a relatively pure sample of late-type morphology. High-resolution images97,98
were cross-matched with our sample and visually inspected to verify the purity of the selection.
Control-sample construction in the z-matching method In the z-matching scheme, we con-
structed a control sample for each sample in such a way that all control samples share the same
redshift distribution, fctl(z). To do this, we first determined fctl(z)through
fctl(z) = min(f1(z), f2(z), ..., fn(z)),(4)
30
Supplementary Table 1: Sample selection and the corresponding results based on GSWLC stellar mass
GSWLC samples Sample size log(M∗/M⊙)alog(Mh/M⊙)bBias z-matchingcBias z-weightingdHalo biase
total 4,699 8.75+0.0
−0.011.0+0.05
−0.05 - - 0.68+0.0
−0.0
0≤Σ∗<10 262 8.49+0.01
−0.01 10.89+0.06
−0.08 2.32+0.36
−0.36 1.95+0.22
−0.23 0.67+0.0
−0.0
10 ≤Σ∗<20 1,109 8.71+0.0
−0.010.99+0.05
−0.07 1.93+0.21
−0.21 1.75+0.13
−0.13 0.68+0.0
−0.0
20 ≤Σ∗<40 1,542 8.77+0.0
−0.011.0+0.06
−0.06 1.2+0.15
−0.15 1.15+0.09
−0.10.68+0.0
−0.0
40 ≤Σ∗1,786 8.78+0.0
−0.011.01+0.06
−0.05 1.0+0.0
−0.01.0+0.0
−0.00.68+0.0
−0.0
The columns show the values of the corresponding quantities, with uncertainties corresponding to the 16% and 84% percentiles.
The uncertainties are rounded to two decimal places, and a value of 0.0 represents the uncertainty of less than 0.004. Column a:
Median stellar mass of the sample; Column b: Halo mass estimated from SHMR; Column c: Relative bias obtained using the z-
matching method; Column d: Relative bias obtained using the z-weighting method; Column e: Theoretical halo bias of the sample.
where fx(z), with x= 1,2, ..., n, is the redshift distribution of the xth sample. The shaded region
in Extended Data Fig. 1d shows fctl(z). We then computed the numbers for the control sample x
within a redshift bin zusing
nx,ctl(z) = fctl (z)/fx(z)∗nx(z),(5)
where nx(z)is the number of galaxies in the xth original sample within the same redshift bin.
Since fctl(z)≤fx(z), one has nx,ctl (z)≤nx(z). Finally, we randomly chose nx,ctl(z)galaxies
from the original sample in the corresponding redshift bin to create the control sample.
The impact of uncertainties in M∗,R50 and zIn the main text, stellar masses from the MPA-
JHU catalog are adopted, and the typical statistical uncertainty in the mass is about 0.08 dex. The
GSWLC catalog99 also provides stellar-mass estimates (hereafter the GSWLC mass) based on the
UV+optical+mid-IR SED fitting. The two masses are tightly correlated, but there is a systematic
offset of 0.17 dex which increases with increasing stellar mass. The scatter of the relation and the
offset between the two mass estimates are larger than the statistical uncertainties in the MPA-JHU
masses, signifying a potential issue in our analysis.
To address this issue we constructed a new dwarf sample based on the GSWLC mass esti-
mates using 7.5≤log M∗/M⊙<9,0.1(g−r)<0.6and n < 2.0. The total dwarf sample is
divided into four subsamples with 0≤Σ∗<10 (diffuse), 10 ≤Σ∗<20,20 ≤Σ∗<40, and
40 ≤Σ∗(compact), respectively. Since the GSWLC mass is greater than the MPA-JHU mass by
0.17 dex at log M∗/M⊙∼9, we adopted a higher threshold, 7 M⊙pc−2×100.17 ≃10 M⊙pc−2, to
31
select diffuse dwarfs. This gives 262 diffuse dwarfs and 1,786 compact dwarfs. The relative bias
obtained from these subsamples is listed in Supplementary Table 1. The diffuse dwarfs still have
significantly higher bias than compact dwarfs. The relative bias for the diffuse sample is around
2, similar to the result based on the MPA-JHU mass but with larger errors. The reason for this is
that a significant fraction of dwarfs do not have GSWLC mass estimates; the fraction is as high as
26% for diffuse dwarfs defined by the MPA-JHU mass. We thus conclude that our results are not
sensitive to the stellar mass measurements.
The NYU-VAGC catalog does not provide uncertainties for the R50 measurements. The val-
ues of R50 given by the catalog are derived from the S´
ersic-model fitting100 and their uncertainties,
shown in Figure 10 of the cited reference, are very small, typically about 10%. We thus believe
that size uncertainties do not affect our results significantly. The uncertainties in redshift are very
small, typically with ∆z/(1 + z) = 2 ×10−5, corresponding to a negligible distance uncertainty
of ∆r=c∆z/H0= 0.06 h−1Mpc. The typical redshift of our samples galaxies is z∼0.02,
corresponding to receding velocity of ∼6,000 km s−1. Thus, peculiar velocities of galaxies may
have a sizable effect on the estimates of their stellar masses. The impact of the uncertainties in the
stellar mass estimates has been tested above.
The distances to nearest groups Backsplash halos, which were once contained in massive halos
but now are independent, make significant contributions to the halo assembly bias23. Most of
the backsplash halos reside within three to four times the virial radius of their host halos23. The
distance distribution of backsplash halos from their host halos reaches its maximum at less than
twice the virial radius of their hosts23. Moreover, since the projected distance is smaller than the
3-D distance, the median projected distance of diffuse dwarfs to nearby groups should be smaller
than two times the virial radius if the dwarf sample is dominated by backsplash halos. To test this,
we identified, for each diffuse dwarf galaxy, the neighboring groups with |∆v| ≤ 3vvir , where ∆vis
the line-of-sight velocity difference between the dwarf and the group, and vvir is the virial velocity
of the group. We computed the projected separation (Rsep) between the dwarf and neighboring
groups and select the nearest group as the one with the smallest Rsep/Rvir, where Rvir is the virial
radius of the group. We found that the median Rsep to the nearest groups with Mh>1012 M⊙is
1.84 h−1Mpc, about 4.8times the virial radius, while the median Rsep from the nearest groups with
32
Mh>1013 M⊙is about 5.9h−1Mpc, about 8.0times the virial radius. These large separations are
in conflict with associating diffuse dwarfs with backsplash halos.
Details of the halo-mass estimate from HI kinematics We used the same method as described
in Guo et al. (2020)64 to evaluate the 20% peak width of the HI line width (W20) from the HI
spectrum for each galaxy. Since resolved HI maps are not available, we assumed the inclination of
the HI disk, ϕ, to be the same as that of the stellar disk, sin ϕ=p[1 −(b/a)2]/(1 −q2
0), where
q0∼0.2101. The circular velocity Vcis then estimated as Vc=W20/(2 sin ϕ). For a typical
dwarf galaxy, the circular velocity at a large radius, such as the HI radius rHI (defined as the radius
at which the HI surface density attains 1 M⊙pc−2), is expected to be Vc. The dynamical mass
enclosed within rHI is
Mdyn(< rHI) = V2
crHI/G , (6)
where Gis the gravitational constant. The estimation of rHI is facilitated by the tight correlation
between rHI and HI mass MHI inferred from observations: log10 rHI = 0.51 log10 MHI −3.59102,103.
Assuming a Burkert profile46 with a central core65,66, we can estimate the halo mass using
Mdyn(< rHI)−Mbar =ZrHI
0
4πr2ρB(r)dr
= 2πρ′
0r3
0ln 1 + rHI
r0+ 0.5 ln 1 + r2
HI
r2
0−arctan rHI
r0 ,
(7)
and
M200c =ZR200c
0
4πr2ρB(r)dr
= 2πρ′
0r3
0ln 1 + R200c
r0+ 0.5 ln 1 + R2
200c
r2
0−arctan R200c
r0 ,
(8)
where Mbar ≃M∗+ 1.33MHI denotes the galactic baryonic mass; r0and ρ′
0are free parameters
describing the core of the dark matter halo, and r0is found to be related to M200c by 104,
log[(r0/kpc)] = 0.66 −0.58(log[M200c /1011M⊙]) .(9)
The halo mass, M200c, can then be estimated with equations (7), (8), and (9). Note that R200c
represents the virial radius enclosing a mean density that is 200 times the critical value and M200c
is the mass within R200c.
33
The uncertainty of the halo mass is determined using a Monte Carlo method, taking into
account uncertainties in the baryonic mass (σMbar ), in rHI (σrHI ), and in W20 (σW20 ), as well
as potential misalignment between the HI and stellar inclinations, and the uncertainty in the r0-
M200c relation. The error term σMbar also includes uncertainties in the HI mass, σMHI , provided
by ALFALFA63, and in the stellar mass σM∗due to uncertainties in the distance and magnitude.
Therefore, σ2
Mbar =σ2
M∗+ (1.33σMHI )2. The uncertainty σrHI is estimated based on the HI mass
error, while σW20 follows the method outlined in Guo et al.64. Previous studies have shown that
the stellar and gas disks in galaxies may not be perfectly co-planar, often exhibiting a small incli-
nation difference of δϕ < 20◦64,68,69. To address this misalignment, we assumed that δϕ follows
a Gaussian distribution centered at 0◦with a standard deviation of σϕ= 20◦to represent the un-
certainty associated with ϕ. For each galaxy, we generated 1,000 sets of (Mbar,W20 ,ϕ, and rHI)
based on their average values and associated uncertainties. We assumed Gaussian distributions for
these parameters centered at their average values, with the 1-σranges matching the uncertainties.
Consequently, we obtained 1,000 halo masses using equations (6)–(9). The standard deviation of
these halo masses (σM200 ) is combined with the uncertainty of the r0-M200c relation to determine
the final halo mass uncertainty, as
σ′
M200c =qσ2
M200c + (0.15 dex)2,(10)
where the scatter of the dark mass profile for a given halo mass67 is approximately 0.15 dex, which
is used for the uncertainty in the r0-M200c relation.
To compare with the mass derived from SHMR, we coverted M200c into M200m by using the
derived Burkert profile.
Abundance matching In Methods, we showed the Σ∗-zfmapping based on abundance matching.
Here we provide further details.
For abundance matching to work correctly, the procedure must (i) preserve the rank order of
zfand Σ∗to the degree set by the adopted scatter and (ii) yield a Σ∗-distribution for halos consistent
with that of dwarf galaxies. Our mapping formula meets these criteria.
The mapping first applies Pzf, which transforms zfinto a uniformly distributed variable over
[0,1]. Next, N−1converts this uniform variable into a unit Gaussian variable. The composition
34
N−1◦ Pzfthus maps the zf-distribution into a unit Gaussian distribution. A unit Gaussian random
scatter, ϵ, is then added, where a correlation coefficient ρcontrols the weight. By the additive
property of Gaussian variables, the result remains a unit Gaussian variable and has correlation
coefficients ρand p1−ρ2with the original Gaussian variable and ϵ, respectively. Finally, N
transforms the unit Gaussian back to a uniform variable, which is then converted to Σ∗that follows
the distribution of dwarfs. Here, ρcontrols the scatter: ρ= 1 indicates perfect correlation between
zfand Σ∗, whereas ρ= 0 implies no correlation.
This mapping is mathematically concise. However, it (i) does not allow the scatter to vary
with zfand (ii) does not quantify the scatter as intuitively as directly adding scatter to a physical
variable such as Σ∗. To address (i), Fig. 3a shows results for a number of ρvalues. For Σ∗≳
10 M⊙pc−2,ρ≳0.5yields a match to observations, while for Σ∗≲10 M⊙pc−2(“diffuse dwarfs”),
a stronger correlation (ρ≳0.8) is required. To address (ii), Fig. 3b displays the median and 16th–
84th percentile range of the zf-Σ∗relation for ρ= 0.85 (the case that appears closely matching to
the observed bias-Σ∗relation). The percentile range intuitively quantifies the introduced scatter by
the abundance matching.
Model assumptions, results and implications The figures in the main texts (Figs. 1–4) were ar-
ranged in a logical order, each building upon the previous one with progressively more assumptions
and leading to increasingly refined results and implications. The conclusions can thus be judged
incrementally, based on the robustness of the assumptions introduced at each step. Below is a brief
summary.
(i) In Fig. 1, we presented observational results. Here, assumptions include sample selection and
property measurements. The results show that diffuse dwarfs have stronger clustering than other
isolated dwarfs.
(ii) In Fig. 2, we reconstructed the density field at z≈0using the SDSS sample. Assumptions
include the group finder, RSD correction, and halo-matter cross-correlation. The results show a
strong correlation of diffuse dwarfs with filaments/knots, and an anti-correction with voids. This
implies that diffuse dwarfs preferentially reside within/around large cosmic structures, constraining
the conditions for their formation.
35
(iii) In Fig. 3, we traced the evolution of the z≈0density field back to high zusing a constrained
simulation, ELUCID. As the simulation cannot resolve assembly of individual halo, we introduced
an abundance modeling between zfand Σ∗. The assumptions here are the initial-condition recon-
struction and the abundance modeling. The results show that the zf-bias of halos can explain the
observed Σ∗-bias of dwarfs, raising the question of how Σ∗is physically linked to zf. This result
also provides clues for revisions to existing models in ΛCDM cosmology. The data product of this
step is the Σ∗assigned to each dwarf-host halo within the ELUCID volume.
(iv) In Fig. 4, we introduced SIDM as a possible explanation for our findings. The assumption in
this step is the isothermal Jeans model. Each halo, with its Σ∗assigned as above, is thus predicted
to contain a SIDM core characterized by rc,rρ0/4, and ρ0. The results show a similarity between
SIDM cores and dwarfs, providing insights for future observations and theoretical studies. Param-
eterizing R50 =Arrc, the predicted bias-Σ∗relations by our SIDM model are shown in Fig. 3for
comparison with the observation and other models.
1. Li, C. et al. The dependence of clustering on galaxy properties. Monthly Notices of the Royal
Astronomical Society 368, 21–36 (2006).
2. Zehavi, I. et al. Galaxy Clustering in the Completed SDSS Redshift Survey: The Dependence
on Color and Luminosity. The Astrophysical Journal 736, 59 (2011).
3. Gao, L., Springel, V. & White, S. D. M. The age dependence of halo clustering. Monthly
Notices of the Royal Astronomical Society 363, L66–L70 (2005).
4. Amorisco, N. C. & Loeb, A. Ultradiffuse galaxies: the high-spin tail of the abundant dwarf
galaxy population. Monthly Notices of the Royal Astronomical Society 459, L51–L55 (2016).
5. Di Cintio, A. et al. NIHAO - XI. Formation of ultra-diffuse galaxies by outflows. Monthly
Notices of the Royal Astronomical Society 466, L1–L6 (2017).
6. van Dokkum, P. et al. A trail of dark-matter-free galaxies from a bullet-dwarf collision.
Nature 605, 435–439 (2022).
7. Spergel, D. N. & Steinhardt, P. J. Observational Evidence for Self-Interacting Cold Dark
Matter. Physical Review Letters 84, 3760–3763 (2000).
36
8. Blanton, M. R. et al. New York University Value-Added Galaxy Catalog: A Galaxy Catalog
Based on New Public Surveys. The Astronomical Journal 129, 2562–2578 (2005).
9. Abazajian, K. N. et al. The Seventh Data Release of the Sloan Digital Sky Survey. The
Astrophysical Journal Supplement Series 182, 543–558 (2009).
10. Yang, X. et al. Galaxy Groups in the SDSS DR4. I. The Catalog and Basic Properties. The
Astrophysical Journal 671, 153–170 (2007).
11. van Dokkum, P. G. et al. Forty-seven Milky Way-sized, Extremely Diffuse Galaxies in the
Coma Cluster. The Astrophysical Journal 798, L45 (2015).
12. Mo, H. J. & White, S. D. M. An analytic model for the spatial clustering of dark matter
haloes. Monthly Notices of the Royal Astronomical Society 282, 347–361 (1996).
13. Hu, H.-J. et al. Global Dynamic Scaling Relations of H I-rich Ultra-diffuse Galaxies. The
Astrophysical Journal Letters 947, L9 (2023).
14. Kravtsov, A. V., Vikhlinin, A. A. & Meshcheryakov, A. V. Stellar Mass—Halo Mass Relation
and Star Formation Efficiency in High-Mass Halos. Astronomy Letters 44, 8–34 (2018).
15. Tinker, J. L. et al. The Large-scale Bias of Dark Matter Halos: Numerical Calibration and
Model Tests. The Astrophysical Journal 724, 878–886 (2010).
16. Wang, E. et al. The Dearth of Differences between Central and Satellite Galaxies. II. Com-
parison of Observations with L-GALAXIES and EAGLE in Star Formation Quenching. The
Astrophysical Journal 864, 51 (2018).
17. Wang, H. et al. ELUCID—EXPLORING THE LOCAL UNIVERSE WITH RECON-
STRUCTED INITIAL DENSITY FIELD. III. CONSTRAINED SIMULATION IN THE
SDSS VOLUME. The Astrophysical Journal 831, 164 (2016).
18. Wechsler, R. H., Zentner, A. R., Bullock, J. S., Kravtsov, A. V. & Allgood, B. The Depen-
dence of Halo Clustering on Halo Formation History, Concentration, and Occupation. The
Astrophysical Journal 652, 71–84 (2006).
37
19. Jing, Y. P., Suto, Y. & Mo, H. J. The Dependence of Dark Halo Clustering on Formation
Epoch and Concentration Parameter. The Astrophysical Journal 657, 664–668 (2007).
20. Bett, P. et al. The spin and shape of dark matter haloes in the Millennium simulation of a Λ
cold dark matter universe. Monthly Notices of the Royal Astronomical Society 376, 215–232
(2007).
21. Gao, L., White, S. D. M., Jenkins, A., Stoehr, F. & Springel, V. The subhalo populations
of ΛCDM dark haloes. Monthly Notices of the Royal Astronomical Society 355, 819–834
(2004).
22. Sato-Polito, G., Montero-Dorta, A. D., Abramo, L. R., Prada, F. & Klypin, A. The depen-
dence of halo bias on age, concentration, and spin. Monthly Notices of the Royal Astronomi-
cal Society 487, 1570–1579 (2019).
23. Wang, H., Mo, H. J. & Jing, Y. P. The distribution of ejected subhaloes and its implication
for halo assembly bias. Monthly Notices of the Royal Astronomical Society 396, 2249–2256
(2009).
24. Wang, H., Mo, H. J., Yang, X., Jing, Y. P. & Lin, W. P. ELUCID—Exploring the Local
Universe with the Reconstructed Initial Density Field. I. Hamiltonian Markov Chain Monte
Carlo Method with Particle Mesh Dynamics. The Astrophysical Journal 794, 94 (2014).
25. van Dokkum, P. et al. A High Stellar Velocity Dispersion and ∼100 Globular Clusters for
the Ultra-diffuse Galaxy Dragonfly 44. The Astrophysical Journal Letters 828, L6 (2016).
26. Safarzadeh, M. & Scannapieco, E. The Fate of Gas-rich Satellites in Clusters. The Astro-
physical Journal 850, 99 (2017).
27. Jiang, F. et al. Formation of ultra-diffuse galaxies in the field and in galaxy groups. Monthly
Notices of the Royal Astronomical Society 487, 5272–5290 (2019).
28. Liao, S. et al. Ultra-diffuse galaxies in the Auriga simulations. Monthly Notices of the Royal
Astronomical Society 490, 5182–5195 (2019).
38
29. Ben´
ıtez-Llambay, A. et al. Dwarf Galaxies and the Cosmic Web. The Astrophysical Journal
763, L41 (2013).
30. Rong, Y. et al. A Universe of ultradiffuse galaxies: theoretical predictions from ΛCDM
simulations. Monthly Notices of the Royal Astronomical Society 470, 4231–4240 (2017).
31. Benavides, J. A. et al. Origin and evolution of ultradiffuse galaxies in different environments.
Monthly Notices of the Royal Astronomical Society 522, 1033–1048 (2023).
32. Mo, H. J., Mao, S. & White, S. D. M. The formation of galactic discs. Monthly Notices of
the Royal Astronomical Society 295, 319–336 (1998).
33. Chan, T. K. et al. The origin of ultra diffuse galaxies: stellar feedback and quenching.
Monthly Notices of the Royal Astronomical Society 478, 906–925 (2018).
34. Guo, Q. et al. From dwarf spheroidals to cD galaxies: Simulating the galaxy population
in a ΛCDM cosmology. Monthly Notices of the Royal Astronomical Society 413, 101–131
(2011).
35. Ayromlou, M. et al. Comparing galaxy formation in the L-GALAXIES semi-analytical
model and the IllustrisTNG simulations. Monthly Notices of the Royal Astronomical Society
502, 1051–1069 (2021).
36. Pillepich, A. et al. First results from the IllustrisTNG simulations: The stellar mass content
of groups and clusters of galaxies. Monthly Notices of the Royal Astronomical Society 475,
648–675 (2018).
37. Bullock, J. S. & Boylan-Kolchin, M. Small-Scale Challenges to the ΛCDM Paradigm. An-
nual Review of Astronomy and Astrophysics 55, 343–387 (2017).
38. Tulin, S. & Yu, H.-B. Dark matter self-interactions and small scale structure. Physics Reports
730, 1–57 (2018).
39. Kaplinghat, M., Ren, T. & Yu, H.-B. Dark matter cores and cusps in spiral galaxies and their
explanations. Journal of Cosmology and Astroparticle Physics 2020, 027 (2020).
39
40. Yang, D., Yu, H.-B. & An, H. Self-Interacting Dark Matter and the Origin of Ultradiffuse
Galaxies NGC1052-DF2 and -DF4. Physical Review Letters 125, 111105 (2020).
41. Zhang, X., Yu, H.-B., Yang, D. & An, H. Self-interacting Dark Matter Interpretation of
Crater II. The Astrophysical Journal 968, L13 (2024).
42. Rocha, M. et al. Cosmological simulations with self-interacting dark matter - I. Constant-
density cores and substructure. Monthly Notices of the Royal Astronomical Society 430,
81–104 (2013).
43. Jiang, F. et al. A semi-analytic study of self-interacting dark-matter haloes with baryons.
Monthly Notices of the Royal Astronomical Society 521, 4630–4644 (2023).
44. Kong, D., Kaplinghat, M., Yu, H.-B., Fraternali, F. & Mancera Pi˜
na, P. E. The Odd Dark
Matter Halos of Isolated Gas-rich Ultradiffuse Galaxies. The Astrophysical Journal 936, 166
(2022).
45. Mancera Pi˜
na, P. E., Golini, G., Trujillo, I. & Montes, M. Exploring the nature of dark matter
with the extreme galaxy AGC 114905. Astronomy & Astrophysics 689, A344 (2024).
46. Burkert, A. The Structure of Dark Matter Halos in Dwarf Galaxies. The Astrophysical
Journal 447, L25–L28 (1995).
47. Huang, K.-H. et al. Relations between the Sizes of Galaxies and Their Dark Matter Halos at
Redshifts 0 <z<3. The Astrophysical Journal 838, 6 (2017).
48. Chen, Y., Mo, H. & Wang, H. A two-phase model of galaxy formation - II. The size-mass
relation of dynamically hot galaxies. Monthly Notices of the Royal Astronomical Society 532,
4340–4349 (2024).
49. Shi, Y. et al. A Cuspy Dark Matter Halo. The Astrophysical Journal 909, 20 (2021).
50. Correa, C. A. et al. TangoSIDM Project: is the stellar mass Tully-Fisher relation consistent
with SIDM? Monthly Notices of the Royal Astronomical Society 536, 3338–3356 (2025).
40
51. Yang, X., Mo, H. J., van den Bosch, F. C. & Jing, Y. P. A halo-based galaxy group finder: cal-
ibration and application to the 2dFGRS. Monthly Notices of the Royal Astronomical Society
356, 1293–1307 (2005).
52. Kauffmann, G. et al. The host galaxies of active galactic nuclei. Monthly Notices of the
Royal Astronomical Society 346, 1055–1077 (2003).
53. Koda, J., Yagi, M., Yamanoi, H. & Komiyama, Y. Approximately a Thousand Ultra-diffuse
Galaxies in the Coma Cluster. The Astrophysical Journal Letters 807, L2 (2015).
54. Davis, M. & Peebles, P. J. E. A survey of galaxy redshifts. V. The two-point position and
velocity correlations. The Astrophysical Journal 267, 465–482 (1983).
55. Foreman-Mackey, D., Hogg, D. W., Lang, D. & Goodman, J. emcee: The MCMC Hammer.
Publications of the Astronomical Society of the Pacific 125, 306 (2013).
56. Zhang, Z. et al. Hosts and triggers of AGNs in the Local Universe. Astronomy & Astrophysics
650, A155 (2021).
57. Trusov, S. et al. The two-point correlation function covariance with fewer mocks. Monthly
Notices of the Royal Astronomical Society 527, 9048–9060 (2023).
58. Strauss, M. A. et al. Spectroscopic Target Selection in the Sloan Digital Sky Survey: The
Main Galaxy Sample. The Astronomical Journal 124, 1810–1824 (2002).
59. Moster, B. P., Somerville, R. S., Newman, J. A. & Rix, H.-W. A COSMIC VARIANCE
COOKBOOK. The Astrophysical Journal 731, 113 (2011).
60. Chen, Y. et al. ELUCID. VI. Cosmic Variance of the Galaxy Distribution in the Local Uni-
verse. The Astrophysical Journal 872, 180 (2019).
61. Wechsler, R. H. & Tinker, J. L. The Connection Between Galaxies and Their Dark Matter
Halos. Annual Review of Astronomy and Astrophysics 56, 435–487 (2018).
62. Giovanelli, R. et al. The Arecibo Legacy Fast ALFA Survey. I. Science Goals, Survey Design,
and Strategy. The Astronomical Journal 130, 2598–2612 (2005).
41
63. Haynes, M. P. et al. The Arecibo Legacy Fast ALFA Survey: The ALFALFA Extragalactic
H I Source Catalog. The Astrophysical Journal 861, 49 (2018).
64. Guo, Q. et al. Further evidence for a population of dark-matter-deficient dwarf galaxies.
Nature Astronomy 4, 246–251 (2020).
65. Marchesini, D. et al. Hαrotation curves: The soft core question. The Astrophysical Journal
575, 801–813 (2002).
66. Rong, Y. et al. Gas-rich Ultra-diffuse Galaxies Are Originated from High Specific Angular
Momentum. Preprint at https://doi.org/10.48550/arXiv.2404.00555 (2024).
67. Wang, J. et al. Universal structure of dark matter haloes over a mass range of 20 orders of
magnitude. Nature 585, 39–42 (2020).
68. Starkenburg, T. K. et al. On the Origin of Star-Gas Counterrotation in Low-mass Galaxies.
The Astrophysical Journal 878, 143 (2019).
69. Gault, L. et al. VLA Imaging of H I-bearing Ultra-diffuse Galaxies from the ALFALFA
Survey. The Astrophysical Journal 909, 19 (2021).
70. Hahn, O., Porciani, C., Carollo, C. M. & Dekel, A. Properties of dark matter haloes in
clusters, filaments, sheets and voids. Monthly Notices of the Royal Astronomical Society 375,
489–499 (2007).
71. Nelson, D. et al. The IllustrisTNG simulations: Public data release. Computational Astro-
physics and Cosmology 6, 2 (2019).
72. Li, Y., Mo, H. J. & Gao, L. On halo formation times and assembly bias. Monthly Notices of
the Royal Astronomical Society 389, 1419–1426 (2008).
73. Bullock, J. S. et al. A Universal Angular Momentum Profile for Galactic Halos. The Astro-
physical Journal 555, 240 (2001).
74. Hearin, A. P. & Watson, D. F. The dark side of galaxy colour. Monthly Notices of the Royal
Astronomical Society 435, 1313–1324 (2013).
42
75. Behroozi, P., Wechsler, R. H., Hearin, A. P. & Conroy, C. UNIVERSEMACHINE: The
correlation between galaxy growth and dark matter halo assembly from z = 0-10. Monthly
Notices of the Royal Astronomical Society 488, 3143–3194 (2019).
76. Silk, J. Ultra-diffuse galaxies without dark matter. Monthly Notices of the Royal Astronomi-
cal Society 488, L24–L28 (2019).
77. Yozin, C. & Bekki, K. The quenching and survival of ultra diffuse galaxies in the Coma
cluster. Monthly Notices of the Royal Astronomical Society 452, 937–943 (2015).
78. Chen, Y. et al. Relating the Structure of Dark Matter Halos to Their Assembly and Environ-
ment. The Astrophysical Journal 899, 81 (2020).
79. Relatores, N. C. et al. The Dark Matter Distributions in Low-mass Disk Galaxies. II. The
Inner Density Profiles. The Astrophysical Journal 887, 94 (2019).
80. Springel, V. et al. First results from the IllustrisTNG simulations: matter and galaxy cluster-
ing. Monthly Notices of the Royal Astronomical Society 475, 676–698 (2018).
81. Nelson, D. et al. First results from the IllustrisTNG simulations: The galaxy colour bimodal-
ity. Monthly Notices of the Royal Astronomical Society 475, 624–647 (2018).
82. Naiman, J. P. et al. First results from the IllustrisTNG simulations: A tale of two elements –
chemical evolution of magnesium and europium. Monthly Notices of the Royal Astronomical
Society 477, 1206–1224 (2018).
83. Marinacci, F. et al. First results from the IllustrisTNG simulations: Radio haloes and mag-
netic fields. Monthly Notices of the Royal Astronomical Society 480, 5113–5139 (2018).
84. Weinberger, R. et al. Simulating galaxy formation with black hole driven thermal and kinetic
feedback. Monthly Notices of the Royal Astronomical Society 465, 3291–3308 (2017).
85. Pillepich, A. et al. Simulating galaxy formation with the IllustrisTNG model. Monthly
Notices of the Royal Astronomical Society 473, 4077–4106 (2018).
43
86. Henriques, B. M. B. et al. Galaxy formation in the Planck cosmology - I. Matching the
observed evolution of star formation rates, colours and stellar masses. Monthly Notices of the
Royal Astronomical Society 451, 2663–2680 (2015).
87. Pillepich, A. et al. First results from the TNG50 simulation: The evolution of stellar and
gaseous discs across cosmic time. Monthly Notices of the Royal Astronomical Society 490,
3196–3233 (2019).
88. Nelson, D. et al. First results from the TNG50 simulation: Galactic outflows driven by
supernovae and black hole feedback. Monthly Notices of the Royal Astronomical Society
490, 3234–3261 (2019).
89. Kaplinghat, M., Tulin, S. & Yu, H.-B. Dark Matter Halos as Particle Colliders: Unified
Solution to Small-Scale Structure Puzzles from Dwarfs to Clusters. Physical Review Letters
116, 041302 (2016).
90. Mo, H. J. & Mao, S. The Tully-Fisher relation and its implications for the halo density
profile and self-interacting dark matter. Monthly Notices of the Royal Astronomical Society
318, 163–172 (2000).
91. Fischer, M. S. et al. Cosmological and idealized simulations of dark matter haloes with
velocity-dependent, rare and frequent self-interactions. Monthly Notices of the Royal Astro-
nomical Society 529, 2327–2348 (2024).
92. Bell, E. F., McIntosh, D. H., Katz, N. & Weinberg, M. D. The Optical and Near-Infrared
Properties of Galaxies. I. Luminosity and Stellar Mass Functions. The Astrophysical Journal
Supplement Series 149, 289–312 (2003).
93. Blanton, M. R. & Roweis, S. K-Corrections and Filter Transformations in the Ultraviolet,
Optical, and Near-Infrared. The Astronomical Journal 133, 734–754 (2007).
94. Kroupa, P. On the variation of the initial mass function. Monthly Notices of the Royal
Astronomical Society 322, 231–246 (2001).
95. Zhang, Z. et al. Massive star-forming galaxies have converted most of their halo gas into
stars. Astronomy & Astrophysics 663, A85 (2022).
44
96. Greco, J. P. et al. A Study of Two Diffuse Dwarf Galaxies in the Field. The Astrophysical
Journal 866, 112 (2018).
97. Aihara, H. et al. The Hyper Suprime-Cam SSP Survey: Overview and survey design. Publi-
cations of the Astronomical Society of Japan 70, S4 (2018).
98. Miyazaki, S. et al. Hyper Suprime-Cam: System design and verification of image quality.
Publications of the Astronomical Society of Japan 70, S1 (2018).
99. Salim, S. et al. GALEX-SDSS-WISE Legacy Catalog (GSWLC): Star Formation Rates,
Stellar Masses, and Dust Attenuations of 700,000 Low-redshift Galaxies. The Astrophysical
Journal Supplement Series 227, 2 (2016).
100. Blanton, M. R., Eisenstein, D., Hogg, D. W., Schlegel, D. J. & Brinkmann, J. Relationship
between Environment and the Broadband Optical Properties of Galaxies in the Sloan Digital
Sky Survey. The Astrophysical Journal 629, 143–157 (2005).
101. Rong, Y., He, M., Hu, H., Zhang, H.-X. & Wang, H.-Y. Intrinsic Morphology of The Stel-
lar Components in HI-bearing Dwarf Galaxies and The Dependence on Mass. Preprint at
https://doi.org/10.48550/arXiv.2409.00944 (2024).
102. Wang, J. et al. New lessons from the H I size-mass relation of galaxies. Monthly Notices of
the Royal Astronomical Society 460, 2143–2151 (2016).
103. Gault, L. et al. VLA Imaging of H I-bearing Ultra-diffuse Galaxies from the ALFALFA
Survey. The Astrophysical Journal 909, 19 (2021).
104. Salucci, P. et al. The universal rotation curve of spiral galaxies - II. The dark matter distribu-
tion out to the virial radius. Monthly Notices of the Royal Astronomical Society 378, 41–47
(2007).
45