Toward effective source apportionment using positive matrix factorization: experiments with simulated PM2.5 data.
ABSTRACT To elucidate the relationship between factors resolved by the positive matrix factorization (PMF) receptor model and actual emission sources and to refine the PMF modeling strategy, speciated PM2.5 (particulate matter with aerodynamic diameter < 2.5 microm) data generated from a state-of-the-art chemical transport model for two rural sites in the eastern United States are subjected to PMF analysis. In addition to chi2 and R2 used to infer the quality of fitting, the interpretability of PMF factors with respect to known primary and secondary sources is evaluated using a root mean square difference analysis. For the most part, factors are found to represent imperfect combinations of sources, and the optimal number of factors should be just adequate to explain the input data (e.g., R2 > 0.95). Retaining more factors in the model does not help resolve minor sources, unless temporal resolution of the data is increased, thus allowing more information to be used by the model. If guided with a priori knowledge of source markers and/or special events, rotation of factors leads to more interpretable PMF factors. The choice of uncertainty weighting coefficients greatly influences the PMF modeling results, but it cannot usually be determined for simulated or real-world data. A simple test is recommended to check whether the weighting coefficients are suitable. However, uncertainties in the data divert PMF solutions even when the optimal weighting coefficients and number of factors are in place.
-
Article: The effective variance weighting for least squares calculations applied to the mass balance receptor model
[show abstract] [hide abstract]
ABSTRACT: The effective variance weighted least squares solution to the mass balance receptor model is derived from the theory of maximum likelihood. The solution is one which contains the effects of random uncertainties in both the receptor concentrations and the source compositions. The solution involves trancendental equations of the unknown source contribution variables, and an iterative solution is required.This solution and the ordinary weighted least squares solution are applied to ten sets of simulated data generated from known source contributions and source compositions, perturbed by random experimental errors typical of those to be found in environmental sampling. The standard deviation of the source contributions calculated from each of these data sets is compared with the uncertainty obtained from the ordinary and effective variance least squares solutions; the effective variance solution provides the more accurate estimate. Extensions of this method to other least squares treatments of environmental data are proposed.Atmospheric Environment 03/1984; 18(7):1347-1355. · 3.46 Impact Factor -
Article: Recent developments in receptor modeling
[show abstract] [hide abstract]
ABSTRACT: Receptor modeling is the application of data analysis methods to elicit information on the sources of air pollutants. Typically, it employs methods of solving the mixture resolution problem using chemical composition data for airborne particulate matter samples. In such cases, the outcome is the identification of the pollution source types and estimates of the contribution of each source type to the observed concentrations. It can also involve efforts to identify the locations of the sources through the use of ensembles of air parcel back trajectories. In recent years, there have been improvements in the factor analysis methods that are applied in receptor modeling as well as easier application of trajectory methods. These developments are reviewed. Copyright © 2003 John Wiley & Sons, Ltd.Journal of Chemometrics 04/2003; 17(5):255 - 265. · 1.95 Impact Factor -
Article: A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution
[show abstract] [hide abstract]
ABSTRACT: Factor analytic tools such as principal component analysis (PCA) and positive matrix factorization (PMF), suffer from rotational ambiguity in the results: different solutions (factors) provide equally good fits to the measured data. The PMF model imposes non-negativity of both source profiles and source contributions in order to reduce the rotational problem. Such constraints are generally insufficient to ensure a unique solution. In the Unmix approach, edges of the multidimensional distribution of source contributions define the variable relationships in the factors. The present work extends this idea into an easy-to-use graphical procedure called G space plotting for PMF modeling. Scatter plots are created of pairs of source contribution factors. When factors are plotted in this way, unrealistic rotations appear as oblique edges that define the distribution of points away from one (or both) of the coordinate axes. With a correct rotation, the limiting edges usually coincide with the axes or lay parallel with them. Inspection of the plots helps one in choosing a realistic rotation.Atmospheric Environment.
Page 1
Toward Effective Source Apportionment Using Positive Matrix
Factorization: Experiments with Simulated PM2.5Data
L.-W. Antony Chen, Douglas H. Lowenthal, John G. Watson, and Darko Koracin
Desert Research Institute, Reno, NV
Naresh Kumar and Eladio M. Knipping
Electric Power Research Institute, Palo Alto, CA
Neil Wheeler, Kenneth Craig, and Stephen Reid
Sonoma Technology, Inc., Petaluma, CA
ABSTRACT
To elucidate the relationship between factors resolved by
the positive matrix factorization (PMF) receptor model
and actual emission sources and to refine the PMF mod-
eling strategy, speciated PM2.5(particulate matter with
aerodynamic diameter ?2.5 ?m) data generated from a
state-of-the-art chemical transport model for two rural
sites in the eastern United States are subjected to PMF
analysis. In addition to ?2and R2used to infer the quality
of fitting, the interpretability of PMF factors with respect
to known primary and secondary sources is evaluated
using a root mean square difference analysis. For the most
part, factors are found to represent imperfect combina-
tions of sources, and the optimal number of factors
should be just adequate to explain the input data (e.g.,
R2? 0.95). Retaining more factors in the model does not
help resolve minor sources, unless temporal resolution of
the data is increased, thus allowing more information to
be used by the model. If guided with a priori knowledge of
source markers and/or special events, rotation of factors
leads to more interpretable PMF factors. The choice of
uncertainty weighting coefficients greatly influences the
PMF modeling results, but it cannot usually be deter-
mined for simulated or real-world data. A simple test is
recommended to check whether the weighting coeffi-
cients are suitable. However, uncertainties in the data
divert PMF solutions even when the optimal weighting
coefficients and number of factors are in place.
INTRODUCTION
As speciated PM2.5(particulate matter [PM] with aerody-
namic diameter ?2.5 ?m) data become more available
from long-term air quality networks and intensive stud-
ies, multivariate receptor models gain popularity as an
important tool for assessing PM2.5source contributions.
Unlike the effective-variance chemical mass balance (EV-
CMB)1,2approach, multivariate receptor models typically
do not assume the types and chemical profiles of contrib-
uting sources. Instead they resolve “factors” imbedded in
the data that explain the data variation in a reduced
dimensional space.3,4Realistic PM2.5source apportion-
ment requires that estimates for “factor” loadings (chem-
ical profiles) and scores (contributions) be nonnegative.
Positive matrix factorization (PMF)5and Unmix,6which
have been widely applied to the U.S. Environmental Pro-
tection Agency (EPA) PM Supersite data, represent multi-
variate receptor models with the nonnegative con-
straint.7,8EPA has made a substantial effort to make the
most recent versions of PMF and Unmix software avail-
able for use by the air quality community (http://www.epa.
gov/heasd/products/products.htm).
The Unmix model begins with a principal compo-
nent analysis to retrieve common factors (i.e., principal
components [PCs]) that can reproduce the data.6The PCs
are orthogonal. Any linear transformation (i.e., rotation)
of the PCs can explain the data as well as the original PCs.
To pin down the contributing sources, Unmix attempts to
detect “edges” in the space spanned by the PCs. The edges
signify missing or small contributions from one or more
sources and therefore define source profiles. Unmix some-
times reports “no solution” when edges are not detected
or contain negative values. On the other hand, PMF at-
tempts to find factors that best explain the data without
any constraints other than nonnegativity.5Factor solu-
tions always exist, but they are not necessarily unique or
do not correspond to actual sources.
Reff et al.9provide a framework for PMF modeling
that includes three steps: (1) prepare the data to be mod-
eled, (2) process the data to develop a feasible and robust
solution, and (3) interpret the results. PMF results are
found to be sensitive to the choice of the number of
factors, species, measurement uncertainty estimates, and
IMPLICATIONS
Despite the efforts of the U.S. Environmental Protection
Agency and many researchers, PMF receptor modeling
remains a subjective practice that relies largely on the
modeler’s experience to achieve the “best” solution. This
limits the reproducibility and application of the PMF results
in air quality management. By using simulated instead of
measured data, this study provides a more objective eval-
uation of PMF modeling and shows its strengths and weak-
nesses. Overall the findings are encouraging, but more
efforts are warranted to further improve the PMF software
and modeling strategies.
TECHNICAL PAPER
ISSN:1047-3289 J. Air & Waste Manage. Assoc. 60:43–54
DOI:10.3155/1047-3289.60.1.43
Copyright 2010 Air & Waste Management Association
Volume 60 January 2010
Journal of the Air & Waste Management Association 43
Page 2
rotational controls (for PMF, this is the FPEAK parameter)
that must be made by the user. Whether the resulting
factors are interpretable in terms of primary PM emissions
and/or secondary aerosol formation depends heavily on
the modeler’s judgment. Blind round-robin tests (e.g.,
Hopke et al.10) indicated that different modeling groups
often identified similar factors (qualitative agreement) but
different factor contributions (quantitative disagreement)
from the same dataset.
How well receptor models, particularly PMF, achieve
source apportionment objectives is seldom evaluated be-
cause actual source contributions are unknown in most
cases (see a review in Lowenthal et al.,11this issue). This
experiment represents an attempt to perform such evalu-
ation with synthetic data from a modern chemical trans-
port model (CTM). Modern CTMs that integrate emis-
sions, dispersion, chemical reactions, and deposition
models are capable of forecasting or hindcasting ambient
PM2.5concentration and chemical composition at speci-
fied receptor sites. Contributions from individual sources
can be tracked. CTM-simulated datasets provide a good
opportunity to test receptor models, interpret multivari-
ate receptor model factors, and recommend improved
strategies for applying receptor models and interpreting
their results.
EXPERIMENTAL APPROACHES
Simulation of Speciated PM2.5Concentrations
This study tested PMF using simulated PM2.5concentra-
tions analogous to those obtained by the Interagency
Monitoring of Protected Visual Environments (IMPROVE,
see Malm et al.12) aerosol sampling program at two north-
eastern U.S. national parks: Brigantine National Wildlife
Refuge (BRIG), NJ (39.465° north, 74.4492° west, 5 m
mean sea level [MSL]) and Great Smoky Mountains Na-
tional Park (GRSM), TN (35.6334° north, ?83.9416° west,
810.6667 m MSL). IMPROVE aerosol data contain 24-hr
average concentrations of PM2.5 mass, ions (sulfate
[SO4
(EC) and organic carbon (OC) , and approximately 23
crustal and trace elements.
Simulated data were generated with the Community
Multiscale Air Quality (CMAQ) model coupled with the
Sparse Matrix Operator Kernal Emissions (SMOKE) model
for a modeling domain that covers the eastern half of the
United States.11MM5 (National Center for Atmospheric
Research Mesoscale Meteorological Model) meteorologi-
cal fields were simulated for this domain for the year 2002
at a resolution of 12 km. The CMAQ model accounted for
emissions, chemical transformation, cloud processing,
and removal, resulting in the production of secondary
inorganic (SO4
ganic aerosol mass (OM). Simulations were done for win-
ter (January–March) and summer (July–September) peri-
ods in 2002. CMAQ simulated receptor concentrations on
an hourly basis, allowing averaging over longer time
scales (e.g., 6 or 24 hr). Relative source contributions are
expected to vary diurnally; for example, as the mixed
layer rises during daytime hours. Thus, PMF model per-
formance can be tested with respect to sample time reso-
lution.
2?], nitrate [NO3
?], chloride [Cl?]), elemental carbon
2?, NO3
?, ammonium [NH4
?]) and or-
The CMAQ model input consisted of emissions of
pollutants: PM2.5and PM with aerodynamic diameter less
than 10 ?m (PM10) (primary mass), volatile organic com-
pounds, nitrogen oxides (NOx), sulfur dioxide (SO2), car-
bon monoxide (CO), and ammonia (NH3) compiled from
the EPA’s 2002 National Emissions Inventory (NEI).13
SMOKE assigned primary source profiles containing IM-
PROVE species for point and area sources. It was deter-
mined that 101 source categories accounted for at least
95% of criteria pollutant emissions in the modeling do-
main. The 101 source categories were related to 43 source
profiles selected from the literature.11Care was taken to
ensure that the selected profiles represented the variabil-
ity of source emissions composition.
The SMOKE code was modified by “tagging” primary
PM2.5 mass associated with each source profile. The
tagged PM2.5contributions are referred to as T1, T2…, T43
(see source descriptions in Table 1). Reported concentra-
tions of each species include a primary component, cal-
culated from T1–T43and the corresponding PM2.5source
profiles, and secondary contributions in the cases of
SO4
2?, NO3
?, and OM. Thus
Ci,t? Ci,t
pri? Ci,t
sec??
j?1
J
fi,jTj,t? Ci,t
sec
(1)
where Ci,t, Ci,t
i, its primary, and secondary component, respectively;
and Tj,tis the PM2.5contribution of Tjat sample period t.
fi,jis the mass fraction of species i in source profile j. Ci,t
dominates SO4
is zero for EC, crustal, and trace element concentrations.
PM2.5mass can be included in PMF analysis.9In this
experiment, PM2.5mass concentration was calculated
from the sum of primary (T1? T2? … T43) and secondary
(1.38 ? [SO4
nents, assuming that all secondary SO4
in the form of ammonium sulfate [(NH4)2SO4] and am-
monium nitrate (NH4NO3), respectively. This assumption
does not influence this analysis because NH4
included as a modeled species as it is not routinely mea-
sured in the IMPROVE network. Reconstructed aerosol
mass according to an enhanced (with trace elements)
IMPROVE formula (i.e., (NH4)2SO4? NH4NO3? OM ?
EC ? crustal material ? trace elements14) overestimates
the PM2.5mass by 3–5%. This bias reflects different forms
of SO4
origins as well as errors in the estimate of the crustal mass
component.
pri, and Ci,t
secare the concentration of species
sec
2?and NO3
?and is significant for OM but
2?]sec? 1.29 ? [NO3
?]sec? [OM]sec) compo-
2?and NO3
?were
?was not
2?, NO3
?, and OM between primary and secondary
Source Contributions
This study focuses on the summer simulation period from
July 1 to September 30, 2002. According to the NEI, the
most important sources of primary PM2.5include CFPP
(T08), OnR Diesel (T13), OffR Diesel (T17), Gasoline (T22),
Paper Burn (T28), PV Dust (T36), UPV Dust (T40), and RWC
(T41) (see Table 1 for definitions of source abbreviations).
The secondary SO4
secondary organic matter (SOM) that account for most
PM2.5mass have been presented as additional sources;
2?(SS), secondary NO3
?(SN), and
Chen et al.
44 Journal of the Air & Waste Management AssociationVolume 60 January 2010
Page 3
TSS, TSN, and TSOM, respectively, denote their PM2.5mass
contributions. Secondary NH4
mostly associated with SO4
of TSSand TSNdo not correlate strongly with any primary
sources (R2? 0.68), whereas the TSOMcontributions at
GRSM are correlated with T22at R2? 0.79.
Lowenthal et al.11found reasonably good agreement
between CMAQ-modeled SO4
those measured during corresponding sample periods by
IMPROVE. Lower correlations between modeled and mea-
sured OM were likely due to uncertainty in the simulation
of secondary organic aerosol formation. Despite this,
?(not listed in Table 1) is
2?and NO3
?. Contributions
2?
concentrations and
modeled OM and EC concentrations were higher at BRIG
than at GRSM (Figure 1), consistent with the fact that
BRIG is closer to major urban areas with substantial traffic
emissions. The synthetic dataset generated by the deter-
ministic model is “perfect” in this application; its role is to
represent how chemical compounds may undergo trans-
port and chemical transformation in the atmosphere
while retaining source identification.
Figure 1 identifies the top three major sources for
each species and shows their fractional contributions to
the average simulated concentrations. Industrial (T25) is
the only important source of As, which is therefore a
Table 1. List of sources included in the CMAQ simulation.
SourceProfile Description Short Name
Contribution at
BRIG
Contribution at
GRSM
T01
T02
T03
T04
T05
T06
T07
T08
T09
T10
T11
T12
T13
T14
T15
T16
T17
T18
T19
T20
T21
T22
T23
T24
T25
T26
T27
T28
T29
T30
T31
T32
T33
T34
T35
T36
T37
T38
T39
T40
T41
T42
T43
TSS
TSOM
TSN
Asphalt roofing production
Blast furnace fugitive—Geneva Steel Plant
Boiler no. 2 fuel oil fired
CNG powered bus
CARB agricultural dust
CARB, EPA agricultural burn average
Charcoal manufacturing
Coal-fired power plant—BRAVO, low OC, high Se, high SO4
Coal composition, 2% S
Coke oven stacks—Geneva steel plant
Coke plant
Composite of dairy cattle soil profiles
On-road heavy-duty diesel vehicle—EPA
EPA Al production
EPA and BRAVO cement (wet or dry)
EPA Kraft pulping
Off-road heavy-duty diesel vehicle—EPA
EPA vegetative detritus
Ferromanganese furnace
Fiberglass composition from TMO
Fresno area construction dust (freeway).
Gasoline vehicle
Glass furnace
Igneous rock composition
Industrial manufacturing—average
Lead smelter average
Limestone Imperial Valley
Brick-shaped paper waste burning
Municipal incinerator (Philadelphia)
Natural gas combustion, EPA, GREER, Denver
Oil-fired power plant EPA no. 11501–11509
Orchard heating smudge pots
Particleboard dryer/direct-fired
Pure H2SO4
Pure OC to simulate VOC evaporation
Paved road dust—BRAVO, CARB, CRPAQS, MZ, NFRAQS
Sawmill—EPA
Stack emission, Texas petroleum refinery
TSS12 industrial construction
Unpaved road dust—CRPAQS, BRAVO, CARB
Residential wood combustion
Industrial wood combustion—EPA
Secondary Al production—EPA
Secondary SO4
Secondary organic matter
Secondary NO3
Asphalt
Steel
OF Boiler
CNG Bus
Ag Dust
Ag Burn
Charcoal
CFPP/Coal
Coal Feedstock
Coke Oven
Coke Plant
Dairy Soil
OnR Diesel
Al Production
Cement
Kraft Pulping
OffR Diesel
Veg Detritus
Iron Furnace
Fiberglass
Con Dust
Gasoline
Glass Furnace
Ig Rock
Industrial
Lead
Limestone
Paper Burn
Incineration
NG
OFPP
Smudge Pots
Particleboard
vH2SO4
vOC
PV Dust
Sawmill
Refinery
Industrial Con
UPV Dust
RWC
IWC
Al2 Production
SS
SOM
SN
1.5 ? 2.1
3.6 ? 6.9
19.5 ? 20.0
1.2 ? 1.0
37.1 ? 56.6
3.7 ? 5.1
0.2 ? 0.4
258.2 ? 252.4
1.7 ? 2.4
0.5 ? 1.6
0.2 ? 0.4
0.6 ? 1.2
249.7 ? 157.0
0.6 ? 1.4
7.9 ? 11.6
3.2 ? 5.4
61.3 ? 40.6
0.4 ? 0.6
5.8 ? 9.3
1.0 ? 1.3
38.5 ? 36.7
196.0 ? 88.6
6.3 ? 11.7
24.4 ? 21.7
26.0 ? 21.2
0.0 ? 0.0
0.4 ? 0.7
85.8 ? 87.2
3.7 ? 5.6
15.6 ? 14.7
25.7 ? 19.5
0.0 ? 0.1
0.5 ? 1.0
0.1 ? 0.2
0.1 ? 0.1
91.1 ? 65.8
0.8 ? 1.4
3.7 ? 7.0
22.9 ? 18.5
70.1 ? 57.8
332.4 ? 437.5
1.8 ? 3.1
0.6 ? 0.8
4444.4 ? 4245.4
664.6 ? 541.7
284.3 ? 594.0
0.2 ? 0.2
7.6 ? 10.4
3.6 ? 2.8
0.7 ? 0.3
55.5 ? 54.1
5.5 ? 5.9
0.8 ? 1.1
336.1 ? 219.2
5.6 ? 8.2
2.7 ? 3.8
1.7 ? 3.1
4.5 ? 3.0
66.4 ? 32.4
63.5 ? 102.4
9.8 ? 13.6
18.1 ? 18.3
36.6 ? 18.5
1.1 ? 1.7
5.6 ? 4.9
3.0 ? 2.7
23.0 ? 12.1
51.5 ? 23.6
4.8 ? 5.6
78.9 ? 39.2
19.8 ? 13.0
0.0 ? 0.0
1.3 ? 2.1
253.6 ? 127.9
1.5 ? 0.9
4.3 ? 2.2
6.0 ? 4.9
0.2 ? 0.2
2.5 ? 3.6
0.1 ? 0.1
0.2 ? 0.2
87.7 ? 38.2
4.9 ? 5.6
0.7 ? 0.8
36.6 ? 22.3
182.2 ? 100.3
69.1 ? 99.5
26.4 ? 27.9
38.9 ? 62.4
8126.3 ? 4167.5
676.4 ? 473.9
8.3 ? 46.5
a
b
c2?
c
c
?
Notes: Average PM2.5contributions (ng/m3) at BRIG and GRSM are presented.
concentrations were calculated by CMAQ.
aMerged with SS factor;
bMerged with SOM factor;
cSecondary species
Chen et al.
Volume 60 January 2010
Journal of the Air & Waste Management Association 45
Page 4
unique marker for industrial manufacturing emissions.
Similar but less unique markers include Ca, Ti, Mo, Zr,
and Se for CFPP (T08), Br for Industrial (T25), Ni and V for
Refinery (T38) and OFPP (T31), Mn for Iron Furnace (T19)
and CFPP (T08), and EC for Paper Burn (T28) and OnR
Diesel (T13). vH2SO4(T34) and vOC (T35) represent pure
SO4
but their contributions to SO4
compared with secondary sources. Hereafter, these two
sources are ignored because PMF is not expected to dis-
tinguish them from the SS and SOM sources.
2?and OM, respectively (i.e., fi,j? 0 for other species),
2?and OM are negligible
PMF Receptor Modeling and Evaluation
PMF attempts to identify factors that explain variability in
PM2.5chemical composition. For each sample period,
Ci,t??
k?1
K
Fi,kGk,t
(2)
where Fi,kis the mass fraction of species i in factor k, and
Gk,tis the PM2.5mass contribution associated with the
factor. Typically, K (the number of factors) is much less
than J (the assumed number of sources). PMF uses an
iterative algorithm to determine nonnegative Fi,kand
Gk,t, for a specified K, which minimize
x2?
1
N ? I?
t?1
N?
i?1
I?Ci,t??
k?1
?i,t
K
Fi,kGk,t?
2
(3)
where ?i,tis a weighting coefficient that is ideally the
actual measurement uncertainty for species i, and I and N
signify the total number of species and sample periods,
respectively. The nonnegative algorithm has been de-
scribed by Paatero5and Paatero et al.15This study uses the
PMF2 software provided by P. Paatero at University of
Helsinki, Finland. PMF2 allows for factor rotation (linear
transformation) by adjusting its FPEAK, Fkey, and Gkey
parameters. In robust mode, PMF2 adapts the Huber in-
fluence function16that iteratively reweights the input
data to reduce the influence of outliers. EPA PMF v1.1 was
also obtained online. In most cases, the two software
programs generated similar results for the same input
data, although FPEAK is not available in EPA PMF version
1.1 and PMF2 does not calculate bootstrapping uncertain-
ties automatically.
The quality of the PMF fit may also be assessed from
the lowest value of Ri
is defined as the squared Pearson’s correlation between
Ci,tand ¥k?1
Fi,kGk,tover all sample periods. If Rmin
Ri
2(i.e., Rmin
2
), which for each species i
K
2
is 1, all
2are 1, suggesting a perfect correlation.
3rd
2nd
1st
1
1
T41 T22 T21 T40 T40 T36 T40 T08 T40 T13 T13 T25 T36 T41 T25 T23 T38 T13 T08 T02 T17 T41 T38 T22 T40 T22 T22
SOM T08 T22 T36 T36 T40 T36 T36 T36 T08 T22 T29 T25 T08 T08 T31 T08 T36 T13 T36 T13 T13 T08 T25 T36 T28 T41
SSSSSNT08 T08 T08 T08 T41 T08 T25 T25 T41 T08 T25 T19 T08 T31 T22 T25 T08 T08 T08 T31 T08 T08 T13 SOMSS
SS SNT08 T08 T08 T08 T41 T08 T25 T25 T41 T08 T25 T19 T08 T31 T22 T25 T08 T08 T08 T31 T08 T08 T13 SOM
0
0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
PM2.5
SO4=
NO3-
Al
Al
Si
Si
Ca
Ca
Fe
Fe
K
K
Ti
Ti
As
As
Br
Br
Cl
Cl
Cr
Cr
Cu
Cu
Mn
Mn
Mo
Mo
Ni
Ni
P
P
Pb
Pb
Rb
Rb
Se
Se
Sr
Sr
V
V
Zn
Zn
Zr
Zr
EC
EC
OM
Species
Species
Contribution by
Fractional
Source
0.1
0.1
1
1
10
10
100
100
1000
1000
10000
10000
Mean Concentration
(ng/m3)
3rd
2nd
1st
T41 T22 T21 T40 T40 T36 T40 T08 T40 T13 T13 T25 T36 T41 T25 T23 T38 T13 T08 T02 T17 T41 T38 T22 T40 T22 T22
SOM T08 T22 T36 T36 T40 T36 T36 T36 T08 T22 T29 T25 T08 T08 T31 T08 T36 T13 T36 T13 T13 T08 T25 T36 T28 T41
PM2.5
SO4=
NO3-
OM
n
Mean Concentration
(ng/m3)
3rd
2nd
1st
1st
T08 T16 T22 T14 T24 T36 T14 T40 T24 T14 T28 T25 T40 T30 T40 T42 T14 T22 T05 T40 T14 T14 T40 T28 T36 T17 T41
SOM T08 T43 T40 T40 T40 T40 T43 T40 T08 T14 T42 T25 T25 T08 T31 T31 T43 T08 T02 T16 T43 T31 T25 T40 T13 T28
SS
SS
SN T08 T08 T08 T08 T42 T08 T25 T25 T43 T08 T08 T19 T08 T08 T40 T25 T08 T08 T08 T08 T08 T08 T28 SOM
SS
SS
SN
T08 T08 T08 T08 T42 T08 T25 T25 T43 T08 T08 T19 T08 T08 T40 T25 T08 T08 T08 T08 T08 T08 T28 SOM
0
0
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1
PM2.5
SO4=
NO3-
Al
Al
Si
Si
Ca
Ca
Fe
Fe
K
K
Ti
Ti
As
As
Br
Br
Cl
Cl
Cr
Cr
Cu
Cu
Mn
Mn
Mo
Mo
Ni
Ni
P
P
Pb
Pb
Rb
Rb
Se
Se
Sr
Sr
V
V
Zn
Zn
Zr
Zr
EC
EC
OM
Species
Species
Contribution
Fractional
by Source
0.1
0.1
1
1
10
10
100
100
1000
1000
10000
10000
Mean Concentration
(ng/m3)
3rd
2nd
T08 T16 T22 T14 T24 T36 T14 T40 T24 T14 T28 T25 T40 T30 T40 T42 T14 T22 T05 T40 T14 T14 T40 T28 T36 T17 T41
SOM T08 T43 T40 T40 T40 T40 T43 T40 T08 T14 T42 T25 T25 T08 T31 T31 T43 T08 T02 T16 T43 T31 T25 T40 T13 T28
PM2.5
SO4=
NO3-
OM
Contribution
by Source
Mean Concentration
(ng/m3)
(a)
(b)
Figure 1. Average concentrations of simulated species (dashed line) and fractional contributions from the three most significant sources to
each species (black, white, and red bars): (a) BRIG, and (b) GRSM.
Chen et al.
46 Journal of the Air & Waste Management AssociationVolume 60 January 2010
Page 5
If true source contributions were known, as they are
in this study (Table 1), an “ideal” PMF solution will suc-
cessfully separate the contributions into different factors.
Thus, for each species i from factor k,
Fi,kGk,t??
j ? 1
J
?k,jfi,jTj,t
(4)
where ?k,jis 1 or 0, indicating that the source j (primary or
secondary) is entirely included in or excluded from the
factor k. A factor can contain one or multiple sources, but
for mass conservation, each source can only be in one
factor; that is, for each j only one ?k,jcan be 1. Equation
4 cannot be met exactly because of the approximate na-
ture of factor analysis; di,k,tdenotes the difference be-
tween the left- and right-hand sides of eq 4, and
D2?
1
N ? I?
t ? 1
N?
i ? 1
I?
k ? 1
Kdi,k,t
?i,t
2
2 ??
k ? 1
K?
1
N ? I?
t ? 1
N?
i ? 1
I
di,k,t
?i,t
2
2???
k ? 1
K
Dk
2
(5)
Finding ?k,jvalues that minimize D2is a special case of
constrained regression and was achieved using commercial
software (i.e., MATLAB; see supplemental data published at
http://secure.awma.org/onlinelibrary/samples/10.3155-
1047-3289.60.1.43_supplmaterial.pdf for details). In other
words, D2is a measure of the root mean square (RMS)
difference between factors and source combinations that
best explain the factors. Dk
factor k. More “correct” modeling parameters (e.g., factor
number, rotation, and data uncertainty) are those leading to
smaller D2(ideally D2? 0), because the purpose of receptor
modeling is to resolve individual sources or unique combi-
nationsofsources.ItshouldbenotedthatD2andDk
be calculated in real-world receptor modeling studies be-
cause the true sources are unknown. In this study, they
provide a means of evaluating, overall (i.e., considering all
factors and species), how well a PMF solution represents
primary and/or secondary sources for these synthetic data.
D2, ?2, and Rmin
of various PMF solutions are calculated and
their dependence on number of factors, number of samples
(i.e., sample duration), factor rotation, and selection of
weighting coefficient, ?i,t, are investigated.
2is the RMS difference specific to
2cannot
2
RESULTS AND DISCUSSION
The PMF analysis used all 27 species shown in Figure 1,
including PM2.5mass, for 24-hr average samples (N ? 92).
PMF requires a weighting coefficient for each species (if
for no other reason than to scale species concentrations of
widely different magnitudes), although analytical and
sampling uncertainties are not relevant to the synthetic
data. Considering that all species were of equal impor-
tance, ?i,twas set to 10% of the average concentration of
species i. Thus, the weighting coefficient, denoted as ?i
constant for all Ci,t. The PMF “robust mode” was first used
without any rotation. Figure 2 shows Rmin
functions of the number of factors used in the PMF
model. In general, ?2decreased exponentially with K.
0, is
2
, ?2, and D2as
Rmin
GRSM. D2decreased initially with K, reached a minimum
at K ? 4 and 6 for BRIG and GRSM, respectively, and
gradually increased thereafter. It appears that after a cer-
tain point, the additional factor(s) neither match a true
source (or source combination) nor improve explanation
of the existing factors with known sources.
Because at least seven factors are needed to explain
more than 95% of the variability, eight- and seven-factor
solutionsforBRIG(denoted
(GRSM_7F), respectively, that have relatively low D2serve
as the base-case PMF solutions for the two sites. Figure 3,
a and b, shows these two solutions in terms of contribu-
tion to PM2.5mass by factor. Sources listed under each
factor are those with ?k,jequal to 1 (eq 4), and they are
ranked according to their contributions to PM2.5(high to
low, bottom to top). The factors are named after the
largest contributing source within each factor.
Solutions at both sites contain two dominating fac-
tors characterized by SS and SOM. The SS factors (F1 in
Figure 3, a and b) underestimate the SS contribution (i.e.,
SO4
tributed to other factors representing primary sources
and/or SOM. The SOM factors generally overestimate
TSOM, and some of the excess mass is explained by OffR
Diesel (T17) at BRIG (Figure 3a) or by Paper Burn (T28) at
GRSM (Figure 3b). OffR Diesel and Paper Burn are impor-
tant contributors of primary OM. Summing primary OM
in T17or T28with TSOMexceeds OM in the SOM factor but
only by 8–12%. It is possible that PMF also distributed
some SOM to factors representing primary sources, SS,
and/or SN. Factors at BRIG include RWC, SN, PV Dust,
CFPP, Gasoline, and OnR Diesel. The PV Dust factor con-
tains several geological sources. At GRSM, the base-case
factors are identified as (1) CFPP, (2) OffR Diesel, (3) UPV
Dust, (4) PV Dust, and (5) Gasoline in addition to SS and
SOM. Although named after PV Dust, this factor contains
small contributions from nongeological sources. For both
sites, a few primary sources were assigned to the SS factor
to explain crustal and trace elements in the factor (Figure
3). OM apportioned to the SS factors is relatively low
(8%/1% of total OM at BRIG/GRSM), although it cannot
be explained by only the primary sources.
2
was less than 0.95 until K was equal to 7 at BRIG and
BRIG_8F)andGRSM
2?in TSS) by 12–22%, indicating that some SS is dis-
Effect of Sample Duration and Number of Data
Points
Although it is currently not possible for the IMPROVE
network, using 6- and 1-hr average data substantially in-
creases the number of samples for PMF analysis. Higher
temporal resolution can be expected to result in better
receptor modeling results by including diurnal informa-
tion in the model. For BRIG and GRSM, 6- and 1-hr
simulated data led to 367 and 2208 samples for PMF
modeling, respectively. The same uncertainty scheme
used previously was assumed.
One- to 26-factor solutions were calculated for 6- and
1-hr datasets at BRIG and GRSM. The dependence of Rmin
?2, and D2on the number of factors was similar to that
shown in Figure 2, and the best solutions were picked
following the same criteria (i.e., with the lowest D2for
Rmin
? 0.95). Table 2 compares the solutions selected for
sample durations of 24, 6, and 1 hr in terms of resolved
2
,
2
Chen et al.
Volume 60 January 2010
Journal of the Air & Waste Management Association 47
Page 6
factors and the respective dominant sources (see supple-
mental data for a complete source attribution in these
solutions). Increasing temporal resolution results in more
meaningful factors. At BRIG, base K is 8, 9, and 11 for the
24-, 6-, and 1-hr datasets, respectively. The overall factor-
source relationships also improve, as evidenced by lower
D2with decreasing sample duration. This improvement is
less apparent for GRSM. Because BRIG is closer to primary
emitters than GRSM, diurnal variation is more likely to
help separate minor sources of PM2.5at BRIG than at
GRSM. According to the leading source, additional factors
derived from BRIG 6-hr data include Paper Burn and OffR
Diesel. For 1-hr data, the additional factors are Paper
Burn, OFPP, Con Dust, and Ig Rock. The Gasoline factor
(originally in the BRIG 24-hr solution) is merged into the
OnR Diesel factor in 6- and 1-hr solutions. For GRSM, the
additional factors from the 1-hr data include Paper Burn,
OFPP, OnR Diesel, and Iron Furnace. Most of these factors
represent complex mixtures of sources and have relatively
small contributions to PM2.5.
Dk
by sources (Table 2). The CFPP factor contribution derived
from BRIG and GRSM 24-hr data does not agree well with
the actual CFPP/coal source (T08), thus accounting for a
sizable fraction of D2. The use of 1-hr data appears to
improve this situation. Figure 4 shows that R2between Se
(a CFPP marker) from the BRIG CFPP factor and from T08
2identifies factors that are less well fitted/explained
increases from 0.91 for 24-hr data to 0.99 for 1-hr data.
This does not translate to a better estimate of the PM2.5
contribution by the CFPP factor toward T08because either
PM2.5mass or Se is only 1 of the 27 species considered in
eq 5. In fact, BRIG 1-hr data led to more accurate estimates
of TSS, TSN, and T41(RWC) by corresponding factors, but
not so for other sources (Table 2). As to GRSM, the 1-hr
data factors, except PV Dust, were explained better by the
sources than were the 24-hr data factors (i.e., lower Dk
and they estimated TSS, T13(OnR Diesel), and T17(OffR
Diesel) much more accurately.
2),
Factor Rotation
PMF2 allows factor rotation by adjusting the FPEAK pa-
rameter. According to the PMF2 user’s manual, a positive
FPEAK forces species loadings to concentrate into single
factors, thus creating more zero and large values in factor
profiles (Fi,k). At the same time, it spreads individual sam-
ple contributions more evenly among factors, thus creat-
ing more intermediate values of Gk,t. Paatero et al.17pro-
vided a graphical interpretation for how this works. Such
adjustments could lead to more accurate solutions than
for the neutral case (FPEAK ? 0) if one or more species are
known to originate predominately from a single source.
For example, in this study, TSSaccounted for 95 and 97%
of SO4
As at BRIG and GRSM, respectively. However, the percent
2?and Industrial (T25) contributed 97 and 95% of
0
20
40
60
80
1F 2F3F 4F5F6F 7F 8F9F
10F11F12F13F 14F15F 16F17F18F 19F20F 21F22F23F24F 25F26F
Number of Factors (K)
D
BRIG D2
GRSM D2
0.001
0.01
0.1
1
10
100
1F2F 3F4F5F 6F7F 8F9F
10F 11F12F 13F14F15F16F 17F18F19F 20F21F 22F23F 24F25F26F
χ2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Rm2
BRIG chi
GRSM chi
BRIG R2
GRSM R2
(a)
(b)
Figure 2. Rmin
indicate the selected number of factors.
2, ?2, and D2as functions of number of factors (K) selected for PMF analysis of BRIG and GRSM 24-hr data. The dark arrows
Chen et al.
48 Journal of the Air & Waste Management AssociationVolume 60 January 2010
Page 7
contributions from the corresponding factors were lower
in the 24-hr base solutions: 83 and 76% for SO4
and 66% for As at BRIG and GRSM, respectively (note that
the Industrial source was associated with the OnR Diesel
factor at BRIG and the OffR Diesel factor at GRSM [Figure
2?and 70
3]). A negative FPEAK produces the opposite effect, creat-
ing more uniform species loadings in the profiles but
more focused contributions among the factors. Negative
FPEAK could be useful if some sample contribution(s) are
known to come from only one source. For example, other
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Lead
Smudge Pots
Charcoal
Limestone
Veg Detritus
Sawmill
Steel
Iron Furnace
Coke Plant
Al ProductionGlass Furnace
Particleboard
CNG Bus CementAsphalt
Coke Oven
IWC
Ag DustKraft Pulping
Dairy SoilAg BurnCon DustRefineryFiberglass
Al ProcessingNGUPV DustIndustrial ConIg Rock
Coal high S
OffR Diesel
OF BoilerIncineration Paper BurnOFPP Industrial
SSSOM
RWC
SNPV Dust
Coal
Gasoline OnR Diesel
0
1000
2000
3000
4000
5000
F1
F2F3F4F5 F6
F7
F8
Factor & Source Contribution to
PM2.5 Mass (ng/m3)
(a)
Diesel
OF Boiler IncinerationCNG Bus Ag BurnGasoline Paper BurnIg Rock
Ag Dust Smudge Pots Coke Oven
Refinery
NG
11
10
9
8
7
6
5
4
3
2
1
Asphalt
Charcoal
Veg Detritus
Limestone
Smudge Pots
Lead
Coke PlantFiberglassRefineryAg Burn
Particleboard OFPP
Incineration
Steel
NG
Kraft PulpingCoke Oven Al ProductionCNG Bus
Dairy Soil
Ag DustSawmillAl ProcessingSN
Glass Furnace Ig RockIndustrial
Iron Furnace
OnR DieselCement
Industrial ConPaper BurnOF BoilerIWCCoal high S
RWC
Con Dust
SSSOMCoal
OffR Diesel
UPV Dust
PV Dust
Gasoline
0
1500
3000
4500
6000
7500
9000
F1F2
F3
F4F5
F6
F7
Factor & Source Contribution to
PM2.5 Mass (ng/m3)
(b)
Figure 3. Contribution to PM2.5mass from factors in the PMF base solution (crosses) and from the
sources associated with the factors (stacked bars). Factors are arranged from left to right according to their
contribution to PM2.5mass. Sources are shown alternatively in black and gray corresponding to source
names listed under each factor. Sources are ranked from bottom to top in order of decreasing PM2.5mass
contribution: (a) BRIG, and (b) GRSM. Contributions of the sources are also presented in Table 1.
Chen et al.
Volume 60 January 2010
Journal of the Air & Waste Management Association 49
Page 8
sources could have been inactive or not upwind during a
particular sample period.
Because individual factors cannot be isolated during a
FPEAK rotation, the rotation may improve source associ-
ations with one factor but deteriorate them for others.
Factor rotations usually increase ?2from the neutral case,
although the PMF2 user’s manual suggests avoiding dras-
tically increasing ?2when using FPEAK. Figure 5 shows ?2
and D2as functions of FPEAK adjustment for the BRIG_8F
and GRSM_7F base solutions. In both cases, there are no
appreciable changes in ?2for FPEAK between ?0.1 and
0.1. D2decreases as FPEAK increases from zero, reaching a
minimum at FPEAK of approximately 0.03 and 0.04 for
BRIG and GRSM, respectively, where factor solutions
most closely correspond to the actual sources. An even
larger FPEAK provides little additional benefit in low-
ering ?2and/or D2. Negative FPEAK tends to rapidly
increase D2.
As compared with the neutral (FPEAK ? 0) solution,
BRIG_8F with FPEAK ? 0.03 and GRSM_7F with FPEAK ?
0.04 yielded a more representative SS factor, with DSS
decreasing from 0.48 to 0.27 (BRIG) and from 1.97 to 0.43
(GRSM). The PM2.5mass contribution from the rotated SS
factor was within 2% of the actual value (i.e., TSS), and the
2
Table 2. Summary of selected PMF solutions.
Time
Resolution
BRIG GRSM
24-hr6-hr 1-hr
24-hr
FPEAK ? 0.0324-hr6-hr 1-hr
24-hr
FPEAK ? 0.04
Number of
samples
Number of
factors (K)
Rmin
?2
D2
SS
SN
SOM
CFFP/Coal
RWC
OnR Diesel
PV Dust
UPV Dust
Paper Burn
OffR Diesel
Gasoline
OFPP
Con Dust
Ig Rock
Iron Furnace
92 367220892 92367 220892
8911 871011 7
2
0.98
0.34
15.2
0.98
0.6
9.1
0.99
0.31
6.5
0.98
0.34
7.8
0.95
0.47
11.2
0.99
0.30
12.3
0.98
0.26
10.9
0.95
0.47
6.5
0.48 (4.03/0.91)
0.50 (0.58/1.90)
0.55 (0.76/1.05)
5.66 (0.30/1.14)
0.52 (0.68/1.83)
2.06 (0.14/0.47)
3.89 (0.39/1.13)
0.43 (4.36/0.97)
0.82 (0.37/1.18)
0.24 (1.05/1.58)
0.98 (0.16/0.63)
0.71 (0.51/1.33)
1.01 (0.30/0.67)
1.91 (0.16/1.08)
0.17 (4.30/0.97)
0.35 (0.38/1.34)
0.13 (1.00/1.50)
0.71 (0.17/0.66)
0.34 (0.36/1.01)
0.80 (0.26/0.59)
0.78 (0.25/1.25)
0.27 (4.38/0.98)
0.22 (0.36/1.26)
0.72 (0.96/1.11)
1.71 (0.21/0.73)
0.93 (0.63/1.40)
1.65 (0.12/0.35)
1.87 (0.26/1.41)
1.97 (6.48/0.79) 0.77 (6.84/0.84) 0.76 (7.48/0.92)0.43 (8.14/1.00)
0.92 (1.39/1.28)
3.68 (0.68/2.01)
0.51 (1.14/1.64)
4.86 (0.36/1.08)
1.09 (0.25/1.06)
0.10 (1.12/1.66)
3.33 (0.17/0.51)
0.93 (1.14/1.04)
1.36 (0.42/1.24)
1.14 (0.17/1.18)
1.18 (0.18/0.68)
0.46 (0.37/1.56)
0.47 (0.48/1.36)
1.12 (0.05/0.95)
1.09 (0.03/0.30)
0.84 (0.26/7.25)
0.9 (0.54/1.6)
1.11 (0.58/2.85)
0.81 (0.23/1.28)
0.63 (0.72/3.74)
0.54 (0.52/1.33)
1.19 (0.12/2.17)
1.17 (0.04/0.38)
0.82 (0.22/0.65)
1.58 (0.24/1.01)
1.26 (0.06/0.34)
1.77 (0.05/0.35)
1.29 (0.10/0.59)
0.47 (0.11/0.95) 1.45 (0.66/7.92)
1.21 (0.05/0.58)
0.52 (0.18/2.12)
0.81 (0.05/0.53)1.51 (0.17/0.67)
0.56 (0.10/2.76)
0.60 (0.08/1.06)
0.76 (0.03/0.52)
0.77 (0.15/8.19) 0.41 (0.06/10.12)
Notes: Each factor is identified by the source with the highest contribution to PM2.5. D2is broken down for each factor. The values in parentheses are the PM2.5
mass contribution from the factor in ?g/m3and the ratio of PM2.5mass contributed by the factor and the sum of the true PM2.5mass contributions from all of
the sources assigned to the factor.
y = 0.669 x + 0.008
R2 = 0.907
0
5
10
15
20
25
30
05
1015
20
25
30
Coal (T08) Contribution to Se (ng/m3)
Coal Factor Contribution to Se (ng/m3)
y = 0.967 x + 0.160
R2 = 0.988
0
5
10
15
20
25
30
051015202530
Coal (T08) Contribution to Se (ng/m3)
Coal Factor Contribution to Se (ng/m3)
y = 0.984 x + 0.119
R2 = 0.991
0
5
10
15
20
25
30
05 10
1520
2530
Coal (T08) Contribution to Se (ng/m3)
(c)
Coal Factor Contribution to Se (ng/m3)
(a)(b)
Figure 4. Comparison between the true coal (T08) contribution to Se and the contribution of the coal factor to Se at BRIG for varying sample
duration: (a) 24 hr, (b) 6 hr, and (c) 1 hr.
Chen et al.
50 Journal of the Air & Waste Management AssociationVolume 60 January 2010
Page 9
factor contributed 91 and 96% of SO4
GRSM, respectively. OM content in the SS factors was
reduced noticeably (to ?6%/0% of total OM at BRIG/
GRSM) by the rotation. In addition, contributions of 74
and 81% of As at BRIG and GRSM, respectively, were
apportioned to a single factor containing the Industrial
source (T25) (see supplemental data for the source attribu-
tion). At BRIG, the SN, CFPP, and PV Dust factors after
rotation were explained better by the sources, leading to a
much lower D2value (Table 2). The rotation merged a
gasoline factor/source at BRIG into the SOM factor, but a
new factor indicating OffR Diesel was created. The rota-
tion did not change any GRSM factor assignments but
improved the interpretability for five of the seven factors
in terms of the sources and decreased D2from 11.2 to 6.5.
2?at BRIG and
Effect of Data Uncertainty
The PMF fitting process cannot be perfect, even with
simulated data; that is, there is always a difference be-
K
Fi,kGk,t. If ?i,tapproximates this differ-
tween Ci,tand ¥
k ? 1
ence, ?2would be approximately 1 (see eq 3). The use of
?i
is less than 1 in Table 2. Real-world measured concentra-
tions deviate further from the simulated data for several
reasons, such as temporal variability of the source profiles,
inaccuracies in the CTM, and sampling and analytical
uncertainties. This deviation in most cases overwhelms
the difference between the model-simulated and PMF-
fitted concentrations (e.g., ? ??i
Because the simulated concentrations contain nei-
ther sampling nor analytical uncertainty, there is no ideal
approach for assigning a statistically meaningful ?i,tto
0for all ?i,tlikely overestimates the difference, so that ?2
0).
them. To evaluate PMF under more realistic conditions,
the simulated 24-hr data were perturbed to C?i,t.
C?i,t? Nr?Ci,t,??m ? ?i
0?2? ?p ? Ci,t?2?
(6)
where the function Nr generated random-normally dis-
tributed concentrations around Ci,twith a standard devi-
ation of ??m??i
pliers to be selected. The standard deviation that repre-
sents the “true uncertainty” of each randomly perturbed
concentration is a combination of a constant and a frac-
tion of the concentration. This perturbation did not
change the average concentration of any species i or ?i
(i.e., 10% of the average) and is not expected to add
source information to the data. By using the same weight-
ing coefficient (?i
tainty on BRIG_8F and GRSM_7F solutions were exam-
ined. Table 3 compares four PMF solutions with different
values of m and p. Random perturbation generally de-
graded the PMF fit, as indicated by lower Rmin
?2, although Rmin
remained more than 0.9 and ?2less
than 2 for all of the solutions. In the case of m ? 1 and p ?
0, a ?2of approximately 1 is consistent with the proper
choice of ?i,tand number of factors (K) for the dataset.
Using data with uncertainties did not necessarily reduce
the interpretability of receptor model factors derived from
them. In PMF results from 15 perturbed BRIG/GRSM data-
sets with m ? 0.5 and p ? 0, PM2.5contributions of the
top three factors did not differ appreciably from those of
the base cases (m ? 0, p ? 0) (Table 3). The average D2of
the perturbed cases was the same or even lower than those
of the base cases, although the standard deviations of D2
and factor-specific Dk
0)2?(p?Ci,t)2, where m and p were multi-
0
0) for PMF modeling, the effect of uncer-
2
and larger
2
2appear to increase with m and p. It
0.33
0.34
0.35
0.36
-0.3-0.2
-0.1
00.1 0.2
0.3
0.4 0.5
FPEAK
χ2/I
0
5
10
15
20
25
30
D2
BRIG chi
BRIG D
0.46
0.47
0.48
0.49
-0.3 -0.2 -0.100.10.20.3 0.4 0.5
χ2
0
5
10
15
20
25
D
GRSM chi
GRSM D
(a)
(b)
Figure 5. ?2and D2as functions of FPEAK in PMF modeling of (a) BRIG and (b) GRSM (24-hr sample
duration) with eight and seven factors, respectively. Arrows indicate the optimal FPEAK.
Chen et al.
Volume 60 January 2010
Journal of the Air & Waste Management Association 51
Page 10
is possible that some (but not every) perturbation triggers
a more appropriate rotation under PMF analysis. How-
ever, larger uncertainties (i.e., more extensively perturbed
data) could produce more variability in the PMF results
even if accurate ?i,twere chosen to account for the un-
certainties in the input data.
In practice, analytical uncertainty often serves as the
first guess of ?i,t. Chow et al.18and Hyslop and White19
defined the analytical uncertainty as
?i,t
a? ?MDLi
2? ?CVi? Ci,t?2
(7)
where MDL is the minimum detection limit and CV is the
coefficient of variation derived from replicate analysis of
species i. Equation 7 is a special case of eq 6, with m ? ?i
being MDLiand p being CVi. Discounting uncertainties
from other origins, ?i,t
often underestimates the actual
uncertainty of a species concentration. In these situa-
tions, PMF may be forced to fit the species by assigning it
to a single factor although it is emitted by multiple
sources (theoretically, by doing this PMF can reduce ?2
associated with that species to zero). This is a common
problem in PMF receptor modeling. Conversely, if ?i,t
overestimates the actual uncertainty, PMF may ignore the
species as well as sources for which that species is a
“marker.”
To evaluate how the choice of weighting coefficient
can influence PMF results, a BRIG 24-hr dataset perturbed
with m ? 2 and p ? 0 was analyzed by varying ?i,tfor the
element V (?V) from 0.05?V
for all other species at ?i
tainty for every species (including V) should be approxi-
mately 2?i
for a constant ?Vwould be ?V
the choices of ?i
?V
change the PMF result (from that using 2?i
?2to approximately 2 (from 1). Figure 6 shows ?2of
0
a
a
0to 1000?V
0. In this test, the actual uncer-
0while leaving ?i,t
0because m ? 2. However, the optimal choice
0instead of 2?V
0for all other species. Using ?i
0, as the weighting coefficients was not expected to
0because of
0, including
0) but to bring
2.6–2.7 for any ?Vless than 100?V
mation for the value of the optimal ?Vexcept that actual
uncertainty in the species may be greater than 2?i
checking ?V
crease was observed for ?Vless than ?V
model’s attempt to fit V at the expense of other species.
On the other hand, a rapid increase in ?V
greater than 100?v
The two leading sources of V are OFPP (T31) and coal
(T08), accounting for 0.55 (57%) and 0.29 ng/m3(30%) of
average simulated V, respectively, at BRIG. Although the
fraction of V in the T31source profile (0.0215) is much
higher than that in the T08profile (0.00114), the average
contribution of T08to primary PM2.5(258 ng/m3) was
nearly 10 times higher than that of T31(26 ng/m3). For
1.5?V
0, providing little infor-
0. When
2(i.e., ?2associated with V only) a sharp de-
0. This reflects the
2is found for ?V
0.
0? ?V? 10?V
0, the first two major contributing
Table 3. Summary of PMF solutions for randomly perturbed datasets.
Perturbation
BRIG GRSM
m ? 0,
p ? 0
m ? 0.5,
p ? 0
m ? 1,
p ? 0
m ? 0.5,
p ? 0.05
m ? 0,
p ? 0
m ? 0.5,
p ? 0
m ? 1,
p ? 0
m ? 0.5,
p ? 0.05
Number of samples
Number of factors (K)
Rmin
?2
D2
1st factor
92
8
0.98
0.34
15.2
0.48
4.03
(57%)
0.55
0.76
(11%)
0.52
0.68
(10%)
92
8
92
8
92
8
92
7
0.95
0.47
11.2
1.97
6.48
(62%)
0.92
1.39
(13%)
3.68
0.68
(7%)
92
7
92
7
92
7
2
0.96–0.97
0.50 ? 0.01
12.1 ? 1.6
0.40 ? 0.13
4.16 ? 0.12
(59 ? 2%)
0.76 ? 0.29
0.82 ? 0.12
(12 ? 2%)
0.66 ? 0.17
0.67 ? 0.06
(10 ? 1%)
0.94–0.96
0.94 ? 0.03
12.6 ? 1.7
0.50 ? 0.15
4.27 ? 0.25
(61 ? 4%)
1.09 ? 1.15
0.81 ? 0.18
(12 ? 2%)
0.85 ? 0.39
0.66 ? 0.11
(9 ? 2%)
0.95–0.97
0.73 ? 0.03
13.7 ? 4.8
0.61 ? 0.19
4.07 ? 0.27
(58 ? 4%)
1.31 ? 1.59
0.87 ? 0.15
(12 ? 2%)
0.72 ? 0.16
0.69 ? 0.05
(10 ? 1%)
0.94–0.95
0.64 ? 0.02
11.4 ? 0.5
1.87 ? 0.23
6.52 ? 0.17
(63 ? 1%)
0.97 ? 0.05
1.38 ? 0.22
(13 ? 2%)
2.95 ? 1.23
0.76 ? 0.08
(7 ? 1%)
0.92–0.94
1.14 ? 0.04
11.6 ? 0.7
1.77 ? 0.33
6.77 ? 0.34
(65 ? 3%)
1.22 ? 0.65
1.31 ? 0.28
(13 ? 3%)
2.41 ? 1.40
0.85 ? 0.15
(8 ? 2%)
0.92–0.95
0.85 ? 0.03
11.8 ? 0.8
1.96 ? 0.34
6.50 ? 0.23
(63 ? 2%)
1.19 ? 0.53
1.36 ? 0.32
(13 ? 3%)
2.28 ? 1.28
0.81 ? 0.08
(8 ? 1%)
2nd factor
3rd factor
Notes: Values presented for the three factors with the highest contributions to PM2.5are: (1) Dk
contribution in percentage. Except for the unperturbed run (m ? 0, p ? 0), results represent the average and standard deviation of 15 model runs.
2; (2) the PM2.5mass contribution (?g/m3); and (3) PM2.5mass
Figure 6. PMF solutions (?2, ?V
BRIG data as a function of the uncertainty weighting coefficient
assigned to element V (?V). The figure also compares the contribu-
tions from the two largest contributing factors to V and the summed
contribution to V from the remaining factors.
2, and D2) for the perturbed 24-hr
Chen et al.
52 Journal of the Air & Waste Management AssociationVolume 60 January 2010
Page 11
factors to V (factor 7 and 6 in Figure 3a) and the combi-
nation of the remaining factors each accounted for one-
third of simulated V (Figure 6). Significant changes oc-
curred in these contributions for ?Vless than ?V
contribution of V from the “other factors” shifted to the
first factor. As the ?V/?V
the second factor also moved to the first factor, which
then accounted for 100% of V. The best source apportion-
ment for V was achieved at ?V? ?V
ng/m3were assigned to the first two factors containing
T31(factor 7, Figure 3a) and T08(factor 6, Figure 3a),
respectively. The minimum D2also occurred when ?V?
?V
trations gradually degraded as ?Vincreased, it took a ?V
? ? 10?V
the trend of ?V
only on the uncertainty in V concentrations but also the
choices of ?i,tfor other species. This study demonstrates
that species-specific ?2could be useful for searching the
optimal ?i,tfor PMF analysis.
0as the
0ratio dropped below 0.12, V from
0, where 0.51 and 0.26
0. On the other hand, although the fitting for V concen-
0for PMF to ignore V completely (on the basis of
2). Notably, the optimal ?Vdepends not
CONCLUSIONS
Although multivariate receptor models such as PMF have
been widely used to infer source contributions through a
set of factors extracted from ambient monitoring data,
two fundamental questions remain: (1) to what extent do
the factors correspond to actual sources, and (2) can the
factors be improved to fulfill a particular goal of receptor
modeling through manipulating modeling parameters?
Very little effort has been made to address these questions
because actual source contributions are unknown in tra-
ditional analyses. This study developed PM2.5datasets for
IMPROVE network sites (BRIG and GRSM) in the eastern
United States using a state-of-the-art CTM and emissions
inventory; therefore, contributing sources to these data-
sets were known. Although the simulated data do not
perfectly reproduce observations at the two sites, the per-
formance of receptor models applied to the synthetic
datasets is indicative of how they will perform with real-
world IMPROVE or other monitoring data.
Considering a common goal of receptor modeling
source apportionment to have each of the resolved factors
represent a unique source or a group of individual sources
in the data (i.e., with any one source appearing in only
one factor, even if multiple sources appear in that factor),
this study developed a diagnostic metric, D2, for deter-
mining the adequacy of PMF results with respect to this
goal. Traditional model performance measures such as ?2
and R2determine only how “measured” data are fitted by
factors. D2, using the known source information, mea-
sures the difference between factors and source combina-
tions (primary and/or secondary) that best explains the
factors. A value of D2? 0 signifies the ideal scenario in
which PMF creates perfect groupings of sources into fac-
tors fulfilling the above goal. This level of ideality was
never achieved with the synthetic data; D2was always
greater than 0. However, D2was minimized by adjusting
the number of factors, the number of observations, factor
rotation with the FPEAK parameter in PMF, and coeffi-
cients that were used to weight individual species in the
model. This study supports the potential of source appor-
tionment with the PMF receptor model and illustrates
several modeling characteristics and strategies.
The appropriate number of factors (K) in a PMF ap-
plication cannot be determined either by the magnitude
of ?2or the ?2-K relationship because ?2depends on the
weighting coefficients (?i,t) and, for the most part, de-
creases exponentially with increasing K. On the basis of
the minimization of D2, the best choice for K occurs when
it is just adequate to explain the input data (e.g., Rmin
?0.95). Increasing K further produces no meaningful fac-
tors that correspond to actual sources. However, when
working with real-world data, R2of “weak species” (i.e.,
those with low signal-to-noise ratios because of poor mea-
surement precision) should not be considered in deter-
mining Rmin
. Increasing the number of observations,
which may be achieved with shorter sample duration, in
the PMF model can lead to more meaningful factors and
better factor-source correspondence (i.e., lower D2). This
added discrimination with respect to source contributions
can be explained by meteorological and emission varia-
tion over a diurnal cycle.
For synthetic data at BRIG and GRSM, there are major
factors representing SS and SOM in combination with
other sources. PMF tends to identify factors dominated by
secondary species in this and many studies using real-
world data, likely because of their high measurement
precision (low weighting coefficient) and weak correla-
tion with primary particulate emissions. However, the SS
factor resolved here underestimates the true SS contribu-
tions at BRIG and GRSM, with the lowest deviations as-
sociated with higher temporal resolution data. SS is spread
among other factors and inflates their PM2.5mass contri-
butions. This also possibly happens to SOM, because un-
explained SO4
versa. By using appropriate positive values for FPEAK, the
factors are rotated so that the SO4
more concentrated on the SS factor, which then more
closely accounts for the “true” SS. At the same time, factor
rotation with the FPEAK substantially reduces D2, which
signifies improvement in the overall PMF results beyond
those related to SO4
additional knowledge about source markers (i.e., species
dominant in specific source emissions) is available. Al-
though the Fkey and Gkey parameters were not discussed
in this study, they may also be used for manipulating
specific Fi,kand Gk,tto reflect the knowledge of source
markers.9
The effect of measurement uncertainties in real-world
data on PMF results was assessed by random-normally
perturbing the simulated concentrations to different de-
grees. This evaluation confirms that ?2could attain values
near 1 (the expected value) with the proper choice of ?i,t
(uncertainty weighting) and K. These perturbations
change the resulting factor profiles and contributions but
do not always degrade their relationships with sources.
The PMF results certainly became more inconsistent from
run to run with increasing uncertainty in the data, even
with an accurate formulation of ?i,tand an optimal num-
ber of factors, K. Increasing the number of observations
(N) may reduce the effects of random measurement un-
certainties on the PMF solution for the same reason that
2
2
2?is found in the SOM factors and vice
2?loading becomes
2?. FPEAK can be useful, particularly if
Chen et al.
Volume 60 January 2010
Journal of the Air & Waste Management Association 53
Page 12
the standard error of the mean decreases as the square
root of N.
Conversely, a ?2of approximately 1 does not justify a
particular choice of ?i,tor guarantee the best PMF solu-
tion. A relatively small number of realistic ?i,tcan force ?2
to approximately 1, whereas unrealistic ?i,tfor the re-
maining species can adversely affect the PMF solution. It
is therefore worthwhile to examine species-specific ?2as a
function of ?i,tfor that species. Using V at BRIG as an
example, it is demonstrated that neither ?V
apportionment to factors changed significantly over a
relatively wide range of assumed values of weighting co-
efficient, ?V. However, outside that range, larger ?V
caused PMF to ignore fitting V, thus increasing ?V
as ?2. Smaller values of ?Vtended to force V loading into
fewer factors (a single factor at the lower limit of ?V) while
reducing ?V
sharp change in ?V
all species by simple modifications to the PMF software.
For chemical species typically used in PMF, the reported
analytical uncertainty serves as a good starting point for this
process to evaluate and refine the weighting coefficients.
2nor the V
2as well
2drastically; the optimal ?Voccurred with a
2. Such tests could be applied, in turn, to
ACKNOWLEDGMENTS
This work was supported by EPA Science to Achieve Re-
sults (STAR) grant EP-P21741/C10649 and Electric Power
Research Institute grant C7750. Its contents do not nec-
essarily reflect the views and policies of EPA, nor does
mention of trade names or commercial products consti-
tute endorsement or recommendation for use.
REFERENCES
1. Watson, J.G.; Cooper, J.A.; Hunzicker, J.J. The Effective Variance
Weighting for Least Squares Calculations Applied to the Mass Balance
Receptor Model; Atmos. Environ. 1984, 18, 1347-1355.
2. Watson, J.G. Overview of Receptor Model Principles; J. Air Poll. Control
Assoc. 1984, 34, 619-623.
3. Henry, R.C.; Kim, B.M. A Factor Analysis Receptor Model with Explicit
Physical Constraints. In Transactions, Receptor Models in Air Resources
Management; Watson, J.G., Ed.; A&WMA: Pittsburgh, PA, 1989; pp
214-225.
4. Hopke, P.K. Recent Developments in Receptor Modeling; J. Chemomet-
rics 2003, 17, 255-265; doi: 10.1002/cem.796.
5. Paatero, P. Least Squares Formulation of Robust Nonnegative Factor
Analysis; Chemom. Intell. Lab. Sys. 1997, 37, 23-35.
6. Henry, R.C. Multivariate Receptor Modeling by N-Dimensional Edge
Detection; Chemom. Intell. Lab. Sys. 2003, 65, 179-189; doi:10.1016/
S0169-7439(02)00108-9.
7. Watson, J.G.; Chen, L.-W.A.; Chow, J.C.; Lowenthal, D.H.; Do-
raiswamy, P. Source Apportionment: Findings from the U.S. Supersite
Program; J. Air & Waste Manage. Assoc. 2008, 58, 265-288; doi:
10.3155/1047-3289.58.2.265.
8. Chen, L.-W.A.; Watson, J.G.; Chow, J.C.; Magliano, K.L. Quantifying
PM2.5Source Contributions for the San Joaquin Valley with Multivar-
iate Receptor Models; Environ. Sci. Technol. 2007, 41, 2818-2826.
9. Reff, A.; Eberly, S.I.; Bhave, P.V. Receptor Modeling of Ambient Par-
ticulate Matter Data Using Positive Matrix Factorization: Review of
Existing Methods; J. Air & Waste Manage. Assoc. 2007, 57, 146-154.
10. Hopke, P.K.; Ito, K.; Mar, T.; Christensen, W.F.; Eatough, D.J.; Henry,
R.C.; Kim, E.; Laden, F.; Lall, R.; Larson, T.V.; Liu, H.; Neas, L.; Pinto,
J.; Stolzel, M.; Suh, H.; Paatero, P.; Thurston, G.D. PM Source Appor-
tionment and Health Effects: 1. Intercomparison of Source Apportion-
ment Results; J. Expo. Anal. Environ. Epidemiol. 2006, 16, 275-286; doi:
10.1038/sj.jea.7500458.
11. Lowenthal, D.H.; Watson, J.G.; Koracin, D.; Chen, L.-W.A.; DuBois,
D.; Vellore, R.; Kumar, N.; Knipping, E.M.; Wheeler, N.; Craig, K.; Reid,
S. Evaluation of Regional Scale Receptor Modeling; J. Air & Waste
Manage. Assoc. 2010, 60, 26-42; doi: 10.3155/1047-3289.60.1.26.
12. Malm, W.C.; Pitchford, M.L.; Scruggs, M.; Sisler, J.F.; Ames, R.G.;
Copeland, S.; Gebhart, K.A.; Day, D.E. Spatial and Seasonal Patterns and
Temporal Variability of Haze and Its Constituents in the United States:
IMPROVE Report III; Cooperative Institute for Research in the Atmo-
sphere; Colorado State University: Fort Collins, CO, 2000; available at
http://vista.cira.colostate.edu/IMPROVE/Publications/improve_reports.htm
(accessed 2009).
13. National Air Quality and Emissions Trends Report, 1999; EPA 454/R-01-
004; U.S. Environmental Protection Agency, Research Triangle Park,
NC, 2002; available at http://www.epa.gov/air/aqtmd99 (accessed 2009).
14. Malm, W.C.; Sisler, J.F.; Huffman, D.; Eldred, R.A.; Cahill, T.A. Spatial
and Seasonal Trends in Particle Concentration and Optical Extinction
in the United States; J. Geophys. Res. 1994, 99, 1347-1370.
15. Paatero, P.; Hopke, P.K.; Song, X.H.; Ramadan, Z. Understanding and
Controlling Rotations in Factor Analytical Models; Chemom. Intell.
Lab. Sys. 2002, 60, 253-264.
16. Huber, P. J. Robust Statistics; John Wiley and Sons: New York, 1981.
17. Paatero, P.; Hopke, P.K.; Begum, B.A.; Biswas, S.K. A Graphical Diag-
nostic Method for Assessing the Rotation in Factor Analytical Models
of Atmospheric Pollution; Atmos. Environ. 2005, 39, 193-201.
18. Chow, J.C.; Watson, J.G.; Chen, L.-W.A.; Chang, M.C.O.; Robinson,
N.F.; Trimble, D.; Kohl, S.D. The IMPROVE_A Temperature Protocol
for Thermal/Optical Carbon Analysis: Maintaining Consistency with a
Long-Term Database; J. Air & Waste Manage. Assoc. 2007, 57, 1014-
1023; doi: 10.3155/1047-3289.57.9.1014.
19. Hyslop, N.P.; White, W.H. An Evaluation of Interagency Monitoring
of Protected Visual Environments (IMPROVE) Collocated Precision
and Uncertainty Estimates; Atmos. Environ. 2008, 42, 2691-2705.
About the Authors
L.-W. Antony Chen is an associate research professor with
the Desert Research Institute (DRI). Douglas Lowenthal,
John Watson, and Darko Koracin are research professors
with DRI. Naresh Kumar is Senior Program Manager in Air
Quality with EPRI. Eladio Knipping is a senior technical
manager with EPRI. Neil Wheeler is Senior Vice President of
Atmospheric Modeling and Information Systems with
Sonoma Technology, Inc. Kenneth Craig is an atmospheric
modeler and Stephen Reid is a manager with Sonoma
Technology, Inc. Please address correspondence to:
L.-W. Antony Chen, Desert Research Institute, 2215 Raggio
Parkway, Reno, NV 89512; phone: ?1-775-674-7028; fax:
?1-775-674-7009; e-mail: antony@dri.edu.
Chen et al.
54 Journal of the Air & Waste Management AssociationVolume 60 January 2010