Department of Economics Working Paper Series
Neighborhood Information Externalities and the Provision of
University of Connecticut
AKM Rezaul Hossain
St. Mary’s College
Steohen L. Ross
University of Connecticut
Working Paper 2010-10
341 Mansfield Road, Unit 1063
Storrs, CT 06269–1063
Phone: (860) 486–3022
Fax: (860) 486–4463
This working paper is indexed on RePEc, http://repec.org/
Recent theoretical models and empirical analyses argue that mortgage market
activity creates information and lowers the costs of underwriting mortgages. We
re-examine this question using models that control for neighborhood-lender fixed
effects and address the potential endogeneity of market volume using information
on lagged volumes. We find that the omission of neighborhood fixed effects sub-
stantially biases analyses of the effect of neighborhood volume on underwriting,
and our tests imply that the one or two period lags of volume typically used in
cross-sectional studies cannot be treated as exogenous due to the persistence of
neighborhood economic shocks. In our preferred specification, we cannot rule out
the possibility that overall market activity has a small positive influence on mort-
gage underwriting, but the statistical evidence for the existence of such effects is
weak. On the other hand, we find that lender-specific activity in a neighborhood
has strong positive effects on the mortgage approval decision, and this effect is
underestimated in traditional cross-sectional models.
Journal of Economic Literature Classification: D8, G2, L8, R3
A recent theoretical model by Lang and Nakamura (1990) examines the dynamic process
by which information externalities reinforce growth and contraction in the provision of credit,
while Lang and Nakamura (1993) and Gruben, Neuberger, and Schmidt (1990) show that such a
process can lead to redlining against neighborhoods in thin mortgage markets1. If low levels of
market activity raise the cost of providing mortgage credit, residents of low-income and minority
neighborhoods may have an especially difficult time obtaining credit (Carr and Schuetz, 2001;
Engel and McCoy, 2002).2 Moreover, the limited access to prime mortgage credit in low-income
and minority neighborhoods has been an important factor in the debate over modernization and
consolidation in the financial services industry,3 and potentially an important factor in the growth
of the subprime mortgage market (Carr and Schuetz, 2001; Engel and McCoy, 2002).
Several empirical analyses have tested for the existence of a neighborhood information
externality in mortgage underwriting. Most papers follow Lang and Nakamura (1993) strictly
and use the number of housing transactions in a neighborhood to proxy for information. Calem
(1996) regresses the average mortgage approval rate in a subdivision of a county on the number
of previous year’s housing transactions finding a positive empirical relationship. Similarly, Ling
and Wachter (1998) and Harrison (2001) model mortgage application approval using the number
of housing transactions in each census tract in Dade County (Miami) and Pinellas County (St.
Petersberg) and find a negative relationship between housing market activity and the stringency
1 See Ross and Yinger (2002) and Guttentag and Wacther (1980) for detailed discussions of redlining in the
2 See also special issues on subprime lending in the Journal of Real Estate Finance and Economics (2004) and on
predatory lending in Housing Policy Debate (2004).
3 An extensive literature has developed questioning whether bank consolidation and financial modernization
legislation has had an adverse effect on small business lending. For recent examples, see Rauch and Henderson
(2004), Avery and Samolyk (2004) on consolidation, Ely and Robinson (2004) on the effect of the Gramm-Leach-
Bliley Act, and Bostic and Robinson (2004, 2003) and Hossain (2005) on the impact of the Community
Reinvestment Act (CRA) on mortgage lending as well as branch location decisions. The reader is also referred to a
recent survey on relationship lending, Elyasiani and Goldberg (2004).
of underwriting standards. The estimated effect sizes for these papers range between a 1.6 and
2.2 percentage point increases in application acceptance rates for a one standard deviation
increase in transaction volume.
Several alternative explanations for these results have been suggested. Lin (2001) raises
concerns that previous mortgage approvals may influence both housing market activity and
mortgage application decisions. Lin (2001) uses the distance between the neighborhood and the
lender’s headquarters as a proxy for lender information and finds that lender information
explains underwriting decisions, but she cannot identify the influence of information externalities
with her lender specific proxy for information. Further, housing transaction volume might
influence underwriting directly through the influence of housing market conditions on lender
expectations of the risk associated with mortgage defaults (Ross and Tootell, 2004),4 rather than
through information externalities. Avery, Beeson, and Sniderman (1999) avoid this problem by
using mortgage application volume, rather than housing sales, and they also propose that lender
specific market activity may affect underwriting through economies of scale. They find that
lender specific application volume in a census tract has a large positive influence on mortgage
underwriting, and after controlling for a lender’s own volume in the tract they find evidence of
negative externalities associated with the overall market activity.5 Blackburn and Vermilyea
(2006) also find a large positive impact of a lender’s own application volume on mortgage
approval in a sample of applications to 10 large national lenders.6
In Lang and Nakamura (1993), mortgage underwriting is a dynamic information
gathering process at the neighborhood level so that past information is expected to influence
4 They find that the census tract ratio of median rent to median house value, a proxy for expectations concerning
housing price appreciation, is a strong predictor of mortgage underwriting.
5 They conjecture that this negative effect on underwriting is due to a deterioration of the applicant pool when those
rejected by one lender try to seek credit from another lender.
6 Blackburn and Vermilyea (2006) still find a positive effect of tract home sales after controlling for lender
application volume, but these effects are not robust as they add additional underwriting controls.
mortgage approval in the current period which in turn determines the size of current information
set in a neighborhood. The persistence arising from lagged choice/decisions entering current
decisions is termed “true state dependence” by Heckman (1981a, b) in that places that had large
numbers of mortgage application denials in the past are more likely to have such denials in the
future simply due to the past denials. However, neighborhoods, as well as the match between
lenders and neighborhoods, likely differ in important ways and over important events or shocks
that influence mortgage underwriting and persist over time, and so also contribute to persistence
in mortgage denial rates. This form of dependence is characterized by Heckman (1981a, b) as
“spurious state dependence.” While most studies have found evidence that neighborhood
information influences mortgage underwriting, these studies tend to be cross-sectional and as
such have only controlled for standard census tract variables and regress mortgage approval on
nearly contemporaneous information on housing market activity, typically market volume from
the preceding year.
This paper reexamines whether mortgage market activity creates information externalities
within neighborhoods using repeated cross-sections of home purchase mortgage applications
from the Home Mortgage Disclosure Act Data (HMDA) in Florida between 1992 and 2007.7
Following Avery, Beeson and Sniderman (1999), we examine the relationship between mortgage
underwriting and both overall mortgage application volume and lender specific application
volume in a census tract. Our model controls for census tract-lender fixed effects in order to
capture time-invariant differences across neighborhoods and across lenders’ neighborhood
7 With the exception of Blackburn and Vermelye (2007), all studies of this question rely on data from the Home
Mortgage Disclosure Act (HMDA). As background, the U.S. Congress enacted the Home Mortgage Disclosure Act
(HMDA) in 1975. The goal of HMDA was to provide sufficient information to determine whether depository
institutions are filling their obligations to serve the housing needs of the communities and neighborhoods in which
they are located. [12 USC 2801(b)] HMDA required depository institutions and their subsidiaries to provide the total
number and dollar value of mortgage originated and purchased in the local market typically segmented by census
tract. The 1989 amendment required lenders to report information regarding race, gender, and income along with
details disposition of the application at the loan application level.
specific underwriting policies. We also exploit the repeated cross-sections in our data in order to
develop longer lags of application volume in order to limit the impact of temporary, but
persistence, shocks to local housing and mortgage markets, which might also create a spurious
relationship between neighborhood specific underwriting policies and recent mortgage
application volumes. In order to accomplish these goals, we must address well known concerns
about bias in fixed effect models with endogenous regressors (Nickell, 1981), and we adapt
traditional dynamic panel data techniques (Holtz-Eakin et al., 1988) for use with repeated,
clustered cross-sectional data.
We find strong evidence of omitted neighborhood variables that correlate with mortgage
application volume and are captured by our tract-lender fixed effects. Somewhat surprisingly, the
bias is negative implying that the true effect of neighborhood information externalities on
mortgage approval is positive even after controlling lender specific application volume. While
these findings contrast with Avery et al.’s findings and our simple OLS or lender fixed effects
estimates where neighborhood application volume also lowers the likelihood of approval, the
lender fixed effect model estimates are consistent with Avery et al.’s interpretation of their
results that unobservables that contribute to high denial rates might also increase application
volume from repeat mortgage applicants creating a negative correlation between volume and
mortgage approval that is captured by our neighborhood fixed effects. Further, we find strong
evidence rejecting the exogeneity of recent lags of neighborhood application volume and lender
specific volume. Our lagged instruments fail both over-identification and closest lag exogeneity
tests for all models that include either the two or three period lags of the volume variables.
In our preferred specification, which has a closest lag of four periods, our estimates of the
effect of neighborhood application volume are similar in magnitude to earlier OLS estimates
based on application volume, but substantially weaker than estimates using shorter time lags and
only marginally significant. While we cannot rule out small effects of information externalities,
the evidence for their existence is fairly weak given concerns about the confounding effects of
housing market activity, unobserved neighborhood attributes and persistence in the effect of
neighborhood shocks for potentially several years. On the other hand, our preferred specification
yields highly significant estimates on lender specific neighborhood application volume. The
magnitude of these effects is substantially larger than the effects identified by Avery et al. or in
our OLS specifications and increase as we lengthen lags suggesting that any remaining bias from
persistent neighborhood-specific shocks to lenders bias our results away from finding a positive
effect of lender specific activity on mortgage approval.
2. Model Specification
Our research design involves modeling loan approval decisions as a function of
neighborhood market activities using repeated cross-sections of mortgage applications that are
clustered by lender and neighborhood. Further, we will use lagged mortgage market activity as
instruments for potentially endogenous current market activities. Specifically, in the
underwriting estimation, the loan approval decision for a specific mortgage application in a given
year will depend upon application attributes, the volume of all applications submitted to all
depository lenders in the neighborhood, the volume of applications in this neighborhood
submitted to this lender in that year, as well as year and neighborhood-lender fixed effects.8
Following Avery et al., this paper uses mortgage applications rather than the number of loans to
isolate the effect of the information externality generated by mortgage application underwriting
activities from the more direct effect of housing market conditions on lender perceptions of
housing market risk.
8 In principle, one might make mortgage approval a function of last year’s tract application volume because the
information generated by mortgage activity takes time to transmit between firms, but in practice where differences
arise, the estimates using lagged application volume are weaker than the already weak preferred IV estimates
associated with tract application volume. Further use of lagged tract application volume as a control has no influence
on our strongly positive estimated effect of current lender specific application volume, i.e. the evidence of
economies of scale associated with a bank’s current lending activity.
Our proposed underwriting model contains neighborhood-lender fixed effects, as well as
application volumes that almost certainly are influenced by past underwriting decisions of
lenders. It is well known that in models of this sort traditional mean difference or dummy
variable estimators of the fixed effect model yield biased estimates in a short panel because the
mean over all years of the unobservable within a neighborhood or cluster is imbedded in the
error term and since this mean contains lagged values of the disturbance the mean is obviously
correlated with the right hand side endogenous variable (Nickell, 1981). In a panel, the solution
is to eliminate the fixed effect by first differencing so that the differenced unobservable only
contains one lag of the disturbance and then instrument for the right hand side endogenous
variable with a two period lag or longer lags (Holtz-Eakin, Newey, and Rosen, 1988; Arellano
and Honore, 2001).
The solution in this paper is quite similar, but adapted for repeated clustered cross-
sections where first differencing is not possible. Specifically, the data will be organized into two
year overlapping cohorts with one cohort containing all data from ? and ? − 1 and the previous
cohort containing all data from ? − 1 and ? − 2. The “pseudo first differenced” model will be
mean differenced based on neighborhood-lender-cohort fixed effects and the resulting
unobservable in a neighborhood-lender cluster for year ? will not contain unobserved effects
associated with applications or tract circumstance in period ? − 2 or earlier.9 As with first
differenced data, assuming that any year-specific neighborhood economic shocks do not persist
over time, instrumental variables that are lagged at least two periods will not have a correlation
9 After differencing, year ? − 1 applications are dropped from the year ? cohort since they are present in the year
? − 1 cohort. At first glance, first differencing the mean of the previous year sample for the neighborhood-lender
combination is no different from the cohort mean differencing mentioned here. However, given the fact that a
sample lender is chosen if it had been consecutively providing services in the state of Florida, not a specific tract, for
at least seven years, first differencing the mean of the previous neighborhood-lender-year combination will drop
observations due to missing values if there were no corresponding applications in the neighborhood-lender cell for
the previous year. This will dramatically reduce our sample from over two million loans to only about four hundred
thousand. While applications in a neighborhood-lender cluster with no applications in the previous year will not
contribute to our estimates on the neighborhood application volume variables, these applications are important
because they add precision to our estimates of other incidental parameters in the model.
with the application unobservables in the current cohort. If the year-specific shocks are
persistent, however, the two period lag will be correlated with the one-period lag that is
imbedded in the error term of the pseudo first differenced model, but as will be discussed in
more detail later this concern can be addressed by examining the robustness of estimates as the
nearest lags are eliminated from the set of instruments.
Following a linear probability model, loan approval for application ? to lender ? in year ?
for a property in neighborhood ? (?????) might be specified as a linear function of a vector of
application attributes (?????), a vector of market activities (????) with two components,
neighborhood application volume (????
?) and neighborhood-lender specific application volume
?), neighborhood-lender fixed effects (???), year specific fixed effects (??), and an
idiosyncratic error associated with time-varying neighborhood unobservables and neighborhood
specific lender underwriting standards (?????).
??+ ???+ ??+ ????? (1)
Under this specification, the significance of ?? provides evidence that mortgage
application volumes influence the probability of mortgage approval, which implies true state
dependence in mortgage underwriting. However, ???, the neighborhood-lender heterogeneity as
well as the correlation between the ?????’s over time, cause serial correlation in the model’s
unobservables and failure to control for the possibility of such heterogeneity will risk
confounding spurious and true state dependence.
A cohort ?? is created by combining all observations for a neighborhood-lender
combination from period ? and ? − 1, and cohort mean differencing equation (1) within a
neighborhood-lender-year combination yields
??????− ????? = ??????− ????????+ ?????− ????????+ ??− ???+ ??????− ????? (2)
where ????, ????, ????, ??? and ???? are the means of the appropriate variable/vector across all
loans in cohort ??. The neighborhood-lender fixed effects drop out, and the resulting unobserved
effect only depends upon error terms in years ? and ? − 1.
Clearly, any year-specific economic shocks might influence both the probability of an
application from a neighborhood getting approved and the volume of mortgage applications from
that neighborhood. Further, changes in lender circumstances and market position may affect both
a lender’s neighborhood specific underwriting policies and their application volumes from that
neighborhood. Therefore, one must be concerned about a contemporaneous correlation between
application volumes and time-varying neighborhood unobservables. Hence, the resulting volume
difference, ????− ????, in equation (2) is endogenous and instrumental variables are used to
address this endogeneity. Applying a sequential exogeneity assumption, that is standard in
dynamic panel analysis, ???????|???
???? = ? or ??????|???
???? = ? for ? = 2,⋯,?,10 the natural
candidates of the instruments are two period or longer lags of the differenced volumes.11
If the application volume follows a lag dependent process, then it can be modeled as a
reduced-form linear function of the lagged application volumes up to lag s.12
+ ????+ ??+ ???? (3)
where as discussed earlier ???? is the 2 × 1 vector representing tract and tract-lender market
activities, ξ is a 2 × 1 column vector the influence of the neighborhood-lender fixed effects on
lending volumes, ?? is a 2 × 1 year dummy column vector, and ???? is a vector of idiosyncratic
10 Here the superscript denotes ???
11 We use volume differences as opposed to levels as instruments since using levels will introduce neighborhood-
lender fixed effects into the underwriting model again through the reduced form model. See equation (3) below.
12 When lags are used as instruments, on one hand, adding longer lags can achieve more efficiency and yield more
overidentifying restrictions; on the other hand, using many overidentifying are known to have poor finite sample
properties (see Altonji and Segal (1996)). In practice, it may be better to use a couple of lags rather than lags back to
the beginning of the sample. In this paper, up to five lags will be used for the sample of 7-consecutive-year lenders.
errors. The endogeneity of ???? arises because common, year-specific economic shocks are
likely to affect ????? and ????, simultaneously.
The model for estimating and predicting ????− ???? is created by applying the same
cohort mean differencing to equation (3) for the application sample.13
?????− ????? = ∑
+ ???− ???? + ?????− ????? (4)
The predicted application volume deviations will yield consistent instrumental variable estimates
for equation (2) as long as ???? is orthogonal to ??????, that is ?????????????? = ?.
If these errors exhibit persistence of any form, however, the two periods or longer
differenced volume lags may be correlated with time-varying neighborhood unobservables that
influence mortgage underwriting in the current cohort. Specifically,
???????− ????|??????−??????? ≠ ? because ???????????????? ≠ ? due to the contemporaneous
correlation between the underwriting and volume disturbances and the persistence in the volume
disturbance. Under those circumstances consistency can still be obtained if the error term
exhibits the ?-order autocorrelation, ????????????? = ? for ? > ?. For example, if the stochastic
shocks follow a ?-period moving average process, differenced volumes lagged at least ? + ?
will be exogenous, and pseudo-first differenced lags from ? + ? to ? can be used as instrumental
variables and the model is just identified with ? + ? periods data.14
The first stage model based on equation (3) results in ? − ? + 1 unique lagged loan
volume vectors in equation (4). As a result, the final IV specification incorporates ? − ? + 1
13 Asymptotically, we could simply estimate this model using first differencing in a lender-tract panel, but
estimating the model in the sample of mortgage applications is preferred because it gives weight in the first stage to
the lender-tract clusters that have the highest weight in the second stage. The same thing could be accomplished by
using weights with a first differenced lender-tract panel.
14 Similar biases may arise with persistence in the mortgage approval disturbance and these biases are also addressed
by lengthening lags under the assumption of a moving average error process. Under many alternative error
processes, the current disturbances is a function of all past disturbances and lengthening lags will not assure that
instruments are exogenous. However, under most reasonable processes, the correlation will be reduced by
lengthening the lags reducing the bias associated with persistence in the error. See the discussion below.
exclusion restrictions for the potential endogenous vectors right-hand side allowing us to test the
validity of the over-identification or similarly the exogeneity of the closest lag. These tests will
be conducted for models using ? − 1 period lags, as well as models that exclude closer lags to
obtain ? − ? + 1 lags and should be more robust to persistence in the error structure. In addition,
these tests provide some indications concerning the presence of persistence within the time-
varying neighborhood unobservables. Specifically, tests of both overidentifying restrictions and
exogeneity of closest lag require that one of the exclusion restrictions be valid. If the persistence
creates a correlation violating the first exclusion restriction, but not for the second restriction that
excludes the closer lag, the test should reject the null that all exclusion restrictions are valid.
More generally, the lengthening of the lag should reduce correlation between the instrument and
the cohort mean differenced error and so reduce the bias in the loan volume estimate implied by
this instrument. The differing estimates arising from the subsets of instruments forms the basis
for the rejection of exogeneity in tests of these type.
As pointed out in Bertrand, Duflo, and Mullainathan (2004), standard errors in panels
may be biased even when fixed effects are included in the specification. Kezdi (2003) shows that
the simple correction for clustered standard errors works well in addressing the biases raised by
Bertrand et. al., and all estimates are presented with standard errors that have been clustered on
3. Data Description
Our sample includes all home purchase mortgage applications for properties located in
the state of Florida, between 1992 and 2007. Two sources of data are used in this paper. The
primary source is the loan application registers (LARs) of Home Mortgage Disclosure Act
(HMDA) data collected by the Federal Financial Institutions Examination Council (FFIEC).
These data collect information on application outcomes, as well as basic loan, borrower, property
characteristics, and location based on census tract. For our primary analysis, we restrict our
sample to new 1 to 4 family home purchases, submitted by 2,259 depository lenders which had
been operating in Florida at least seven consecutive years during the sample period (hereafter,
consistent lenders). The loan approval variable is coded to one if the application is explicitly
approved and originated by the financial institution and to zero if the application was declined.15
These restrictions result in a sample of 2,908,705 mortgage applications with loan approval
decisions across 2,402 census tracts during the period, while relaxing the 7-consecutive-year
restriction increases the number of applications to 4,340,368 by all depository lenders (all
In 2004, HMDA shifted to reporting data using the census tract definitions under the
2000 Decennial Census data. In order to estimate a model with common census tract fixed
effects, we reconcile the differing geographies by assigning a tract under the 2000 tract
definitions to corresponding tracts under 1990 definitions with probabilities based on the fraction
of the 2000 census tract’s area located in each 1990 census tract using ArcGIS. The information
regarding the demographic composition and housing stock attributes of the census tracts, which
will be used in the OLS specifications to replicate previous studies is obtained from the 1990
Census of Population and Housing.
Descriptive statistics, segmented by whether lenders belong to our sample of consistent
lenders or not, are presented in Table 1. The two samples are very similar with means usually
differing by no more than 2 or 3 percent and often differing by less than 1 percent over approval
rates, loan type, loan attributes, or census tract attributes. The three exceptions are family income
where the restricted sample incomes are about 6 percent higher, tract loan volume is about 5
15 Applications that were approved, but not accepted, withdrawn by the applicant prior to denial or approval, or
closed for incompleteness are dropped from the analysis. This specification was used in Munnell et. al.’s (1992)
study of mortgage lending discrimination, but studies in the information literature have sometimes used alternative
definitions. For example, Harrison (2002) treats approved but not accepted applications as approved and withdrawn
or incomplete applications as denied. These three categories only involve a small number of applications, and all of
our results are unchanged if applications falling into these categories are included.
16 The number of total HMDA records in the state of Florida during this period is 25,237,001. The restriction of
home purchase applications to depository lender reduces it substantially to 6,221,419. The number further reduces to
4,340,368 due to applications withdrawn or with wrong or missing census tract identifiers.
percent higher, and lender specific loan volume is 7 percent lower. On our variables of interest,
the geometric average of tract application volume and lender specific application volume is 438
and 28 home-purchase mortgage applications submitted to depository lenders, respectively, for
the restrict sample in the tracts where the restricted sample of lenders are operating.17 Note that
the tract volume is based on all depository lenders’ applications. The other sample means have
very reasonable values with more than 80% of home purchase mortgage applications being
approved approximately 79% of those properties are owner-occupied dwellings, and about 91%
of those applications were conventional loans, while 6.3% were FHA-insured loans and 2.5%
were VA-guaranteed loans.
4. Empirical Results
Table 2 presents the empirical results from estimating an accept/reject linear probability
model of mortgage applications submitted to consistent lenders under four alternative OLS
specifications, as well as two neighborhood fixed effects (FE) models that control for census
tract-lender fixed effects.18
Model 1 is directly comparable to standard tests except that we follow Avery et al. (1999)
and use application rather than transaction volume. We find that a one standard deviation
increase in application volume increases approval rates by approximately 0.7 percentage points
relative to 1.2 percentage points when we estimate this model replacing application volume with
housing sales based on home purchased mortgages19 or relative to earlier estimates based on
transaction volume of between 1.6 and 2.2 percentage points. Model 2 follows Avery et. al.
(1999) by adding the lender specific application volume to test for economies of scale in
17 The geometric average is used to represent application volume due to the structure of the data. The largest tracts
and lenders in terms of application volume are naturally overrepresented in the sample due to the large number of
applications to those tracts and to those specific lenders in those tracts.
18 The OLS estimated results for loans submitted to all lenders are qualitatively no different from those estimations
using loans submitted to consistent lenders. The results are available upon request.
19 This pattern persists throughout our analysis. All findings presented below for application volume are robust to
replacing application volume with the number of approved home purchases mortgages as a proxy for sales volume
where the effect size for sales volume is always moderately larger than the effect for application volume.
mortgage underwriting, and as in their paper we find a strong positive relationship between
lender specific application volumes and mortgage approvals with a one standard deviation
increase in lender’s tract application volume increasing the likelihood of approval by 2.2
percentage points. We also find the same negative relationship between overall tract volume and
mortgage approval after controlling for lender specific volume.
To address lender heterogeneity in underwriting, Models 3 and 4 include lender fixed
effects. Consistent with Harrison (2001), both model’s goodness of fits jump significantly as the
adjusted R-square doubles, from .04 to .11, while the coefficients of the information variables
fall by almost half. The sign reversal on overall market activity is robust to controlling for lender
specific heterogeneity, but the coefficient is no longer statistically significant. Though this is not
the focus of our paper, a close examination reveals that the estimates on applicant gender and
race as well as neighborhood racial composition decline in magnitude again consistent with
Model 5 presents the results for the fixed effects specification using the sample for the 7-
consecutive-year lenders that controls for neighborhood-lender heterogeneity by pseudo first
differencing. After controlling for tract-lender fixed effects, the coefficient on tract application
volume increases dramatically, and is consistent with a one-standard deviation in tract
application volume increasing the likelihood of approval by 1.4 percentage points, which differs
statistically from the estimate in the lender fixed effects model.20 The coefficient on lender
specific application volume is relatively stable and does not differ statistically from the lender
fixed effects estimate. Model 6 presents the estimation using a more restrictive sample of
consistent lenders where lenders must be present in the state for at least eight consecutive years
during the sample period. The results are quite similar to those in Model 5.
20 Seemingly unrelated regression was used to estimate the correlation between the lender fixed effect and tract-
lender fixed effect estimates and test for statistical differences between them.
The tract application volume estimates differ statistically from those in the lender fixed
effects model, and so we can reject cross-sectional style models that do not control for tract fixed
effects. Somewhat surprisingly, the bias from omitted neighborhood fixed effects is negative
suggesting that high application volume tracts have worse unobservables. However, this finding
is consistent with Avery et al.’s interpretation of their negative estimate of tract application
volume. They suggested that tract unobservables that contribute to high denial rates might also
increase application volume from the repeated submission of weak mortgage applications, thus
creating a negative correlation between volume and mortgage approval rates. Our model
removes those unobservables and again finds a positive effect of tract volume.
Table 3 examines the results of our pseudo first differenced models where we instrument
with lagged application volumes to address concerns about the endogeneity of application
volume. Panel A presents endogeneity tests for our preferred model that uses four period and
longer lags as instruments.21 The overall Hansen’s J-statistic is 14.166, which is highly
significant and confirms the hypothesis that the pseudo first differenced application volumes are
correlated with the cohort mean errors. Further, we also reject the exogeneity of tract or lender
specific tract application volume individually after controlling for the other application volume
Panel B presents the estimation results for a variety of instrumental variable
specifications. We begin with a model where the closest lag is two periods plus we include all
available lags up to a five period lag. Following Nickell (1981), the two period lag is chosen
because the unobservable in the pseudo-first differenced model contains information from the
current and the previous periods. Then, in each column that follows, we drop the closest lag in
order to estimate a model that is robust to persistence in the errors for up to one additional
21 Later tests will show that lags 4 and 5 are relevant and valid instruments for current pseudo first differenced
period. Model 2 and 3 which use differenced loan volumes lagged 2 to 5 and 3 to 5 periods as the
instruments, respectively, fail both the over-identification and closest lag exogeneity tests,
implying that the 2 and 3 period lags do not provide legitimate, exogeneous instruments for
current loan volume. Model 4 which uses differenced volumes lagged 4 and 5 periods as
instruments passes both over-identification and closest lag exogeneity tests suggesting that lags
of four years or longer provide legitimate instruments. Model 5 instrumented with the
differenced volumes lagged only 5 periods is just identified and fails the weak identification test
implying that 5 period lag alone does not provide enough information for identification. We
repeat this exercise using a sample of lenders that appear for 8 or more consecutive years first
using the same set of lags, and then using the extra year to add an additional lag. The results of
the specification tests are fairly similar except that the over-identifiation and closet lag
exogeneity tests are rejected or are close to rejected using the 4th, 5th and 6th period lags as
In our preferred model where the 4 and 5 period lags are used as instruments, the
coefficient on tract application volume is positive, but somewhat smaller and weaker in statistical
significance than the fixed effect estimates in Table 2.22 Further, as the closest lag is moved
further back in time, the estimated coefficient on tract application volume always falls as long as
the instruments are sufficiently powerful to identify the model. This pattern is consistent with a
positive correlation between short-run shocks to neighborhood circumstances and/or lender
confidence in a neighborhood and future underwriting standards, which Heckman (1981a, b)
described as “spurious” state dependence. Even when the closest lag is four periods, we cannot
rule out the possibility that the effect of these shocks continue to persist, especially given the
failure of the overidentification test for the model that uses the 4th, 5th, and 6th lag as instruments.
22 The estimates increase substantially when the two period lag is included as an instrument. This increase relative to
the simple fixed effect estimates may arise due to measurement error associated with the differenced loan volume.
Instrumental variables estimation with a very short lag likely eliminates the measurement error and the associated
attenuation bias without eliminating the endogeneity bias.
While we cannot rule out information externalities of the magnitude identified in previous
papers, our findings suggest that short lags of the type used in previous analyses do not provide
convincing evidence of the existence information externalities, and our evidence on the existence
of these externalities is weak at best.
On the other hand, our models where the 4th period lag is the closest lag provide strong
evidence of lender economies of scale at the census tract level. Our instrumental variable
estimates imply that a one standard deviation increase in lender specific application volume
increases the likelihood of loan approval by between 4.3 and 5.2 percentages points about double
the traditional OLS estimates presented in Table 2. In addition, the estimates increase in
magnitude as we increase the length of the closest lag. Therefore, even if the use of a four period
lag has not completely eliminated bias from persistent neighborhood shocks, the resulting
estimates are likely biased downwards by this persistence.
In summary, our estimates clearly indicate that cross-sectional models are biased by the
omissions of neighborhood fixed effects, but that simple fixed effects are not feasible in these
models because neither the individual loan volume nor the tract loan volume is strictly
exogenous. Further, models that use short lags of loan volumes as is standard in this literature are
also biased due to persistence in time-varying neighborhood unobservables. We are able to
minimize this bias by using longer lags than previous analyses. We find that the estimates of
bank scale economies increase in magnitude as this bias is reduced providing compelling
evidence of lender scale economies at the neighborhood level. However, our estimates of
neighborhood information externalities weaken as lags are lengthened. While we cannot rule out
the existence of such externalities, our analysis suggests that the previous evidence should be
viewed with some skepticism and even our estimates, which are based on differences and
substantially longer lags than previous analyses, may be subject to bias.
Incorporating Non-Depository Lenders and Refinance Applications
In this section, we examine whether our estimates are robust to alternative measures of
application volume. For example, depository lenders may obtain information from the activities
of non-depository mortgage banks, as well as from their own and other lender’s experiences with
mortgage refinances. In order to allow for these possibilities, four alternative specifications are
estimated to explain the approval decision concerning home purchase mortgage application by
our restricted sample of lenders. Regarding the non-depository lenders, we first estimate a model
including home purchase mortgage application submitted to those lenders into the calculation of
the overall tract loan volume, and in the second specification we add non-depository application
volume as a new variable. Similarly, in the third specification, application volumes are calculated
as the sum of both home purchase and refinance applications to depository lenders, and in the
fourth specification, we add separate controls for tract and lender specific refinance application
Table 4 presents the estimates for the pseudo-first differenced instrumental variables
models using these new controls. Panel A contains the results for the variables that include
information on non-depository lenders. From left to right, the first half of the panel contains the
first specification where non-depository lenders are included in tract volume. The results are
qualitatively similar with no change in the estimated coefficient of lender application volume
after the addition of non-depository application volume, but the tract application volume estimate
is somewhat smaller with an estimated effect size reduced to a 0.7 percentage point increase in
application acceptance rates for a one standard deviation increase in all application volume23.
The second half of the panel contains the model with a separate control for non-depository
23 The standard deviation is 1309.97 for all mortgage application loans including those from non-depository lenders
for the sample of consistent lender.
application volume, but those models fail the weak instrument test with the 4 period lag and we
cannot statistically separate out depository from non-depository application volume.
The second panel presents the results using refinance application volume. The first half
re-estimates the standard model simply including refinance applications in the volume, and again
qualitatively results are similar, but the effect size of tract application volume increases to 3
percentage points compared to 1.2 in the preferred specification, while the effect of bank specific
application volume reduces from 5.2 to 3.9. The second half presents the results where the
refinance application volume is split out separately. The refinance application volume variables
are insignificant and small and the magnitude of the estimates for the tract level and lender
specific application volumes for home purchase loans return to the levels observed in Table 3
suggesting that our findings are driven by the influence of home purchase mortgage activity.
4. Summary and Conclusion
This paper tests the hypothesis proposed by Lang and Nakamura (1990) that mortgage
underwriting is a dynamic information gathering process as more applications generate more
neighborhood information and then induce lenders to approve those applications with higher
probabilities. To identify this true state dependence, our analysis controls for the spurious state
dependence stemming from neighborhood-lender fixed effects which are likely correlated with
market activity and underwriting standards, as well as the persistence in the time-varying
neighborhood unobservables using information from lagged market activity. Specifically, we
estimate models of the loan approval decision for home purchase mortgages in Florida as a
function of both census tract application volume and the application volume to the lender
themselves in that census tract controlling for census tract fixed effects and instrumenting for
current application volume with lagged values of application volume.
While we cannot rule out information externalities of the magnitude identified in
previous studies, our specification tests imply the existence of tract fixed effects that likely bias
existing cross-sectional estimates of information externalities, as well as the existence of
persistence in short-run shocks to neighborhoods that suggest that simple one or two period lags
of application volume are not sufficient to assure exogeneity even after removing tract fixed
effects. Further, our estimated effects for tract volume are only weakly significant, and even with
a four period lag the estimates may still be vulnerable to bias from persistent shocks. On the
other hand, our estimates strongly support the existence of economies of scale in lending at the
neighborhood level as proposed by Avery et al., and our preferred estimates are about double
traditional OLS estimates of these effects. All results are robust to considering the applications
associated with non-depository lenders or refinance mortgage applications.
1. J.G. Altonji and L.M. Segal, Small-Sample Bias in GMM Estimation of Covariance
Structures, Journal of Business and Economics Statistics, 14, 353-366 (1996).
2. M. Arellano and B. Honore, Panel data models: Some recent developments in the Handbook
of Econometrics Vol 5 (Eds. J Heckman and E. Leamer). Amsterdam: North-Holland (2001).
3. R.B. Avery, P.E. Beeson, and M.S. Sniderman, Neighborhood information and home
mortgage lending, Journal of Urban Economics, 45, 287-310 (1999).
4. R.B. Avery and K.A. Samolyk, Bank Consolidation and Small Business Lending: The Role of
Community Banks, Journal of Financial Services Research, 25, 291-325, (2004).
5. M. Bertrand, E. Duflo, and S. Mullainathan, How Much Should We Trust Differences-in-
Differences Estimators? Quarterly Journal of Economics, 119, 249-275, (2004).
6. M. Blackburn and T. Vermilyea, The Role of Information Externalities and Scale Economies
in Home Mortgage Lending Decisions, Journal of Urban Economics, 61, 71-85, (2007).
7. P.S. Calem, Mortgage credit availability in low- and moderate-income minority
neighborhoods: are information externalities critical? Journal of Real Estate Finance and
Economics, 13, 71-89, (1996).
8. J.H. Carr and J. Schuetz, Financial Services in Distressed Communities: Framing the Issue,
Finding Solutions. Washington, D.C.: Fannie Mae Foundation. Washington, D.C., (2001).
9. K.C. Engel and P.A. McCoy, The CRA Implications of Predatory Lending. Fordham Urban
Law Journal, 29, 1571-1606, (2002).
10. W.C. Gruben, J.A. Neuberger, and R.H. Schmidt, Imperfect information and the Community
Reinvestment Act, Federal Reserve Bank of San Fransisco Economic Review, Summer, 3, 27-46,
11. D.M. Harrison, The Importance of Lender Heterogeneity in Mortgage Lending, Journal of Urban
Economics, 49, 285-309, (2001).
12. J.J. Heckman, Statistical Models for Discrete Panel Data in Structural Analysis of Discrete Data
with Econometric Applications (Eds. C.F. Manski and D. McFadden). Cambridge: MIT Press (1981a).
13. J.J. Heckman, Heterogeneity and State Dependence in Studies in Labor Markets (Ed. S. Rosen).
Chicago: University of Chicago Press (1981b).
14. D. Holtz-Eakin, W. Newey, and H. Rosen, Estimating Vector Autoregressions with Panel
Data, Econometrica, 56, 1371-1396, (1988).
15. G. Kezdi, Robust Standard Error Estimation in Fixed Effect Panel Models, Budapest University of
Economics working paper, (2003).
16. W.W. Lang and L.I. Nakamura, A Model of Redlining, Journal of Urban Economics, 33, 223-234,
17. W.W. Lang and L.I. Nakamura, The Dynamics of Credit Markets in a Model with Learning, Journal
of Monetary Economics, 26, 305-18, (1990).
18. E. Lin, Information, Neighborhood Characteristics, and Home Mortgage Lending, Journal of Urban
Economics, 49, 337-355 (2001).
19. D.C. Ling and S.M. Wachter, Information Externalities and Home Mortgage Underwriting, Journal
of Urban Economics, 44, 317-332, (1998).
20. S. Nickell, Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 1417-1426, (1981).
21. S.L. Ross and G.M.B. Tootell, Redlining, the community reinvestment act, and private
mortgage insurance, Journal of Urban Economics, 55, 278-297, (2004).
22. S.L. Ross and J. Yinger. The Color of Credit: Mortgage Discrimination, Research Methodology, and
Fair Lending Enforcement. Cambridge, MA: MIT Press, 2002.
23. G.M.B. Tootell, Redlining in Boston: Do mortgage lenders discriminate against
neighborhoods? Quarterly Journal of Economics, 111, 1049-1079 (1996).
Table 1: Descriptive Statistics
Mean Variable definition S.D. S.D.
Whether the loan application is approved
Whether the property is owner-occupied as a principal dwelling
Applicant’s income (thousands of dollars)
Whether it is a conventional loan
Whether it is a FHA-insured loans
Whether it is a VA-guaranteed loans
Whether it is an application by a single female applicant
Whether it is an application by a couple
Whether it is an application by an applicant with unspecified sex
Whether it is an application by a black applicant
Whether it is an application by a Hispanic applicant
Whether it is an application by other minority applicants
Whether it is an application by an applicant without specified race .09
Census tract characteristic variables
Minority Percentage of minority population
Age55 Percentage of population age 55 and over
Highsch Percentage of population over age 25 with a high school education .56
Unemploy Percentage of workforce over age 16 that is unemployed
Medinc Median household income (thousands of dollars)
Pubass Percentage of households receiving public assistance income
Units Housing units
Ownerocc Housing units that are owner occupied
Vacant Housing units that are vacant
Yrbuilt Median year structure built
Medrent Median gross rent
Medvalue Median house value (thousands of dollars)
# of Obs. Number of observations
Application volume by all depository lenders in a census tract
Lender specific application volume in a census tract
Note: The full sample includes all 1-to4 home purchase mortgage applications submitted to depository lenders for
properties located in the state of Florida, between 1992 and 2007, reported in Home Mortgage Disclosure Act
(HMDA) data collected by the Federal Financial Institutions Examination Council (FFIEC). The number of original
HMDA loan applications is 25,237,001, the restriction of 1-to-4 home purchase reduces it to 9,404,453, being
submitted to a depository lender and requiring a well defined underwriting outcome (approved or denied) further
reduces it to 4,657,103, and excluding those loans with an incorrect location identifier leads to a final sample of
4,340,368. The restricted sample consists of those loans submitted to 2,259 depository lenders which had been
operating consecutively over seven years in Florida. The census tract characteristic variables are obtained from the
1990 Census of Population and Housing.
Table 2: OLS and Fixed Effect (FE) estimations explaining the probabilities of loan approval for loans submitted to consistent lenders.
OLS W/O Lender Dummies
OLS With Lender Dummies
Fixed Effects (FE) Model
model 4 and 5
coef se coef se coef se coef se coef se coef se
Information Variables Unit: 10-4
T_vol .0930** .0433 -.0911**
.0593** .0278 -.0361
.0238 t-stat = 5.309 .162***
B_vol N/A .933 N/A .625 .505 t-stat = .5083 .501
Income .0000 .0000 .0000 .0000 .0000 .0000
.0007 .0007 .0010 .0010 .0013 .0015
Lntype1 .0060 .0059 .0055 .0055 .0059 .0092 .0069
Lntype2 .0059 .0058 .0056 .0056 .0094
.0061 -.0003 .0071
.0060 .0059 .0056 .0055 .0059 .0086
Female1 .0009 .0009 .0008 .0008 .0008 .0009
Couple .0009 .0009 .0008 .0008 .0008 .0008
Othsex .0018 .0018 .0015 .0015 .0015 .0016
.0021 .0021 .0017 .0017 .0016 .0017
.0023 .0021 .0018 .0018 .0016 .0016
Othrace .0020 .0020 .0018 .0018 .0019 .0019
Norace .0017 .0017 .0015 .0015 .0014 .0015
Tract Attributes Yes Yes Yes Yes No No
Year Dummies Yes Yes Yes Yes Yes Yes
Lender Dummies No No Yes Yes Yes Yes
Cluster Correction Yes Yes Yes Yes Yes Yes
# of observations 2,908,705 2,908,705 2,908,705 2,908,705 2,782,916 2,527,793
Adj. R-square .0399 .0420 .1131 .1136 .0107 .0106
Note: This table reports results for four OLS specifications and two neighborhood FE specifications explaining the probabilities of loan approval for loans submitted to consistent
lenders using mortgage applications at the tract level to proxy the information. The dependent variables equals one if a mortgage application was explicitly approved and zero if
being explicitly denied. Under Model 1 and 2’s specifications, underwriting policies across lenders are treated homogenous, while Model 3 and 4 relax this restriction allowing
lender heterogeneity by including lender fixed effects. Model 5 is based on a pseudo first differenced sample of lenders having provided services consecutively in Florida over
seven years from 1992 to 2007, and Model 6 reports the results based on a more restricted pseudo first differenced sample of lenders with at least eight-consecutive-service-year
during the sample period. Seemingly unrelated regression was run using a sample of pseudo first differenced mortgage applications to estimate the correlation between the volume
estimates for the lender fixed effects and the tract-lender fixed effects models and test for statistical difference between those estimates. *** Significant at the 99% level. **
Significant at the 95% level. * Significant at the 90% level.
Table 3: Fixed Effect (FE) with and without Instrument Variables (IV) estimations explaining the probabilities of loan approval for loans submitted
to 7-consecutive-year and 8-consecutive-year lenders, respectively. Unit: 10-4.
Panel A: Exogeneity Test
Lags 4 and 5 as instruments Lags 4 and 5 as instruments
J-stat P-val J-stat P-val J-stat
Lags 4, 5 and 6 as instruments
P-val P-val J-stat P-val J-stat J-stat P-val
14.745 0.0001 16.597 0.0000
Panel B: FE estimations w/ and w/o instrumenting
Mod 2 Mod 3
Mod 9 Mod 10 Mod 1 Mod 4 Mod 5 Mod 6 Mod 7 Mod 8 Mod 11 Mod 13 Mod 14 Mod 15
Lag 5 N/A
Yes Yes Yes
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
2,782,916 2,075,923 2,527,793 2,025,829 1,860,347
Note: This table reports results for fifteen FE specifications with and without instrumenting using mortgage applications submitted to consistent lenders using mortgage
applications at the tract level to proxy the information. Ivreg2.ado was used to run the regressions using instruments and do the statistical tests. Those columns identified by 7-
consecutive-year lender report the results using a sample of applications submitted to those lender having provided services in Florida over seven consecutive years between 1992
and 2007, while those columns identified by 8-consecutive-year lender report the results using a sample of more consistent lenders, those having provided services at least 8
consecutive years during the sample period in Florida. Panel A reports the results for exogeneity tests using our preferred specification with the fourth and fifth lags as instruments.
Column 1 presents the Hansen J statistic and column 2 gives the p-value to reject the null hypothesis that those tested variables are exogenous. Column 3 reports the results of
exogeneity test for one variable when another one treated endogenous and column 4 gives the p-values to reject the null hypotheses. In panel B, weak identification tests are based
on the Kleibergen-Paap rk Wald F statistics and the critical values are from Table 5.1 and 5.2 in Stock and Yogo (2005). Over identification test are based on the Hansen J statistics
while the closest lag exogeneity test uses the Eichenbaum, Hansen, and Singleton statistics. *** Significant at the 99% level. ** Significant at the 95% level. * Significant at the
Table 4: Estimations for robustness with loans submitted by non-depository lenders or refinance loans submitted by depository lenders included.
Panel A: Estimations with loans submitted by non-depository lenders included
Tract Volume Changed
Variable Model 1 Model 2 Model 3 Model 4 Model 5
Instruments N/A Lags 2 to 5 Lags 3 to 5 Lags 4 to 5 Lag 5
(.0302) (.0701) (.0351) (.0287) (.0890)
(.462) (1.349) (1.731) (2.653) (7.569)
Non-depository Volume Added
Model 6 Model 7
N/A Lags 2 to 5
Lags 3 to 5
Lags 4 to 5
Year Fixed Effects Yes Yes
Year Fixed Effects
Weak Identification Test N/A
Weak Identification Test N/A
Over identification Test N/A N/A Over identification Test N/A N/A
# of Observations
Panel B: Estimations with refinance loans submitted by depository lenders included
Variable Model 1 Model 2
Instruments N/A Lags 2 to 5
2,075,923 # of Observations 2,075,923
Refinance Volumes Added
Model 6 Model 7
N/A Lags 2 to 5
Lags 3 to 5
Lags 4 to 5
Lags 3 to 5
Lags 4 to 5
Year Fixed Effects Yes Yes
Year Fixed Effects
Weak Identification Test N/A
Over identification Test N/A
N/A Over identification Test N/A N/A
Closest Lag Exogeneity
# of Observations
Closest Lag Exogeneity
# of Observations
2,075,923 1,037,962/Half Randomly Chosen Data
Note: This table reports results for ten FE specifications with and with instrumenting using applications submitted to non-depository lenders as well as refinance applications to the
consistent lenders. In first models in Panel A, tract application volume includes applications submitted to non-depository lenders included, while the second models add this
volume as a separate variable. Similarly, in Panel B, the first models use tract and lender specific application volume including refinance applications , while the second models
include refinance volumes as new variables. *** Significant at the 99% level. ** Significant at the 95% level. * Significant at the 90% level.
24 Stock and Yogo (2005) do not compute the critical values for weak identification tests with four endogenous variables. However, a close examination of Table 5.1 and 5.2 in
Stock and Yogo (2005) shows that critical values are very stable as the number of endogenous variables increases given a constant ratio of the number of endogenous regressors
and the number of instrumental variables. Therefore, with the large values for model 7 to 9 and the relatively small value for model 10, the associated p-value thresholds are clear.