ArticlePDF Available

Unraveling the Complexity: The Nexus Between Homelessness and Housing Prices in the San Francisco Bay Area and Throughout State of California (A Comprehensive Research Study)

Authors:

Abstract

This research investigates the intricate relationship between homelessness and housing prices, unraveling the complexities prevalent in both the localized context of the San Francisco Bay Area and the broader scope of the entire state of Califor-nia. Employing a multi-faceted approach, the study combines traditional research methodologies with advanced Artificial Intelligence (AI) and Machine Learning (ML) tools. Through comprehensive data analysis, the research explores the dynamics of homelessness and housing prices, identifying patterns and trends at both the regional and statewide levels. The study incorporates predictive modeling to discern future trends, spatial analytics to understand geographic variations, and natural language processing to gauge public sentiment. By evaluating the impact of policies on homelessness rates, the research seeks to provide actionable insights for poli-cymakers. Ethical considerations guide the implementation of AI, ensuring the responsible handling of sensitive data. Stakeholder collaboration and community engagement are integral components of this research, fostering a nuanced understanding of the challenges faced by diverse communities across the state. The research concludes with evidence-based policy recommendations, aiming to inform interventions that address the interconnected issues of homelessness and housing affordability in the San Francisco Bay Area and throughout the state of California. Introduction This research delves into the intricate relationship between homelessness and housing prices in the state of California, employing cutting-edge Artificial Intelligence (AI) and Machine Learning (ML) tools for an exhaustive analysis. The study aggregates diverse datasets from regions across California, utilizing predictive modeling to discern statewide trends and variations. Through advanced spatial analytics and natural language processing techniques, the research aims to uncover nuanced patterns in homelessness clusters, migration dynamics, and public sentiments across communities. Evaluating the impact of statewide policies on homelessness rates through causal inference models, the study identifies vulnerabilities and populations at risk on a large scale.
Science Set Journal of Economics Research
www.mkscienceset.com Sci Set J of Economics Res 2024
Review Article
Unraveling the Complexity: The Nexus Between Homelessness and Housing
Prices in the San Francisco Bay Area and Throughout State of California (A
Comprehensive Research Study)
Nathan Torento1, Antony Scott1, Samruddhi Mistry1, and Bahman Zohuri1,2*
1Graduate Students, Golden Gate University, Ageno School of Business, San Francisco, California, USA 94105
2Adjunct Professor and Project Advisor Professor, Golden Gate University, Ageno School of Business, San Francisco, California, USA 94105.
*Corresponding author: Bahman Zohuri, Adjunct Professor and Project Advisor Professor, Golden Gate University, Ageno School of
Business, San Francisco, California, USA 94105
Submitted: 05 January 2024 Accepted: 09 January 2024 Published: 12 January 2024
Citation: Nathan Torento, Antony Scott, Samruddhi Mistry, Bahman Zohuri (2024) Unraveling the Complexity: The Nexus Between
Homelessness and Housing Prices in the San Francisco Bay Area and Throughout State of California. Sci Set J of Economics Res 3(1), 01-11.
Page No: 01
Abstract
This research investigates the intricate relationship between homelessness and housing prices, unraveling the complexities
prevalent in both the localized context of the San Francisco Bay Area and the broader scope of the entire state of Califor-
nia. Employing a multi-faceted approach, the study combines traditional research methodologies with advanced Articial
Intelligence (AI) and Machine Learning (ML) tools. Through comprehensive data analysis, the research explores the dy-
namics of homelessness and housing prices, identifying patterns and trends at both the regional and statewide levels. The
study incorporates predictive modeling to discern future trends, spatial analytics to understand geographic variations, and
natural language processing to gauge public sentiment.
By evaluating the impact of policies on homelessness rates, the research seeks to provide actionable insights for poli-
cymakers. Ethical considerations guide the implementation of AI, ensuring the responsible handling of sensitive data.
Stakeholder collaboration and community engagement are integral components of this research, fostering a nuanced un-
derstanding of the challenges faced by diverse communities across the state. The research concludes with evidence-based
policy recommendations, aiming to inform interventions that address the interconnected issues of homelessness and hous-
ing aordability in the San Francisco Bay Area and throughout the state of California.
Keywords: Homelessness, Housing Prices, San Francisco Bay Area, California, Aordable Housing, Socioeconomic Diversity,
Articial Intelligence, Machine Learning, Gentrication, Policy Interventions.
Introduction
This research delves into the intricate relationship between
homelessness and housing prices in the state of California, em-
ploying cutting-edge Articial Intelligence (AI) and Machine
Learning (ML) tools for an exhaustive analysis. The study ag-
gregates diverse datasets from regions across California, utiliz-
ing predictive modeling to discern statewide trends and varia-
tions. Through advanced spatial analytics and natural language
processing techniques, the research aims to uncover nuanced
patterns in homelessness clusters, migration dynamics, and pub-
lic sentiments across communities. Evaluating the impact of
statewide policies on homelessness rates through causal infer-
ence models, the study identies vulnerabilities and populations
at risk on a large scale.
Ethical considerations guide the implementation of AI, ensuring
privacy and transparency. Stakeholder collaboration and com-
munity engagement play a pivotal role, contributing to a holistic
understanding of challenges faced by dierent communities. The
research concludes by translating ndings into evidence-based
policy recommendations at the state level, oering a roadmap
for inclusive and equitable interventions to address the complex
nexus of homelessness and housing aordability across Califor-
nia's diverse landscapes.
In the heart of the technological revolution and cultural diver-
sity, the San Francisco Bay Area stands as a testament to the
complexities that arise when rapid urban development collides
with socioeconomic diversity. One of the most pressing issues
gripping the region is the intricate relationship between home-
lessness and soaring housing prices. Public perception in San
Francisco paints a vivid picture of a worsening homelessness
crisis, exacerbating the challenges posed by an ever-expanding
housing market.
www.mkscienceset.com
Page No: 02 Sci Set J of Economics Res 2024
The relationship between homelessness and housing prices in
urban areas is a complex and multifaceted issue, particularly in
rapidly changing and socioeconomically diverse regions like
the San Francisco Bay Area. Public perception in San Francis-
co suggests a worsening homelessness crisis alongside soaring
housing prices. On the contrary, there are views positing that ris-
ing homelessness negatively impacts housing prices. These con-
icting opinions emphasize the need for a data-driven approach
to understanding the true nature of this relationship.
The Overall Homelessness Crisis
The homelessness crisis in San Francisco is undeniable, with
visible tent encampments dotting the cityscape and individuals
seeking shelter in public spaces. Despite concerted eorts by lo-
cal governments and nonprots, the numbers continue to rise,
sparking concerns about the adequacy of existing strategies.
One critical factor contributing to the homelessness crisis is the
scarcity of aordable housing. As housing prices surge, the most
vulnerable members of the community nd themselves margin-
alized, unable to secure stable living arrangements. The mis-
match between income levels and housing costs has left many
individuals and families on the brink of homelessness, highlight-
ing the urgency of addressing the aordability gap.
Soaring Housing Prices
The San Francisco Bay Area, often hailed as an economic pow-
erhouse, has experienced a relentless surge in housing prices.
The demand for housing, fueled by the tech industry's boom, has
led to a shortage of aordable homes, pushing prices to unprec-
edented levels. The region's desirability has attracted a ood of
high-income earners, driving up the cost of living and making it
increasingly dicult for lower-income residents to nd suitable
accommodation.

Gentrication, a byproduct of rising housing prices, plays a piv-
otal role in reshaping urban landscapes. As auent individuals
move into once-aordable neighborhoods, property values rise,
and long-time residents nd themselves economically displaced.
Gentrication not only contributes to the homelessness crisis but
also underscores the need for comprehensive urban development
policies that prioritize inclusivity and aordability.
Policy Challenges
Addressing the homelessness-housing price conundrum requires
a multifaceted approach. Policymakers must grapple with the
challenge of balancing economic growth with social equity.
Strategies to increase aordable housing options, protect vul-
nerable populations, and mitigate the impacts of gentrication
are essential.
Innovative Solutions
To combat homelessness and mitigate the eects of soaring
housing prices, the San Francisco Bay Area must embrace in-
novative solutions. This includes the development of aordable
housing projects, the expansion of supportive services, and the
implementation of policies that protect vulnerable populations
from economic displacement.
In conclusion, the intricate relationship between homelessness
and housing prices in the San Francisco Bay Area highlights the
urgent need for comprehensive and innovative solutions. As the
region grapples with the challenges of rapid urban development
and socioeconomic diversity, it is imperative that stakeholders
collaborate to create a more equitable and sustainable future. By
addressing the root causes of homelessness and implementing
thoughtful policies, the San Francisco Bay Area can strive to-
wards a more inclusive and resilient community.
However, bear in mind that, research on homelessness and hous-
ing prices in California State particularly, in the San Francisco
Bay Area demands innovative approaches to capture the com-
plexity of the issue. Leveraging Articial Intelligence (AI) and
Machine Learning (ML) tools can enhance the depth and accu-
racy of one’s research ndings. As such, we utilized AI and ML
tools as presented in Section 7.0 of this paper.
Furthermore, here is a guide on how to incorporate these tech-
nologies into your research:
1. Clearly outline your research
objectives. Identify specic questions or challenges within
the realm of homelessness and housing prices that AI and
ML can help address. For example, you might want to pre-
dict future homelessness trends based on housing market
uctuations or analyze the impact of specic policies on
homelessness rates.
2. Data Collection: Acquire relevant datasets that encom-
pass a wide range of variables, including housing prices,
demographic information, social services availability, and
historical homelessness trends. Publicly available datasets,
government records, and nonprot databases can be valu-
able sources. Ensure the data is cleaned and standardized
for eective analysis.
3. Implement Predictive Analytics: Utilize machine learning
algorithms for predictive analytics. Regression models can
help identify patterns and relationships between housing
prices and homelessness rates over time. Predictive mod-
els can forecast future trends, allowing for proactive policy
measures.
4. Spatial Analysis: Leverage AI to perform spatial analysis.
GIS (Geographic Information System) tools, combined with
machine learning algorithms, can help identify geographic
patterns in homelessness distribution and correlate them
with housing price uctuations. This can provide insights
into localized factors contributing to homelessness.
5. Natural Language Processing (NLP): Implement NLP
techniques to analyze qualitative data. Extract valuable in-
formation from news articles, public forums, or social me-
dia related to homelessness and housing prices. This can
provide a real-time understanding of public sentiment and
identify emerging issues.
6. Identify Vulnerable Populations: Use clustering algo-
rithms to identify vulnerable populations at a higher risk of
homelessness due to housing market dynamics. By analyz-
ing demographic data, socioeconomic indicators, and hous-
ing trends, AI can help target interventions more eectively.
7. Evaluate Policy Impact: Assess the impact of existing
policies or proposed interventions using causal inference
models. Machine learning algorithms can help establish
www.mkscienceset.com
Page No: 03 Sci Set J of Economics Res 2024
causal relationships between policy changes and changes
in homelessness rates, aiding policymakers in making in-
formed decisions.
8. Ethical Considerations: Pay careful attention to ethical
considerations, especially when dealing with sensitive data
related to homelessness. Ensure data privacy and security
and be transparent about the limitations and biases inherent
in the AI and ML models.
9. Collaborate with Stakeholders: Collaborate with local au-
thorities, nonprots, and community organizations to vali-
date ndings and ensure the research aligns with real-world
challenges. Engaging stakeholders can provide valuable
context and help translate research ndings into actionable
policies.
10.    Clearly communicate
your ndings to both academic and non-academic audienc-
es. Visualization tools, such as charts and maps generated
through AI-driven analytics, can enhance the accessibility
of your research.
By integrating AI and ML tools into your research on home-
lessness and housing prices in the San Francisco Bay Area, you
can uncover nuanced insights and contribute to the development
of more eective and targeted solutions to address this pressing
societal issue.
As we extend our focus beyond the iconic San Francisco Bay
Area, the intricate relationship between homelessness and hous-
ing prices becomes an even more pressing concern for the entire
state of California. With diverse urban landscapes, economic
disparities, and a housing market that resonates with the chal-
lenges of the tech-driven economy, leveraging Articial Intel-
ligence (AI) and Machine Learning (ML) tools becomes para-
mount for a comprehensive understanding of this complex issue.
1. Statewide Data Aggregation: Begin by aggregating com-
prehensive datasets from various regions across California.
This should include housing price indices, demographic
information, employment data, and existing homelessness
statistics. AI-powered data aggregation tools can streamline
this process, ensuring that the research captures the nuances
of each distinct locality.
2. Predictive Modeling for Statewide Trends: Implement
predictive modeling to discern statewide trends in home-
lessness and housing prices. Machine learning algorithms
can analyze historical data to predict future patterns, of-
fering invaluable insights for policymakers in planning re-
gion-specic interventions.
3. Comparative Analysis Between Regions: Leverage AI
and ML tools to conduct a comparative analysis of dier-
ent regions within California. Explore variations in housing
market dynamics, demographic proles, and socioeconom-
ic factors to identify unique challenges and opportunities in
addressing homelessness.
4. Advanced Spatial Analytics: Utilize advanced GIS tools
and machine learning algorithms to conduct spatial analysis
on a larger scale. This could involve mapping out clusters of
homelessness, understanding migration patterns, and identi-
fying correlations between housing price trends and vulner-
able population distribution across the state.
5. Sentiment Analysis Across Communities: Extend natu-
ral language processing techniques to analyze sentiments
across diverse communities in California. Extract insights
from social media, community forums, and local news
sources to understand public perceptions and concerns re-
lated to homelessness and housing prices.
6. Policy Impact Evaluation at the State Level: Employ
AI-driven causal inference models to assess the impact of
statewide policies on homelessness rates. Evaluate the ef-
fectiveness of initiatives such as housing aordability pro-
grams and rent control policies, providing a comprehensive
overview of their success and potential areas for improve-
ment.
7.     Apply clus-
tering algorithms to identify statewide vulnerabilities and
populations at higher risk of homelessness due to systemic
factors. This can aid in the development of targeted inter-
ventions and support systems on a larger scale.
8. Ethical AI Implementation: Given the diversity and scale
of California, prioritize ethical considerations in data han-
dling, ensuring privacy and security. Transparently com-
municate the ethical practices employed in the AI and ML
models to build trust among stakeholders.
9. Stakeholder Collaboration and Community Engage-
ment: Collaborate with state-level agencies, nonprots, and
community organizations to ensure the research aligns with
the broader Californian context. Engaging with stakehold-
ers from diverse regions ensures a holistic understanding of
the challenges faced by dierent communities.
10. Policy Recommendations for Statewide Impact: Trans-
late research ndings into actionable policy recommen-
dations at the state level. AI-generated visualizations and
analytical insights can contribute to evidence-based pol-
icymaking, helping shape interventions that address the
interconnected issues of homelessness and housing prices
statewide.
By extending our research to encompass the entire state of Cal-
ifornia, we can leverage AI and ML tools to unveil patterns,
correlations, and opportunities for impactful interventions. The
integration of technology into this research not only enhances its
depth but also lays the foundation for a more inclusive and equi-
table approach to addressing homelessness and housing aord-
ability challenges across the diverse landscapes of California.

The main objective of this research is to understand the con-
nection between homelessness levels and housing prices across
the Bay Area and extend it throughout the State of California.
To achieve this, two primary data sources were leveraged. 1)
Government datasets tracking annual homelessness counts by
various demographics in California from 2017 to 2022 (Cali-
fornia Interagency Council on Homelessness, 2023), and 2) a
dataset from the California Association of Realtors detailing me-
dian prices of existing single-family homes from 1990 to 2023
(California Association of Realtors, 2023). [1-3]
Step one was preparing the data for analysis. We began by align-
ing the datasets to the overlapping time frame of 2017 to 2022
and then integrating them based on the year and county, with a
particular focus on "Continuum of Care" groupings, which we
will elaborate on later. Our analysis was conducted at three lev-
els: San Francisco, the broader Bay Area, and the entire state of
California.
www.mkscienceset.com
Page No: 04 Sci Set J of Economics Res 2024
The Python programming language was used in each step of this
study, starting with the initial exploratory data analysis to iden-
tify trends and patterns, followed by more targeted regression
analysis to pinpoint demographic factors most strongly associat-
ed with homelessness and median housing prices. In the follow-
ing section, we delve deeper into the methodology and ndings
of our study.
Data Collection and Cleaning
The following steps were taken for consideration of this class
project, assigned to us by our instructor and advising professor.
Homelessness Dataset
In our study, we sourced our homelessness data from Data.gov,
a platform managed by the U.S. government for public data
dissemination. The dataset, titled "People Receiving Homeless
Response Services by Age, Race, Ethnicity, and Gender," com-
prises four subsets corresponding to these demographic catego-
ries across California. Each subset annually records data from
2007 onwards, detailing unique individuals receiving homeless
response services within dierent demographic subcategories.
Key to our analysis is the concept of the Continuum of Care
(CoC), a statewide initiative established by the U.S. Department
of Housing and Urban Development. The CoC program aims to
facilitate rapid rehousing and long-term support for the home-
less (U.S. Department of Housing and Urban Development,
2023). In this paper, "CoCs" refers both to the regional planning
bodies and the geographic areas they cover. Notably, in Califor-
nia, some CoCs encompass multiple counties. [1-3]
Each dataset row presents the year, CoC, a demographic sub-
category, and the number of individuals receiving services for
that particular group. The age categories are '18-24', '25-34',
'35-44', '45-54', '55-64', '65+', 'Under 18', and 'Unknown'. Race
categories include 'American Indian, Alaska Native, or Indige-
nous', 'Asian or Asian American', 'Black, African American, or
African', 'Multiple Races', 'Native Hawaiian or Pacic Islander',
'Unknown', and 'White'. Ethnicity is broken down into 'Hispanic/
Latinx', 'Not Hispanic/Latinx', and 'Unknown'. Gender catego-
ries comprise 'Female', 'Male', 'Non-Singular Gender', 'Ques-
tioning Gender', 'Transgender', and 'Unknown'.
Table 1: Merged Homeless Demographic Data Sample
Table 2: Housing Prices Data Sample
www.mkscienceset.com
Page No: 05 Sci Set J of Economics Res 2024
Figure 3: Distribution of Demographic Groups by Year for San Francisco from 2017 to 2022
Figure 2: Total Homeless Count and Mean Housing Prices in San Francisco from 2017 to 2022
Figure 1: Distribution of Demographic Groups by Year for San Francisco from 2017 to 2022
www.mkscienceset.com
Page No: 06 Sci Set J of Economics Res 2024
Figure 6: Scatterplot of Prices and Homelessness counts by County with Los Angeles
Figure 5: Total Homeless Count And Mean Housing Prices In California From 2017 To 2022
Figure 4: Relationship between Prices and Homeless by County
www.mkscienceset.com
Page No: 07 Sci Set J of Economics Res 2024
Figure 7: Scatterplot of Prices and Homelessness counts by County without Los Angeles
To streamline the data analysis process, we carefully developed
custom functions within our code for ecient data cleaning and
merging to create tables like the one above. These functions
were crucial in consolidating and preparing the data from these
multiple datasets into one unied format sucient for our anal-
ysis. This would also inform the reliability and validity of our
subsequent ndings.
Limitations
It is crucial to emphasize that the homelessness dataset, which
captures individuals who have accessed homeless response ser-
vices within a Continuum of Care (CoC) for a specic year, has
several inherent limitations that are crucial for understanding the
broader context of our ndings. Firstly, the dataset likely un-
derrepresents the total homeless population, as it includes only
those who have sought and received services. Many individuals
who are homeless may not access these services due to barriers
such as lack of awareness, mistrust of authorities, or logistical
challenges. Additionally, there is a potential issue of data du-
plication, as individuals might move between dierent CoCs
and be counted more than once, leading to inated gures and a
potentially inaccurate representation of the homelessness situa-
tion in a particular area. Furthermore, while the dataset provides
demographic breakdowns, it may not fully capture the diversity
and complexity of the homeless population. Groups with undoc-
umented status or those not conforming to traditional gender cat-
egories might be underrepresented.
Another important thing to note is that the dataset's limitations
include temporal constraints. Its annual nature might not eec-
tively capture the dynamic nature of homelessness, which can
uctuate signicantly within a year due to factors like season-
al employment, weather conditions, and policy changes. The
variability in the availability and accessibility of homeless re-
sponse services between dierent Continuum of Cares (CoCs)
can also aect the accuracy of the data, inuencing who gets
counted. Additionally, dierent CoCs are likely to employ dif-
ferent methodologies for data collection on the ground, leading
to inconsistencies in data gathering and reporting, which can af-
fect the comparability of data across regions. Policy and funding
changes at local, state, and federal levels can inuence both the
availability of services and individuals' willingness to seek them
out, leading to year-to-year uctuations in the data that may not
solely reect changes in the homeless population. Moreover, the
dataset may not adequately reect long-term trends in homeless-
ness, focusing instead on the conditions of a specic year.
Recognizing these limitations is essential for a nuanced interpre-
tation of the data. While the dataset provides valuable insights
into the demographics and scale of homelessness as captured
through service provision, it should be viewed as a starting point
for analysis and our initial proxy for the count of people expe-
riencing homelessness, rather than a denitive portrayal of the
homelessness situation across California.
Housing Prices Dataset
For housing prices, we rely on the California Association of Re-
altors who collect and provide data on historical housing trends.
In one page, they provide the Median Prices of Existing Sin-
gle-Family Homes across California from 1990 to 2023. While
this dataset oers simplied and potentially signicant insights
into the housing market, it's crucial to acknowledge the many
limitations of selecting this dataset. See Table-2 in above.
Limitations
Firstly, the dataset focuses solely on single-family homes,
excluding other housing types like apartments, condos, and
multi-family units. This exclusion can skew our understanding
of the overall housing market, especially in urban areas where
diverse housing types are more common such as houses split
into multiple rooms for rent in San Francisco. Secondly, the
dataset presents median prices, which, while useful for indicat-
ing central tendencies, do not reect the range and variability of
housing prices within each region. This causes us to lose out on
insights about high-end and low-end market dynamics.
Another limitation is the lack of granularity in geographical
terms. The dataset does not dierentiate between housing pric-
www.mkscienceset.com
Page No: 08 Sci Set J of Economics Res 2024
es at the neighborhood level, which can vary signicantly even
within the same city or county. This lack of specicity can lead
to oversimplied interpretations of the housing market. Further-
more, the dataset does not account for the impact of external
economic factors such as ination, interest rates, and econom-
ic downturns, which can signicantly inuence housing prices.
These factors are crucial for a comprehensive understanding of
market dynamics but are also not directly captured in the dataset.
Finally, it is important to note that housing prices are inuenced
by a multitude of factors, including local policies, demographic
changes, and shifts in housing demand and supply. Our analysis,
therefore, requires careful consideration of these variables and
an acknowledgment that the dataset serves as a limited represen-
tation of the complex housing market.
Despite these limitations, however, the dataset remains a valu-
able tool for our study, and it is the only one we’ve found that
was publicly available and had comprehensive coverage of
values for all counties in California. By being mindful of these
constraints, we can use the data eectively as a proxy for under-
standing broader housing market trends in California.
Data Wrangling
Extensive data wrangling had to be performed before even be-
ginning to explore the data (the accompanying .html or .ipynb
le to this report goes is beyond the scope of this paper for pub-
lishing. If you are interested in obtaining a copy of it, please
reach out to authors). Each of the four homelessness datasets had
to have their rows for dierent demographic groups split into
columns. Therefore, instead of having, say, “Year: 2017, CoC:
Santa Clara, Age: 18-24, Experiencing Homeless Count: n”
and “Year: 2017, CoC: Santa Clara, Age: 25-34, Experiencing
Homeless Count: n2”, we had consolidated rows to “Year: 2017,
CoC: Santa Clara, Age: 18-24_Homeless: n, Age: 25-34_Home-
less: n2”.
This way, we could combine all the datasets and match the rows
by Year and CoC. Then, for the median housing prices dataset,
the data was actually split into months since 1990. We combined
the data by getting the mean values per county per year. We then
ltered for data only from 2017-2022, added the average values
for counties in the Bay Area. Finally, we combined both datasets
to have one nal dataset containing the Year, CoC, the multiple
columns for Homelessness Count by Demographic Group, and
Prices.
Exploratory Data Analysis (EDA)
In this section, through all above gures from Figure-1 to Fig-
ure-7, we show what our python algorithm was plotting based on
our collective data.
EDA: San Francisco
If we look at Figure-1 showing the stacked bar chart of demo-
graphic distribution for the homeless in San Francisco by year,
are observed. Firstly, it is evident that there are always more val-
ues represented by age groups than other groups. This suggests
that the completeness of data collection uctuates over time, im-
plying that there are challenges in consistently gathering full de-
mographic information. In terms of age, the age groups ‘25-34’
and ‘55-64’ display notable representation, indicating that these
age brackets are either more aected by the conditions being
measured or are more likely to be included in the data collection
process. In terms of gender, more male individuals access home-
less response services, indicating a higher prevalence of home-
lessness among male-identifying individuals, or a higher pro-
pensity for men to seek out homeless response services. In terms
of various racial groups, the majority is shared by the ‘'Black,
African American, or African' group, followed by the ‘White’
group, while the rest of the categories present in lower numbers.
The trend from 2020 to 2023 shows a discernible increase in
counts, which may reect the consequences of recent socio-eco-
nomic events or changes in the data collection methodology.
There is a noticeable increase in the number of individuals re-
ceiving services from 2020 to 2022, which could be attributed
to the economic impact of the COVID-19 pandemic. The groups
identied as Non-Singular Gender and Transgender have lower
visible counts, which could be indicative of their actual pop-
ulation numbers or point to larger systemic barriers in service
access or data inclusion. The 'Unknown' category for age and
race shows that there is a portion of the homeless population for
which demographic data is not captured, reecting inherent data
collection challenges.
Figure 2 above for San Francisco from 2017 to 2022 shows a
concurrent rise in both the total homeless count and mean hous-
ing prices, suggesting a potential correlation between the two.
Notably, the data shows a parallel rise up to 2020, which coin-
cides with the COVID-19 pandemic's start—a time of signi-
cant economic upheaval that probably had an impact on both
the housing market and homelessness rates. Post-2020, while
housing prices show signs of leveling o, the homeless count
continues its upward trajectory, hinting at additional factors in-
uencing homelessness beyond just housing costs.
The graph implies a strong relationship, with both the homeless
count and housing prices following a similar growth pattern.
However, as the scale in millions on the graph indicates, it is
crucial to take into account San Francisco's high housing costs.
Furthermore, despite the apparent correlation, causation cannot
be established from this graph alone. Further analysis is required
to account for confounding variables such as economic condi-
tions, policy changes, and demographic shifts that could also be
aecting these trends. This graph provides a clear starting point
for discussing the dynamics between housing aordability and
homelessness, but comprehensive statistical analysis is essential
to fully understanding the complexity of these social issues.
EDA: San Francisco Bay Area
Figure-3 visualizes stacked bar charts average demographic dis-
tribution of individuals accessing homeless response services
across the nine counties of the Bay Area from 2017 to 2022.
When interpreting these averages, it's crucial to recognize the
heterogeneity of the Bay Area; each county may have unique
socio-economic conditions, housing markets, and resources for
the homeless that the averaged data surely masks. Nonetheless,
we can still glean some insight.
From the data, it seems that the age groups '25-34' and '55-64'
consistently show higher counts across the years. This trend sug-
gests that these age groups are notably aected by homelessness
www.mkscienceset.com
Page No: 09 Sci Set J of Economics Res 2024
in the Bay Area, a pattern that mirrors observations made with-
in San Francisco. While this could indicate a regional issue, it
may also be inuenced by specic counties with higher counts
in these demographics. The racial demographics represented in
the chart—particularly the 'White', 'Black, African American,
or African', and 'Hispanic/Latinx' groups—are the most promi-
nent in terms of accessing homeless services. This prominence,
however, does not necessarily equate to the overall demographic
makeup of the homeless population but rather reects those who
are utilizing services.
A notable increase in the counts is seen post-2020, a trend
that might correlate with the socio-economic impacts of the
COVID-19 pandemic. However, the chart does not reveal the dif-
ferential impact on individual counties, which could vary based
on local responses and the severity of the pandemic's eects.
Gender distribution shows a higher count for males accessing
services. This observation aligns with broader trends but also
raises questions about how service utilization might dier by
gender across counties. Furthermore, the relatively lower counts
for 'Non-Singular Gender', 'Questioning Gender', and 'Transgen-
der' individuals suggest potential barriers to service access or
data collection challenges for these groups.
The scatter plot in Figure-4 illustrates the relationship between
housing prices and homelessness counts in the 9 various coun-
ties of the Bay Area. Each point represents a county's median
housing price plotted against its total homelessness count. The
graph shows a range of housing prices from approximately
$400,000 to over $2,000,000 (denoted as 0.4 to 2.0 on the x-axis
that is scaled by 1e6 for millions of dollars) and homelessness
counts up to 17,500.
From this visualization, we can observe that counties with higher
housing prices, such as San Francisco and Santa Clara, also have
higher counts of homelessness. Conversely, counties like Solano
and Sonoma, with relatively lower housing prices, show lower
homelessness counts. This pattern suggests a potential correla-
tion where counties with more expensive housing markets might
experience higher rates of homelessness. Notably, counties like
Marin, San Mateo, and San Francisco appear as outliers, deviat-
ing from the more linear trend observed among the other coun-
ties. These outliers have higher homelessness counts relative to
their housing prices compared to counties such as Santa Clara
and Alameda, which align more closely with the linear trend
formed by the 6 other counties.
The deviation of Marin, San Mateo, and San Francisco from the
linear pattern could imply that factors other than housing prices
are signicantly inuencing homelessness rates in these coun-
ties. For instance, these areas might have higher rental market
pressures, a lack of aordable housing inventory, or more pro-
nounced income inequality, which can exacerbate homelessness
irrespective of the average housing price. Additionally, the pres-
ence of robust homeless services in these counties might lead
to higher reported counts due to more comprehensive data col-
lection. At these times, it’s important to remember that this is a
count of unique individuals who’ve received homeless response
services.
EDA: California
The line graph illustrated in Figure-5 displays the total home-
less count and mean housing prices in California from 2017 to
2022, revealing a simultaneous increase in both metrics over
the years. The trend lines suggest a correlation between rising
housing prices and an increase in homelessness across the state.
However, while the trend lines run parallel for the initial years,
there is a noticeable convergence starting in 2020, potentially
indicating that the rate of increase in homelessness is outpacing
that of housing prices or that other factors intensied during this
period, such as the economic impact of the COVID-19 pandem-
ic. This observation suggests that while housing price escalation
may be a contributing factor to rising homelessness, it is likely
not the sole driver, and other economic or social factors may also
be inuencing this trend.
Furthermore, Figure-6 presents a Scatterplot of Prices and
Homelessness counts by County with Los Angeles.
The scatter plot in Figure-6 above displays the relationship be-
tween housing prices and homelessness counts by county in Cal-
ifornia. Immediately, we can see that Los Angeles is a signicant
outlier, so we’ve also remade the plot without Los Angeles. Fo-
cusing on the second graph, which excludes Los Angeles, allows
for a clearer examination of patterns across other counties, in-
cluding those in the Bay Area. Figure-7
With Los Angeles removed, we observe that Bay Area counties
like San Francisco, Alameda, and San Mateo do not exhibit a
simple linear relationship between housing prices and homeless-
ness. While these counties have relatively high housing prices,
their homelessness counts vary, with some not following the
expected trend of higher prices correlating with higher home-
lessness counts. This suggests that in the Bay Area, other factors
are inuencing homelessness beyond housing costs alone. For
instance, San Francisco, despite its high housing prices, does
not have the highest homelessness count as one might expect if
housing costs were the sole factor.
Other counties with lower housing prices, such as those in the
Central Valley and more rural areas of the state, also show a
wide range of homelessness counts, indicating that lower hous-
ing costs do not automatically equate to lower rates of homeless-
ness. This further suggests that additional variables like employ-
ment rates, availability of social services, and local economic
conditions play signicant roles in inuencing homelessness.
Furthermore, the spread of homelessness counts among coun-
ties with similar housing prices on the second graph indicates a
complex interaction of factors at the county level. For instance,
counties with similar housing prices have vastly dierent home-
lessness counts, suggesting that each county's approach to hous-
ing, support services, and economic opportunities can greatly
aect the number of homeless individuals.
Data Analysis
As illustrated in the following tables our data has been analyzed
utilizing the Regression Analysis Methodology, for San Francis-
co and Bay Area Counties as well throughout State of California.
www.mkscienceset.com
Page No: 10 Sci Set J of Economics Res 2024
Regression Analysis: San Francisco
The analysis conducted in Table-3 is driven by three variations
of Ordinary Least Squares (OLS) regression models which were
applied to a dataset representing San Francisco's housing market
and demographic variables. The rst model, the original OLS
regression, and the second, a pruned version removing less sig-
nicant variables, both achieved perfect R-squared and adjusted
R-squared values of 1.0. Typically, such scores would indicate
that the model explains 100% of the variability in the target
variable (housing prices). However, in this context, these per-
fect scores are highly indicative of overtting. This suspicion of
overtting is primarily due to the extremely small sample size of
the dataset, which consisted of only 6 rows. In such cases, the
model tends to learn the noise and specic details of the training
data to an extent that it perfectly predicts the outcome but fails
to generalize to new, unseen data.
Table 3: San Francisco Regression Analysis
  
Original 1.0 1.0
Pruned (p<0.05) 1.0 1.0
PCA 0.0 0.0
On the other hand, the third model, which incorporated Prin-
cipal Component Analysis (PCA) before regression, yielded
an R-squared and adjusted R-squared of 0.0. This drastic shift
suggests that the PCA transformation, in this case, removed or
altered the features' predictive power. This outcome could be
due to the dimensionality reduction process in PCA, which can
sometimes lead to the loss of signicant information, particular-
ly in a small dataset.
The stark contrast between the perfect scores in the rst two
models and the complete lack of explanatory power in the PCA-
based model underscores the challenges of working with very
small datasets. It highlights the importance of having a su-
ciently large and representative sample to build robust predictive
models. The results from these models are more reective of the
limitations of the dataset, and rather require us to rely on a larger
dataset, perhaps one that collects more data in intervals between
the years. Consequently, further exploration of the results in the
Bay Area counties and greater California area are necessary.
Regression Analysis: Bay Area Counties
Table-4 below is the summary of results for an OLS Regression
conducted across all the Bay Area counties and pruned three
times to lter for variables with high p values until we landed
on these variables.
The model's constant term has an estimated value of approxi-
mately 830,900, with a highly signicant p-value, establishing
a substantial baseline for housing prices in the absence of other
variables. This points to other inuential factors, beyond those
captured in the model, contributing to this baseline value in the
Bay Area's housing market. The age demographic '35-44' shows
a negative coecient of -621, implying that an increase in the
homeless count within this age group is associated with a slight
decrease in housing prices.
However, the p-value of 0.055 is just above the conventional
threshold for statistical signicance, which suggests that this re-
sult should be interpreted with caution.
Table 4: Summary Table for Pruned OLS Model of Demo-
graphic Groups vs Prices for Bay Area
 
R-squared 0.717
Adjusted R-squared 0.694
Demographic Group Coef. P>|t|
const 8.309e+05 0.000
Age:35-44 -621.0004 0.055
Age: Unknown 4428.4561 0.000
Race: Multiple Races -3898.9566 0.000
Gender: Male 584.8694 0.000
PCA 0.0 0.0
On the other hand, the 'Unknown' age category shows a strong
positive correlation with housing prices, denoted by a coecient
of 4428.4561 and a p-value indicating high statistical signi-
cance. The 'Multiple Races' demographic exhibits a signicant
negative correlation, with a coecient of -3898.9566. The 'Male'
gender group is positively associated with housing prices, with a
coecient of 584.8694, again conrmed by a signicant p-val-
ue. The model's R-squared value at 0.717 suggests a substantial
explanatory power, accounting for over 71% of the variation
in housing prices with the variables considered. The Adjusted
R-squared, slightly lower at 0.694, accounts for the number of
predictors, indicating that the model is not unduly complicated
by unnecessary variables.
Regression Analysis: California
Table-5, is illustrates a summary table for Pruned OLS Model of
Demographics Groups vs. Prices for California
Table 5: Summary Table for Pruned OLS Model of Demo-
graphic Groups vs Prices for California
 
R-squared 0.497
Adjusted R-squared 0.489
Demographic Group Coef. P>|t|
Variable Estimate p-value
const 501600.0000 0.000
Age:45-54 -386.9682 0.000
Age:65+ 744.3669 0.000
Race: Native Hawaiian or
Pacic Islander
3227.2835 0.000
PCA 0.0 0.0
The regression model employed to explore the relationship be-
tween various demographic groups and homelessness rates in
California oers insightful ndings. The model’s R-squared
value stands at 0.497, indicating that about 49.7% of the vari-
ability in homelessness rates is explainable by the demograph-
ic variables included. The adjusted R-squared, at 0.489, further
www.mkscienceset.com
Page No: 11 Sci Set J of Economics Res 2024
underscores the model's robustness, adjusting for the number of
predictors and arming a good t considering the degrees of
freedom.
In terms of the model’s coecients and their signicance, each
demographic group's coecient is accompanied by a p-value,
providing a measure of statistical signicance. The constant
term, or the intercept, is set at 501,600, representing the esti-
mated homelessness rate when all other variable values are zero.
This intercept is statistically signicant, as evidenced by a p-val-
ue of less than 0.001, suggesting a meaningful baseline for the
model.
For the age group '45-54', the model reveals a coecient of
-386.9682, pointing to an association with a decrease in home-
lessness rates. This negative coecient is supported by a highly
signicant p-value, indicating a robust relationship. In contrast,
for the '65+' age group, the coecient is 744.3669, suggesting
an increase in homelessness rates associated with this demo-
graphic. This positive association is also statistically signicant,
as reected in its p-value. Another notable nding is in the ra-
cial category 'Native Hawaiian or Pacic Islander', where the
coecient is a substantial 3,227.2835. This gure indicates a
signicant increase in homelessness rates for individuals in this
demographic group, and the p-value being highly signicant fur-
ther validates this association.
Conclusion
This study set out to explore the relationship between home-
lessness levels and housing prices across the Bay Area. We lev-
eraged government datasets on homelessness and realtor-pro-
vided median single-home family housing prices to delve into
the socio-economic fabric of the Bay Area, San Francisco, and
California at large. Our ndings suggest a nuanced relationship
between demographic variables and housing prices that was, on
average, positively correlated.
In San Francisco, the perfect R-squared and adjusted R-squared
values obtained from our OLS regression models pointed to-
wards overtting, likely due to the limited size of the dataset.
This limitation was further highlighted in the PCA-based model,
which showed no explanatory power, underscoring the challeng-
es of working with small datasets and the potential loss of cru-
cial information in dimensionality reduction processes.
As such, we knew we’d nd more insight if we expanded our
analysis to the Bay Area. Across the 9 counties, we found that
certain demographic groups, such as '18-24' and 'Native Ha-
waiian or Pacic Islander', showed a positive correlation with
housing prices, potentially indicative of gentrication or demo-
graphic shifts in specic neighborhoods. Conversely, groups like
'Asian or Asian American' and 'American Indian, Alaska Native,
or Indigenous' were negatively correlated with housing prices,
hinting at socio-economic challenges or regions with more af-
fordable housing.
Expanding further to the California state level, the model's
R-squared value of 0.717 provided a robust explanation for a
substantial portion of the variance in housing prices. However,
the adjusted R-squared of 0.694 indicated that there are still sig-
nicant factors at play that are not captured by our model. The
statewide analysis painted a diverse demographic landscape,
with dierent age and race groups showing both negative and
positive associations with housing prices, underscoring the mul-
tifaceted nature of homelessness and housing economics.
The study, while extensive in its exploration, is ultimately great-
ly limited due to the nature of the datasets and the inherent chal-
lenges in capturing the full spectrum of factors inuencing both
homelessness and housing markets. It is clear that a variety of
interrelated factors, including but not limited to economic condi-
tions, demographic changes, and local policies, have an impact
on housing prices and homelessness levels.
In closing, our ndings accentuate the intricacy of the relation-
ship between the housing crisis and market dynamics. They
underscore the critical need for nuanced and data-informed ap-
proaches to eectively address and understand the multifaceted
issues faced by urban areas such as the San Francisco Bay Area
and the broader California region. Our paper demonstrates that
while there is a discernible correlation between certain demo-
graphic groups and housing prices, the true nature of this cor-
relation is complex and demands thoughtful analysis to guide
the development of eective and equitable housing policies.
References
1. California Association of Realtors (2023) Historical Hous-
ing Data. Median Prices of Existing Single-Family Homes.
2. California Interagency Council on Homelessness (2023)
People receiving homeless response services by age, race,
ethnicity, and gender. People Receiving Homeless Response
Services by Age, Race, Ethnicity, and Gender (Datasets).
3. U.S. Department of Housing and Urban Development.
(2023) Continuum of Care Program. CONTINUUM OF
CARE PROGRAM.
Copyright: ©2024 Bahman Zohuri, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted u se, dist ribution, and reproduction in any medium, provided the original author and source are credited.
ResearchGate has not been able to resolve any citations for this publication.
People receiving homeless response services by age, race, ethnicity, and gender. People Receiving Homeless Response Services by Age
California Interagency Council on Homelessness (2023) People receiving homeless response services by age, race, ethnicity, and gender. People Receiving Homeless Response Services by Age, Race, Ethnicity, and Gender (Datasets).