Back to Data Consulting

Data for Good Proposal

Evaluating and Advising Public Policy Directives for the California Housing Crisis



Project Description

On a typical evening, students at UC Berkeley amble down Telegraph Ave and pass regulars sitting at the sides of storefronts, holding out cups for change. Homelessness is an open phenomenon in Berkeley, and a daily encounter for Cal students. While most students are not on the streets themselves, securing housing for the next year amidst rising rent prices and limited supply is an inevitable stressor for many at the turn of the spring semester. In reality, not all students are able to find places they can afford — turning homelessness into a much more personal problem.

When the dorms closed during her first winter break, undergraduate student Taylor Harvey found herself couch surfing inside UC Berkeley libraries, hiding in bathrooms before closing time until only she and the books remained. Once she was rejected from on-campus housing in her junior year, Harvey’s financial aid ceased to cover her housing, leaving her officially homeless.1

The average monthly rent of a one-bedroom apartment in Berkeley is $3,3502 , which is only affordable at an income of around $134,000. The median household income in Berkeley is $70,3933 — clearly not enough to sustain the ever-increasing rent. However, Berkeley’s residents are not the only ones who have had firsthand experiences with housing insecurity. The state of California is in the wake of an increasingly dire housing crisis, spelling economic uncertainty for its lower- and middle-income citizens.

Since the founding of its Data Consulting committee three years ago, SAAS has used statistics to combat social issues. Our Data Consulting teams work with organizations such as UC Berkeley's Office of Equity & Inclusion, who enlisted us to evaluate and advise UC Berkeley's admissions process to allow for greater diversity and fairness, and Environmental Progress (a local energy policy advocacy non-profit), quantifying the environmental/health effects of the closure and potential closure of California's San Onofre Nuclear Plant and Diablo Canyon Nuclear Plant respectively. Our work also strives to create useful tools for individuals less familiar with statistics to explore the data for themselves, as demonstrated by our web applet created last semester for our population modeling project that allows users to interact and draw their own predictions for populations of different countries4. By performing advanced statistical analysis in fields relevant to the UC Berkeley student body and making our work accessible to the public, we leverage our knowledge of machine learning and data science to make positive impacts both on-campus and off.

An initial survey of the current housing situation in California immediately reveals troubling trends. According to Zillow data (plotted right), the median house sales prices in California have deviated sharply from the national average in recent years, from $95,000 more than the national average in 2012 to $234,000 more last November. Another useful metric is the housing affordability index (HAI), which measures how affordable the typical house is for a median household income. The HAI in California has not exceeded 100 (bare minimum affordability) since 2013, and continues to drop yearly (plotted below). While the national average HAI coincides with this drop, the national HAI at the end of 2015 was over 90 points higher than that of even the best cases in California coastal cities.

State response to the housing crisis has been slow, and California legislators are now playing catch-up. In September 2017, Governor Jerry Brown signed 15 new bills all aimed at solving the crisis. Senate Bill (SB) 2 and 3 raise funds for subsidizing low-income housing development. The package also includes SB 35, 540, and Assembly Bill (AB) 73, which provide subsidies to cities that weaken zoning regulations for low-income housing projects. AB 1505, 1521, and 571 push developers to build and preserve low-income housing by strengthening rent-control laws and creating quotas at which developers must set aside projects towards low-income housing. AB 879, 1397, and SB 166 force cities to plan additional housing projects and identify possible housing sites. The rest of the bills are aimed at penalizing cities that refuse to approve housing that meets the zoning regulations.

Whether this flurry of legislation will actually reduce the costs of housing is questionable. The LA Times predicts that revenue raised from the housing bond and real-estate tax will fall billions of dollars short of what’s needed to properly address the crisis5. Also, while the new legislation provides incentives for cities to weaken zoning regulations, there is no requirement for cities to do so. High rises are viewed as unattractive in suburban neighborhoods, and cities might not flirt with dampening housing markets for the subsidy. According to state and third-party estimates, even with the new legislation, housing development will struggle to keep pace with population growth.

In spite of governmental efforts, the housing disparity situation continues to worsen. The graph on the right demonstrates the comparison between estimated housing units necessary to support the California population with the actual amount built between 1980 and 20106 . Throughout the state, the amount of housing built pales in comparison with actual needs. In fact, California needs to double the number of homes built each year to keep prices from rising faster than the national average, according to the Legislative Analyst's Office.

The housing disparity has spiraled out of control to the point where “home prices in California are twice the national average, and 70 percent can't afford to buy a home.” More specifically, “a household would need to earn $115,000 a year to reasonably afford a home at [the median price of housing in California], assuming a 20 percent down payment. Yet, two thirds of Californians earns less than $80,000, according to the U.S. Census Bureau.”7

Efforts to find a solution have been so unfruitful that Gov. Brown dismisses the issue as one of supply and demand — that is, to an extent, out of the government’s control: “Everybody moves into San Francisco, New York, London, Beijing, Tokyo. Drives up the prices. Can we control that? Only within limits.” Brown also presses the difficulty of dealing with conflicting interest groups and stakeholders, continuing, “You can’t change CEQA. The unions won’t let you because they use it as a hammer to get project labor agreements. The environmentalists like it because it’s the people’s document that you have to disclose all the impacts.”8 Amidst such conflicting voices, the lack of clear, data-driven insight into which policies will ameliorate the situation renders the state office unable to tackle the issue head on. This is where SAAS comes in.

In the Data for Good competition, SAAS will tackle the housing crisis on three parallelizable fronts:
1. Quantify and predict the severity of California’s housing crisis using the Housing Affordability Index
2. Identify public policy areas and shifts that would solve or alleviate negative externalities using statistical analysis, feature engineering, and data visualizations of historical data
3. Engineer a web application tool for California citizens to identify locations of affordable housing, taking into account e.g. work location, salary, and public transportation routes

First, we plan to use the Housing Affordability Index (HAI) to quantify the housing situation in California. We plan to find key features in economy and legislature directives that affect housing affordability to create models that predict future values of the HAI for the entire United States. Then, armed with this model, we will directly measure the impact of specific features on the housing situation in California. Using ARIMA and TBATS time series models, and Recurrent Neural Networks (RNNs) with Long Short Term Memory (LSTMs), we will be able to predict the HAI for California in future years at our current projection as well as compare against possible policy or economy changes in the future. This will allow us to consider the future of the current housing situation, and also evaluate the future payoffs of focusing public good efforts in various areas. We hope to use our analysis to create an evaluation tool for potential public policy directives, quantifying their effects in terms of affordable housing for California citizens.

Second, since the model for HAI depends heavily on the features chosen, it is crucial to conduct time series analysis with accurate descriptive statistics. As such, we will also have a team focusing on public policy areas, to identify which legislation affects the housing situation. By using statistical analyses and feature selection techniques such as Canonical Correlation Analysis (CCA), we will identify the key components associated with the housing crisis in California. Then, features will be fed into our time series models to quantify and evaluate various legislation. These results will shed light on historically-proven policies for California legislature offices to make use of in developing their own solutions.

The third and final objective of our research is to interface directly with Californian citizens to help alleviate effects felt personally from the housing crisis. We will create a web application tool for citizens to identify affordable housing, taking into consideration both geospatial and user-specific data. Rather than just selecting the housing within a certain affordability cap, our Tableau web app would allow Californians to input fields such as work location and salary to attempt to find housing that balances cost and proximity. Additionally, the user would be able to input preferences on modes of transportation and commute length. Taking into account all of these variables, we will use Google Maps API to recommend housing locations.

To search for and evaluate historically beneficial housing policy directives, especially given the complexity of LSTMs and other machine learning models, requires sufficiently large high-resolution datasets. In this proposal, we utilize data from the Census Bureau, which contains median household income and median home rent/sale price statistics; the Bureau of Labor Statistics to consider the trends in Consumer Price Index for housing; Zillow, for their Monthly Mortgage Affordability, Price to Income, and Rent Affordability data; and Housing Affordability Index data from the National Association of Realtors (NAR). However, these data sources lack the high sample size required to train complex machine learning architectures, and more features are necessary to begin multivariate analysis to advise policy. Organizations like the NAR, California Association of Realtors, and local REALTOR® branches require either membership or steep fees before disclosing their high-resolution data, with some datasets exceeding $700 in price9. Though we have the statistical expertise to complete the objectives outlined above, we lack the data — and the funds to acquire it for ourselves.

Given the Center for Technology, Society, and Policy’s propensity for solving the most vital problems in our society, SAAS hopes to use seed funds to address an issue felt harshly by our peers. Investing some of the seed funding into Amazon Web Services, we will work with realtor organizations, using cloud computing technology and our experience with model evaluation and predictive algorithms to add fresh perspective to the housing crisis. By maintaining our standard of careful statistical analysis, we will conduct scientifically and ethically sound research on the features of the housing crisis, advising policy makers and ordinary Californians alike.


Project Timeline

HAI Modeling Public Policy Web Application
Project Goals Quantify and predict the severity of the housing crisis in California using HAI Identify public policy areas that would alleviate negative externalities using statistical analysis, feature engineering, and data visualizations Engineer a web application tool for California citizens to identify locations of affordable housing taking into account work location, salary, and public transportation
Feb 4th - Feb 17th Identify key features relevant to influencing HAI Identify public policy areas that are influential in the housing situation Conduct research on what of tools would be useful to those in need of housing
Feb 18th - Mar 3rd Create relevant visualizations and elementary models Determine public policy correlations with HAI using Canonical Correlation Analysis Create an interactive interface with California counties, visualizations for cost and affordability
Mar 4th - Mar 18th Predict trends in HAI using LSTMs and ARIMA/TBATs models, and calculate future values by using Team B’s feature correlations Use Google Maps API and Tableau with locations, salary, and transportation to suggest housing
Mar 19th - April Reconvene to synthesize results and create a well-polished paper and presentation, incorporating the web application

Semester

Spring 2018

Project Manager

Patrick Chao