top of page
Search
  • Writer's pictureTim Beecher

Exploring Zillow's Real Estate Data

One of the things on many people's minds is where is the next best place to invest in real estate. Years ago in the U.S, it was a relatively easy process to qualify for a mortgage loan and thus many people began to take advantage of that and bought investment properties. After the crash in 2008 it left several investors in the red because they could not get what they expected as a return.


A few experts had predicted that in certain places would be hotspots for investing that had low relative housing prices and high demand. Thus these areas appeared to be attractive places to invest. In the 2010s, metro areas in Texas were amongst these favorable destinations for investors because of their low prices and demand for housing. Housing was in demand because job openings were bringing people to the state. I myself was actually one of those people as my family moved to Texas when I was 8 because my Dad received a job offer.


Using data from Zillow, one of the most well known sources of real estate information, I wanted to see how rental prices in Texas compared to those of the rest of the U.S.


The first thing I did was conduct EDA (Exploratory Data Analysis) on the dataset.


I try to follow 4 steps as my recipe for EDA

  1. Import, Look and Understand the Data: df = pd.read_csv()& df.head()

  2. Isolate and See the data: Uni-Variant Analysis

  3. Describe the data: Multi-Variant Visualization

  4. Find patterns and/or outliers

I found the dataset through Kaggle and after opening a Python notebook and importing it, it looked like this



As you can see the data is organized by city according to population and contains their rental price per square foot from December 2010 all the way January 2017. One of the first things I did was use df.describe() to find standard statistical measurements such as mean and standard deviation.


The second step in my EDA recipe was to isolate and see the data. I used the groupby() function to show me the 694 Texan cities and their rent prices per sq ft. Unsurprisingly, the highest price per sqft areas are in the big cities like Dallas and Houston, while the lowest price per sqft areas was consistently in Wheeler, TX as part of the Texas panhandle.




The next step in my EDA analysis was to describe the data using a visualization that shows multiple variables. I thought a good way to show the overall picture of these rental prices in different states through a box plot.


Here's the Boxplot of rental prices per sqft organized by state in November 2010, the first available month in the data:



Visualizations give a good indication if there is anything off with the data through the patterns and outliers it shows. Analyzing these is actually the fourth and final step of my exploratory data analysis. What jumps out to me is that California and Florida both have some really high data points over anyone else. That makes sense considering the have big cities in desirable areas like LA and Miami. The same should apply for New York City in New York State however if you look at their box plot, there's another out of the ordinary there. So I went back to the data to see investigate. Turns out there was no available data on New York City rentals in November 2010. That would explain why the was no high outlier. I made box plots of the dataset in New York City of individual months until I saw the outliers from NYC which was in December 2011. Here is what the Boxplot of that month looks like:




As you can see the rental prices per sq ft now have values closer to $3 as opposed to around $1 where it had been before the available NYC data.


If you look at the information for TX you will see the average rental price per sq ft is about $0.75 with a maximum of a little over $2 as of December 2011. One of the more affordable states accounting for population. Now looking at the January 2017 data:

The average rental price in TX is still around that $0.75 with the maximum closer to $3 rental price per square foot. I created an additional Boxplot of the two dates to show what how the data in TX changed during that time.



For the most part rents in the state of TX stayed relatively consistent over time only slightly rising.


Compare this to a state like Florida in that same time:




Rental prices here notably increased especially in big cities where one of them the rental price per square foot is over $6. A 500 square foot studio apartment there would cost you over $3,000 a month!


Real Estate contains a lot of data like this that is waiting to be explored. In the future I would like to focus more on this data and then see if it can bring to light future trends that we should all pay attention to. Performing EDA is an essential first step to finding this valuable information.




11 views0 comments

Comentários


Post: Blog2_Post
bottom of page