Let’s take a look of Seattle AirBnB Open Data using Python

Which features are most related to homestay cost? How do seasonal costs change? What is different between superhost and regular host? Let’s get the answers by analyzing the data in python

Photo by MILKOVÍ on Unsplash


Seattle AirBnB Open Data describes the listing activity of AirBnB homestays in Seattle, WA until 2016. It is consists of 3 files

  • listings.csv — including full descriptions and average review score
  • calendar.csv — including listing id and the price and availability for that day
  • reviews.csv — including unique id for each reviewer and detailed comments

How do seasonal costs change?

Below graphs show the mean, quantile prices by months.

  1. Average prices are highest in summer and increase slightly in December. Perhaps summer holidays, year-end events, and Christmas were influential.
  2. Prices fluctuate mainly at high price points, and cheap rooms do not change even when the season changes.

Which features are most related to homestay cost?

By using `pandas.DataFrame.corr` can easily do correlation analysis. Let’s show the heatmap of correlation matrix.

  • Each score of review features has strong correlation.
  • Number of reviews and price has a little negative correlation. Expensive homestay are expected to no attract many people and have fewer reviews.
  • The number of bedrooms, the number of bathrooms, and the number of beds have a high correlation with each other and a high correlation with price. It’s natural to think that the more rooms, beds, and toilets you have, the more expensive it becomes. However, the number of people was the most influential among them (0.69).
  • Whether or not a homestay can be rented for a long time has no significant correlation with other features other than each other. The minimum and maximum stay dates are not significantly correlated with other features.

What is different between superhost and regular host?

  1. Superhosts almost slightly outperformed all score of reviews. However, there are far more reviews written. It could be because more customers are staying at accommodation on the superhost, or it could be because the superhost encourages customers to actively write reviews.
  2. Another difference is that on average the superhost has fewer rooms than the rest of the hosts.
  3. Even though the average number of beds, toilets, and number of people in a room operated by superhosts are slightly smaller There is a shorter minimum/maximum period for customers to stay, and more guests can be included. But the difference is not big.

Assuming that the higher the number of reviews, the more customers it attracts, the more likely a superhost will have other factors that can attract more customers. For example, they may have been running a homestay for a long time, or their location may be better.

AirBnB superhost do not requires long hosting period. Host can become a superhost even if hosting period is less than 12 months. Hosting for a long time doesn’t mean host will become a superhost.

It was not possible to identify nearby facilities with the given data. So I marked the host’s location on the map. Red is superhost, blue is regular host.

It can be seen that there is not a big difference in location.

(FIXED — My analysis on this point was performed incorrectly, So I wrote there is a lot of differences so modified the article..… really sorry… 😢 🙏🙏🙏🙏🙏🙏)


  1. Accommodation prices increase in summer and December and are lowest at the beginning of the year. However, the lower the price, the less the volatility.
  2. The size of the accommodation has a big impact on the price. Price and number of reviews have a weak negative correlation.
  3. The biggest distinction between the superhost and the rest is the number of reviews. The rating is slightly higher than the rest, but it is about the same level.

2020.12.8 ~ 2022.6.7 육군복무중 Serving in the South Korean Military Service