Exploratory analisis of the city of Santo Domingo, we will study wich are the most common stablishments on all the different boroughs.

what percentage of all the city boroughs are food joints?
How similar are the boroughs, based on the different types of business located in them.

Data:

- Foursquare places api

Based on the borough location we will fetch all the different stablishments from the city

- Administrative delimitations of the city

Provided by, Ayuntamiento Santo Domingo

Methodology:

We generate a table with all the different borough locations in the city, we have a total of 71 boroughs.

dataframe.head() reference

Using folium to visualize all the locations

Then super imposing all the different venues, we get using the foursquare API, into the same map:

Using the machine learning algorigthm Kmeans, we will clasify all the different boroughs into clusters of similarity, based on the venues they have.

We use K = 7 for the optimal result.


Results:

  • We have a total of 1236 venues.
  • Food joints total 524
  • Food joints are a 42.39% of all venues in Santo Domingo avaiable through the foursquare api
Labeled representation of the Kmeans cluster of all the venues in the city

We can appreciate that besides some outliers the variety of business accross the city is pretty consistent, due to the overwhelming ammount of food joints, compared to other businesses.


Discusion:

It is up for debate, and more research, if the amount of food joint venues in the city, that we locate through the foursquare api, is an actual representation of the real business diversity of the city, or we are in a case were food joints are just more likely to be listed by the owners in the foursquare database.


Conclusion:

Due to a possible bias in the data, we cant guarantee that the cluster of the bouroughs is truly a representation of reality, for more acurate results, we need more data sources to contrast with the foursquare API.

But in the case this bias in the data were not to be true, according to the fourSquare API, we could state that the venue variation across boroughs, is minimal.

Reference:

JupyterNotebook for this blog post