There are two major competitors in the Convenience Store industry in Monterrey, Mexico: “OXXO”” and “7 Eleven”. People usually don’t realize how big is the footprint they have in the metropolitan area. Taking advantage of inegiR
and leaflet
R packages I decided to do a quick visual exercise to learn who is where
inegiR
packageA couple of months ago I read about inegiR
- a R package that helps you interact with the APIs from the official statistics agency of Mexico INEGI within R. Its author @eflores89 talks about the package in r-bloggers in more detail.
It’s important to note that you can access all the information through two APIs - so you are going to need two different tokens. One will give you access to the Indexes while the other one will allow you to pull data from the DENUE database (National Statistic Business Directory… or something like that)
You can get your tokens through these links:
Okay, assuming you already got your tokens, the next step will be to jump into R and load the library…
… and load the tokens. It’s not that I don’t like to share, but you would have to input your own tokens to make this work.
Once you have the library up and running, and your tokens values stored in some variable then we can start pulling data from INEGI. There are many data sets you can obtain, but for testing purposes I am going to download the Mexican Peso vs USD Exchange rate for the past 20 or 30 years.
Now that the data is loaded we can get “fancy”” and load it into some htmlwidget package. Using the dygraphs
package we can make a time-series chart.
To obtain information from the National Statistic Business Directory database (or whatever the exact translation is) we need to use the DENUE API. We already loaded the token a couple of lines above, and now we are going to start using it to download the Convenience Store data.
The formula we are going to use (denue_inegi
) requires a specific point and the radius (in meters) that you want to search. The radius has a limit of 5Km and that could be a problem, but later we are going to talk on ways to bypass it. For this exercise we are going to pick the Lat/Lon for the ITESM University and we are going to pull any business that is classified as “comercio al por menor”
Once we have the raw business data, we’re going to filter to get only stores that have “OXXO”, “ELEVEN” or “EXTRA” in their formal or informal names (Nombre, Razon).
Now we have reduced our data frame from 9049 down to 360. Using the leafet
package we can now see where are all the stores located 5000 mts around the ITESM.
As we mentioned before, the API only allows to pull data from up to 5Km around your initial coordinates. In order to cover the whole metropolitan area we would need to pick lat/lon points around and loop the points in the denue_inegi
formula.
This is a Packing Problem but for now, we don’t want to take it too serious.
This is why I created a function grid
that split any rectangular area in points separated by 0.7 degrees in each direction. According to some guys in the internet $1º=~100Kms$. So by trial and error 0.07º will do the work.
The function overlaps the coverage area creating duplicate values. This is why we use the unique
function - within grid
- to remove duplication.
Here is a look of all the points we’ll loop the denue_inegi
function to cover Monterrey and all other close by cities.
By looping denue_inegi
through our self-defined grid
function we can now see the whole footprint of the major players in the Convenience Store Industry in the Monterrey Metropolitan Area.
I think this is a good time to wrap this up, but the are many potential uses of this script. You could group the data by postal code to understand any variance, or pair that grouping with income or average age in the region to understand the market.
Now, if you have access to POS data it will be a total different ballgame.
Until next time…
You can find the script to replicate this post here
Subscribe to this blog via RSS.
R 2
Plotly 1
Knime 1
Leaflet 1
Inegir 1
Retail 2
Español 2
Rtodolist (3) R (2) Web scraping (1) Plotly (1) Knime (1) Machine learning (1) Random forest (1) Leaflet (1) Inegir (1) Retail (2) Español (2)