Back in November I saw a post using ggplot2 to visualized MLS Player Salaries. I thought I would give it a try and replicate the analysis for the Mexican Soccer League using the newly open sourced Plotly but first I had to figure out where to get the data…
###The tweet
This tweet links to @tejaykodali post:
Visualizing MLS Player Salaries with ggplot2 https://t.co/Vh5I3wWB4M #rtodolist for LigaMX
— Arturo Cárdenas (@arturocm) December 3, 2015
###ETL process
First we need to get the salary data. I didn’t have a PDF file with all the salaries (just like the original post) so I had to look for ways to scrap the information from Transfermarkt.
# Set the initial url to pull the data
url.t <- "http://www.transfermarkt.es"
# Mexican League link
url.mx <- "/liga-mx-clausura/marktwerteverein/wettbewerb/MEX1/" #mexico
Using the rvest
package I was able to loop through each team from the league and obtain compile a single table with all the required data.
transfermarkt <- function (x = dara.url){
# Create data.table for team loops
team.value <- data.table()
for (j in 1:length(x)){
# Initial pull with all the liga MX names-url information
all.ligamx <- read_html(paste0(url.t,x[j])) %>%
html_nodes("table.items") %>%
.[1]
# Get teams url
team.url <- html_nodes(all.ligamx, "a") %>%
html_attr("href")
team.url <- unique(team.url[team.url %like% "startseite"])
# Get teams names
names <- html_table(all.ligamx, fill = TRUE) %>%
as.data.frame() %>%
.[-1,"Club"]
# Create data.frame with url-names
db <- cbind.data.frame(team.url, names)
# Get each player´s cost by team
for (i in 1:nrow(db)){
temp <- read_html(paste0(url.t,db[i,1])) %>%
html_nodes("table.items") %>%
.[1] %>%
html_table(fill = TRUE) %>%
as.data.frame()
temp$team <- db[i,2]
team.value <- rbind.data.frame(team.value, temp)
}
}
# Update values
headers <- c("number","x1","x2","x3","position","name","age","x7","value","team")
names(team.value) <- headers
team.value <- subset(team.value,
select = c("number","position","name","age","value","team"))
return(team.value)
}
team.value <- transfermarkt(url.mx)
The raw data includes NAs
and currency values formatted as string. Using the self defined cleansing
formula the values are:
draft <- cleansing(team.value)
To control the colors assigned in the plot we are defining the color.column
formula to take advantage of the RColorBrewer
package and pre-define the color scale to be used in the final plot.
final.draft <- color.column(draft, team_rank = 0)
Finally we are defining the parameters to be used in the Plotly bar chart!
league.plot <- function(x = data, Plot.Title = "League"){
p <- plot_ly(data = x, y = team, x = value.adj,
marker = list(color = color.set,
line= list(color='white', width=.5 )),
opacity = .9, group = total, type = "bar",
showlegend = FALSE, bargap = 0.5, text = name,
orientation = "h")
f1 <- list(family = "Arial, sans-serif", size = 18, color = "lightgrey")
f2 <- list(family = "Old Standard TT, serif", size = 12, color = "black")
x <- list(title = "USD Million", titlefont = f1, showticklabels = TRUE,
tickfont = f2, exponentformat = "e", showexponent = "All")
y <- list(title = "TEAM", titlefont = f1, showticklabels = TRUE,
tickfont = f2, exponentformat = "e", showexponent = "All")
m = list(l = 150, r = 20, b = 50, t = 50, pad = 4)
layout(p, title=Plot.Title, hovermode = "closest", barmode = "stack",
xaxis = x, yaxis = y, margin = m)
}
final.draft <- arrange(final.draft, (total), (value.adj))
league.plot(final.draft, "Liga MX Teams")
You can find a larger version of this plot here
###Going the extra mile
Now, if we want to replicate this for all the Teams in the continent, we just need to create a vector containing all the URLs linking to all the leagues in America and pass it through the same formulas.
# Links for all the soccer leagues from america available
url.us <- "/major-league-soccer/marktwerteverein/wettbewerb/MLS1" #US
url.br <- "/campeonato-brasileiro-serie-a/marktwerteverein/wettbewerb/BRA1" #Brazil
url.co <- "/liga-postobon-i/marktwerteverein/wettbewerb/COLP" #Colombia
url.ec <- "/serie-a-primera-etapa/marktwerteverein/wettbewerb/ECPE" #Ecuador
url.ur <- "/primera-division-apertura/marktwerteverein/wettbewerb/URU1" #Uruguay
url.ch <- "/primera-division-apertura/marktwerteverein/wettbewerb/CHLA" #Chile
url.cr <- "/primera-division-invierno/marktwerteverein/wettbewerb/CRPD" #C Rica
url.pe <- "/torneo-descentralizado-clausura-entscheidungsspiel/marktwerteverein/wettbewerb/PERD" #Peru
url.ar <- "/primera-division/marktwerteverein/wettbewerb/AR1N" #Argentina
# Add all the american leagues
url.all<- c(url.us, url.br, url.ar, url.co,
url.ec, url.ur, url.ch, url.mx)
# Here is the link in case you want to add "La Liga" from Spain
url.es <- "/primera-division/marktwerteverein/wettbewerb/ES1"
For visualization purposes we are limiting the list to the 20 most expensive soccer teams in the continent.
team.value <- transfermarkt(url.all)
draft <- cleansing(team.value)
final.draft <- color.column(draft, team_rank = 20)
final.draft <- arrange(final.draft, (total), (value.adj))
league.plot(final.draft, "Most Expensive Soccer teams from America")
You can find a larger version of this plot here
And finally - since we already have the data loaded - here is the list of the soccer players with the highest salaries in the continent:
draft <- cleansing(team.value)
final.draft <- color.column(draft)
final.table <- select(final.draft, No = number, name, position, team, USD_MM = value.adj)
final.table <- arrange(final.table, desc(USD_MM))
knitr::kable(final.table[1:10,], digits = 2, row.names = TRUE, align = "c", format = "markdown", padding = 40)
No | name | position | team | USD_MM | |
---|---|---|---|---|---|
1 | 10 | Carlos Tévez | Media punta | Club Atlético Boca Juniors | 16.35 |
2 | 10 | Gabriel | Extremo izquierdo | Santos FC | 13.08 |
3 | - | Maicon | Defensa central | São Paulo Futebol Clube | 10.90 |
4 | - | Gerson | Medio centro ofensivo | Fluminense Football Club | 10.90 |
5 | 10 | Sebastian Giovinco | Media punta | Toronto FC | 10.36 |
6 | 10 | Ganso | Medio centro ofensivo | São Paulo Futebol Clube | 9.81 |
7 | 20 | Lucas Lima | Medio centro ofensivo | Santos FC | 9.81 |
8 | 10 | Gonzalo Martínez | Medio centro ofensivo | Club Atlético River Plate | 9.81 |
9 | 10 | Giovani dos Santos | Media punta | Los Angeles Galaxy | 8.72 |
10 | 7 | Elias | Medio centro | Sport Club Corinthians Paulista | 8.72 |
You can find the entire rmarkdown document here
Subscribe to this blog via RSS.
R 2
Plotly 1
Knime 1
Leaflet 1
Inegir 1
Retail 2
Español 2
Rtodolist (3) R (2) Web scraping (1) Plotly (1) Knime (1) Machine learning (1) Random forest (1) Leaflet (1) Inegir (1) Retail (2) Español (2)