Visualizing Mexican Soccer Salaries with Plotly

Visualizing Mexican Soccer Salaries with Plotly

Jan 3, 2016. | By: Arturo Cardenas

Back in November I saw a post using ggplot2 to visualized MLS Player Salaries. I thought I would give it a try and replicate the analysis for the Mexican Soccer League using the newly open sourced Plotly but first I had to figure out where to get the data…

###The tweet

This tweet links to @tejaykodali post:

###ETL process

First we need to get the salary data. I didn’t have a PDF file with all the salaries (just like the original post) so I had to look for ways to scrap the information from Transfermarkt.

# Set the initial url to pull the data
url.t <- "http://www.transfermarkt.es"
# Mexican League link
url.mx <- "/liga-mx-clausura/marktwerteverein/wettbewerb/MEX1/" #mexico

Using the rvest package I was able to loop through each team from the league and obtain compile a single table with all the required data.

transfermarkt <- function (x = dara.url){
  # Create data.table for team loops
  team.value <- data.table()
  for (j in 1:length(x)){
    # Initial pull with all the liga MX names-url information
    all.ligamx <- read_html(paste0(url.t,x[j])) %>%
      html_nodes("table.items") %>%
      .[1]
    # Get teams url
    team.url <- html_nodes(all.ligamx, "a") %>%
      html_attr("href")
    team.url <- unique(team.url[team.url %like% "startseite"])
    # Get teams names
    names <- html_table(all.ligamx, fill = TRUE) %>%
      as.data.frame() %>%
      .[-1,"Club"]
    # Create data.frame with url-names
    db <- cbind.data.frame(team.url, names)

    # Get each player´s cost by team
    for (i in 1:nrow(db)){
      temp <- read_html(paste0(url.t,db[i,1])) %>%
        html_nodes("table.items") %>%
        .[1] %>%
        html_table(fill = TRUE) %>%
        as.data.frame()
      temp$team <- db[i,2]
      team.value <- rbind.data.frame(team.value, temp)
    }
  }
  # Update values
  headers <- c("number","x1","x2","x3","position","name","age","x7","value","team")
  names(team.value) <- headers
  team.value <- subset(team.value,
                       select = c("number","position","name","age","value","team"))
  return(team.value)
}
team.value <- transfermarkt(url.mx)

The raw data includes NAs and currency values formatted as string. Using the self defined cleansing formula the values are:

  1. Transformed to number values, and
  2. converted from euros to USD
draft <- cleansing(team.value)

To control the colors assigned in the plot we are defining the color.column formula to take advantage of the RColorBrewerpackage and pre-define the color scale to be used in the final plot.

final.draft <- color.column(draft, team_rank = 0)

Finally we are defining the parameters to be used in the Plotly bar chart!

league.plot <- function(x = data, Plot.Title = "League"){
  p <- plot_ly(data = x, y = team, x = value.adj,
               marker = list(color = color.set,
                             line= list(color='white', width=.5 )),
               opacity = .9, group = total, type = "bar",
               showlegend = FALSE, bargap = 0.5, text = name,
               orientation = "h")

  f1 <- list(family = "Arial, sans-serif", size = 18, color = "lightgrey")
  f2 <- list(family = "Old Standard TT, serif", size = 12, color = "black")
  x <- list(title = "USD Million", titlefont = f1, showticklabels = TRUE,
    tickfont = f2, exponentformat = "e", showexponent = "All")
  y <- list(title = "TEAM", titlefont = f1, showticklabels = TRUE,
    tickfont = f2, exponentformat = "e", showexponent = "All")
  m = list(l = 150, r = 20, b = 50, t = 50, pad = 4)
  layout(p, title=Plot.Title, hovermode = "closest", barmode = "stack", 
         xaxis = x, yaxis = y, margin = m)
}
final.draft <- arrange(final.draft, (total), (value.adj))
league.plot(final.draft, "Liga MX Teams")

You can find a larger version of this plot here

###Going the extra mile

Now, if we want to replicate this for all the Teams in the continent, we just need to create a vector containing all the URLs linking to all the leagues in America and pass it through the same formulas.

#  Links for all the soccer leagues from america available
url.us <- "/major-league-soccer/marktwerteverein/wettbewerb/MLS1" #US
url.br <- "/campeonato-brasileiro-serie-a/marktwerteverein/wettbewerb/BRA1" #Brazil
url.co <- "/liga-postobon-i/marktwerteverein/wettbewerb/COLP" #Colombia
url.ec <- "/serie-a-primera-etapa/marktwerteverein/wettbewerb/ECPE" #Ecuador
url.ur <- "/primera-division-apertura/marktwerteverein/wettbewerb/URU1" #Uruguay
url.ch <- "/primera-division-apertura/marktwerteverein/wettbewerb/CHLA" #Chile
url.cr <- "/primera-division-invierno/marktwerteverein/wettbewerb/CRPD" #C Rica
url.pe <- "/torneo-descentralizado-clausura-entscheidungsspiel/marktwerteverein/wettbewerb/PERD" #Peru
url.ar <- "/primera-division/marktwerteverein/wettbewerb/AR1N" #Argentina
# Add all the american leagues
url.all<- c(url.us, url.br, url.ar, url.co, 
            url.ec, url.ur, url.ch, url.mx)
# Here is the link in case you want to add "La Liga" from Spain
url.es <- "/primera-division/marktwerteverein/wettbewerb/ES1"

For visualization purposes we are limiting the list to the 20 most expensive soccer teams in the continent.

team.value <- transfermarkt(url.all)
draft <- cleansing(team.value)
final.draft <- color.column(draft, team_rank = 20)
final.draft <- arrange(final.draft, (total), (value.adj))
league.plot(final.draft, "Most Expensive Soccer teams from America")

You can find a larger version of this plot here

And finally - since we already have the data loaded - here is the list of the soccer players with the highest salaries in the continent:

draft <- cleansing(team.value)
final.draft <- color.column(draft)
final.table <- select(final.draft, No = number, name, position, team, USD_MM = value.adj)
final.table <- arrange(final.table, desc(USD_MM))

knitr::kable(final.table[1:10,], digits = 2, row.names = TRUE, align = "c", format = "markdown", padding = 40)
  No name position team USD_MM
1 10 Carlos Tévez Media punta Club Atlético Boca Juniors 16.35
2 10 Gabriel Extremo izquierdo Santos FC 13.08
3 - Maicon Defensa central São Paulo Futebol Clube 10.90
4 - Gerson Medio centro ofensivo Fluminense Football Club 10.90
5 10 Sebastian Giovinco Media punta Toronto FC 10.36
6 10 Ganso Medio centro ofensivo São Paulo Futebol Clube 9.81
7 20 Lucas Lima Medio centro ofensivo Santos FC 9.81
8 10 Gonzalo Martínez Medio centro ofensivo Club Atlético River Plate 9.81
9 10 Giovani dos Santos Media punta Los Angeles Galaxy 8.72
10 7 Elias Medio centro Sport Club Corinthians Paulista 8.72

You can find the entire rmarkdown document here

Google Icons

About

Data Scientist in-the-making

Social Links

Location

Toronto, Canada
Hermosillo & Monterrey, Mexico