Visualizing Mexican Soccer Salaries with Plotly
Jan 3, 2016. | By: Arturo Cardenas
Back in November I saw a post using ggplot2 to visualized MLS Player Salaries. I thought I would give it a try and replicate the analysis for the Mexican Soccer League using the newly open sourced Plotly but first I had to figure out where to get the data…
###The tweet
This tweet links to @tejaykodali post:
###ETL process
First we need to get the salary data. I didn’t have a PDF file with all the salaries (just like the original post) so I had to look for ways to scrap the information from Transfermarkt .
# Set the initial url to pull the data
url.t <- "http://www.transfermarkt.es"
# Mexican League link
url.mx <- "/liga-mx-clausura/marktwerteverein/wettbewerb/MEX1/" # mexico
Using the rvest
package I was able to loop through each team from the league and obtain compile a single table with all the required data.
transfermarkt <- function ( x = dara.url ){
# Create data.table for team loops
team.value <- data.table ()
for ( j in 1 : length ( x )){
# Initial pull with all the liga MX names-url information
all.ligamx <- read_html ( paste0 ( url.t , x [ j ])) %>%
html_nodes ( "table.items" ) %>%
. [ 1 ]
# Get teams url
team.url <- html_nodes ( all.ligamx , "a" ) %>%
html_attr ( "href" )
team.url <- unique ( team.url [ team.url % like % "startseite" ])
# Get teams names
names <- html_table ( all.ligamx , fill = TRUE ) %>%
as.data.frame () %>%
. [ -1 , "Club" ]
# Create data.frame with url-names
db <- cbind.data.frame ( team.url , names )
# Get each player´s cost by team
for ( i in 1 : nrow ( db )){
temp <- read_html ( paste0 ( url.t , db [ i , 1 ])) %>%
html_nodes ( "table.items" ) %>%
. [ 1 ] %>%
html_table ( fill = TRUE ) %>%
as.data.frame ()
temp $ team <- db [ i , 2 ]
team.value <- rbind.data.frame ( team.value , temp )
}
}
# Update values
headers <- c ( "number" , "x1" , "x2" , "x3" , "position" , "name" , "age" , "x7" , "value" , "team" )
names ( team.value ) <- headers
team.value <- subset ( team.value ,
select = c ( "number" , "position" , "name" , "age" , "value" , "team" ))
return ( team.value )
}
team.value <- transfermarkt ( url.mx )
The raw data includes NAs
and currency values formatted as string. Using the self defined cleansing
formula the values are:
Transformed to number values, and
converted from euros to USD
draft <- cleansing ( team.value )
To control the colors assigned in the plot we are defining the color.column
formula to take advantage of the RColorBrewer
package and pre-define the color scale to be used in the final plot.
final.draft <- color.column ( draft , team_rank = 0 )
Finally we are defining the parameters to be used in the Plotly bar chart!
league.plot <- function ( x = data , Plot.Title = "League" ){
p <- plot_ly ( data = x , y = team , x = value.adj ,
marker = list ( color = color.set ,
line = list ( color = 'white' , width = .5 )),
opacity = .9 , group = total , type = "bar" ,
showlegend = FALSE , bargap = 0.5 , text = name ,
orientation = "h" )
f1 <- list ( family = "Arial, sans-serif" , size = 18 , color = "lightgrey" )
f2 <- list ( family = "Old Standard TT, serif" , size = 12 , color = "black" )
x <- list ( title = "USD Million" , titlefont = f1 , showticklabels = TRUE ,
tickfont = f2 , exponentformat = "e" , showexponent = "All" )
y <- list ( title = "TEAM" , titlefont = f1 , showticklabels = TRUE ,
tickfont = f2 , exponentformat = "e" , showexponent = "All" )
m = list ( l = 150 , r = 20 , b = 50 , t = 50 , pad = 4 )
layout ( p , title = Plot.Title , hovermode = "closest" , barmode = "stack" ,
xaxis = x , yaxis = y , margin = m )
}
final.draft <- arrange ( final.draft , ( total ), ( value.adj ))
league.plot ( final.draft , "Liga MX Teams" )
You can find a larger version of this plot here
###Going the extra mile
Now, if we want to replicate this for all the Teams in the continent, we just need to create a vector containing all the URLs linking to all the leagues in America and pass it through the same formulas.
# Links for all the soccer leagues from america available
url.us <- "/major-league-soccer/marktwerteverein/wettbewerb/MLS1" #US
url.br <- "/campeonato-brasileiro-serie-a/marktwerteverein/wettbewerb/BRA1" #Brazil
url.co <- "/liga-postobon-i/marktwerteverein/wettbewerb/COLP" #Colombia
url.ec <- "/serie-a-primera-etapa/marktwerteverein/wettbewerb/ECPE" #Ecuador
url.ur <- "/primera-division-apertura/marktwerteverein/wettbewerb/URU1" #Uruguay
url.ch <- "/primera-division-apertura/marktwerteverein/wettbewerb/CHLA" #Chile
url.cr <- "/primera-division-invierno/marktwerteverein/wettbewerb/CRPD" #C Rica
url.pe <- "/torneo-descentralizado-clausura-entscheidungsspiel/marktwerteverein/wettbewerb/PERD" #Peru
url.ar <- "/primera-division/marktwerteverein/wettbewerb/AR1N" #Argentina
# Add all the american leagues
url.all <- c ( url.us , url.br , url.ar , url.co ,
url.ec , url.ur , url.ch , url.mx )
# Here is the link in case you want to add "La Liga" from Spain
url.es <- "/primera-division/marktwerteverein/wettbewerb/ES1"
For visualization purposes we are limiting the list to the 20 most expensive soccer teams in the continent.
team.value <- transfermarkt ( url.all )
draft <- cleansing ( team.value )
final.draft <- color.column ( draft , team_rank = 20 )
final.draft <- arrange ( final.draft , ( total ), ( value.adj ))
league.plot ( final.draft , "Most Expensive Soccer teams from America" )
You can find a larger version of this plot here
And finally - since we already have the data loaded - here is the list of the soccer players with the highest salaries in the continent:
draft <- cleansing ( team.value )
final.draft <- color.column ( draft )
final.table <- select ( final.draft , No = number , name , position , team , USD_MM = value.adj )
final.table <- arrange ( final.table , desc ( USD_MM ))
knitr :: kable ( final.table [ 1 : 10 ,], digits = 2 , row.names = TRUE , align = "c" , format = "markdown" , padding = 40 )
No
name
position
team
USD_MM
1
10
Carlos Tévez
Media punta
Club Atlético Boca Juniors
16.35
2
10
Gabriel
Extremo izquierdo
Santos FC
13.08
3
-
Maicon
Defensa central
São Paulo Futebol Clube
10.90
4
-
Gerson
Medio centro ofensivo
Fluminense Football Club
10.90
5
10
Sebastian Giovinco
Media punta
Toronto FC
10.36
6
10
Ganso
Medio centro ofensivo
São Paulo Futebol Clube
9.81
7
20
Lucas Lima
Medio centro ofensivo
Santos FC
9.81
8
10
Gonzalo Martínez
Medio centro ofensivo
Club Atlético River Plate
9.81
9
10
Giovani dos Santos
Media punta
Los Angeles Galaxy
8.72
10
7
Elias
Medio centro
Sport Club Corinthians Paulista
8.72
You can find the entire rmarkdown document here
Please enable JavaScript to view the comments powered by Disqus.