Map Time: Cities Hosting United States MNT Matches (2014 through March 2017)

Full of confidence fostered by my recent map plot progress, I decided to try my hand with some non-MLS-related data for the first time. I have some ideas for working with and presenting data related to the national teams of the United States, but for now I'm having a difficult time finding a comprehensive database of US national team results that is in a format compatible with my current R abilities. Until I take those steps in my R education, I'll have to settle for using the disappointingly limited collection of USMNT results and USWNT results available at US Soccer's website. The women's team results go back only to 2012, while the men's team results go back only as far as 2014. This post is about the USMNT map I created; the USWNT map will be presented in a later post.

The map of USMNT fixtures from 2014 through the end of March 2017 looks like this (click to enlarge):


Curiously, for the 3-1/4 years covered by this dataset the USMNT has not played any matches farther east than than Cyprus, which is why that part of the world is not included in this map. With a grand total of 4 matches, Carson, CA, was the most frequent location for USMNT internationals. No other city hosted more than 2 matches during that same time period.

To produce this map, I had to do a little more cleaning of the dataset than I had to do for my previous projects. Stadium, City, and State/Nation were grouped together in one column. I used the "separate" function to split this column into a Stadium column and a City and State/Nation column. Then I used the "count" function to determine how many times the USMNT played in each city. I then split City and State/Nation into separate columns in order to pass them into the "geocode" function in order to obtain latitudes and longitudes, although I suspect that this splitting step might not have been necessary. I also had to use na.omit and to rename a specfic column for the first time. The code I used to rename the count column (default name "n") to "Matches" was the following:

> names(usmnt5)[names(usmnt5) == "n"] <- "Matches"

These steps were followed by modified versions of the get_googlemap and ggmap code lines that I used on previous maps. This included increasing the size of the dots using "scale_size" as part of the geom_point command line, so now even the smallest dots are more readily identified.

I'm pleased with this map, but there is so much more to do. Most of it relates to obtaining all of the USMNT match data back to 1885. That will require me to learn many new R skills, which is a huge part of what this is all about. Hopefully it won't be too painful!

Comments