Map of American Soccer Players in Foreign Leagues
After the success I had creating my map of MLS player production by state, I thought it would be interesting to plot a different dataset on a world map. I decided to do this for Americans playing in foreign leagues, using the list of US players abroad found at Soccer Way, with minimal supplementation based on the list of players at Yanks Abroad. Both of these lists include only male players. I'm planning to produce a similar map for female players in the future.
The resulting map employs gradient shading based on the number of Americans playing in leagues in each country or territory. The more red the color, the more players in that country. Areas that are gray have no Americans playing in foreign leagues in that country.
Here are the top ten countries and territories on the map, with "freq" representing the number of players in that country or territory:
I was not surprised to see Germany at the top of this particular list. However, I was surprised to learn how many Americans are playing in Iceland. The Germany connection I understand, but why Iceland? I'm sure it's a lovely place, and I have plans to visit someday, but I had no idea that Iceland is such a draw for American soccer players.
I also expected to see darker shading/more players in the United Kingdom in large part because I was unable to plot separately the data for England, Scotland, Wales, and Northern Ireland. I hope to crack that in a future iteration of this map, although bizarrely the world dataset I used does not include "England", instead using "Great Britain" but also separately including "Scotland" and "Northern Ireland".
"But what about Canada?" I hear you ask. The Americans playing in Canada--at least according to the two data sources I used--play in US-based leagues that have teams in Canada. These leagues are MLS, USL, and NASL. As they are not foreign leagues but rather American leagues operating in a foreign country, I excluded players in those leagues from the map.
Slightly more complicated is Puerto Rico, which is included in the Soccer Way list of Americans abroad despite Puerto Rico being a US territory and Puerto Ricans being Americans. I think that the Soccer Way list for Puerto Rico includes only those American players who are not from Puerto Rico, as the list is pretty short otherwise. From this list, I excluded those players who are in Puerto Rico as part of the NASL but kept those players who play in the LNFPR. I know that geopolitically, the LNFPR is not really a "foreign" league, but as it operates entirely within the territory of Puerto Rico and not in any of the 50 US states, I kept it on the list for purposes of this project.
There are a couple more issues with this map. One is that Trinidad and Tobago were separated into "Trinidad" and "Tobago" in the "world" dataset I used, so I had to pick one to shade. (Sorry, Tobago!) Another is that there is one American playing in Hong Kong, but again due to my inability to plot "subregions" (to use the term applied in the world dataset to places like Hong Kong, England, and Scotland) as well as regions, I had to shade all of China instead. And lastly, the Soccer Way list might not be complete, and I did not double-check all of the names on the shorter Yanks Abroad list, so there might be some omissions from the map. Again, I hope to be able to resolve these issues in a future map.
Creating this map was more difficult than I thought it would be. I initially used virtually the same R code that I used for the US map of MLS players and youth soccer registration. This included the rworldmap, RColorBrewer, and ggplot2 packages. After loading my dataset of Americans abroad, I started with these two lines of code:
The second line was a neat discovery, which allows for the removal of any particular region from the map. In this case, I chose to remove Antarctica, which improved the resulting map immensely.
While coding the map, I discovered how to include all countries on the map, not just those with players. The secret was to include "all=TRUE" in the merge function code when combining the map data and my player data, like so:
I even figured out how to include the borders of the countries on the map, which is accomplished by adding the following parenthetical text to geom_polygon in the ggplot code:
Everything was going swimmingly until I ran the ggplot code to produce the map. This is what I received for my troubles:
Try as I might, I could not get rid of these streaks across the map. I eventually was able to determine that the problem was with the "world" dataset, in particular the data it includes for Russia. When I removed Russia from the map, the lines went away. I was able to find this discussion about the same problem, but the solution offered was to find a better dataset. However, by following the link to the other post of the same question/issue, I learned that I could switch the map from a Mercator projection to a Cartesian projection simply by removing coord_map() from the ggplot code. Crossing my fingers, I ran the new lines of code, and it worked! No more streaks across the map!
There is an alternate world map dataset called "world2" that will work as a Mercator projection without the lines, but this is focused on the Pacific Ocean, which was not aesthetically desirable for this map. Besides, the Cartesian projection isn't bad at all. I will have to explore other ways of plotting world maps like this, though, and learning that the data included in some packages might be defective was a valuable lesson for me.
The resulting map employs gradient shading based on the number of Americans playing in leagues in each country or territory. The more red the color, the more players in that country. Areas that are gray have no Americans playing in foreign leagues in that country.
(click to enlarge) |
Here are the top ten countries and territories on the map, with "freq" representing the number of players in that country or territory:
x freq 1 Germany 39 2 UK 23 3 Mexico 20 4 Sweden 20 5 Finland 18 6 Iceland 16 7 Denmark 11 8 Austria 6 9 Netherlands 6 10 Puerto Rico 6
I was not surprised to see Germany at the top of this particular list. However, I was surprised to learn how many Americans are playing in Iceland. The Germany connection I understand, but why Iceland? I'm sure it's a lovely place, and I have plans to visit someday, but I had no idea that Iceland is such a draw for American soccer players.
I also expected to see darker shading/more players in the United Kingdom in large part because I was unable to plot separately the data for England, Scotland, Wales, and Northern Ireland. I hope to crack that in a future iteration of this map, although bizarrely the world dataset I used does not include "England", instead using "Great Britain" but also separately including "Scotland" and "Northern Ireland".
"But what about Canada?" I hear you ask. The Americans playing in Canada--at least according to the two data sources I used--play in US-based leagues that have teams in Canada. These leagues are MLS, USL, and NASL. As they are not foreign leagues but rather American leagues operating in a foreign country, I excluded players in those leagues from the map.
Slightly more complicated is Puerto Rico, which is included in the Soccer Way list of Americans abroad despite Puerto Rico being a US territory and Puerto Ricans being Americans. I think that the Soccer Way list for Puerto Rico includes only those American players who are not from Puerto Rico, as the list is pretty short otherwise. From this list, I excluded those players who are in Puerto Rico as part of the NASL but kept those players who play in the LNFPR. I know that geopolitically, the LNFPR is not really a "foreign" league, but as it operates entirely within the territory of Puerto Rico and not in any of the 50 US states, I kept it on the list for purposes of this project.
There are a couple more issues with this map. One is that Trinidad and Tobago were separated into "Trinidad" and "Tobago" in the "world" dataset I used, so I had to pick one to shade. (Sorry, Tobago!) Another is that there is one American playing in Hong Kong, but again due to my inability to plot "subregions" (to use the term applied in the world dataset to places like Hong Kong, England, and Scotland) as well as regions, I had to shade all of China instead. And lastly, the Soccer Way list might not be complete, and I did not double-check all of the names on the shorter Yanks Abroad list, so there might be some omissions from the map. Again, I hope to be able to resolve these issues in a future map.
Creating this map was more difficult than I thought it would be. I initially used virtually the same R code that I used for the US map of MLS players and youth soccer registration. This included the rworldmap, RColorBrewer, and ggplot2 packages. After loading my dataset of Americans abroad, I started with these two lines of code:
> world<-map_data("world") > world <- subset(world, region!="Antarctica")
The second line was a neat discovery, which allows for the removal of any particular region from the map. In this case, I chose to remove Antarctica, which improved the resulting map immensely.
While coding the map, I discovered how to include all countries on the map, not just those with players. The secret was to include "all=TRUE" in the merge function code when combining the map data and my player data, like so:
> worldplot <- merge(world, countries, all=TRUE, by="region")
I even figured out how to include the borders of the countries on the map, which is accomplished by adding the following parenthetical text to geom_polygon in the ggplot code:
geom_polygon(color="white")
Everything was going swimmingly until I ran the ggplot code to produce the map. This is what I received for my troubles:
Streak Map |
There is an alternate world map dataset called "world2" that will work as a Mercator projection without the lines, but this is focused on the Pacific Ocean, which was not aesthetically desirable for this map. Besides, the Cartesian projection isn't bad at all. I will have to explore other ways of plotting world maps like this, though, and learning that the data included in some packages might be defective was a valuable lesson for me.
Comments
Post a Comment