Word Cloud: First Names of MLS Players with US Hometowns

Today's task was to make a word cloud using R for the first time. I wanted to make one using the two letter state abbreviations, but for reasons I still haven't elucidated, the code that I used would not accept these pairs of capitalized letters. After much trial and error, I gave up and tried hometowns instead, but they were just too uncommon to generate much of a word cloud. The same (surprisingly) was true for surnames. So I settled on using the players' first names. I adapted the code presented in this example, making some modifications such as the color palette, increasing the number of colors available, and setting the argument colorblindFriendly to TRUE. Here is the result (click to enlarge):
I'm reasonably pleased with the way this turned out, and I already have an idea for a more complex word cloud project.

This project required the tm, NLP, SnowballC, wordcloud, and RColorBrewer packages, although NLP and RColorBrewer were automatically installed as dependencies of tm and wordcloud, respectively.

Another thing I could never figure out was that despite there being only seven different name frequency levels (1, 2, 3, 4, 5, 7, and 10, the last of which was for the most common name: "Chris"), I had to increase the number of available palette colors to 9 in order to see seven different colors in the word cloud. I would have thought that setting the available colors to 7 would have been sufficient, but it was not and nor was setting it to 8. I'll have to do some research to figure out why this is the case.

Comments