Comparison of Selected USWNT Players' Goal Scoring over Time

With the United States playing Russia tonight in Texas, I thought it would be a good time to take a look at some more USWNT data. As with Donovan and Dempsey on the men's side, I wanted to compare the top goal scorers for the women's team as a function of caps. Unfortunately, of the top ten goal scorers in USWNT history, I could find a list of goals and caps only for Abby Wambach. Conversely, and to my great surprise, Mia Hamm's Wikipedia page doesn't have a complete list of her goals at all, with or without caps included. Of the rest of the top ten, only Carli Lloyd and Alex Morgan have lists of their international goals included on their Wikipedia pages. I was able to find such lists (again, without caps) for Sydney Leroux, Christen Press, and Crystal Dunn, as well.

Of course, the problem became deciding what variable I should use to plot cumulative goals. Outside of the odd Wikipedia page, I have not been able to find any lists of individual player's appearances for the men or the women. Although caps should be the independent variable on these graphs, I had no choice but to plot career goals against time. This is admittedly a poor substitute for caps, as it is subject to distortions such as scheduling differences, injury absences, and pregnancy absences, all of which extend or contract the line plot relative to other players. But the only other choice was to abandon the project all together, so I decided to press on. Here is the result (click to enlarge):

Most of the data came from Wikipedia, but I had to add some information from other sources to fill out the dataset. The starting point for each of the line plots is the date of the first appearance of each player on the field for the USWNT.

At least for this set of players, only Alex Morgan and Christen Press are on anything close to a course to catch Abby Wambach should their careers last as many weeks as hers. Even then, they would have to play a lot more matches than Wambach did in the same amount of time given their goals per game figures (Wambach: 0.72; Morgan: 0.59; Press: 0.49). This only goes to show problem with using chronological length of career rather than number of actual matches played.

As the players all started their national team careers in different games, I changed the match dates into seconds (which is calculated by R in reference to 01/01/1970), subtracted the number of seconds that the dates of their debut matches represented from each row, and then turned those very large numbers into weeks. This gave a starting point of 0 for each player, and turned the x-axis into weeks of each player's career. Here is an example of the R code that I used:

> lloyd_goals$secs<-as.numeric(as.POSIXct(gsub("\\[m.*", "", lloyd_goals$Date)))
> lloyd_goals$secs <- lloyd_goals$secs-1120968000
> lloyd_goals$Weeks<-lloyd_goals$secs/(60*60*24*7)

This was my first practical experience with POSIX, which continues to confuse me. I also had to contend with footnotes in the original data, all of which preceded the dates and started "[m..." I had to remove this before I could convert the dates into seconds, etc., and I was very pleased with myself when I remembered that in gsub commands, putting \\ after the first quotation mark and before a punctuation mark tells R that the punctuation mark is part of the text to replace. Otherwise, this

> lloyd_goals$secs<-as.numeric(as.POSIXct(gsub("[m.*", "", lloyd_goals$Date)))

results in an error, as R looks for a corresponding ] as part of the command structure rather than as part of the data text and, of course, that ] is not to be found.

This project was surprisingly challenging for me. My R code from the Donovan and Dempsey project would not work for reasons I could not figure out. And there are other parts of my code with which I am far from satisfied, but there will be other days for figuring out how to do this better.


Popular Posts