The Birth of the USISSL

After many years, I've decided to relaunch my blog. This time, however, the focus will be different. I've been trying to teach myself the statistical programming language R for a number of years, with little success. After reading about the experiences of others, the same recommendation kept popping up: the best way to learn R is by slogging through one's own dataset rather than relying on exercises. But what dataset? I knew it had to be something I was actually interested in spending time examining. And what to do with it? Equally, I wanted the ends to be interesting to me. Then I hit upon an idea: I could use this to make my own fantasy sports league, one that I wish were a reality but know will never happen.

Domestic soccer in the United States (at least in the era of Major League Soccer) is not structured the way I would like it to be. In my world, every one of the 50 states would have a league of at least two tiers, ideally of 18 clubs in each tier. Each state's league would be composed entirely of players from that state (although I'm willing to entertain the idea of a limited number of "foreign" players on each club, be they from other states or other nations). There would be intra-state promotion and relegation. There would be no intra-state playoffs to determine the state champion. Instead, the 1st division champion would be the club that was top of the table at the end of a season that saw each club play home-and-away against every other club in that division. And when I say "club" I mean an association football CLUB, not a franchise of a corporate oligopoly (e.g., NFL) or single entity (e.g., MLS).  However, each state's champion would enter a nation-wide champions league akin to the UEFA Champions League in its structure. The winner of that competition would be the national champion.

Of course, this could never happen for a host of reasons, including financial, legal, demographic, and even constitutional concerns. Even a lesser model in which each state fielded its own team made up of players from that state and played the other state teams in a league format is similarly impossible. But that doesn't mean I can't dream!

Here's where R comes in. After searching but not finding what I wanted, I decided to create my own dataset using MLS data. Not knowing where else to build it, I used Excel to create the dataset. My dataset includes all current MLS players with US hometowns, according to their teams' own websites.  For each player with a US hometown, I have included the player's name, hometown, home state, whether or not they are a goalkeeper, and current MLS team (as of then end of week 4 of the 2017 MLS season). I am going to use this dataset to construct my United States Inter-State Soccer League (USISSL) using R. The dataset includes 324 players as of March 28, 2017. Ideally, the more R I learn, the more efficient I can be in running the league.

To enter a team in the USISSL, a state must have at least 10 field players and at least 1 goalkeeper. These players might never actually play in an MLS game, but as long as they are on an MLS roster, they count. The points scored by each player in USISSL are the same points that each player is awarded by MLS's fantasy league. Each state team's points total is the sum of the cumulative MLS fantasy league points of the top ten field players and top goalkeeper for each state (top 10:1). The USISSL champion will be the state with the highest points total at the end of the MLS regular season based on the top 10:1 formula described above. MLS playoffs do not count toward the USISSL championship. My plan is to use R to evaluate the data and compile the league table as often as possible, ideally every week of the MLS season.

As I was putting this dataset together, I realized there are numerous and varied analyses other than constructing my own fantasy league that I can try to perform using R. This is exciting and daunting, but hopefully in the weeks and months to come, I will be able to post some of the results here (or post pleas for help!). In the meantime, here are a few observations that I found interesting after considering the data that I accumulated:

  1. Only 34 states (and the District of Columbia) have even one "hometown" MLS player.
  2. Of these 34 (plus 1), only 10 have at least 11 MLS players. DC is not one of them.
  3. Of these 10 states, only 6 have at least 10 field players and at least 1 goalkeeper. The others either have no goalkeepers (Colorado and Illinois) or not enough field players (Florida and Georgia).
  4. Several states are just short of the total number of players needed to enter the USISSL. Should they gain more players during the course of the MLS season, those states will be added to the league.
  5. California accounts for 66 of the 324 players included in the dataset, which is approximately 20% and twice as many players as the next state (Texas) but still far in excess of the 12% of the entire US population that lives in California.
  6. New York has surprisingly few players (16) given its population, which in 2014 was slightly more than half that of California's population.
  7. Many cities are the hometowns of multiple players, but they aren't always in the states with the most players. For example, four of the six MLS players from Missouri hail from St. Louis, while Federal Way is the hometown for three of Washington's nine MLS players. Perhaps there will be enough of these clusters to form a five-a-side league...
So, as I said, this is about learning R. I don't remember how to make tables in R just yet. For now, though, here is the USISSL table as of the end of Week 4 of the 2017 MLS season as rendered in Excel. (Forgive me.)


Week 4 State Team Points

California 206

Texas 144

Ohio 109

New Jersey 96

New York 85

Pennsylvania 77

Comments