Wednesday, March 17, 2021

College Basketball - Rankings, Brackets, Subjective Probabilities and Not Enough Data

Like many fans of Illinois, including some non-true ones like me who haven't watched for many years, I've become taken with this year's team.  Not only would I watch the game, but during the Big Ten Tournament I'd go online and look for what is made available about the game once it was over.  I discovered after game press conferences held in Zoom, with coaches and players from each team.  I actually found that listening to the opposing coach was more informative - I already knew a lot about the Illini without hearing it from the players and coaches, though that was entertaining too.  The bit from the coaches of the other team that caught my attention was how much they talked about the players learning, even at this late stage of the season, so that their performance would improve and their team would get better.  The expectation of such learning seemed paramount.  And this was from coaches of teams that would make the NCAA tournament.  Indeed, Iowa and Ohio State were each given a 2 seed in their respective brackets.  How much better can they get?

Unlike just about every other fan, I engage in math modeling of social/political/economic situations.  It's what I learned to do in graduate school and later as a faculty member, when my research was in economic theory. Somewhat surprising to me at the time, I continued to do this when I switched careers and became an administrator in educational technology.  There were some obvious changes with the career switch.  I no longer tried to publish these models.  And for the most part they were remarkably simple, much more so than the models I did try to publish earlier in my career.  The simplicity helped in communicating the ideas to others who weren't versed in modeling of this sort.  And they helped me enormously in supporting my thinking.  Indeed, often I would generate such a model with no apparent effort whatsoever.  

I'm going to do something similar here - no math, just a talkie discussion of the modeling issues.  I hope that anybody else who reads this gets the gist of what I'm saying, which is based on listening to those coaches talking about player and team learning.

Let's talk a little about inference via chains of connection. A plays B (and some outcome is determined).  Then B plays C (another outcome).   And C plays D (still another outcome).  At this point some inference can be made between A and D, which of them is better or are they approximately equal.  This happens even though A never played D.   Chain inferences of this sort underlie the ratings. 

The regular season is broken up into two main components.  The first is the preseason, with about 1/3 of the total number of regular season games.  Preseason games form chains among all the teams within Division 1. (With Google I learned that there are 350 teams in Division 1 and 7 more that are transitioning from Division 2.)  The remaining 2/3 of the games are played within a Conference in which the school is a member.  (There are 7 Independents.  I believe there used to be many more, but it then became difficult to schedule games with other schools in a Conference during the Conference season.)  There are roughly 30 games in the entire season and then the postseason begins with the Conference Tournament.  Once the Conference Season starts, we get a lot of information about how Conference teams compare with one another, but we no longer get fresh information about how they compare with out of Conference teams, until the NCAA Tournament is played. 

Regarding the outcome of any particular game, apart from which team wins, I would break the outcomes into three categories: 1) nail-biter,  2) convincing win, and 3) blowout.   I'm not going to try to come up with formal ways to differentiate these but among the things I look at instinctively are: a) score at the 2:00 minute mark compared with final score, b) when the subs who otherwise don't play much get put into the game, and c) body language of the opposing team late in the game.  My view is that these categories serve as a useful summary of the outcome. 

With this background, let's get to the model.  Every team is characterized by its quality, q, a vertical parameter.  When two teams play, the one with the higher value of q is more likely to win.  It's the difference in the quality parameters across the teams that determines the distribution over outcomes.  This is an abstract characterization.  Let's consider what it rules out.  For one, home team advantage is not considered.  In a non-Covid season, home team advantage clearly matters, the fans spur the players on and the refs sometimes get caught up in that. I don't disagree but it's not in the model, because if it were that would complicate things and we want to keep things simple.  Another factor that is sometimes mentioned is matchup - a team plays well against man-to-man defense but struggles against the zone, then the type of defense the opponent plays matters for the outcome. That too is not included, again to keep things simple.  

One other point to consider is to translate the rankings into the quality of teams.   The rankings are ordinal.  Number 1 is the best, but might number 2 be very close or not?  The rankings would be the same either way.  So these quality parameters have more information in them than the rankings do.  The quality parameters give a cardinal ranking in that differences in quality do matter. 

Returning to the model, the quality parameter evolves over the course of the season.  Teams get better via the learning the coaches describe.  There may be reasons for quality to drop, which we should consider as well. One obvious reason is for there to be a significant injury to a key player on the team.  Another might be attitudinal.  If a team is on a losing streak because of the luck of a the draw in the scheduling as well as other random factors, it might lose some if its competitive edge.  (I'm writing this after reading about Indiana and Minnesota in the Big Ten firing their coaches.  For players who are not graduating this year, the prospect of a new coach may be daunting.  While the firings happened after the end of the season for those teams, if the players anticipate that possibility earlier it could impact their play.)

Now the key point.  The learning effect which says that quality will grow may be different from team to team.  To make progress here, let's focus on the conference season.  Then we might might break the overall learning effect for a team into a conference growth component, common to all teams in the same conference and a team idiosyncratic effect.  The latter average out to zero so that in aggregate the learning of the teams in the conference happens at the conference rate.  Those rates may differ from one conference to another. 

Now we'll take another step back.  If one team has very high quality and it is playing another team of mediocre quality,  the first team might learn a lot from a loss or a nail-biter win, the proverbial wake up call, but there is little to no learning from a convincing win or a blowout of the other team.  So at the team level, growth will be higher on average when even the weaker teams in the conference have a decent chance at winning and where most games are reasonably close.  

Now let's consider individual player growth.  Much of this happens from one season to the next, where the player works on his game during the off season.  This sort of thing happens at any level.  But for a freshman who gets a fair amount of playing time, the adjustment from high school basketball to college basketball is huge.  Such players might grow the most during their first season, although not all of them will rise to the occasion.  Transfer students might be next on the list.  They played under one system at their old school and then have to adjust to the new system and their role on it.  Then there is the situation where a key player goes down with a severe injury and the roles of others on the team must change as a consequence. This can produce a lot of learning, both by the players and by the coaching staff, which hasn't seen the players perform in their new roles.  This gives some reason why the rate of team learning may depart from the average for the conference. 

Of course there is another reason this year.  Covid has knocked many teams off their ordinary rhythms.  With some players testing positive and teams having to quarantine, so unable to practice and engage in ordinary team camaraderie outside of practice, that surely made for a setback for those teams who have gone through it.  Afterwards, could they eventually return to the level of play they had achieved before?  Who knows?

I'm going to try to wrap up now with two observations about this year's tournament.  What quality does Gonzaga have? It is undefeated and the number 1 seed overall.  That is agreed upon.  But does it have the highest quality? Or might Illinois, which was good but not great last December, have passed Gonzaga in quality because the Big Ten Conference produced a lot more team learning, and Illinois had the right mix of freshman, transfers and upperclassman to have a very high learning rate.  In particular, the blowout loss to Michigan State, where Ayo Dosunmu had a concussion and his nose was broken, might have jump started team learning for all the other Illini players regarding all in effort on defense and a raised efficacy on offense.  As an enthusiastic fan, I watched this performance.  I want it to be true that Illinois now has the highest q, no doubt. But on what evidence could I make that determination?  Indeed, I just looked through Gonzaga's season record.  While they had some good early wins, many of their preseason games were cancelled due to Covid, including a game against Baylor.  They did have a close win against West Virginia in December, where in the first half they fell behind. And West Virginia played Baylor in the the Big 12, losing a hear breaker.  The Illini lost to Baylor in a game that was close in the first half but then Baylor took control.  On this chain connection, one might infer that Gonzaga is the best team, so has the highest q.  But the Illini in early December are nowhere near as good as the Illini who just won the Big Ten Tournament.  What about a chain connection now to make a proper comparison?  Alas, there isn't one. 

Now let me turn to the brackets.  The NCAA Committee that determines each bracket and the seedings within doesn't go full seedings of all the teams in the tournament.  When there was a real pod system, particularly for the round of 64 and round of 32 games, with those sites distributed geographically around the country, it made sense for teams to play within driving distance of their home campus, if at all possible.  This would encourage fans of the team to attend the Tournament.  That is but one factor where the brackets would be jiggled to accommodate fan interest.  It also seems that the committee would encourage possible matchups that the fans seem to want but the schools are reluctant to schedule.  In the Tournament that starts this week, it is possible for Illinois to meet Loyola of Chicago in the round of 32.  That's one such example of such a fan interest game.  

A different one is in the top seeds.  The Big Ten has four teams in the top eight.  Illinois and Michigan are #1 seeds in their respective brackets.  Iowa and Ohio State are #2 in their brackets.  None of these teams are in the same bracket.  So it is possible, if incredibly unlikely, for all of them to make the Elite Eight and, if each of them wins there, for the Final Four to be an all Big Ten affair. As I said, this took some jiggling of the overall seeds.  The Big Ten already has an advantage with the location of the Tournament this year.  Indianapolis is approximately at the geographic center of the entire Big Ten.  More broadly the state of Indiana plays that role. And the Big Ten Tournament was played in the same venue where they'll host the Final Four.  Why do this other thing as well?  I can only guess that the Committee wanted to give fans something else to think about.  If all the top seeds make it through to the Sweet Sixteen, I'm sure it will get more than a mention. 

I do not bet on the games.  I think if I did it would detract from my own interest as a fan.  But it's clear that many other people do bet.  Based on the betting, Gonzaga is the overwhelming favorite.  Do those people who are betting on Gonzaga know something about what's going on that I don't?  Or are they caught up in the undefeated season narrative, which might not have in itself much to bear as to how the Tournament turns out.  I do recall back in 2005 that the Illinois team was undefeated till the last game of the regular season, where it lost to Ohio State.  As a fan, that was disheartening.  But it might very well have been a blessing in disguise, with the unbeaten streak something of a burden to maintain.  That team did make it to the NCAA Championship game.  I surely don't know how Gonzaga is handling things now.  After the Tournament is over I'd be interested in reading stories about that, but until then it's just another thing to speculate about.

That's why they play the games.  I'm chomping at the bit for the tournament to start.

1 comment:

Lanny Arvan said...

I didn't say this in the post itself, but the various rankings may be a very good think in terms of increasing fan interest, but may be far less good as predictors of NCAA Tournament outcomes. I know there are many variants of these rankings that are presumably data based, but I don't see how any of those get past the critique in this post, except by assuming the learning effect is small.

That is not a good assumption, at least not in all cases.