Paved with Good Intentions

On Sunday, November 8, with 3:00 remaining in the fourth quarter, the Atlanta Falcons, trailing 17-13 with two remaining timeouts, faced fourth down and goal from the San Francisco 49ers’ two yard line. Rather than use their remaining down to attempt to score a touchdown and take the lead, the Falcons elected to take an essentially guaranteed three points by sending out Matt Bryant to kick a 19-yard field goal. From a variety of perspectives, ranging from intuitive (four is greater than three, after all) to quite rigorous, this decision was the wrong one, resulting in a significant reduction in the Falcons’ chance of winning the game. Here we attempt to address some questions:

  • To what extent was this decision counterproductive? More specifically, what are our best approximations for the Falcons’ two distinct win probabilities, respectively conditioned on their two available options? We utilize a quick, heuristic approach as well as an advanced model.
  • What factors contribute to this and similar errors? We focus on the persistence of traditionalist, “conservative” analytical fallacies.
  • As a fun thought experiment, can we approximate the maximum degree to which the aforementioned fallacies can negatively impact win probability, in any sport? In other words, what is the worst genuinely conceivable single analytical decision that a well-meaning party could make under the pretense of attempting to maximize win probability? In particular, does the example at hand approach this extreme?
  • In an era and climate where the competitive and financial stakes of high level sporting events are so high, and so much emphasis is placed on advanced analytics in other facets of operations, most notably personnel decisions, how and why do errors like this continue to be made with such frequency?

Disclaimer: This is not a hit piece. While this particular decision was presumably made by the Falcons’ head coach, with advice from coordinators and other trusted assistants, we choose not to specify names or assign blame, and will hereafter refer to the decision as made simply by the team as a whole. Even as we expand our discussion to other scenarios and other sports, it is not our goal to assert who is or is not good at their job, or who is or is not worthy of lucrative employment, a devolution that is rampant in such discussions. This analysis comes from a perspective not of frustration or criticism, but rather of genuine academic interest.

Crunching the Numbers: An Elementary Heuristic

Before we expand our gaze, let’s get down to business. How could one, with only an intermediate knowledge of football, estimate the two relevant win probabilities in real time? For this rough heuristic, we do not take into consideration the specific teams or personnel, so this is more like a discussion of what the correct decision is in this generic situation. We begin by defining the following parameters:

p = the probability of the Falcons scoring a touchdown on fourth and goal from the 2, should they attempt to do so

q = the probability of the 49ers regaining the lead and winning should the Falcons score a touchdown

r = the probability of the Falcons regaining possession and scoring a touchdown should they attempt and fail to do so initially

s = the probability of the Falcons regaining possession and scoring a field goal or touchdown should they choose to kick a field goal initially

We note that there is no defined parameter for the probability of the success of the field goal attempt, as that is an effective certainty. A field goal attempt snapped from the two yard line is equivalent to the traditional (pre-2015) extra point kick, and for some context, Matt Bryant was 222-222 on such kicks in his previous six seasons with the Falcons. A quick consideration reveals that the two relevant win probabilities are roughly P = p(1-q)+(1-p)r and s, as there would most likely be insufficient time for meaningful possessions beyond those indicated.

The most easily estimated value is p. In the NFL and NCAA, two point conversion tries are snapped from the two yard line, a spot determined to approximately balance the play’s expected value with the essentially guaranteed extra point kick. In practice, the conversion rate is slightly below 50%, but for our purposes that figure serves as a convenient and sufficient approximation. This yields the formula P = 0.5 + 0.5(r-q).

The individual approximations of r and q are highly nontrivial. However, given the limited time remaining, it is reasonable to assert that both probabilities are low, likely less than 0.25, and also that the probabilities are comparable to each other. To support the latter assertion, the scenario defining q features a team starting a drive with under three minutes to play, most likely from their own 20 yard line, whereas the scenario defining r  features a team on defense, but in immensely favorable field position. Without getting into too much detail, the competing factors of possession versus 80 yards of field position, counteract each other, though the limited remaining time adds additional value to possession. In summary we assume that r-q is a relatively small, though likely negative, quantity. In particular, we arrive at a well-motivated estimate of P ranging from 0.4 to 0.5.

An estimate for s is more challenging. We first assume that the 49ers ensuing possession begins at the 20 yard line, as most NFL kickers can record touchbacks essentially at will, particularly since the kickoff placement was moved to the 35 yard line, noting that this assumption results in, if anything, a slight overestimate for s. We note that given Atlanta’s retention of two timeouts, as well as the two minute warning, the 49ers must achieve two first downs to prevent Atlanta from ever regaining possession, and the achievement of one first down significantly limits Atlanta’s remaining time for an ensuing drive after a punt.

League wide, across all game situations, approximately 20% of all drives stall before achieving a first down. Assuming independence once a first down is achieved, this would indicate that the 49ers have approximately a 64% chance of winning the game without ever giving back the football, a 16% of punting with less than a minute remaining, and a 20% chance of punting with exactly two minutes remaining. However, built into these assumptions about the clock is that the 49ers will not throw any incomplete passes, which turn into de facto extra timeouts. Taking into account the 49ers likely conservative run-only play calling strategy, the argument could be made that the 20% figure should be significantly higher, let’s liberally assume 40%, resulting in a distribution of probabilities of 36% game over, 24% punt with under a minute, and 40% punt with two minutes.

In the event of a punt, the Falcons must then drive the ball into field goal range, likely a distance of at least 30 yards, and successfully kick a field goal (or score a long touchdown) in the remaining time. Considering the aforementioned data, and time constraints, it would be somewhat liberal to estimate that the probability of achieving this goal (again, with no consideration of specific teams or personnel) as 50% with two minutes remaining and 25% with less than a minute remaining. The resulting estimate on s is (0.4)(0.5)+(0.24)(0.25)=0.26. Given the rough nature of these probability assignments, we estimate that s lies between 0.2 and 0.3.

The interested reader may inquire: Couldn’t the consideration of factors specific to these teams, their quality, personnel, playing style, etc., lead to an advanced analysis that supports the field goal as the optimal strategy? For example, wasn’t Blaine Gabbert playing quarterback for the 49ers? Hadn’t they barely moved the football at all in the second half?

In theory, such reversals are certainly possible, but in this particular case the differential is rather cavernous. Additionally, these refined considerations are likely to effect each win probability in a similar way. For example, the consideration of San Francisco’s offensive ineffectiveness would certainly lead to an increased likelihood of Atlanta regaining possession, and hence an increase in the value of s. However, the same consideration would also lead to a corresponding decrease in the value or q and increase in the value of r, hence an increase in P. More thoroughly, since both offensive and defensive situations are in consideration for both teams, any asserted advantage for the Falcons would result in (likely comparable) increases in both P and s, while any asserted advantage for the 49ers would result in (likely comparable) decreases in both P and s.

Bringing out the Big Guns

While the preceding section was a pleasant exploration of how a casual observer could approximate these particular win probabilities without assistance, it is a fortunate reality of the current sports analytics climate that we don’t have to rely on such rough, off the cuff calculations. In particular, a statistical model that is ideally suited to the current discussion is the Pro Football Reference (PFR) Win Probability Calculator. The formula takes as input the time remaining in the game, the current point differential for the team in possession, down, distance, and field position, as well the game’s original point spread, and produces as output the win probability for the team in possession. A detailed explanation of the formula can be found here.

For generic teams, as assumed in our heuristic, we input 0 for the point spread, and the calculator produces the following probabilities:

If Atlanta goes for it: Atlanta wins 42.17% of the time.

If Atlanta kicks a FG (and kicks a touchback on the ensuing kickoff):  Atlanta wins 25.61% of the time.

Hey, we did pretty well!

Factoring in that the Falcons were actually 7.5 point favorites in the game, the results are below:

If Atlanta goes for it: Atlanta wins 52.55% of the time

If Atlanta kicks a FG (and kicks a touchback on the ensuing kickoff): Atlanta wins 33.33% of the time.

In both considerations, and for all point spreads in between, a team faced with Atlanta’s predicament is approximately 1.6 times as likely to win the game if they attempt to score a touchdown as opposed to kicking the short field goal. Phrased from the reciprocal perspective, the decision to kick left the Falcons about 40% less likely to win the game. (To clarify, we mean that the team sacrificed 40% of its winning outcomes, not that the difference in win probability was 40%. For example, if a decision dropped your win probability from 10% to 1%, we would say that while the difference in win probability is 9%, you are 90% less likely to win. As we continue our discussion, we will consider both ways of measuring the impact of a decision: difference in win probabilities and ratio of win probabilities.)

The Risk of the Sure Thing

If not informed by a straightforward consideration of conditional win probabilities, what could have led to such a counterproductive course of action for Atlanta?

Imagine a person is encountered with the following the game: he must either risk $2 to win $2 in a coin flip, or he must unconditionally give his opponent $1. Clearly, from a pure expected value perspective, the coin flip is the right choice, as in the long run his wins and losses should roughly balance out, and he would do much better than losing $1 on every round. However, if given the choice only once, he must take into account the volatility of the coin flip, and his personal utility in risking $2 for the sake of improved expected value versus the security of losing only $1. Maybe he really needs that second dollar. Long story short, even the shrewdest of statisticians could not declare his choice to surrender $1 as an objectively bad choice, as it may be the case that his personal utility function is highly intolerant of the increased variance of the coin flip. This is an example where the sacrifice of expected value for the sake of decreased variance is most certainly defensible.

However, imagine a second game, in which the player repeats the trial from the first game ten times, but instead of dollars, we just keep track of points. In each round, the player can either risk two points to win two points on a coin flip, or he can unconditionally surrender one point. After ten rounds, if the player has positive or zero points, he wins $100, but if he has a negative score, he gets nothing. It is very much still the case that for each round, the lower variance decision is to surrender a point. However, the game ultimately only has two outcomes, a win or a loss, and any “conservative” decisions made during the course of the game have no tangible benefit in defeat. Imagine a player who, in each round, resolutely declares, “I’m not much one for risk taking, I’ll just surrender a point.” After ten rounds and his inevitable defeat, he is no better off than he would have been in the remarkably unlikely event that he risked and lost every single coin flip. Sure, his score is -10 instead of a wildly unlucky -20, but that is not relevant to the conditions of the game. He is a loser, 100% of the time. He would be the worst player for this game imaginable.

For a less extreme example, suppose the player flips coins during his first seven rounds, winning four and losing three, leaving him with two points. He recognizes that 0 points is still a winning score, so he surrenders points in rounds 8 and 9, leaving the fate of his game to rest on a round 10 coin flip, a win probability of 50%. Had he instead just flipped all three coins in rounds 8-10, he would have only needed to win one of the three to ultimately win the game, a win probability of 87.5%. In other words, taking the “sure thing” for those two rounds is by far the RISKIER decision. The only conceivable benefit of surrendering points in rounds 8 and 9 was that he avoided his lowest possible score of -4 points that would result from three straight lost coin flips, but remember, avoiding a particularly bad score means absolutely nothing in the context of this game. There are only two true outcomes, and all that matters is optimizing win probability.

To clarify, surrendering the point isn’t always wrong. If the player flips coins in rounds 1-6 and wins four of them, then he stands with 4 points, and he can simply surrender points in rounds 7-10 and guarantee victory. At risk of broken record status, a low variance decision in a two outcome game is not inherently correct or incorrect. What matters, and ALL that matter, is how that decision impacts win probability.

The astute reader can likely predict where this discussion is headed. A game of football, or any other sport whose season is separated into discrete win/loss outcome events, is not like the first scenario. It would be closer to it if the standings at the end of the season were determined by total point differential, but they aren’t. A game of football is like the second scenario, ultimately a two outcome proposition. Those three points the Falcons scored with 3:00 to play in Week 9 are not deposited into some sort of account that could be in any way useful later. A great example of the opposite dynamic is professional golf, where tiered prize money is awarded to each player that makes the cut in each tournament, and it makes perfect sense for a player to make a variance-lowering decision based on personal preference at virtually any time. Football is not golf.

This flawed logic of “conservative” play rears its head most often, though not exclusively, when a team is trailing and considering a low variance decision that will leave them STILL trailing (for example: a field goal when trailing by more than three or a punt when trailing at all; we are not insinuating that these are never appropriate actions, only that these scenarios are disproportionately represented in the collection of bad decisions that teams make in reality). The explanation is simple: expected value being equal, if the variance of possible scores is lowered, then the lack of volatility makes it more likely for the lead to remain where it is. Put another way, if a team is trailing and is not in a situation where a low variance decision can gain them the lead (for example: down by 2, lining up a short field goal), then that team should actually take measures to RAISE variance. The “sure thing” is, in some sense, aptly named: the team is assuring themselves that they remain behind.

Aside from a misguided attraction to “traditional” or “old-school” strategy, a key psychological motivator for these crippling decisions appears to be a paralyzing aversion to a “fatal” outcome, an apparent preference to a slow, drawn out, more certain death compared to a potentially quick, yet less likely, instant execution. Even at the expense of a substantial portion of their win probability, teams often favor the route of “extending the game”. That phrase can be used to denote positive strategic techniques employed by a trailing team to maximize the utility of remaining time, by using timeouts, favoring passes to runs, getting out of bounds, etc., but that’s not how we mean it here. Rather, a trailing team faced with its mortality, perhaps in the form of a manageable fourth down situation, may be overwhelmed by the fear of the fact that if that single conversion is not made, then the game is effectively over, whereas a punt or a field goal is guaranteed to leave some conceivable path to victory, now matter how unlikely, still on the table.

To go back to a coin-flipping scenario. Imagine a player is faced with two options. He can choose to flip one coin, heads he wins, tails he loses, for a clear 50% win probability. Alternatively, he can choose to flip ten coins, and he wins if at least six of the flips are heads, a win probability of 37.7%. In a pure win/loss scenario, the second option is clearly inferior, but it avoids the player from staring down the barrel of an immediate determination of his fate. NFL teams take this second option, a lot.

 

Does it Get Any Worse?

Here is a rough transcript of a conversation between me and Daniel Garver, close friend, EPA scientist, and sports analytics enthusiast, shortly after the Falcons-49ers game in question.

Daniel: I’m trying the think of comparable examples of this kind of decision in other sports…

Alex: Here’s one for basketball – my team is down by 3 with 7 seconds left, and I miraculously steal an inbounds pass and get a breakaway opportunity. Instead of pulling up for a wide open three, I go in and dunk.

Daniel: Yeah, that’s a good one. I was going to say it’s like having the bases loaded, 1 out, Miguel Cabrera at the plate, and running a squeeze play.

Alex: Exactly, except you’re forgetting some key details: It’s the bottom of the ninth and YOU’RE DOWN BY 2!

Daniel: These definitely seem like scenarios that have a greater negative impact than the Falcons play today, but we’ve already probably crossed in to the realm of ‘things that would never actually happen’.

Daniel is likely correct about these examples, and some lengthy consideration should convince the reader that, at least among major team sports, the discrete nature of football and baseball leave them more susceptible to purely human decision (as opposed to execution)-based win probability swings. Among those two, the time constraint and fourth down decision components of football contribute to an increased prevalence of the aforementioned “fatality risk” scenarios that can psychologically warp the competitors into counterproductive decision making. Putting all of these factors together, it is reasonable to think that the most extreme, genuinely conceivable examples of decision-based reductions in win probability are likely to occur in football. So what can we come up with? How bad can one decision really be?

For the remainder of this section, we will not consider specific teams or personnel, and all appeals to the PFR calculator will include a 0 point spread.

An Extremized Falcons-49ers Scenario:  This entire post was inspired by the fact that the Falcons decision against the 49ers was in some sense perfectly ill-informed, making it inherently hard to top, even hypothetically. However, we can certainly take the core spirit of this scenario, and push some of the details to the extreme.

For example, would the Falcons’ decision-making process have been considerably different if they had been at the 1 yard line instead of the 2? Analytically speaking of course this is a big difference, but we have already established that the Falcons were clearly motivated by fallacious, “game-extending”, anti-analytic logic, and it is conceivable that this small perturbation would not swing them toward the light.

In a similar vein, what if there had been slightly less time? The Falcons likely took into account that they had the two minute warning ahead of them to supplement their two remaining timeouts. Is it possible that this perceived security blanket would have still felt sufficient if there had been, say, 2:10 remaining instead of 3:00? After all, that still leaves time for a field goal, a touchback, and a 49ers first down play before the two minute warning. Maybe the reduced time would change Atlanta’s mind, but it’s conceivable that it wouldn’t.

Here’s what the PFR calculator has to say:

Falcons down 4, 4th and goal from the SF 1, 2:10 remaining.

Go for it: win probability 50.19%

Kick a FG (and a touchback): win probability 13.53%

Win Probability Difference: 36.66%

Portion of Winning Outcomes Sacrificed: 73.04%

Note: The PFR calculator does not have a parameter for remaining timeouts, which is admittedly important for extremely late game scenarios, but not THAT important. In particular, it would still be the case that one first down for the 49ers ends the game.

Just Out of “Range”: Suppose Team A is down by two points, facing a fourth down and 1 on Team B’s 40 yard line. A potential go-ahead field goal would measure 57-58 yards, and a miss, or a failed fourth down conversion attempt, would result in a time, possession, and field position predicament that, even with a timeout or two and the two minute warning, could be perceived as nearly fatal. However, given those available time stoppages, pinning Team B deep into its own territory with a punt doesn’t seem quite as dire. The potential would remain to get a quick stop, regain possession with good field position, and set up an easier field goal try. This is a classic “extend the game” logical fallacy, and this is genuinely something that an NFL team might do.

Of course, whether the ideal course of action would be to kick a field goal or go for a first down, likely to set up a shorter field goal attempt upon success, is highly dependent on Team A’s kicker, but for the generic situation, the PFR calculator produces the following win probabilities for the various situations, depending on the success of Team A’s punt. What we find is that, in addition to a super deep punt being very difficult, it doesn’t make a huge impact in this scenario.

Team A: 4th and 1 on Team B 40, 2:10 remaining

Win Probability: 39.83%

Team A: Punts to the Team B 1

Win Probability: 12.41%

Team A: Punts to the Team B 10

Win Probability: 11.35%

Team A: Punts into the endzone for a touchback

Win Probability: 10.25%

Win Probability Difference: 27.42-29.58%

Portion of Winning Outcomes Sacrificed: 68.84-74.27%

Note: In both of the scenarios outlined in this section, the negative impact of the decision would be significantly greater if made with less remaining time, but in trying to stay in the realm of feasibility we use the two-minute warning as a convenient delineation, as its presence often serves as fallacious bait for a “game-extending” decision.

While the parameters of this discussion are fairly nebulous, and the discussion remains widely open, it may indeed be the case that the decision made by the Falcons was not only an analytical error, but in fact quite close, just some small detail perturbations away, from the WORST POSSIBLE purely human decision-based mistake that a team in any sport could conceivably make, outside of intentional sabotage.

To contrast with what we are going for here, let’s discuss a recent, much-derided NFL decision, namely the Seahawks’ goal line play call in the closing moments of Super Bowl XLIX. In particular, there was no “outer layer” decision to be made in that scenario, as it was only second down, the Seahwaks trailed by 6, and it was a foregone conclusion that the Seahawks would attempt to score a touchdown. As for the “inner layer” decision to call a pass play, most everyone from the viewers, the announcers, even the Seahawks players were in agreement that this was not optimal, particularly with two downs and a timeout remaining and an elite short yardage running back. However, the perception of this decision as one of the worst in NFL history is almost completely informed by the result of the play, an interception, which was by any objective measure unlikely. If one were to perform the (presumably difficult) analysis to determine the win probability for Seattle conditioned on calling a passing play on that second down, versus the win probability for Seattle conditioned on handing the ball to Marshawn Lynch on that second down, it is likely that both probabilities would have been quite high, well over 50%, and that their difference would be quite small. While the stage and the outcome magnified what was probably an error, it was not an analytical mistake to anywhere near the degree of the others outlined here.

How do these things still happen?

In case the point hasn’t been made clearly enough, this decision made by the Falcons is in no way isolated. NFL teams make these kind of mistakes A LOT, most notably in fourth down situations. A great source of this data is the New York Times 4th Down Bot, which breaks down all of the possessing teams options in every fourth down situation encountered, including discussions of probability of success for fourth down conversions and field goals, and win probabilities before and after each potential decision.

While we can continue to dissect the psychological and historical reasons for these errors, the question remains: How do these things still happen, with such frequency, in a post-analytics explosion 2015? Thousands of man hours, millions of dollars, and numerous graduate dissertations have been poured into the effort of perfectly evaluating personnel, projecting their performance and value, and building the ideal roster based on complicated financial constraints. If a major league baseball player is deemed to be worth one additional win over the course of 162 game season, his market value salary could see a sharp increase. However, it may well be the case that simply having a reasonably statistically inclined intern with an iPhone on an NFL sideline is worth upwards of 1 win per season, the equivalent of 10 wins in major league baseball. Moreover, if teams went the extra mile in employing full-time in-game analytics experts, perhaps with their own proprietary, further advanced and nuanced statistical models, the same way teams do for personnel and other operations, the impact could be equivalent to that of a pro bowl level player, assuming the coaching staff would consistently heed their advice.

Leave a Reply

Your email address will not be published.