arrow-circle arrow-long-stroke arrow-stroke arrow-thick arrow-thin arrow-triangle icon 2 baseballCreated with Sketch. basketball calendar category check-circle check-square check comment facebook-circle facebook-icon facebook-rounded facebook-square facebook-stroke football instagram-circle instagram-icon instagram-square long-arrow-right rss-circle rss-rounded rss-square rss-stroke rss twitter-circle twitter-icon twitter-rounded twitter-square twitter-stroke user-group user

TigerEye Review – Week 4

By on September 27th, 2017 in Football, News 14 Comments »
Week4_TEV_title

Massaging the Outlier Data Point Edition

The bane of all statistical research is the outlier data point. Much like the sinking of the Titanic, researchers throw a huge number such as 1500+ passenger-liner lives lost to an iceberg collision in an era with historically low collision incidents (they peaked in the 1870’s–80’s just before the advent of steam power). And passenger liner losses wouldn’t be anywhere close to that figure until later decades when the number of incidents of lives lost increased due to an rapid increase of affordable passenger traffic (NOT counting the WWI German unrestricted U-boat campaign—a separate data silo). Yet, the story of that event remains fresh, inspiring everything from good books to bad movies to seemingly incessant Internet memes. That one event has stayed amazingly present in our national psyche. Simply mention the word “Titanic” in a conversation a full 105 years later, and everyone listening immediately assumes you’re talking about the proper name and not the word as an adjective.

Likewise, try as you might to avoid such an event in any analysis of data, one or several troubling outlier points remain, gleefully taunting the amateur statistician, fans and professional bean counters alike. But in my experience, you dismiss these singular points outside the comfortable mean at your peril. They still happened, and if you’re serious about the craft, you need to find a way to account for them.

So what do we do with Missouri’s ten-touchdown day against Southern Missouri State and its three total touchdowns in games since then? How do we handle Vanderbilt’s outstanding defensive numbers prior to that brutal chainsaw massacre of an Alabama game? And what do we make of South Carolina’s numbers on both sides of the ball vis a vis its W–L record? Or even Auburn’s outstanding yards per play—except for the dismal 1.2 YPP of the Clemson game?

In my mind, you can’t dismiss any of it. It all goes into the analysis as raw numbers for the very simple reason that it occurred. However, understand that once you start down the path of selecting data under “valid’ or “invalid” criteria, you leave the path of real analysis and start down one of prejudicial distortion. However good your intentions, you’ll never recover objectivity and clear interpretation unless all the data is collected and included in how you view the subject.

What is implied in reviewing early data is a brief skewing of any analysis in the short term. If you’ve noticed in my previous posts I’ve been careful to use words and phrases like “if this continues” or “if this is accurate” and so on. This is with the understanding that we won’t have enough data to completely understand the season until it ends. In the meantime (no pun intended), what we collect is a slowly normalizing set of data points, outliers included, that will point with increasing certainty to the real interpretation as we approach mid-season and beyond. By season’s end, the outlier will cease to be a problem or to skew the data to the large degree it did the first few weeks.

So, when is the tipping point? In previous years, I usually didn’t get a strong feel for where things were going until midway in the season and conference play. Right about game six ,overall, and games three and four in-conference, you can start give credence to what the analysis is showing. Some early data will be confirmed and verified as valid from the start. Some outlier Titanic-like numbers will be recognized only in retrospect for what they were. The proper way to treat early is to acknowledge it and wait and see how it pans out. Now that we’ve reached the first third of the season mark, the next couple of weeks will, hopefully, point out what is what going forward.

By the numbers – what a championship level SEC team should be:

Week4_TEV_STDs

Week4_TEV_more

Strong reactions to outliers are often based purely upon perspective

The SEC West

Offense

Week4_TEV1

There is movement, and not all of it is upward. Auburn is improving game by game, but those Clemson game numbers will take some time to overcome. A couple of more strong showings in the right direction will move us up. Likewise, those early Mississippi State numbers might also be outliers yet to be impacted by better data. Or maybe the Georgia game was the outlier. The same goes for those troubling Alabama red zone issues. They might turn out to be nothing to worry about, but they’ve still only had 11 TD’s in 19 trips to the red zone, against more than just that now 0–2 FSU defense.

But it is enough to drop Alabama’s offense just one tick off of the elite rating. This means that in the entire SEC West, we have only “average” offenses – Alabama and Texas A&M plus five other squads struggling to normalize their numbers in key areas. Auburn is indeed leading all SEC teams in third-down conversions, but our yards per play are dismal, and points per game and TD’s in the red zone haven’t recovered from Week 2.

Defense

Week4_TEV2

Defensively, Auburn took a hit with those two garbage time Mizzou TD’s in the red zone. But again, I cannot exclude them. We’ll have those 2nd-team guys in late in the season in critical games, and to dismiss their performance here would be a mistake if this is a true trend. So in they stay, and Auburn goes down a peg from the top. Likewise Alabama continues to prove its worth—Vandy never got past mid-field, despite all its improvement from last year to this, it was still an Alabama–Vandy game such as every other time they’ve met.

As for the rest of the division, who knows? Every other team has issues in some facet of defensive performance, and from the look of it, it doesn’t look at all like any team but the top three have anything approaching stability. Performances are seemingly all over the place as the numbers come in.

The SEC East

Offense

Week4_TEV3

Stability in the East is a curse as offensive performance so far has been consistently bad. No one east or north of the state of Alabama is showing anything like progress towards good numbers in any facet of the game. Georgia may score in the red zone, but everything else it is doing is pedestrian. Florida also shows great numbers in the red zone, but that’s only because it only made it there four times in three games. All the rest of its scoring has been on big plays that rival the prayer at Jordan-Hare and the Kick Six in surprise levels. And 50% of its scores are from its own side of the field in the last minute of play in two games. Amazing, yes, but that just can’t continue to be a thing. Someone at some point is going to start playing these guys hard on defense.

Defense

Week4_TEV4

That being said, it might have to wait until the Florida–Georgia classic game. No one else is showing signs of life or ability in the SEC East on defense. All six teams that are not based in Athens have severe numbers issues in the first four games of the season, even if their W–L records may belie it. These teams are not fully functional in all aspects of the game, with yards per play and scoring being the biggest areas of concern across the board.

By these numbers, Georgia should have the division locked up, and its new top-ten ranking is a reflection of this. But that Florida jinx in the Jacksonville game is a hard one to overcome if history is any judge. It remains to be seen if the Gator luck holds out for that game, and if the division race in the East is still on.

The State of the Conference

Week4_TEV_state

…is fast leaning to being just the state of Alabama or Georgia. With Auburn on the move offensively and catching momentum, we could be well in the mix for the championship with games against the only two elite teams in the conference in the late season Amen Corner. The other “maybe” teams of Mississippi State, LSU and TAMU have all shown some disturbing trends that are still surfacing as we progress in the season. You can’t dismiss them all as outliers when they happen this often, game after game.

It’s still early enough for them and Auburn to right the ship and play like champions on both sides of the ball, but the performance slope is steepening as the season progresses. With outliers being what they are, you can never fully count them out. Neither can you count ON them going forward. Getting a 72-point performance out of your offense in conference play is just not that common. Not saying it can’t happen, but there is just less and less impact such a game will have as this data builds over the next two weeks.

Week4_TEV_end

No matter what Lady Macbeth might say, keep your hands off this and wait for the numbers to settle out.

14 Comments

  1. ATL_AU_FAN ATL_AU_FAN says:

    Good stuff as always, Sully!
    Now, if only my brain could keep up.

  2. Acid Reign Acid Reign says:

    …..And too, the level of competition in non-conference games is wildly inconsistent across different schedules. Most teams have only played one SEC game, a handful two. I enjoy looking at those tables!

  3. WarSamEagle WarSamEagle says:

    Love these articles but afterwards I be like…..

  4. Pine Mt Tiger Pine Mt Tiger says:

    “Some early data will be confirmed and verified as valid from the start.”

    Here’s to hoping Auburn’s later data will look more like Mizzou than Clemson!

    Thanks for the great post Sully.

  5. neonbets says:

    With regards to outliers, the problem as I see it is that unweighted inclusion of outliers is fine in a macro sense–i.e. after all the results are in. But outliers cause massive distortions when included along the way as you look at each game.

    For example, your 38.5 ppg benchmark does factor epic outlier-blowouts, so I get why you are including them for this season’s analysis. But the 38.5 ppg also represents a figure after the results are in, so it’s really a macro figure.

    When we use it as a benchmark metric each week (micro) it’s quite misleading.

    Look at it another way—even in the confines of your model–a 70 point win in week 2 has an exponentially more dramatic impact than the same70 point win in week 10. In using this model on a week-to-week manner –whether you like it or not– the week of play is probably the most important statistic. Not ppg, pa, nor anything else.

    Medians would help, but for the small sample sizes. It seems that incorporating point spreads and making weighted adjustments off them would really help.

    Nonetheless, keep up the great work, Sully. I really enjoy your thoughtful presentations.

    • sullivan013 sullivan013 says:

      The benchmarks in the opening table are from the last 10 years of the SEC championship teams and include certain key performance indicators that I found to model poor, good, above average and championship level teams accurately in the last decade. I started this back in 2013 with my first Cafe Malzahn article – (http://trackemtigers.com/dining-on-elephant-at-cafe-malzahn/), and have continued to tweak it over time. While not entirely analytic or scientific in nature, I chose these particular indicators because they seemed to be the most relevant and easy to analyze of the wealth of data one could collect and review. I’m sure there are much more accurate and insightful models out there, but this is the one I’ve chosen to follow.

      I’ve mentioned over the last couple of years that the overall numbers are dropping season to season, but I’ve purposely kept those benchmarks at that level to gauge those very trends. While that may seem too arbitrary, I still think it’s important to the discussion of where the conference is going and how each team ranks relative to each other and the recent past.

      • ATL_AU_FAN ATL_AU_FAN says:

        Sully, your KPI stuff, I think, is spot on. The metrics upon which the KPI’s are measured is what changes.

      • neonbets says:

        Don’t get me wrong–the benchmarks are very interesting. I’m simply responding to the issue you raised about what to do with week-to-week outliers.

        You come across a 10 touchdown explosion from Missouri–what do you do with it? As you stated, you dismiss these singular points outside the comfortable mean at your peril. They still happened, and if you’re serious about the craft, you need to find a way to account for them.

        Your problem–as I see it–is what do you do with a 10 touchdown outlier in week 2. It skews all your data. So much so, in fact, including that result renders analysis of Missouri in week 2 meaningless. But like you said–you can’t just dismiss it, either.

        Your problem–as I see it–you’re not comparing SEC champs as they happened to fare in the week of review. [In other words, we’re reviewing Week 4 so what was the ppg average of the typical SEC Champ in Week 4? Was it at 38.5 ppg? I doubt it. Yet you’re stuck analyzing a single Week 4 performance against 10 years of 12 week averages.

        Again, I’m not being critical, I enjoy your work. The outliers in a small sample size turn this into a conundrum.

        • neonbets says:

          Sorry for muddled paragraphs. I was interrupted while writing.

          • Tiger Tiger says:

            Don’t worry about muddling up your paragraphs, neon. They’re nothing we aren’t already used to seeing. Haha.

        • sullivan013 sullivan013 says:

          I understand, but my view is this:

          I’m reminded of the two methods of land navigation taught at the Infantry Center at Fort Benning to officer candidates. One by the cadre of the Infantry Center and the one re-taught by myself and other TAC Officers at Officer Candidate School.

          On the one hand, the Infantry Center cadre taught students to alternate which direction they went whenever they sidestepped an obstacle while following an azimuth for several hundred yards in a southern pine forest. That way, when the candidate reached the end of their pace count, they would be in a smaller error circle of doubt if the point they were searching for wasn’t there.

          To us TAC officers, this was absurd. The error could be a full 360 degree circle from where the candidate’s pace count stopped and any search of that area, especially at night, would result in them NEVER FINDING IT.

          Our solution was to sidestep in the same direction each time (to the right) so that at the end of their pace count, the error was in a known direction – to the left and forward (pace counts are nearly always short), thereby limiting the search in one direction for the candidate.

          The same applies here in my opinion. The known error is that the ideal is the standard and the early data will be somewhat skewed. Seeing what Alabama did in 2012 or Auburn in 2010 against their second opponent of those seasons doesn’t seem to be that informative to me. That only means I’d be comparing skewed data to potentially other skewed data from those seasons, rather than a ‘golden standard’ established by what exceptional teams did for full seasons.

          I just feel this is a better guide to judge team performance week to week.

          • neonbets says:

            Thanks for taking the time to respond, Sully. These posts you create are hard work, and the last thing you need is to spend even more time explaining your philosophy. My comments weren’t made to criticize your method, but rather they were made in response to what I thought was your challenge/frustration in what to do with outliers. I hope I didn’t give you the wrong impression.

            I know it’s not for everybody, but I just find this stuff to be so interesting (and confounding). Thanks again for your work.

          • dyingculture dyingculture says:

            When a 2LT finds himself lost on the Land Navigation course he knows there is only one man who can save him: his NCO.

Post A Comment

You must be logged in to post a comment.