Previously, I dove a bit deeply into data from FiveThirtyEight’s Facebook Primary to suggest that, properly adjusted to USPD, it predicts a very significant upset for Bernie Sanders in Michigan today. In fact, there isn’t a single county outside Wayne County (Detroit and suburbs) where the model predicts a victory, or even a close contest, for Clinton. This includes Genessee (Flint)! The model, furthermore, predicts only a very slight edge for Clinton in Wayne County with suburban and non-black voters making up for an approximately an 80-20 preference for Clinton by African Americans in Detroit.
For now adjusting 538’s Facebook Primary to USPD has fairly accurately predicted the outcome between Clinton and Sanders at the county level 97% of the time^ for counties with more than 1000 voters.
^86% of the time the model predicted the right county level winner between Bernie Sanders and Hillary Clinton. An additional 11% of races were predicted by the model to be tight, but the model picked the wrong eventual winner.
Since “Can the Facebook Primary Predict Michigan’s Democratic Primary?” I’ve gone yet deeper into the Facebook Primary weeds, comparing Detroit (Wayne County) very carefully vis-a-vis Fulton County (Atlanta) and Suffolk County (Boston), along with using the data we have on African Americans going for Clinton at rates of three, four, five, and often even six to one. This included analyzing all the available 538 Facebook data at the zip code level versus what zip code levels looked like in Atlanta, especially the southwestern portion, for the Facebook Primary versus the actual county level results.
Last night and today, I began looking at 100 counties all told from states that have already voted and that we have county level results from versus what what Facebook Primary data, adjusted by USPD, would have predicted. I am just over 70% done, but patterns are evident already. So here is a sketch of preliminary results.
The county level results for states that have already voted appear to be around 97% accurate. In Michigan, Clinton is losing every county outside Wayne County when the Facebook Primary is adjusted by USPD. Wayne County is too close to call with a slight edge for Clinton (+137.5 USPD for Clinton to +134.7) USPD for Sanders.
Clinton is definitely winning by a wide margin in predominantly black areas in Detroit (almost the entire city save “Mexicantown,” some mixed black-white neighborhoods downtown, and along the western edge of Detroit where it bleeds into suburbia less cleanly than Eight Mile to the north). Non-black voters in the Detroit suburbs of Wayne County are going for Sanders at a rate high enough to make the results almost dead even. Do I think the suburbs will show up at the polls at the same rate as Detroit itself?
Not a chance.
Clinton will win Wayne County, perhaps pick up a county or two where there are fewer than a 1,000 voters, pull off an upset (per these tabulations) in Genessee if she is having a great night, and will lose in virtually every other county. Of course, this may turn out to be nonsense on stilts and a perfect example that even very large datasets can be horribly skewed in a way that analysis simply cannot untangle. If Sanders loses by more than about 5 points tonight, that is the case.
The model would predict a 5 point loss for Sanders if A) African Americans are more than 30% of turnout B) vote or Clinton at a 6:1 ratio and C) the rest of Michigan goes for Sanders, but only at a rate similar to Nebraska rather than New Hampshire, Kansas, or somewhere in between Nebraska and New Hampshire. This is a live question for various reasons including where Michigan is situated at the mid-point of the U.S. economy for the 4th quarter of 2015 versus Nebraska (top 10), New Hampshire (top 20), and Kansas (bottom 10).
The basics of the data: of the 71 counties I’ve analyzed, the Facebook Primary adjusted by USPD, accurately predicted who would win in 57 of them for a conservative, 80% ratio of being correct. In another 8 cases, the model suggested the county level race would be very close but picked the wrong winner. Of the 14 wrong picks, an additional 5 are for counties where less than 1000 (sometimes even less than 100) people voted, meaning the sample sizes for the model are just far too small as well as statistically not very relevant for projecting the winner on a statewide basis.
Counting the 8 predictions of a close contest (wrong winner) as “nearly right,” and eliminating the 5 tiny vote counties, the model has an accuracy rate of 64 out of 66 counties for 97%! I have some questions about both instances where the model picked wrongly. In one of them (< 1200 voters), there was either a clerical error by FiveThirtyEight or the use of rounding means the model actually picked the right winner, even if it said the race was going to be close where it wasn’t. In the second instance, a smallish (~2500 voters), heavily Latino (42.5%) county picked Clinton by 5% where the model predicts a modest Sanders victory. I am tempted to say the model is right virtually 100% of the time with some caveats for not applying to counties with less than 1000 voters and allowing for statistical anomalies. It’s been right or nearly right 100% of the time so far for counties I’ve analyzed with less than 2500 votes. Still, I will stick with “right or nearly right 97% of the time” for now. We’ll definitely know more late tonight!
Note 1: I picked what counties to use somewhat at random but also making sure to include some major cities in each state along with very small counties, places that Clinton won (if any), places that Sanders won (if any), ties or near ties, and geographically diverse counties within each states.
Note 2: I will definitely have a full write up if Sanders upsets Clinton, overturning FiveThirtyEights >99% chance that Clinton wins. This write-up would include the names and associated numbers for each of the counties. If this is all wishy wishy dream crap on my part – totally possible, maybe even probable – I’ll likely just fold up shop here in terms of Sanders’ postings. If he loses by more than he lost in Massachusetts, there is no hope. And it’s hard for me to see a path to victory even with a tie.
Note 3: I’ve noted the floor as a 5% Clinton victory for the model to have any chance at being worthy, with some fancy work, it might be 10%. The ceiling? 20-25% or more. *ducks* (not a chance)