I'm a political junkie and I spend way too much time following the polling results on FiveThirtyEight's election forecast.

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

#### Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

- the name of the state
- the chance that Hillary Clinton would win the state
- the number of electoral college votes for that state

I recorded this data as a list of 3-tuples:

```
probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)
]
```

Note that some states (like Maine) apportion electoral votes in a weird way:

```
probabilities :: [(String, Double, Int)]
probabilities =
...
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
...
]
```

Maine apportions two of its electoral votes based on a state-wide vote (i.e. `"Maine"`

in the above list) and then two further electoral votes are apportioned based on two districts (i.e. `"Maine - 1"`

and `"Maine - 2"`

. FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

#### Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

```
>>> 2 ^ length probabilities
72057594037927936
```

... or approximately `7.2 * 10^16`

outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!

#### Distributions

Fortunately, we can do *much* better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a `Vector`

of `Double`

s:

```
import Data.Vector.Unboxed (Vector)
newtype Distribution = Distribution (Vector Double)
deriving (Show)
```

This `Vector`

will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a `Double`

representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the `Vector`

) must sum to `1`

.

For example, if the `Distribution`

were:

`[1, 0, 0, 0, 0 ... ]`

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the `Distribution`

were:

`[0, 0.5, 0, 0.5, 0, 0, 0 ... ]`

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

- Split the 50 states into two sub-groups of 25 states each
- Compute an electoral vote probability distribution for each sub-group of 25 states
- Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

```
import qualified Data.Vector.Unboxed
combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)
```

The `combine`

function takes two input distributions named `xs`

and `ys`

and generates a new distribution named `zs`

. To compute the probability of getting `i`

electoral votes in our composite distribution, we just add up all the different ways we can get `i`

electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

- We get 0 votes from our 1st group and 4 votes from our 2nd group
- We get 1 votes from our 1st group and 3 votes from our 2nd group
- We get 2 votes from our 1st group and 2 votes from our 2nd group
- We get 3 votes from our 1st group and 1 votes from our 2nd group
- We get 4 votes from our 1st group and 0 votes from our 2nd group

The `probabilityOfEachOutcome`

function computes the probability of each one of these outcomes and then the `totalProbability`

function `sum`

s them all up to compute the total probability of getting `i`

electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

```
empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
where
makeElement 0 = 1
makeElement _ = 0
```

This distribution says that given zero states you have a 100% chance of getting zero electoral college votes and 0% chance of any other outcome. This empty distribution will come in handy later on.

#### Divide and conquer

There's no limit to how many times we can subdivide the problem. In the extreme case we can sub-divide the problem down to individual states (or districts for weird states like Maine and Nebraska):

- subdivide our problem into 56 sub-groups (one group per state or district)
- compute the probability distribution for each sub-group, which is trivial
`combine`

all the probability distributions to retrieve the final result

In fact, this extreme solution is surprisingly efficient!

All we're missing is a function that converts each entry in our original `probabilities`

list into a `Distribution`

:

```
toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
where
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0
```

This says that if our probability distribution for a single state should have two possible outcomes:

- Hillary clinton has probability
`x`

of winning`n`

votes for this state - Hillary clinton has probability
`1 - x`

of winning`0`

votes for this state - Hillary clinton has 0% probability of any other outcome for this state

Let's test this out on a couple of states:

```
>>> toDistribution ("Alaska" , 0.300, 3)
Distribution [0.7,0.0,0.0,0.3,0.0,0.0,...
>>> toDistribution ("North Dakota", 0.070, 3)
Distribution [0.9299999999999999,0.0,0.0,7.0e-2,0.0...
```

This says that:

- Alaska has a 30% chance of giving Clinton 3 votes and 70% chance of 0 votes
- North Dakota has a 7% chance of giving Clinton 3 votes and a 93% chance of 0 votes

We can also verify that `combine`

works correctly by combining the electoral vote distributions of both states. We expect the new distribution to be:

- 2.1% chance of 6 votes (the probability of winning both states)
- 65.1% chance of 0 votes (the probability of losing both states)
- 32.8% chance of 3 votes (the probability of winning just one of the two states)

... and this is in fact what we get:

```
>>> let alaska = toDistribution ("Alaska" , 0.300, 3)
>>> let northDakota = toDistribution ("North Dakota", 0.070, 3)
>>> combine alaska northDakota
Distribution [0.6509999999999999,0.0,0.0,0.32799999999999996,0.0,0.0,2.1e-2,0.0,...
```

#### Final result

To compute the total probability of winning, we just transform each element of the list to the corresponding distribution:

```
distributions :: [Distribution]
distributions = map toDistribution probabilities
```

... then we reduce the list to a single value repeatedly applying the `combine`

function, falling back on the `empty`

distribution if the entire list is empty:

```
import qualified Data.List
distribution :: Distribution
distribution = Data.List.foldl' combine empty distributions
```

... and if we want to get Clinton's chances of winning, we just add up the probabilities for all outcomes greater than or equal to 270 electoral college votes:

```
chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
where
Distribution xs = distribution
main :: IO ()
main = print chanceOfClintonVictory
```

If we compile and run this program we get the final result:

```
$ stack --resolver=lts-7.4 build vector
$ stack --resolver=lts-7.4 ghc -- -O2 result.hs
$ ./result
0.9929417642334847
```

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

```
>>> let Distribution xs = distribution
>>> Data.Vector.Unboxed.sum xs
0.9999999999999994
```

**Exercise:** Expand on this program to plot the probability distribution

#### Efficiency

Our program is also efficient, running in 30 milliseconds:

```
$ bench ./result
benchmarking ./result
time 30.33 ms (29.42 ms .. 31.16 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 29.43 ms (29.13 ms .. 29.81 ms)
std dev 710.6 μs (506.7 μs .. 992.6 μs)
```

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.

#### Monoids

People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the `Monoid`

type class. In fact, many divide-and-conquer solutions tend to be `Monoid`

s of some sort.

The `Monoid`

typeclass is defined as:

```
class Monoid m where
mappend :: m -> m -> m
mempty :: m
-- An infix operator that is a synonym for `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y
```

... and the `Monoid`

class has three rules that every implementation must obey, which are known as the "`Monoid`

laws".

The first rule is that `mappend`

(or the equivalent `(<>)`

operator) must be associative:

`x <> (y <> z) = (x <> y) <> z`

The second and third rules are that `mempty`

must be the "identity" of `mappend`

, meaning that `mempty`

does nothing when combined with other values:

```
mempty <> x = x
x <> mempty = x
```

A simple example of a `Monoid`

is integers under addition, which we can implement like this:

```
instance Monoid Integer where
mappend = (+)
mempty = 0
```

... and this implementation satisfies the `Monoid`

laws thanks to the laws of addition:

```
(x + y) + z = x + (y + z)
0 + x = x
x + 0 = x
```

However, `Distribution`

s are `Monoid`

s, too! Our `combine`

and `empty`

definitions both have the correct types to implement the `mappend`

and `mempty`

methods of the `Monoid`

typeclass, respectively:

```
instance Monoid Distribution where
mappend = combine
mempty = empty
```

Both `mappend`

and `mempty`

for `Distribution`

s satisfy the `Monoid`

laws:

`mappend`

is associative (Proof omitted)`mempty`

is the identity of`mappend`

We can prove the identity law using the following rules for how `Vector`

s behave:

```
-- These rules assume that all vectors involved have 539 elements
-- If you generate a vector by just indexing into another vector, you just get back
-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
-- If you index into a vector generated by a function, that's equivalent to calling
-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
```

Equipped with those rules, we can then prove that `mappend xs mempty = xs`

```
mapppend (Distribution xs) mempty
-- mappend = combine
= combine (Distribution xs) mempty
-- Definition of `mempty`
= combine (Distribution xs) (Distribution ys)
where
ys = Data.Vector.Unboxed.generate 539 makeElement
where
makeElement 0 = 1
makeElement _ = 0
-- Definition of `combine`
= Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)
ys = Data.Vector.Unboxed.generate 539 makeElement
where
makeElement 0 = 1
makeElement _ = 0
-- Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
= Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* makeElement (i - j)
makeElement 0 = 1
makeElement _ = 0
-- Case analysis on `j`
= Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j
| j == i =
Data.Vector.Unboxed.unsafeIndex xs j
* 1 -- makeElement (i - j) = makeElement 0 = 1
| otherwise =
Data.Vector.Unboxed.unsafeIndex xs j
* 0 -- makeElement (i - j) = 0
-- x * 1 = x
-- y * 0 = 0
= Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j
| j == i = Data.Vector.Unboxed.unsafeIndex xs j
| otherwise = 0
-- Informally: "Sum of a vector with one non-zero element is just that element"
= Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i = Data.Vector.Unboxed.unsafeIndex xs i
-- Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
= Distribution xs
```

**Exercise:** Prove the associativity law for `combine`

#### Conclusion

I hope people find this an interesting example of how you can apply mathematical design principles (in this case: `Monoid`

s) in service of simplifying and speeding up programming problems.

If you would like to test this program out yourself the complete program is provided below:

```
import Data.Vector.Unboxed (Vector)
import qualified Data.List
import qualified Data.Vector.Unboxed
probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)
]
newtype Distribution = Distribution { getDistribution :: Vector Double }
deriving (Show)
combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
where
zs = Data.Vector.Unboxed.generate 539 totalProbability
totalProbability i =
Data.Vector.Unboxed.sum
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
where
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)
empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
where
makeElement 0 = 1
makeElement _ = 0
instance Monoid Distribution where
mappend = combine
mempty = empty
toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
where
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0
distributions :: [Distribution]
distributions = map toDistribution probabilities
distribution :: Distribution
distribution = mconcat distributions
chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
where
Distribution xs = distribution
main :: IO ()
main = print chanceOfClintonVictory
```

Nice article! I wrote a follow-up: Electoral vote distributions are polynomials

ReplyDeleteThis suggests the type

ReplyDelete"newtype RV a = RV [(a,double)]"

modelling a random variable of type a, which conveniently also is a Monad thereby allowing conditional probabilities.

If each State's vote is Vn, represented by "RV Int", the total distribution according to probability theory is V1 + V2 + ... + Vn, which of course is exactly what you derived in your post.