Selection Bias in Poker

In venture we often talk about the selection bias (both positive and adverse) in many areas—whether the companies in a specific incubator or if you offered blanket deal terms to every employee in a company, or any number of other scenarios. But because we rarely quantify them, it’s hard to have know exactly what their impact is—or have an intuitive feel for how to calibrate and adjust for them.

Was bored one night, and thinking about other areas with selection bias that’d be fun to take a look at. Preferably ones with quantifiable data to look at. And poker came to mind. We all have an intuitive sense that hands that get played in poker are better than the average hand dealt out. But most casual players likely can’t estimate how much better one should expect hands not folded to be.

I looked at a dataset of ~7k hands of poker played, and focused on the hole cards (the two cards dealt to the player that only they can see). I wanted to see how the distribution of hands that players got differed from the distribution of hands they had where they stayed in at least until the next round of cards was dealt (versus those hands where they immediately folded). You can see data source and methodology at bottom of this essay. To be clear, lots of reasons this dataset shouldn’t be taken as generalizable and precise statistics. But for my purposes, illustrative enough.

Of the ~7k hands in 26% of them the player stayed in and didn’t immediately fold. The question is how do these hands that are kept vs folded differ.

Below is the percentage occurrences of each card rank as well as the percentage occurrence of each card rank among hands kept. As you would expect, all the ranks are dealt roughly equally. However, there is a wide range among hands kept—with higher value cards showing up significantly more than lower value cards. The Ace is kept over five times more than the Two is kept.

 

This analysis is not particularly useful because we don’t think of each hole card in isolation. After all, having a matching pair of cards can be far more valuable than two higher but non-matching cards.

Instead let’s look at some common types of desirable hands. Having face cards is great as is having a matching pair of cards. Even better is having a pair of face cards. Of course there are other attractive hands like a flush or straight draw—but for simplicity we’ll focus on face cards and pairs.

Below is the probability that if the player had one of these combinations of cards—they would then play it to at least the next round rather than fold.

Again these results are not surprising but help quantify our intuition. If a player has a pair of face cards they are virtually guaranteed to not fold. In our data set a pair of face cards was dealt over a hundred times—and only once did the player fold them. Similarly, when the player had two face cards or a matching pair of cards they played them over 75% of the time. On the other hand if they didn’t have these—they were far less likely to keep their hand.

 

Most interesting is looking at the distribution of each of these types of hands among all hands dealt—compared to the distribution among hands kept.

While there is only a 13% chance for a player to get two face cards or a pair. Among hands played there is a 40% chance it’s two face cards or a pair. Let that sink in. Even though there is a very low chance of someone drawing a pair or two face cards. There’s almost even odds that anyone that doesn’t fold has a pair or face cards. If you’re playing with more than one person the chance definitely becomes greater than 50% odds that at least one player has it.

Again, nothing surprising. But interesting to be able to quantify the impact of the selection bias.

While it’s hard in the real world to quantify selection bias. There’s a lot more we could be doing to improve at this. And we should. It’s hard to adjust for it—when we don’t have a shared sense of exactly what impact it has.

 

Sidenote: Stack rank of hands by probability of being played

Turns out another useful thing is you can see a stack ranked list of each hand and the probability of it being kept vs folded. This is a pretty useful list for new players getting used to figuring out how strong their hands are.

Methodology

Surprisingly, getting Texas Hold’em data is harder than I expected. This is surprising since scraping poker sites or videos of poker seems very doable. Apparently, people used to scrape and buy poker data sets in order to get direct edge over other players by having the data on their *specific* opponents. Which may be part of the stigma or crackdown on datasets. And most free datasets have the actual hole cards obfuscated.

The dataset I used was 7k hands of poker from a Kaggle dataset. The data can be found here. Since I needed to know the hole cards, all my data is from one player—since they only disclosed the hole cards of the player collecting the data.

Lots of reasons to not over generalize from this data. Besides the data being from one user, it also doesn’t factor in hands where the player was big blind or there were no bets. These would likely skew the data even more towards only high value hands being kept.

But think the general trends it shows are illustrative.

 

Further studies

  • How do these probability distributions differ depending on the number of players. We’d expect people to only play stronger hands the more players they are in a game.
  • How do these probability distributions differ depending on how many blinds the player can afford to play.
  • How do these probability distributions differ depending on whether the player is a pro vs amateur
  • Probably more important is how we can get better selection bias data on other more important areas.
  • Honestly, poker’s great—but I’d much rather have statistics like this collected for Avalon! ESPN for Avalon. Looking at you Eugene.

3 thoughts on “Selection Bias in Poker

  1. I played pro poker for 11 years and am still surprised how many biases translate into the real world quite easily from poker (even when not quantifiable).

    Data and HUDs (heads-up displays) became a must for playing multi-table online poker as far back as 2006-07. Random number generation for balancing ranges became commonplace. For example, if I decide on a certain flop with a flush draw I need to raise 1/3 and call 2/3 then an RNG is useful to make sure I stay unexploitable with my preferences. More and more those calculations are being made by tools like https://www.piosolver.com/ to play game theory optimal.

    Nice write up. Invite me to the next home game though 🙂

    1. Is there a good writeup on the evolution of these kinds of aids for players? Would love to read

      Love these specific built apps like piosolver.

      And haha–you can’t lead with all these and ever be invited to a home game😂

Comments are closed.