Minimax with Α/Β Pruning

Notation / Algorithm
Node kinds / Return values
Search trees / Predictions

> Notation

We’ll notate the successors of a node in a game tree or search tree with node·succ[k] from k = 0 to k = count(node·succ) − 1. Nodes are subtly different than positions; a position may appear multiple times in a game tree or search tree due to repetitions or transpositions, but a node only appears once and has a unique predecessor. The initial value of alpha at node is node·alpha[0], the value of alpha immediately before the k-th successor is searched is node·alpha[k], and the value of alpha immediately after the k-th successor is searched is node·alpha[k+1]. The value of beta at node is node·beta .

Note that node·succ[0]·alpha[0] = −node·beta and node·succ[0]·beta = −node·alpha[0], and more generally, node·succ[k]·alpha[0] = −node·beta and node·succ[k]·beta = −node·alpha[k]. In other words, we’ll be using the negamax form of minimax. (I’m of the opinion that negamax is an implementation of minimax, not a distinct algorithm in its own right.)

The return value of the search of node is node·ret . The maximum of −node·succ[u]·ret from u = 0 to u = k is node·best[k], so that

node·ret

= node·best[s−1]

= max(−node·succ[u]·ret

for

u = 0 ,

. . . , s

−1)

where s is the number of successors searched and s−1 is the index of the last successor that was searched. Note that s may be less than count(node·succ).

The score of node is node·score. We will always use the term “score” to refer to an evaluation (for terminal nodes) or the theoretical result of a minimax search without α/β pruning. We will not use it as a synonym for the return value of a search with pruning.

> Algorithm

Using this terminology, minimax with α/β pruning is

def search!(node)
  if count(node.succ) = 0
    node.ret = evaluate(node.position)
    return
  end
  mutable retval ← unknown
  for k from 0 to count(node.succ) − 1
    node.succ[k].alpha[0] = −node.beta
    node.succ[k].beta = −node.alpha[0]
    search!(node.succ[k])
    if k = 0
      node.best[0] = −node.succ[0].ret
    else
      node.best[k] = max(node.best[k−1], −node.succ[k].ret)
    end
    retval ← node.best[k]
    break if retval * node.beta
    node.alpha[k+1] = max(node.alpha[k], −node.succ[k].ret)
  end
  node.ret = retval
end

where * is either > or ≥ (this is the cutoff criterion). When the flow of execution breaks out of the loop, we say that a cutoff has occurred and the k-th successor caused the cutoff. Note that we break before alpha is updated (that is, we never define alpha[k+1] when the k-th successor causes a cutoff) in order to maintain the invariant that values of alpha are always less than or equal to beta. This is fairly irrelevant, as alpha[s] is not used in any case.

In more typical presentation, where there is not an explicit object for each node, this is

def search(position, alpha, beta)
  iterator = moves(position)
  mutable move ← next!(iterator)
  return evaluate(position) if move = none
  mutable alpha ← alpha
  mutable retval ← unknown
  while move ≠ none
    x = −search(apply(position, move), −beta, −alpha)
    retval ← x if retval = unknown or x > retval
    break if retval * beta
    alpha ← max(alpha, x)
    move ← next!(iterator)
  end
  return retval
end

where * is either > or ≥ again.

> Kinds of nodes

Every node is classified as being one of three kinds: Pv, Cut, or All. The name “pv” is somewhat unfortunate, because a pv node is not necessarily part of the principal variation of any of its ancestors. (Only when move ordering is perfect is every pv node part of the pv of its predecessor.) The kind of a node cannot generally be determined until after the node has been searched (however, it can be predicted).

Pv nodes

A node is a pv node when the negated return value of at least one successor meets or exceeds alpha and none of its successors cause a cutoff. In other words, none of the negated return values of its successors exceed beta (or if the cutoff criterion is ret ≥ beta rather than ret > beta , none meet or exceed beta). This implies that all its successors are searched.

A pv node has a return value that is exact: the return value is greater than or equal to alpha and less than or equal to beta, and it is equal to the score of the node.

                                       ╭──┴──╮
                                       │  w  │
                                       ╰──┬──╯
               ┌─────────────────┬────────┴────────┬──╶╶╶
    w.beta=+7  │      w.beta=+7  │      w.beta=+7  │
w.alpha[0]=−3  │  w.alpha[1]=−3  │  w.alpha[2]=+5  │
               │   w.best[0]=−4  │   w.best[1]=+5  │
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
               │                 │x.ret=−5         │
               │        x.beta=+3│                 │
               │    x.alpha[0]=−7│                 │
            ╭╴╴┴╶╶╮           ╭──┴──╮           ╭╴╴┴╶╶╮
            ╷     ╷           │  x  │           ╷     ╷
            ╵     ╵           │ p-v │           ╵     ╵
            ╰╴╴┬╶╶╯           ╰──┬──╯           ╰╴╴┬╶╶╯
                                 │
                        ┌────────┴────────┐
             x.beta=+3  │      x.beta=+3  │      x.beta=+3
         x.alpha[0]=−7  │  x.alpha[1]=−5  │  x.alpha[2]=−5
                        │   x.best[0]=−5  │   x.best[1]=−5
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
                        │y.ret=+5         │z.ret=+9
               y.beta=+7│        z.beta=+5│
           y.alpha[0]=−3│    z.alpha[0]=−3│
                     ╭──┴──╮           ╭──┴──╮
                     │  y  │           │  z  │
                     ╰──┬──╯           ╰──┬──╯

Cut nodes

A node is a cut node when one of its successors causes a cutoff. This implies that some of its successors may not be searched.

A cut node has a return value that is

greater than beta, from its own perspective (because one of its successors caused a cutoff).
less than alpha, from the perspective of its predecessor (so it fails to raise alpha).

Or greater than or equal to beta and less than or equal to alpha, respectively, if the cutoff criterion is ret ≥ beta rather than ret > beta .

The score of a cut node is unknown; the return value is a bound on its score. Specifically, the score of a cut node is

greater than its return value, from its own perspective.
less than its return value, from the perspective of its predecessor.

Or greater than or equal to and less than or equal to, respectively, if the cutoff criterion is ret ≥ beta rather than ret > beta .

                                       ╭──┴──╮
                                       │  w  │
                                       ╰──┬──╯
               ┌─────────────────┬────────┴────────┬──╶╶╶
    w.beta=+7  │      w.beta=+7  │      w.beta=+7  │
w.alpha[0]=−3  │  w.alpha[1]=−3  │  w.alpha[2]=−3  │
               │   w.best[0]=−4  │   w.best[1]=−4  │
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
               │                 │x.ret=+5         │
               │        x.beta=+3│                 │
               │    x.alpha[0]=−7│                 │
            ╭╴╴┴╶╶╮           ╭──┴──╮           ╭╴╴┴╶╶╮
            ╷     ╷           │  x  │           ╷     ╷
            ╵     ╵           │ cut │           ╵     ╵
            ╰╴╴┬╶╶╯           ╰──┬──╯           ╰╴╴┬╶╶╯
                                 │
                        ┌────────┴────────┐
             x.beta=+3  │                 │
         x.alpha[0]=−7  │                 │
                        │   x.best[0]=+5  │
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
                        │y.ret=−5
               y.beta=+7│
           y.alpha[0]=−3│
                     ╭──┴──╮
                     │  y  │
                     ╰──┬──╯

All nodes

A node is an all node when none of the negated return values of its successors meet or raise alpha. This implies that all of its successors are searched.

An all node has a return value that is

less than alpha, from its own perspective (because none of its successors raised alpha).
greater than beta, from the perspective of its predecessor (so it causes a cutoff).

Or less than or equal to alpha and greater than or equal to beta, respectively, depending on the cutoff criterion, as above.

The score of an all node is unknown; the return value is a bound on its score. Specifically, the score of an all node is

less than its return value, from its own perspective.
greater than its return value, from the perspective of its predecessor.

Or less than or equal to and greater than or equal to, respectively, depending on the cutoff criterion, as above.

                                       ╭──┴──╮
                                       │  w  │
                                       ╰──┬──╯
               ┌─────────────────┬────────┴────────┐
    w.beta=+7  │      w.beta=+7  │                 │
w.alpha[0]=−3  │  w.alpha[1]=−3  │                 │
               │   w.best[0]=−4  │   w.best[1]=+8  │
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
               │                 │x.ret=−8
               │        x.beta=+3│
               │    x.alpha[0]=−7│
            ╭╴╴┴╶╶╮           ╭──┴──╮
            ╷     ╷           │  x  │
            ╵     ╵           │ all │
            ╰╴╴┬╶╶╯           ╰──┬──╯
                                 │
                        ┌────────┴────────┐
             x.beta=+3  │      x.beta=+3  │      x.beta=+2
         x.alpha[0]=−7  │  x.alpha[1]=−7  │  x.alpha[2]=−7
                        │   x.best[0]=−8  │   x.best[1]=−8
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
                        │y.ret=+8         │z.ret=+11
               y.beta=+7│        z.beta=+7│
           y.alpha[0]=−3│    z.alpha[0]=−3│
                     ╭──┴──╮           ╭──┴──╮
                     │  y  │           │  z  │
                     ╰──┬──╯           ╰──┬──╯

> Values returned by search

`alpha[0] < ret < beta`

When the value returned by a search is strictly between the initial value of alpha and beta, the value is exact (the score is the return value).

`ret > beta`

When the value returned by a search is strictly above beta, the value is a lower bound of the score (but the score is unknown). In other words, the score is the return value or greater.

The example below shows a search with pruning on the left and a search without it on the right.

            x.beta=+3 │ x.ret=+5                      │ x.score=+9
        x.alpha[0]=−7 │                               │
                   ╭──┴──╮                         ╭──┴──╮
                   │  x  │                         │  x  │
                   │ cut │                         ╰──┬──╯
                   ╰──┬──╯                            │
              ┌───────┴───────┐               ┌───────┴───────┐
    x.beta=+3 │               │               │               │
x.alpha[0]=−7 │ x.best[0]=+5  │               │ x.best[0]=+5  │ x.best[1]=+9
╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴      ╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴
    y.beta=+7 │ y.ret=−5                      │ y.score=−5    │ z.score=−9
y.alpha[0]=−3 │                               │               │
           ╭──┴──╮                         ╭──┴──╮         ╭──┴──╮
           │  y  │                         │  y  │         │  z  │
           ╰─────╯                         ╰─────╯         ╰─────╯

Here the search with pruning return +5, but the score is actually +9.

`ret < alpha[0]`

When the value returned by a search is strictly below the initial value of alpha, the value is an upper bound of the score (and the score is unknown). In other words, the score is the return value or less.

The example below shows a search with pruning on the left and a search without it on the right.

            x.beta=+7 │ x.ret=−5                      │ x.score=−9
        x.alpha[0]=−3 │                               │
                   ╭──┴──╮                         ╭──┴──╮
                   │  x  │                         │  x  │
                   │ all │                         ╰──┬──╯
                   ╰──┬──╯                            │
            x.beta=+7 │                               │
        x.alpha[0]=−3 │ x.best[0]=−5                  │ x.best[0]=−9
╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴      ╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴
            y.beta=+3 │ y.ret=+5                      │ y.score=+9
        y.alpha[0]=−7 │                               │
                   ╭──┴──╮                         ╭──┴──╮
                   │  y  │                         │  y  │
                   ╰──┬──╯                         ╰──┬──╯
              ┌───────┴───────┐               ┌───────┴───────┐
    y.beta=+3 │               │               │               │
y.alpha[0]=−7 │ y.best[0]=+5  │               │ y.best[0]=+5  │ &.best[1]=+9
╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴      ╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴│╴╴╴╴╴╴╴╴╴╴╴╴╴╴
    z.beta=+7 │ z.ret=−5                      │ y.score=−5    │ &.score=−9
z.alpha[0]=−3 │                               │               │
           ╭──┴──╮                         ╭──┴──╮         ╭──┴──╮
           │  z  │                         │  z  │         │  &  │
           ╰─────╯                         ╰─────╯         ╰─────╯

Here the search with pruning return −5, but the score is actually −9.

`ret = alpha[0]` or `ret = beta`

When the value returned by a search is equal to alpha or beta, whether the value is exact or a bound of the score depends on the criterion for cutoff.

When the criterion for cutoff is ret > beta, a return value equal to alpha[0] or beta is exact. A position is unreachable when ret > beta. For theoretical matters, this is the convention I prefer. For example, it implies that when alpha = beta we know the score exactly, as one would intuitively expect; after all, if alpha[0] ≤ score ≤ beta and alpha[0] = beta, then alpha[0] = score = beta. But this is not necessarily true if the cutoff criterion is ret ≥ beta because the position may be unreachable when ret = beta, and score ≤ beta does not hold for unreachable positions!

When the criterion for cutoff is ret ≥ beta, whether a return value equal to alpha[0] or beta is exact or a bound is unknown. A position is unreachable when ret > beta and may be unreachable when ret = beta.

However, when the cutoff criterion is ret ≥ beta, whether the return value is exact or a bound (upper or lower) can be determined if an additional field is also returned:

def invert(d)
  ※ if upper, exact, and lower are represented with
  ※ −1, 0, and +1 respectively, this is simply −d
  return upper if d = lower
  return lower if d = upper
  return exact
end

def join(d0, d1)
  ※ if upper, exact, and lower are represented with
  ※ −1, 0, and +1 respectively, this is simply max(d0, d1)
  return lower if d0 = lower or d1 = lower
  return exact if d0 = exact or d1 = exact
  return upper
end

def search(position, alpha, beta)
  iterator = moves(position)
  mutable move ← next!(iterator)
  return ⟨evaluate(position), exact⟩ if move = none
  mutable alpha ← alpha
  mutable retval ← unknown
  mutable bound ← unknown
  while move ≠ none
    x', d' = search(apply(position, move), −beta, −alpha)
    x, d = −x', invert(d')
    if retval = unknown or x ≥ retval
      when x = retval then bound ← join(bound, d)
      when x > retval then bound ← d
      retval ← x
    end
    move ← next!(iterator)
    break if retval ≥ beta
    alpha ← max(alpha, x)
  end
  ※ move ≠ none means not all successors were searched
  bound ← lower if retval ≥ beta and move ≠ none
  return ⟨retval, bound⟩
end

> What search trees look like

With perfect move ordering, search subtrees have this structure:

     p-v           cut         all
 ┌────┼────┐        │        ┌──┴──┐
p-v  cut  ...      all      cut   ...

where nodes are searched from left to right and an ellipsis indicates zero or more nodes of the preceding type.

With imperfect move ordering, they have this structure:

              p-v                       cut            all
  ┌───────┬────┼────┬────┐        ┌──────┼────┐      ┌──┴──┐
p-v/cut  ...  p-v  cut  ...    p-v/cut  ...  all    cut   ...

A tree with perfect move ordering then looks like:

                                        p-v
                    ┌───────────┬────────┼────────┐
                   p-v         cut      cut      cut
        ┌──────┬────┼────┐      │        │        │
       p-v    cut  cut  cut    all      all      all
 ┌───┬──┼──┐   │    │    │   ┌──┼──┐  ┌──┼──┐  ┌──┼──┐
p-v  c  c  c  all  all  all  c  c  c  c  c  c  c  c  c
 ╎   ╎  ╎  ╎   ╎    ╎    ╎   ╎  ╎  ╎  ╎  ╎  ╎  ╎  ╎  ╎

Actual search trees are quite good: 90+% of cut nodes have only a single successor and in the majority of pv nodes the first successor is the best move. This is partially explained by iterative deepening with a transposition table: the tree is continually being reëvaluated, and at each iteration the best moves are now ordered first. And since deeper searches are more likely to be accurate, the closer a node is to the root, the more likely it is for its first move to be ordered correctly. So the interior of the tree looks nearly ideal, but one might expect the leaves to be garbage. Over the next few iterations they will be massively refined, but they will no longer be leaf nodes. What’s amazing, though, is that the overall statistics are still good even given the fact that ordinarily half or more of all nodes are leaf nodes.

> Predicting the kind of a node

If you believe a node is an all node, you should predict that its successors are cut nodes.

If you believe a node is a cut node, you should predict that its successors are all nodes.

If you believe a node is a pv node, you should predict that its first successor is a pv node and that every subsequent successor is a cut node. There are conditions under which you might revise your beliefs:

If you strongly believe the move ordering is perfect and the first successor fails to raise alpha (meaning it was a cut node), then you should consider this node to instead be an all node.
If you suspect the move ordering is imperfect (or you strongly believe this is in fact a pv node) and the first successor fails to raise alpha, then you should predict the next successor is a pv node (and not a cut node) and that every successor thereafter is a cut node (and so on until a successor raises alpha).

{注意} When discussing principal variation search, the terms pv and cut are commonly used to mean expected-pv and expected-cut (and a different decision procedure is used to predict the successors of pv nodes), or the terms are simply conflated with whether zero-window pruning is attempted or not.