hpincket - Hunt's Concurrent Heap-based Priority Queue

Remember, this is just a collection of thoughts.

These research papers skim through fundamental computer science topics fairly quickly. Look at this terse summary:

A heap is a binary tree with a property that the key at any node has higher priority than the keys at its children (if they exist). An array representation of a heap is the most space efficient: the root of the heap occupies location 1 and the left and right children of the node at location i occupy the locations 2i and 2i+1 respectively.

Paper Commentary

A look at their competition:

Coarse-grained locking yawn
Biswas and Browne use maintenance processes that dequeue sub-operations from a FIFO Queue. Overhead.
Rao and Kumar uses top-down insertions (to avoid deadlocks)
Jones uses skew heaps. What the heck are those?

So what does this paper do differently?

Allows concurrent insertions and deletions in opposite directions without deadlocking.
Fancy 'bit-reversal' technique to reduce contention when operating on the bottom of the heap.

This algorithm uses locks. One on the heap size variable, and one for each node in the heap. Each node also has a tag of the following values:

EMPTY - nothing here, move along
AVAILABLE - can be deleted
pid - currently in the process of moving with said process

Operations

Delete operation: Top down. The locks of two nodes are held at any time. The lock for the size is also held. (Q: Is the size lock held throughout the entire swapping charade? Does it matter?)

Insert operation: Bottom up. The locks of two nodes are still required, but locking upwards will cause a deadlock (see below). But "in every step of the bottom-up comparison, the lock of the parent is acquired first, followed by the lock on the inserted node. After comparison and swapping (if necessary), but locks are released." This opens up a period of vulnerability, when the inserting thread comes back to re-lock it's node it could be anywhere! This is where the tags come into play. This can be a little confusing in the paper, hopefully I can do it better service.

When inserting:

If TAG(current) != pid then it must be above us. Look upwards
ELIf TAG(parent) == AVAILABLE
ELIf TAG(parent) == EMPTY then the inserted node is at the root, all done.
(implicit) If TAG(Parent) == some other PID then release your locks to allow the above process to access them.

Q: The paper only considers a lost item to have moved upwards, couldn't it also move downwards if another insert overtakes it?

A: Good thinking, but it can't.

Proof. Assume of the sake of contradiction that an inserting thread B has passed
inserting thread A. In order for B to pass A, a must at one point been above b.
A must have just released it's locks. In order for B to switch it's node (b) with
A's (a) it must acquire a's lock and then switch values. But since a is not
marked as available b must not have been able to switch the nodes.
Contradiction.

Here's a look at the pseudo-code

while i > 1 do
  parent := 1/2; LOCK(parent); LOCK(i)
  if TAG(parent) = AVAILABLE and TAG(i) = pid then
    if PRIORITY(i) > PRIORITY(parent) then
      swap_items(i,parent); i:= parent
    else
      TAG(i) := AVAIABLE; i:=0
  else if TAG(parent) = EMPTY then
    i:= 0
  else if TAG(i) != pid then
    i:= parent
  UNLOCK(i): UNLOCK(parent)
enddo

When a process detects another ahead of it, it releases it's locks and then immediately reacquires them. Wait, but how does the other process acquire the locks? Luck I guess. This scheme works if the lock implementation treats the aspiring lock-holders as a queue, but there's no way to guarantee this. The paper uses a test-test-and-set lock, which offers no such guarantee. Here's the Java TTAS lock from Herlihy's TAoMP:

:::java
public class TTASLock implements Lock {
  AtomicBoolean state = new AtomicBoolean(false);
  public void lock() {
    while (true) {
      while (state.get()) {};
      if (!state.getAndSet(true))
        return;
    }
  }
  public void unlock() {
    state.set(false);
  }
}

The use of this lock means the algorithm isn't starvation-free.

The Bit Reversal in a Nutshell

The goal of the 'bit reversal technique' is to generate the following pattern (starting from 8 here): 8, 12, 10, 14, 9, 13, 11, 15

The goal of this pattern is to spread the insertions along the base of the tree. That is, reduce the number of collisions or lock-contention. It's a good idea, and worth drawing out on a sheet of paper.

Here they are in binary:

A key to understanding this bit-hack is to realize that all of these numbers operate on the 4th level of the tree. This means they all are at least 8. Let's ignore that most significant digit.

08 - 8: 000
12 - 8: 100
10 - 8: 010
14 - 8: 110
09 - 8: 001
13 - 8: 101
11 - 8: 011
15 - 8: 111

OK, this is starting to look familiar. Wait, bit reversal you say? See how the numbers count upwards if we 'reverse' the greatest and least significant digits? This means that alternating numbers are placed on different sides of the tree. The greater the actual digit, the greater higher up the branches meet. For instance 8 and 9 are right next to each other. They are more likely to collide than 8 and 12. Likewise 8 and 10 have a common parent only 2 generations away. This is the intuition.