Published on
Tue Feb 14, 2017

Python Priority Queues with priorityq

The power of Nostalgia

I really miss Weewar! Weewar for those who don’t remember was a very very simple and light themed, turn-based, strategy war game. There have been many clones since then but nothing comes close. Here’s a video to give an idea of the mayhem:

So it all came flooding back and I embarked on what I would do when I am flooded with nostalgia. I recreate it! Plus now I had the added incentive of exposing my kids to some good old fashioned fun and strategy!

Getting there fast!

The first thing I wanted to teach my kid was around the concept of finding your way, say in a maze. And here again, Weewar provides us with good life lessons. Weewar, like other turn-based war games, offers different kinds of units (eg Soldier, Tank, Plane, Mortar etc) that, like other turn-based war games, can move around on different kinds of terrain (eg river, mountain, plains, swamp etc). Each unit has several movement points (MP) and each terrain type has different costs (that would eat away at a unit’s movement cost). So clearly it is strategically paramount that you get your units to the enemy base as quickly as possible.

Given the different costs and movement points and speeds, it requires a bit of careful thought. You cannot adopt an easy approach where you pick the terrain with the best cost and hope it would get you there as that could be a trap that leads you to a “costly” mountain! There needs to be a way to compute the cheapest/shortest path between a given two points and this is where Dijkstra’s shortest path algorithm comes in! It looks something like (courtesy of Skienna’s The Algorithm Design Manual

 1def ShortestPath-Dijkstra(G,s,t):
 2    known = {s}
 3    for i = 1 to n, dist[i] = 
 4    for each edge (s,v), dist[v] = w(s,v)
 5    last = s
 6    while (last != t)
 7        select vnext, the unknown vertex minimizing dist[v]
 8        for each edge (vnext,x), dist[x] = min[dist[x],dist[vnext] + w(vnext,x)]
 9        last = vnext
10        known = known  {vnext}

In line 7, in order to find a vertex with the smallest distance, a Priority Queue is usually used. And in line 8, as the distances of new nodes are modified, they will be reorganized within the Priority Queue based on these distances (this is the decrease-key method on a Priority Queue).

Current hurdles

A python implementation of the above algorithm highlights two problems:

  1. Finding an element is inefficient (O(n))
  2. Adjusting the position of an item in the heap after the item has been modified (the decrease-key operation) is not possible.

All that can be done now is the child node will have to be removed (which is an O(n) operation) and then have to be re-inserted into the heap. In the python’s heapq module a node cannot also be easily removed.

PriorityQ to the rescue

To fix this I have developed the priorityq python module. This module has the following advantages over the traditional heapq module:

  1. Finding a value is an O(1) operation
  2. Once a value is updated, it can be re-adjusted without a removal.
  3. The heap storage strategy can be changed transparently without needing to peek into the internals of the implementation.
  4. Seamless integration of custom comparators.

The new implementation with the priorityq module is very simple and can be found here.

How it works?

The reason this works is delightfully simple. Current implementations refer directly to the object that is stored in the heap. What PriorityQ does is slightly different. Instead of dealing with the objects directly, the objects are wrapped in an opaque handle which is what is returned to the user (of the library).

The user, in turn, can pass these handles back to the queue to, well, do things! Though a value can be modified, the handle remains the same and more importantly, the handle (opaquely) contains all the information required by the custom heap storage implementation to find and identify an existing element.

Benchmarks

The examples folder (in the package) contains a few examples that can be used on real use cases. Thankfully the 9th DIMACS international challenge was just the arena that provided a few really good graphs.

The example can be run as follows:

python examples/dijkstra/runner.py

eg:

python examples/dijkstra/runner.py examples/dijkstra/data/USA-road-d.BAY.gr.gz   10

The above runs 10 queries over randomly selected source and destination nodes in a graph that contains about 320,000 nodes and 800,000 edges. The times (as expected) were proportional to the number of nodes processed (ie searched and adjusted). Given the different heap storage types the following times were recorded:

{' '} {' '} {' '} {' '} {' '} {' '} {' '} {' '}
Storage Strategy Total Nodes Processed Average Nodes Per Second
List Heap 3,800,404 3,610
Binary Heap 4,857,736 26,361
Binomial Heap TBD TBD
Pairing Heap TBD TBD

What coming next?

In the upcoming parts of this series, I will talk about:

  1. The JS (and Swift) ports of this library
  2. More heap storage strategies (like binomial, pairing and Fibonacci heaps).
  3. Adding more “standard” algorithms that can leverage the “find” and “decrease-key” features.

Python Heapq Module

Java PriorityQueue

PriorityQueue Java Source

The skyline problem