Project Description

Probabilistic Data structures are powerful when dealing with heavy data processing.

They became more popular with the expansion of distributed data stores technologies such as Hadoop, Cassandra, Redis etc.. Nevertheless, they're still little-known and considered as "exotic".

The adavantage of these randomly balanced trees is that their implementation is simpler than the very famous "Red Black Trees" and they give a good performance-memory consumption tradeoff.

Note that the rebalancing of these trees has less cost than for R-B Trees

Treap

It's simply a binary Tree with properties of a heap (for each node, its childs priorities are superior to its priority). This tree is just balanced by keeping its heap structure stable, and priorities are chosen randomly.

In my implementation ,the tree-traversal function does not use recursion at all neither stack data structure, which makes code less readable but more performant and less memory consuming.

Otherwhise, functions such as delete and add use recursion. To completely avoid this when doing binary tree traversal, we have to use additional data structures like stacks or lists to track visited nodes. Or tree node has to have extra metadata such as a reference to its parent and have a "visited" flag.

SkipList

A skip list represents an ordered set of (key,value) pairs. It's simply a linked list with extra "levels" of increasingly sparse linked lists.


Related Papers

- http://msdn.microsoft.com/en-us/library/ms379573%28v=vs.80%29.aspx#datastructures204topic4
- Original skiplist paper : ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf
- Other Probabilistic structures http://pages.cs.wisc.edu/~sekar/research/swat98.pdf

Last edited Aug 23, 2013 at 3:09 PM by yacineb, version 4