This is the first of a series of posts on new features that will be introduced with our 1.1 release. That release will feature support for much of SPARQL 1.1 and includes sophisticated new data structures and algorithms for fast, scalable analytic query. The highlights of our forthcoming 1.1 release are:
- SPARQL 1.1 support (Sesame 2.5) (except property paths, minus and update).
- A new query optimizer (the RTO).
- Scalable analytic operators (hash joins).
- New extensible hash tree index (the HTree).
- 100% native Java solution gets data off the JVM object heap (the Memory Manager).
This article will begin at the bottom of the stack, focusing on the memory manager and its role in supporting scalable query.
Due to byte code optimization, Java applications can run as fast as hand coded C application. However, there is a hidden cost associated with Java application — the maintenance of the JVM object heap. For many applications the cost of object creation, retention, and garbage collection may be negligible. However, as illustrated in the diagram below, an application with a high object creation and retention rate can get into trouble with the Garbage Collector.
As the application induced “heap pressure” increases, the garbage collector must run more and more frequently. Depending on the mode in which the garbage collector is running, it may either take cores away from the application or freeze out the application entirely during GC cycles. As the application heap pressure increases, the GC cycle time increases as well. Eventually, the garbage collector runs more than the application and application throughput plummets. Larger heaps can only mask this problem temporarily since larger heaps require more GC time.
With our 1.1 release, we are introducing the Bigdata MemoryManager class. We have been using the NIO package and native buffer allocations for some time in the clustered database, but with the introduction of the MemoryManager and the MemStrore, these native heap allocations are now easily reused for other purposes. The memory manager utilizes the Java NIO package to allocate large blocks of RAM on the native process heap using ByteBuffer.allocateDirect(…) and thus remains a 100% Java solution! These allocations are outside of the JVM managed memory and impose NO GC overhead. They are basically regions of the native C heap allocated by malloc inside of the JVM. Such allocations can not be deterministically released, so we maintain a pool of such native buffers and recycle buffers once they are no longer required for a specific purpose.
The memory manager is basically the RWStore technology repackaged for main memory. Like the RWStore, it can scale to terabytes. The key interface is com.bigdata.rwstore.sector.IMemoryManager. The IMemoryManager interface provides for hierarchical nesting of “allocation contexts” which share the same pool of backing buffers. This hierarchical allocation model makes it easier to ensure that allocations for the same purpose are grouped together on the same native buffers, and that all allocations are released no later than when their top allocation context goes out of scope. For example, an IRunningQuery on the QueryEngine has a top-level allocation context. Various operators in the query plan may create inner allocation contexts in which they store data, typically on an HTree instance. Since we always clear the top-level allocation context when the IRunningQuery is done, all such hash indices will be release no later than when the IRunningQuery is done.
The IMemoryManager is a low level interface which manages a logical address space over native buffers and provides methods to allocate, read, and delete slices of data on those buffers. The MemStore class in the same package provides a higher level IRawStore abstraction — the same abstraction which is used by the journal and the B+Tree and HTree classes. The MemStore makes it easy to use persistence capable data structures over native buffers. The main use of the MemStore in the 1.1 release is to support high level language constructs such as DISTINCT, scalable default graph evaluation, and highly efficient hash join operators. All of those features rely on the HTree operating over an MemStore. However, other applications are certainly possible, including very large application specific caches.