Publications

ART That Lasts: Persistent Multiversion Adaptive Radix Trees with Fast Atomic Range Queries

Arash Khalaji, Trevor Brown, Khuzaima Daudjee

To appear in ACM SIGMOD International Conference on Management of Data (SIGMOD), June 2026

Indexes are essential for efficient query processing in database systems, with ordered indexes particularly suited for supporting range queries. Non-volatile memory (NVM) technology offers fast, byte-addressable access and persistence across system failures, enabling rapid recovery without costly index reconstruction. While recent efforts have focused on building durable NVM-based indexes for point queries and updates, they often lack efficient support for concurrent range queries with correctness. We present PermART, a persistent, versioned adaptive radix tree that surpasses current NVM-based indexes in point query and update performance and supports efficient, consistent, range queries under concurrency. By integrating multiversioning into ART, we enable efficient, linearizable range queries while improving point operation performance through in-node logging. Our evaluation shows that our persistent ART matches or exceeds the performance of current NVM-based indexes for point queries and updates, while offering correct, linearizable range queries, which existing indexes do not provide.

[Paper]

Sampling-based Predictive Database Buffer Management

Mohammad Khalaji, Theo Vanderkooy, Runsheng (Benson) Guo, Khuzaima Daudjee

Published in International Conference on Very Large Databases (VLDB), January 2026

Systems often need to support analytical (OLAP) workloads that perform concurrent scans of data on secondary storage. The buffer manager is tasked with fetching data into the database system’s buffer pool and caching it there so as to increase the hit rate on these data, thereby lowering query latencies. This paper presents a database buffer caching policy that uses information about long-running scans to estimate future accesses. These estimates are used to approximate an optimal buffer caching policy that would otherwise infeasibly require knowledge about future accesses. Since a buffer caching policy must be efficient with low overhead, we present sampling-based predictive buffer management techniques where buffer eviction considers only a small random sample of buffers and access time estimates are used to select from the sample. This design is advantageous as it is easily tuned by adjusting the sample size, and easily modified to improve access time estimates and to expand the set of workload types that can be predicted effectively. We evaluate our techniques through both simulation studies on real Amazon Redshift workload traces and through implementation into the well-known open-source PostgreSQL database system on the popular TPC-H and YCSB benchmarks. We show that our approach delivers substantial performance improvements for workloads with scans, reducing I/O volume significantly by up to 40\% over PostgreSQL’s Clock-sweep policy and over prior predictive approaches for workloads using sequential scans and index accesses.

[Paper]

Practical Hardware Transactional vEB Trees

Mohammad Khalaji, Trevor Brown, Khuzaima Daudjee, Vitaly Aksenov

Published in Principles and Practice of Parallel Programming (PPoPP), March 2024

van Emde Boas (vEB) trees are sequential data structures optimized for extremely fast predecessor and successor queries. Such queries are an important incentive to use ordered sets or maps such as vEB trees. All operations in a vEB tree are doubly logarithmic in the universe size. Attempts to implement concurrent vEB trees have either simplified their structure in a way that eliminated their ability to perform fast predecessor and successor queries, or have otherwise compromised on doubly logarithmic complexity. In this work, we leverage Hardware Transactional Memory (HTM) to implement vEB tree-based sets and maps in which operations are doubly logarithmic in the absence of contention. Our proposed concurrent vEB tree is the first to implement recursive summaries, the key algorithmic component of fast predecessor and successor operations. Through extensive experiments, we demonstrate that our algorithm outperforms state-of-theart concurrent maps by an average of 5× in a moderately skewed workload, and the single-threaded C++ standard ordered map and its unordered map by 70% and 14%, respectively. And, it does so while using two orders of magnitude less memory than traditional vEB trees.

[Paper] | [Slides]

Arash (Mohammad) Khalaji

Publications

ART That Lasts: Persistent Multiversion Adaptive Radix Trees with Fast Atomic Range Queries

Sampling-based Predictive Database Buffer Management

Practical Hardware Transactional vEB Trees