DILI: A Distribution-Driven Learned Index

Pengfei Li, Hua Lu, Rong Zhu, Bolin Ding, Long Yang, Gang Pan

Publikation: Bidrag til tidsskriftKonferenceartikelForskningpeer review

Abstract

Targeting in-memory one-dimensional search keys, we proposea novel DIstribution-driven Learned Index tree (DILI), where aconcise and computation-efficient linear regression model is usedfor each node. An internal node’s key range is equally divided by itschild nodes such that a key search enjoys perfect model predictionaccuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, wedesign flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.

OriginalsprogEngelsk
TidsskriftProceedings of the VLDB Endowment
Vol/bind16
Udgave nummer9
Sider (fra-til)2212-2224
Antal sider13
ISSN2150-8097
DOI
StatusUdgivet - 2023
Begivenhed49th International Conference on Very Large Data Bases, VLDB 2023 - Vancouver, Canada
Varighed: 28 aug. 20231 sep. 2023

Konference

Konference49th International Conference on Very Large Data Bases, VLDB 2023
Land/OmrådeCanada
ByVancouver
Periode28/08/202301/09/2023

Citer dette