Advanced techniques for efficient data integrity checking

Davide Martinenghi

    Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandling

    Abstract

    Integrity constraint checking, understood as the verification of data correctness
    and well-formedness conditions that must be satisfied in any state of a database, is not fully supported by current database technology. In a typical scenario, a database is required to comply with given semantic criteria (the integrity constraints) and to maintain the compliance each time data are updated.
    Since the introduction of the SQL2 standard, the SQL language started supporting assertions, which allow one to define general data consistency requirements expressing arbitrarily complex “business rules” that may go beyond predefined constraints such as primary keys and foreign keys. General integrity constraints are, however, far from being widely available in commercial systems; in fact, their usage is commonly not encouraged, since the database management system would not be able to provide their incremental
    evaluation. Given the size of today’s data repositories and the frequency at which updates may occur, any non-incremental approach, even for conditions whose complexity is only linear in the size of the database, may prove unfeasible in practice.
    Typically it is the database designer and the application programmer who take care of enforcing integrity via hand-coded pieces of programs that run either at the application level or within the DBMS (e.g., triggers). These solutions are, however, both difficult to maintain and error prone: small changes in a database schema may require subtle modifications in such programs.
    In this respect, database management systems need to be extended with means to verify, automatically and incrementally, that no violation of integrity is introduced by database updates. For this purpose we develop a procedure aimed at producing incremental checks whose satisfaction guarantees data integrity. A so-called simplification procedure takes in input a set of constraints and a pattern of updates to be executed on the data and outputs a set of optimized constraints which are as incremental as possible with respect to the hypothesis that the database is initially consistent. In particular, the proposed approach allows the compilation of incremental checks at database design time, thus without burdening database run time performance with expensive optimization
    operations. Furthermore, integrity verification may take place before the execution of the update, which means that the database will never reach illegal states and, thus, rollback as well as repair actions are virtually unneeded.
    The simplification process is unavoidably bound to a function that gives an approximate measure of the cost of evaluating the simplified constraints in actual database states and it is natural to characterize as optimal a simplification with a minimal cost. It is shown that, for any sensible cost function, no simplification procedure exists that returns optimal results in all cases. In spite of this negative result, that holds for the most general setting, important contexts can be found in which optimality can indeed a lways be guaranteed. Furthermore, non-optimal simplification may imply a slight loss of efficiency,
    but still is a great improvement with respect to non-incremental checking.
    Finally, we extend the applicability of simplification to a number of different contexts, such as recursive databases, concurrent database systems, data integration systems and XML document collections, and provide a performance evaluation of the proposed model.
    OriginalsprogEngelsk
    UdgivelsesstedRoskilde
    ForlagRoskilde Universitet
    Antal sider153
    StatusUdgivet - 2005
    NavnRoskilde Universitetscenter. Datalogisk Afdeling. Datalogiske Skrifter
    Nummer105
    ISSN0109-9779

    Citer dette