Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code


Due to the emergence of cloud computing and big data applications, modern software systems are becoming more dependent on the underlying database management systems (DBMSs) for data integrity and management. Since DBMSs are very complex and each technology has some implementation-specific differences, DBMSs are usually used as black boxes by software developers, which allow better adaption and abstraction of different database technologies. For example, Object-Relational Mapping (ORM) is one of the most popular database abstraction approaches that developers use. Using ORM, objects in Object-Oriented languages are mapped to records in the DBMS, and object manipulations are automatically translated to SQL queries. Despite ORM’s convenience, there exists impedance mismatches between the Object-Oriented paradigm and the relational DBMSs. Such impedance mismatches may result in developers writing inefficiently and/or incorrectly database access code. Thus, this thesis proposes several approaches to improve the quality of database-centric software systems by looking at the application source code. We focus on troubleshooting and detecting inefficient (i.e., performance problems) and incorrect (i.e., functional problems) database accesses in the source code, and we prioritize the detected problems based on severity. Through case studies on large commercial and open source systems, we plan to demonstrate the value of improving the quality of database-centric software systems from a new perspective - helping developers access the database more efficiently and accurately.

Internatonal Conference on Data Engineering Workshop