Database systems consist of complicated data structures and relations. To make it easier for users to access the data, these complications are kept hidden, and only the relevant part of the database is made accessible to the users. This method of masking data is called data abstraction.
Data abstraction refers to selecting a set of data which is relevant to the user and masking the remaining unwanted data.
During the requirement gathering phase, we try to determine information about the various business processes, the actors in each process, and the data required by them at each step. From the gathered data, we are able to create the database model which describes the various entities, the levels of entities, and the relations between them. As we detail down from one level of the data model to another, we add more details to the objects. The database design goes through three levels of abstraction as we move from user requirements to DBMS implementation.
Levels of Data Abstraction
Database abstraction also hides the implementation details of the data from the user. There are four levels of data abstraction. Let’s take a look at these levels starting from the top:
Database security and integrity are crucial to any organization. The right people must have access to the right data for the smooth functioning of the organization. This makes it mandatory to implement view level abstraction.
This is the highest level from the three levels of data abstraction. In this, only the part of the data which are relevant to the users are accessible.
To understand this better, let’s take an example of a college DBMS. Here, the various users would be the students, teachers, accounts department staff, etc. At the view level, a student will not have access to the details of other students or the members of the staff. The HOD might not have access to the account details but will surely have access to the academic information of all the students. Even the various members of the staff would not have access to the data of their peers in this case.
The logical level is also called as the conceptual level. At this level, we can see what data is stored in the database without knowing the implementation details such as the data structures and tree implementations. This level also tells us about the relations between the different fields and database tables. This relationship can be many-to-many, one-to-many, many-to-one, the various joins, etc.
Here, we can see the overview of the data flow inside the organization. The database administrators have full access to this level of data. Taking the example of a college DBMS again, the relationship between an entity ‘Professor’ and another entity ‘Student’, can be one-to-many. The fields describing the entity student would be the same as that describing a general entity ‘Person’ in along with fields such as subjects, marks, rank, etc.
Any changes to this level should not affect the external or the physical level. Say we are to add a field “skill” to the entity Student, it should neither change the way a user is able to access data nor should it affect the way data is stored in the database.
The physical level is at the bottom of all the abstraction levels. It describes how data is stored in the database. Implementation details such as indexing methods like B+ trees or hashing and access methods such as sequential or random access used are described in this level.
Basically, what we do is map the conceptual level to the selected models’ characteristics and constraints. This makes the DBMS independent of the internal model. Let’s assume that we decide to use relational database management systems; then our conceptual models would be mapped to the internal level. The entities from the ER model will be mapped to tables.
It is quite evident that data abstraction is an important concept which has to be implemented and serves many applications. We have adventured into one example, but the applications are galore. With every incident of data mining, management, and use, data abstraction is imperative. Hence, it is not something that can be ignored and must be incorporated during the initial design and development phase of any database.