Kerala Plus Two Computer Science Notes Chapter 8 Database Management System
Database Management System (DBMS) provide an environment that is both convenient and efficient to use in retrieving and storing database.
Concept of Database
Approach of data storing in conventional file management system has many drawbacks. They are:
- We must keep more copies of the same data for different applications. This storage leads to duplication of data.
- There is no mechanism to protect the data from inconsistent changes made by different users accessing the database simultaneously.
- If the data are not properly organised, retrieval of information will be difficult, and time-consuming and there may be chances of inaccuracy in the information.
- There is no way to ensure that data is restored to a consistent state if the system crashes while changes are being made.
- Operating systems provide only a password mechanism for security. This is not sufficiently flexible to enforce security policies in data.
- There is no standardisation on data.
Need of Database
Database is an organized collection of interrelated data stored together with minimum redundancy, in a manner that makes them accessible for multiple applications.
Database Management System(DBMS) software is a set of programs which facilitates storage, retrieval and management of database.
Advantages of DBMS
Database systems are designed to manage large amounts data. DBMS involves both the database structures definition for the storage of data and the provision for the manipulation of data. The main advantages of DBMS are:
Controlling data redundancy: In file management systems, data may be placed in many files. The storing of the same data in multiple locations or duplication of data is known as data redundancy.
This leads to the higher cost in storage and data access. But database system do not maintain redundant data, instead all the data is kept at one place in a centralized manner.
Data redundancy may lead to data inconsistency; i.e., the various copies of the same data show different values in different files. By controlling data redundancy, data consistency is obtained. If a data item appears only once, any update to its value has to be performed only once and the updated value is immediately available to all users.
Efficient data access:
A DBMS utilises a variety of techniques to store and retrieve data efficiently.
It refers to the overall completeness, accuracy and consistency of data in the database. This can be indicated by an absence of any alteration in data between two updates of a data record.
Data integrity is imposed within a database at its design stage through the use of standard rules and procedures. It can be maintained through the use of error checking and validation routines.
It refers to the protection of data against accidental or intentional disclosure or unauthorised destruction or modification by unauthorised persons. The various programs and users may share data in common. But access to specific in information can be limited to selected users by setting the access rights,
Sharing of data:
The data stored in the database can be shared among several us ers or programs even simultaneously and each may use it for different purposes.
Enforcement of standards:
With cen tral control of the database, a Database Administrator (DBA) defines and enforces the necessary standards. Standards can be defined for data formats to facilitate ex] change of data between systems.
When a system crashes, all or portion of the data can become un stable. DBMS provides some mechanism to recover data from the crashes. Thus the DBMS protects data from the effects of svs tern failures.
Components of the DBMS java Environment DBMS have several components, each per forming very significant tasks in its envi ronment.
It is the actual computer sys tern used for storage of data. It includes computers, storage devices, network de vices and other supporting devices for keep ing and retrieval of data.
It consists of the actual DBMS, application programs and utilities. DBMS acts as a bridge between the user and the database. All the request from users for ac! cess to the database are handled by the DBMS. Its function is shielding of database from complex hardwarelevel detail. DBMS helps to maintain the consistency of the data. Utilities are the software tools used to ] help manage the database system. I
It is the most important compo ! nent. The database contains operational ! data and metadata (data about data). The ! ! database should contain all the data needed by the organization. For effective storage and retrieval of information, data is orga nized as fields, records and files.
- Fields: A field is the smallest unit of stored. data. Each field consists of data of a specific type, eg., Name, Marks.
- Record: A record is a collection of related fields.
- File: A file is a collection of all occurrence of same type of records.
There are a number of users who can access data on demand using application programs. The different categories of users depending on the mode of their interactions with DBMS are Database Administrator (DBA), Application Programmers, Sophisticated users and Naive Users.
It refers to the instructions and rules that govern the design and use of database. The users of the system and the person that manages the database require documented procedures on how to use or run the system. These consist of instruction on how to:
- log onto the DBMS.
- use a particular DBMS facility or application program.
- start and stop the DBMS.
- make backup copies of the database or handle hardware or software failures.
- reorganise the database across multiple disks, improve performance or archive data to secondary storage.
The need for efficiency led designers to use complex data structures to represent data I in the database. Since many database systems users are not computer trained, developers hide the complexity from users through several levels of abstraction. The three levels of abstraction are:
1. Physical level: It is the lowest level of abstraction. The physical level describes complex lowlevel data structures in detail. We must decide what file organisations are to be used tostore the relations and create auxiliary data structures, called indexes, to speed up data retrieved operations. A sample physical schema for the SWS database follows:
- Store all relations as unsorted files of record.
- Create indexes on the first column of the files.
2. Logical level:
It is is the next higher level of abstraction describes what data is stored in the database, and what relationship exist among those data. It describes the entire database in terms of a small number of relatively simple structures. The user of the logical level does not need to be aware of the complexity. The logical level is also referred to as a conceptual level.
3. View level:
It is the highest level of database abstraction and is closest to the users. It is concerned with the way in which individual users view the data. It describes only a part of the entire database. Most of the users are not concerned with all the information that is contained in the database.
The ability to modify the schema definition (data structure definition) in one level without affecting the schema definition at the next level is called data independence. There are two levels of data independence.
- Physical data independence: It refers to the ability to modify the schema followed at the physical level without affecting the schema followed at the conceptual level. So the application programs remain the same even though the schema at physical level gets modified.
- Logical data independence: It refers to the ability to modify a conceptual schema without causing any changes in the schema followed at view (external) level. It ensures that the application programs remain the same.
It is more difficult to achieve logical data independence than physical data independence because the application programs are heavily dependent on the logical structure of the database.
Users of Database
Depending on the degrees of expertise or the mode of the interactions with DBMS, the users of database system can be classified into:
1. Database Administrator (DBA):
It is the person who is responsible for the centralized and shared database. The tasks of DBA are:
- Design of the conceptual and physical schemas: The DBA is responsible for interacting with the users of the system to understand what data is to be stored in DBMS and how it is likely to be used. Based on this knowledge, the DBA must design the conceptual schema and the physical schema.
- Security and authorization: The DBA is responsible for ensuring authorized access of data.
- Data availability and recovery from failures: The DBA must take steps to restore the data to a consistent state when the system fails to complete a transaction or in case of a system crash. The DBA is responsible for implementing procedures to back up the data periodically and maintain logs of system activity.
2. Application Programmers:
They are computer professionals who interact with the DBMS through application programs. Application programs are programs written in any host languages (VB, C, Java, etc.) and interact with the DBMS through DML (Data Manipulation Language). Appliction programs should ideally access data through the external schema.
3. Sophisticated Users:
Sophisticated users include engineers, scientists, business analysts and others who are thoroughly familiar with the facilities of the DBMS. They interact with the systems through their own queries (a request to a database) to meet their complex requirements.
4. Naive Users:
They interact with the system by invoking one of the application programs that were written previously. They are not concerned with or even aware of the details of the DBMS. Naive users deal only with the higher level of abstraction.
Relational Data Model
It represents database as a collection of tables called relations each of which is assigned a unique name. In this model, both data and the relationship among them are represented in tabular form. This representation helps the beginners to understand the concepts of a database easily.
The database products based on the relational model are known as Relational DataBase Management System (RDBMS), The main advantages is its simple data representation and the ease with which even complex queries can be expressed. Popular RDBMS are Oracle, Microsoft SQL Server, MySQL, DB2, Informix, etc. Examples of 1 query languages are Structured Query Language (SQL), query by example (QBE) or Datalog.
Terminologies in RDBMS
Some terminologies in RDBMS are:
- Entity: An entity is a person or a thing in the real world that is distinguishable from others, eg., each student is an entity.
- Relation: It is a collection of data elements organised in terms of rows and columns. A relation is also called Table.
- Tuple: The rows (records) of a relation are generally referred to as tuples. A row consists of a complete set of values used to represent a particular entity.
- Attribute: The columns of a relation are called attributes. The values of each attribute are taken from the range of possible values called domain.
- Degree: The number of attributes in a relationship determines the degree of relation.
- Cardinality: It is the number of rows or tuples in a relation.
- Domain: It is a pool of values from which actual values appearing in a given column 1 are drawn.
- Description: The description or structure of a database is called database schema, In the relational model, the schema for a relation specifies its name, the name of each column, and the type of each column.
- Instance: It is a set of tuples in which each tuple has the same number of fields as the relational schema.
A key is an attribute or a collection of at tributes in a relation that uniquely distinguishes each tuple from other tuples in a ; given relation. If a key consists of more than one attribute then it is called a composite key. In the extreme, the entire tuple is the i key since each tuple in the relation is guaranteed to be unique. Different keys are:
It is minimal set of attributes that uniquely identifies a row in a relation, There may be more than one candidate key in a relation. A candidate key need not be just one single attribute. It can be compos 5 ite key.
2. Primary key:
It is a set of one or more attributes that can uniquely identify tuples within the relation. It cannot contain null value and duplicate value, Candidate keys are considered as candidates for primary key position. From i the candidate keys the one with the least f number of attributes may be selected as primary key.
3. Alternate key:
A candidate key that is not primary key is called alternate key. In the case of two or more, candidate keys, only one of them can serve as the primary key. The rest of them are alternate key.
4. Foreign key:
A key in a table can be called foreign key if it is a primary key in 1 another table. Since a foreign key can be used to link two or more tables it is also called a reference key.
The collection of operations that is used to manipulate the entire relations of a data’ erations are performed with the help of a special language associated with the relational model called query language. The fundamental operations in relational algebra are SELECT, PROJECT, UNION, INTERSECTION, SET DIFFERENCE, CARTESIAN PRODUCT, etc.
eg., Consider the relation STUDENT given in table.
1. SELECT operation:
It is used to select rows from a relation that satisfies a given predicate. The predicate is a user defined condition to select rows of user’s choice. This operation is denoted by sigma (σ). Its general form is,
The relational algebra uses comparison operators like <, <=, >, >=, =, <> (not equal to) and logical operators like v (OR), A (AND), (NOT).
eg., To select all the students who are eligible for higher studies in a relation STUDENT.
2. PROJECT operation:
It selects certain attributes from the table and forms a new relation. If the user is interested in selecting the values of a few attributes, rather than all the attributes of the relation, then use PROJECT operation. It is denoted by π. The general format is,
Here At, A2, An are various attributes that would make up the relation specified.
eg., Select Name, Result and Marks attributes in relation STUDENT.
It is possible to combine SELECT and PROJECT operations into a single statement.
eg., To select admission number and name of students who are eligible for higher studies.
3. UNION operation:
It is a binary operation and it returns a relation containing all tuples appearing in either or both of the two specified relations. It is denoted by ∪. If two relations are union-compatible, then they have the same number of attributes and corresponding attributes taken in order from left to right, have the same domain.
4. INTERSECTION operation:
It is a binary operation and it returns a relation containing the tuples appearing in both of the two specified relations. It is denoted by ∩.
5. SET DIFFERENCE operation:
It is a binary operation and it returns a relation containing the tuples appearing in the first relation but not in the second relation. It is denoted by-.
Union and Intersection operations are commutative, that is the order of relation is not important.
6. CARTESIAN PRODUCT operation:
It returns a relation consisting of all possible combinations of tuples from two relations. In this, the degree (number of attributes) equal to to the sum of the degrees of the two relations operated upon. It is denoted by cross (x). It is also called CROSS PRODUCT. All the tuples of the first relation are concatenated with tuples of the second relation to form tuples of the new relation.
It defines the logical design of data. This model describes the relationships between different parts of the data. Different models are hierarchical model, network model, relational model and object-oriented model.