Intro to Database Systems - Part 3 : The Relational Model

The entity-relationship model described in the previous lecture is a diagram. The main purpose is to clearly picture the constraints of the database to someone that is not necessarily coming from a computer science field.

The model that’s the most popular in Database Systems is the relational model: a representation of structuring data through relations between rows and columns. As seen in Lecture 1, relational database is composed by:

The schema for the database Students can be represented as Students(studID:string, name:string, login:string, faculty:string, gpa:real). It must contain:

The schema of the database is defined by the DDL : data definition language. It is used mainly to create and destroy databases and databases objects. The main operations are :

As a practical example, we can have :

The last DDL that we’ll cover is the DROP operation: it removes the entire database objects from the Database System.

exploits_of_a_mom.png

The Data Manipulation Language is used to manipulate data itself (vs Data Definition Language which is used to define data structures). It is incorporated most of the times in SQL databases. The main commands are :

Accessibility in terms of querying time is one of the main advantages of using the relational model. The most common tool for it is SQL : Structured Query Language. It covers both defining relations (DDL) and writing/querying data (DML).

While defining data in the relational model, it is important to notice that every attribute must have a data type. SQL, as any programming language, provides various data types :

This are 2 examples of using data types in SQL. Notice the fact that SQL is case sensitive, and also that we can attribute default values:

To manipulate data, we can show 3 examples:

We’ve already discussed constraints in the previous lecture. An integrity constraint ensures that changes made by authorized users do not result in a loss of data consistency. The designer specifies them when the DB is created and the DBMS checks if the condition is true when relations are modified. The domain definition is an example of integrity constraint.

A general example of an integrity constraint is the key declaration: a set of attributes forming a candidate key for the entity set. In a database, we can have 3 types of keys:

A candidate key is a set of columns allowing us to distinguish rows. It can uniquely identify a record without mentioning any other data. The attribute “First Name” won’t identify any record in a database of students. In this database, you can have Dan Crisan, but also Dan Smith. To make sure that you have the required row, the required student, you will also have to take a look at the “Last Name”. Hence, the combination “First name” + “Last name” is a candidate key.

In a banking database, the combination of the customer’s ID and a sequential number for each of his bank accounts is a valid candidate key. Also, the ISBN (product number) is a valid candidate key. The ISBN number is also an example of a primary key: a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be a Primary Key.

Here are few similarities between candidate keys and primary keys:

To differentiate a candidate key from a primary key, we can notice that:

In addition of candidate keys and primary key, databases also have foreign keys: they refer to a primary key in another table. It’s a key to define the relationships between tables. Say we have a StudentEvent table where the primary key is the EventID. The personInCharge is actually storing studentID… which is the foreign key to the Student table. It defines the relationship between Student table and StudentEvent table.

Another example is where in database Enrolled, studentID is a foreign key referring to database Students. This can illustrate the concept of referential integrity: when one table has a foreign key to another table, the concept of referential integrity states that you can’t add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. In this example, we can not add a student to the database Enrolled if the database Students doesn’t contain this student. Referential integrity is achieved if we don’t have this inconsistency.

In a scenario where we have Managers and Employees as different tables, referential integrity maintains the following 3 rules:

This is an example of how we mention keys between the database Enrolled and the database Students in SQL:

Now what are the steps to go from an entity-relationship model to the relational model?

1) From entity sets to relations:

entitySet.PNG

2) From weak entities to relations

weak.PNG

3) From relationship sets to relations (without constraints)

relSet.PNG

4) From ISA Hierarchies to relations

5) From aggregation to relations
*Remember that aggregation is a summary of our database: hence, we will have the primary key of each database and the descriptive attribute of the relationship between the relations (in this example : ssn, did, pid AND until).

aggregation.PNG

6A) From relationship sets to relations (with key constraints - METHOD 1)

keyContraint.PNG

primKey.PNG

6B) From relationship sets to relations (with key constraints - METHOD 2)

7) From relationship sets to relations (with key and participation constraints)

manager.PNG

As a last word, referential integrity (consistency in data manipulation) is supported in SQL trough 4 options:

As an example, if we have 2 related databases Enrolled and Students, deleting the studID from one table will also delete the studID from the other table, if we mention ON DELETE CASCADE. This is done in order to preserve data consistency (referential integrity).

Now that we are done with the rules to translate ER to relational model, here is a quick summary of the relation model:

 
55
Kudos
 
55
Kudos

Now read this

Intro to Database Systems : Indexing Part 2 - B+ trees

In the previous section, Indexing Part 1, we’ve seen that building an index for frequently used attributes considerably increases the efficiency of a query. In this section we’ll discuss the most widely used index implementation: the B+... Continue →