January 29, 2015

Intro to Database Systems - Part 4 & 5 : Relational Algebra

Let’s start by understanding what relational algebra is:

it is a query language (not a programming language)
takes instances of relations as input (basically rows from a table, also called tuples)
yields instances of relations as output
uses operators to perform queries (unary - applied on a single relation, or binary - applied on two relations).
its role? define database operations in terms of algebraic expressions

A query language allows manipulation and retrieval of data from a database. It is useful to explain how a SQL query is executed internally.

Understanding relational algebra is basically understanding SQL. Some basic operations from relational algebra are:

Select (σ)
Project (∏)
Rename (ρ)
Union (∪)
Intersection (∩)
Set difference (-)
Cross product (X) - also called Cartesian product or Cross join
Condition join - also called Theta-join
Equi-join

The select operation is an unary operation noted σc( R ). It returns a subset of rows from the input relation R, satisfying condition C. This constraint is expressed as the predicate. The predicate is a boolean expression whose operators are:

logical connectives : and, or, not
arithmetic comparisons

Considering a table of employees E, this is an example of the select operation in SQL:

example 1 : SELECT * FROM E where salary < 200

example 2 : SELECT * FROM E where salary < 200 and nr >= 7

Under the relation algebra form, we’ll have:

σ(salary < 200)(E)
σ(salary < 200 and nr >= 7)(E)

It is important to notice that:

the condition C is related to the attributes of relation R
there are no duplicates in the output
it’s an easy procedure : for each row, check if condition C is satisfied. If that’s the case, add that row to the result relation.

The project operation is an unary operation noted ∏L ( R ), where L is a list of attributes (A1, A2, A3, …, An) and R is the input relation. It extracts a subset of columns (the attributes) in a relation and discards the rest. If the user is interested in selecting the values of a few attributes, rather than selecting all attributes of the relation/table, one should go for the project operation.

This is an example of the project operation in SQL:

SELECT salary FROM E

SELECT nr, salary FROM E

Under the relational algebra form, we’ll have:

∏(salary)(E)
∏(nr, salary)(E)

Note that again, the process is simple:

extract the columns
eliminate duplicated rows (NOTE: real systems don’t eliminate duplicated, unless explicitly asked. This operation is very costly.)

We can also use operator composition: embed the result of one operation as an input to another relation. Example:

in SQL we’ll have : SELECT nr, salary FROM E where salary < 200 and nr >= 7

in relational algebra we’ll have : ∏(nr, salary)(σ(salary < 200 and nr >= 7)(E))

The rename operator is a unary operator noted ρ( X(x1,x2,…,xn), Y(y1, y2, …, yn)) where X is identical to the initial relation Y, but with a new relation name X and new attribute names (x1 to xn). Example:

in SQL we’ll have : SELECT nr AS ‘Employee_Number’, salary AS ‘Monthly_Salary’ FROM E
in relational algebra we’ll have : ρ( EMPLOYEE(Employee_Number, Monthly_Salary), E(nr, salary))

The union operator takes as input two relations R and S.

it joins or includes all tuples (rows) that are in R or in S
duplicate tuples (rows) are eliminated
both relations should have the same number of attributes (columns)
attribute domains must be compatible

As an example,

in SQL we’ll have : SELECT * FROM R UNION SELECT * FROM S
in relational algebra we’ll have : R ∪ S

The intersection operator is applied when only rows shared by both A and B are included in the result. It has the same preconditions as the union operator (no duplicates, same number of attributes, domain compatibility).

in SQL we’ll have: SELECT * FROM A INTERSECT * FROM B
more particularly, we can also an intersection of a specific attribute : **SELECT Surname FROM Graduates INTERSECT Surname FROM Managers
in relational algebra, the general case is represented by : A ∩ B

The set difference operator between two relations A - B results in a relation containing rows that are in A but not in B.

in SQL we’ll have: SELECT * FROM A MINUS SELECT * FROM B
again, we can have a particular case where we consider only certain attributes (columns): SELECT Surname FROM Graduates MINUS Surname FROM Managers
in relational algebra, the general case is represented by : A - B

The cross product operator between two relations A X B is the set of pairs that can be formed by pairing each row of A with each row of B.

in SQL we’ll have: SELECT * FROM A, B
in relational algebra we’ll have: A X B

If it happens that attributes have the same name in the resulting relation, we should prefix the similar attributes with the initial relation name (example : id is an attribute of relation school, but also of relation library. In the resulting relation, there should be an attribute for school id and another one for library id).

An easy way to implement the cross product operator is a nested loop. For each row from table A, pair it with a row from table B.

The condition join is just a cross product satisfying a condition. We apply the nested loop method in order to create the pairs, but only in the case where the condition is satisfied. It’s the selection operation applied to a cross product.

In SQL we’ll have : SELECT * FROM Employee, Department WHERE dept = depNr

The equi-join is simply a condition join where the condition is an equality.

The natural-join is an equi-join if values from columns with the same name are equal. If relation A has attribute (column) studID and relation B has attribute studID, rows will be paired only if their value are identical (relation A has a student with studID 99 and relation B has a student with stutID 99). In other words, we need 2 conditions to apply the paring:

column name is the same in both relations
row value under that column is the same in both relations.

As a conclusion, here are some properties of relational algebra:

Commutativity between project and select, if the condition C considers attributes of the subset L.

Commutativity between cross product inputs

Associativity between cross product inputs

Idempotence of project operator, if L2 is a subset of L1

Idempotence of select operator

Kudos

Intro to Database Systems - Part 4 & 5 : Relational Algebra

Now read this

Intro to Database Systems : Indexing