March 18, 2015

Intro to Database Systems - Part 11 to 13 : Integrity Constraints

Constraints are there to impose restrictions on what data is allowed in the database. The basic E/R model introduces us to 2 types of constraints:

KEYS
Multiplicities

The relational model allows us to be more flexible. It imposes bounds on attributes. For now, we are able to force an attribute to be:

non null
of a specific domain (integer, string, etc)

But what if we need to express the situation of a student that can only take some courses if his GPA > 3.0? What if, in order to be a teacher assistant, you need to have successfully passed that course with an A or you need to have a 3.0 of cumulative GPA? By introducing the SQL keyword CHECK, we introduce 2 new notions:

attribute-based checks : fancy name to mention that we are imposing a constraint on a column
tuple-based checks: imposing constraints on rows.

We are introducing checks (and in general, constraints), for the following main reasons:

to catch data-entry errors (we can only input a GPA between 0.0 and 4.0)
as correctness criteria for database updates (GPA can only be an int)
to enforce consistency across data (by setting primary keys)

Here are a few examples of an attribute-based check:

CREATE TABLE tableName (column columnType CHECK (anyPossibleCondition)
CREATE TABLE Student (GPA float CHECK (GPA <= 40 AND GPA > 0));
CREATE TABLE Sells ( bar CHAR(20), beer CHAR(20) CHECK ( beer IN (SELECT name FROM Beers)), price REAL CHECK ( price <= 5.00 ) );
CREATE TABLE Student ( major char(10) CHECK(major NOT IN (SELECT dept-name FROM Dept WHERE status = “full”) );

It is important to notice that:

checks are performed on update or insert, not on delete
if condition is violated, the system will reject the update/edit
conditions may include subqueries that evaluate as conditions

For the cases where we play with more than one one column, we need to use tuple-based checks (basically row checks). But aren’t rows composed by columns? If we want to check on a row, why can’t we simply do the following:

CREATE table….. column1 CHECK(column1 >= 1,000), column2 CHECK(column2 >12000)

We can, only that the previous command checks column1 AND column2, not column1 OR column2. What if we want to validate a check on one OR another attribute? That’s where tuple-based checks are useful. The syntax is slightly different:

CREATE table….. column1 …, column2…., CHECK(column1 >= 1,000 or column2 >12000)

What if we want to modify our constraints, or remove them? What if for this year you don’t need to have a 3.0 GPA to be accepted to an MBA, but a 2.6? In that case, we need to use a naming constraint convention.

In the following example, we have 2 constraints:

skaterID is an independent key
rating should be between 1 and 10

If we want to name our constraints or to edit them, we’ll do the following:

CREATE TABLE Skaters ( skaterID INT NOT NULL, skaterName VARCHAR(20), rating INT CONTRAINT rat CHECK (rating > 0 and rating < 11) age INT, CONTRAINT primKey PRIMARY KEY (sid), CONTRAINT rateAndAge CHECK (rating <= 4 OR age >5) );

If we want to remove a constraint:

ALTER TABLE Skaters DROP CONSTRAINT rateAndAge;

If we want to add a constraint:

ALTER TABLE Skaters ADD CONSTRAINT rateAngAge CHECK (rating <= 5 OR age >5);

Kudos

Intro to Database Systems - Part 11 to 13 : Integrity Constraints

Now read this

Intro to Database Systems - Part 2 : The Entity-Relationship Model