Intro to Database Systems - Part 2 : The Entity-Relationship Model
This page will cover the following keywords : E/R diagram, E/R schema, entities, their attributes, entity sets, keys, ISA hierarchies, overlap constraints, covering constraints (specialization, generalization), relationships, relationship sets, key-constraints (many-to-many, one-to-many, many-to-one, one-to-one), participation constraints (partial / total). You should be able to explain them in your own words at the end of this page.
Before we start, let’s clarify what “Entity-Relationship Model” is. In a nutshell, the ERM is a “GUI for DBs” : a graphical representation of the requirements for a database. Within this lecture, we will notice it under different names, but they all mean the same:
- ER diagram
- ER schema
- semantic schema
A little confusion might occur between Entity-Relationship schema and relational schema. The relational schema refers to the schema of the relation (in red) discussed in the last post. The relational schema describes the columns in a database table.
On the other hand, an Entity-Relationship schema is only the graphical representation of the entities and their relationship. It has to be :
- the representation of the data model
- understandable by everyone
- easy to go from it to the relational schema
Hence, a relational schema is NOT a synonym of a semantic schema. The first is a table, the second is a diagram.
Following a process to design a database makes it easier to come up with the relational schema. The following are the 3 steps in database design:
- Analyzing the requirement : 1) data reqs - what’s the data 2) functional reqs - what are the operations applied on the data
- Modeling the requirements : using an Entity-Relationship model, a diagram connecting entities trough relationships
- Implementing the requirements: implementing the relational schema
The requirements can be summarized to 2 questions :
- What are the entities and the operations executed on them?
- What are the relationships between the entities and the operations describing them?
The requirements are often mentioned in plain text. They contain data and their functionality.
An example of requirements of a database is a db of courses. The different courses offered during the winter semester are the data. The functionality will be the fact that students can register (class is not full yet), courses get canceled, TA is assigned.
Another example of requirements is a database of books. Titles, prices, authors are the data. The fact that a book might be discounted, out of stock or lost is the functionality.
Now that we’ve got an essence of the design process of the DB, let’s dig in into more precise vocabulary. We’ve already defined a few of those:
- entity: an object. It has to be distinguishable: a nurse and a doctor are 2 different entities, even if they are both employees.
- an entity set : collection of entities (employees in an hospital, all the account entities for a bank). They all have the same attributes (each employee has an id, a salary, number of hours due, etc).
- attribute : a property/characteristic of an entity or an entity set (employee id).
- key : set of one or more attributes allowing to distinguish an entity from another entity within an entity set (teachers have the first 4 digits of their employee id starting with 9999, teacher-assistants with 3333; vacation days for TAs can’t be greater than 0; graduation year, etc)
- domain : set of values that may be assigned to the attribute (in the case of size: positive integer; name: string[30]; passed: yes or no; political party: democrat, republican, independent, other)
- ISA Hierarchies : inheritance for the Entity Relationship Model (presence of subclasses/superclasses).
An example of an ISA (is a) hierarchy is a database where we have employees as full time and also as contractual. We say an employee is a hourly employee. Another example is a database of living things. A living thing is a plant, but we also have living things as animals in the db. An animal is a fish, but we also have birds. A bird is a land-bird, but we also have aquatics birds. An aquatic bird is a great-blue-heron.
If an entity A is also an entity B, the key is only in B, the superclass.
We have 2 types of constraints on ISA hierarchies:
- overlap constraint (allowed/disallowed) : is it allowed to have an entity being part of more than one subclass? (example : a full time employee implies that he is paid hourly; a contractual employee usually gets paid with a fixed salary; hence, a full time employee can’t be a contract employee : overlap disallowed).
- covering constraint (yes/no) : does every employees entity also have to be an hourly employee or a contract employee? Does every hourly employee and contract employee are employees of this organization? (yes).
We implement ISA Hierarchies because we want to:
- add descriptive attributes specific to a subclass
- identify entities participating in a relationship
ISA Hierarchies also have 2 important properties:
- Generalization : a bottom-up design, we extract shared characteristics from 2 or more classes and combine them into a general superclass. An account is a saving account and a checking account.
- Specialization : a top-down design, we create new subclasses from an existing class. We start with the superclass, we define our subclasses and we find their specific attributes and relationships.
Weak entities are entities that can only exist when owned by another one. A ROOM can only exist in a BUILDING. A QUESTION is a strong entity and an ANSWER is a weak entity because the answer needs a question to exist.
- Weak entities are represented in bold
- Partial keys are represented with dashed lines (in the example below, we’re illustrating that multiple sections can have books with the same reference number, but each book within a section will have an unique reference number)
A relationship is an association between two or more entities.
A relationship set is a collection of similar relationships: “increase”, “decrease”, “maintain” are all relationships part of the set “progression”.
Relationships are described by :
- types : unary, binary, ternary;
- constraints : multiplicity (key-constraints), participation.
A relationship type can be classified by the number of entity types involved:
- unary relationship type : a relationship type between entities in a single entity type (“friendship” within the entity type PERSON);
- binary relationship type : a relationship type between entities in one entity type and entities in another entity type (“purchased” between entity types CUSTOMER and PRODUCT);
- ternary relationship type : a relationship involving more than 2 types (a PROFESSOR “recommends” a TEXTBOOK on a certain COURSE, we have a relationship connecting 3 types).
A relationship type also has a multiplicity constraint (key constraint), which is the maximum and the minimum elements allowed in the set.
- one-to-many multiplicity: one entity of A is related by R to many entities of B AND one B is related (by R’s inverse) to one entity A. R will be called “one to many”.
In the same time, many entities (in our case of type B) are related to one entity (in our case of type A). We call this many-to-one and we depict it with an arrow from the element that’s not unique (in our case of type B) to the element that’s unique (in our case of type A).
We depict it with an arrow from both elements.
In the case where we have a relationship between products and invoice, to generally derive a one:many relationship we should ask those 2 questions, both starting with the keyword one:
- one customer (entity A) has received ( relationship ) how many invoices (entity B). [zero or more]
- one invoice was sent to how many customers? [one]
In the case where we have a relationship between customers and products, to generally derive a many-many relationship, we should ask those 2 questions:
- one customer purchased how many product types? [one or more]
- one product type was purchased by how many users? [zero or more]
In the case where we have a relationship between a birth certificate and its owner, to generally to derive a one-to-one relationship, we should ask those 2 questions:
- how many birth certificates has a person? [one]
- how many persons is a birth certificate owned by? [one]
Another constraint that relationships have are participation constraints. The participation can be total or partial. An example of total participation is the fact that every department has a manager. This is represented by a thick line. An example of partial participation is the fact that not every employee is a manager.
As design principles, we have to make sure to :
- keep it simple : avoid redundancies
- capture as many constraints as possible.
We need to make sure that we have the correct structure in place:
- entities or attributes?
- attributes or relationships?
- what type of relationships we have? (binary, ternary, any key-constraints?)
That’s all folks!
Thank you for reading,
@dandancrisan