Intro to Database Systems - Part 6 & 7 : Basic SQL
This part won’t cover the notions of passing from relational algebra to SQL since this has already been covered in the previous post. In this section, we will start by introducing a main overview of a basic SQL query. Then, we structure this overview into 10 keywords and 6 short notions commonly used in SQL :
- 1) When to use * (star)
- 2) Operators
- 3) The DISTINCT keyword
- 4) Renaming
- 5) Evaluations within queries
- 6) Constraints
- 7) The ORDER BY keyword
- 8) Range variables
- 9) The JOIN keyword
- 10) The UNION keyword
- 11) The INTERSECT keyword
- 12) The EXCEPT keyword
- 13) The ALL keyword : sets & multisets
- 14) The IN keyword
- 15) The EXISTS keyword
- 16) The ALL/ANY/SOME keywords : quantifiers
The most basic form of a SQL query looks like the following :
- SELECT CustomerName FROM Customers WHERE Country=“Mexico”
More generally, we’ll have:
- SELECT (what columns we want to keep?) FROM (from which database, which relations?) WHERE (under which conditions, if any)
Before moving on with the common notions, it’s important to have in mind that an SQL query returns a set of the initial table, not a multi-set: the returning table doesn’t contain duplicates, unless specified otherwise. This is a major difference with the relational algebra.
1) When to use * (star) ?
Simply when you want to extract all columns satisfying a certain condition from that db. Example:
- SELECT * FROM Customers WHERE Country=“Mexico”
- This will extract all info on the customers originating from Mexico.
2) OPERATORS
We’ll notice that the WHERE clause has the following form:
- WHERE column_name operator value
The usual arithmetic operators don’t really need an introduction. Let’s spend some time understanding the most particular ones: BETWEEN, LIKE and IN.
The BETWEEN keyword expresses values within a range.
- SELECT * FROM Products WHERE Price BETWEEN 10 AND 20;
- SELECT * FROM Products WHERE ProductName BETWEEN ‘C’ AND ‘M’ : select all products with a ProductName beginning with any of the letters BETWEEN ‘C’ and ‘M’
The LIKE keyword searches a specific pattern in a column.
- SELECT * FROM Customers WHERE City LIKE “a%” : selects all the customers coming from a city starting with the letter “a”. % means a substitute for 0 or more characters. It’s a way of saying “we don’t are about whatever comes after”.
- SELECT * FROM Customers WHERE City LIKE “ber%”;
- SELECT * FROM Customers WHERE City LIKE “%es%”: all the customers coming from a city containing the string “es”
- SELECT * FROM Customers WHERE City LIKE “ _ erlin”: selects all the customers coming from a city starting with any letter, followed by “erlin”._ means a substitute for a single character.
- SELECT * FROM Customers WHERE City LIKE “[bsp]%” : selects all the customers coming from a city starting with “b”, “s” or “p”.
- SELECT * FROM Customers WHERE City LIKE “[a-c]%” : selects all the customers coming from a city starting with “a”. “b” or “c”.
- SELECT * FROM Customers WHERE City LIKE “[!bsp]%” : selects all the customers coming from a city NOT starting with “b”, “s” or “p”.
The IN keyword allows us to force pick a value from the columns.
- SELECT * FROM Customers WHERE City IN (“Paris”, “London”) : selects all the customers coming from either Paris or London.
3) The DISTINCT keyword
We use DISTINCT to list different values:
- SELECT DISTINCT city, state FROM Suppliers : displays 2 columns with the different cities and the different states of the db “Suppliers” .
4) Renaming
Renaming has the following form:
- renaming columns : SELECT column_name(s) AS alias_name FROM table_name;
- renaming db : SELECT column_name(s) FROM table_name AS alias_name;
Renaming to a column name that requires spaces involves either using square brackets or “ ”.
- SELECT CustomerName AS Customer, ContactName AS [Contact Person] FROM Customers;
We can also merge columns in a single one under a single name. The new column will contain values separated by commas.
- SELECT CustomerName, Address+‘, ’+City+‘, ’+PostalCode+‘, ’+Country AS Address FROM Customers : the four columns (Address, City, PostalCode, and Country) are combined under an alias named “Address”.
5) Evaluations within queries
The following is an example of how we can add 1 to all the columns selected:
- SELECT sname, rating+1 AS upgrade FROM Skaters
6) Constraints
SQL provides also a possibility to force constants default values .
- SELECT rating AS reality, ‘10’ AS dream FROM Skaters
7) The ORDER BY keyword
This keyword is used to sort columns in ascending order by default. We need to add ASC or DESC if we want to precise the order.
- SELECT * FROM Customers ORDER BY Country : select all customers sorted by “Country”
- SELECT * FROM Customers ORDER BY Country DESC : selection sorted in descending order of the country.
8) Range variables
Range variables are just a way to precise which columns we want in the case where we have 2 databases containing the same attribute.
- SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate FROM Orders, Customers WHERE Orders.CustomerID=Customers.CustomerID : Customer.ID is an attribute of Orders but also of Customers, hence we need a way to differentiate it.
This SQL command will produce something similar to the following:
9) The JOIN keyword
We use JOIN to subset from a merge of two tables. We subset the specific rows when there is at least one match in BOTH tables.
Let’s illustrate the procedure with an example. If we want the orderIDs, customerNames, and the customerCountries of all the customers that order something, we will :
- SELECT Orders.OrderID, Customers.CustomerName, Customers.Country FROM Orders INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
10) The UNION keyword
We use the UNION keyword when we want to merge 2 relations.
Those relations imply 2 things:
- participating tables need to have the same number of columns
- attributes, taken in order, should have the same domain
Again, let’s illustrate the process with an example: we need to find the skaters that have participated in a regional OR a local competition. We can translate this in 2 ways:
- SELECT Participants.sid FROM Participants, Competition WHERE Participants.sid = Participants.sid AND (Competition.type = “regional” OR Competition.type = “local”)
- SELECT Participants.sid FROM Participants, Competition WHERE Participants.sid = Competition.sid AND Competition.type = “local” UNION SELECT Participants.sid FROM Participates, Competition WHERE Participants.sid = Competition.cid AND Competition.type = “regional”
11) The INTERSECT keyword
We use the INTERSECT keyword when we want to have queries satisfying a condition that’s valid in both intersected relations. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results.
Again, those relations imply 2 things:
- participating tables need to have the same number of columns
- attributes, taken in order, should have the same domain
Let’s illustrate the process with an example: we need to find the skaters that have participated in a regional AND a local competition. We can translate this in 2 ways:
- SELECT Participants.sid FROM Participants, Competition WHERE Participants.sid = Participants.sid AND (Competition.type = “regional” AND Competition.type = “local”)
- SELECT Participants.sid FROM Participants, Competition WHERE Participants.sid = Competition.sid AND Competition.type = “local” INTERSECT SELECT Participants.sid FROM Participates, Competition WHERE Participants.sid = Competition.cid AND Competition.type = “regional”
12) The EXCEPT keyword
We use the EXCEPT keyword when we want to have queries satisfying a condition that’s valid in both intersected relations. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results.
Again, those relations imply 2 things:
- participating tables need to have the same number of columns
- attributes, taken in order, should have the same domain
Let’s illustrate the process with an example: we need to find skaters that have participated in a local but not in a regional competition
- SELECT Participants.sid FROM Participants, Competition WHERE Participants.sid = Competition.sid AND Competition.type = “local” EXCEPT SELECT Participants.sid FROM Participates, Competition WHERE Participants.sid = Competition.cid AND Competition.type = “regional”
13) The ALL keyword : sets & multisets
Returning multisets imply returning duplicates. SLQ queries return sets. We can force the SQL query to return a multiset by using the keyword ALL (UNION ALL, INTERSECT ALL, EXCEPT ALL). We’ll obtain:
- Multiset Union : {1, 2, 2} ∪ {1, 2, 3, 3} = {1, 1, 2, 2, 2, 3, 3}
- Multiset Intersection : {1, 2, 2} ∩ {1, 1, 2, 2, 3, 3} = {1, 2,2}
- Multiset Difference : {1, 2, 2} - {1, 2, 3, 3} = {2}
14) The IN keyword
We already mentioned that the IN keyword forces the WHERE column_name IN (value1,value2,…) condition to pick within a predefined value.
The best example to illustrate this scenario is the following: find skaters who have NOT participated in competition 103.
- SELECT Skaters.name FROM Skaters WHERE Skaters.sid NOT IN (SELECT Participants.sid FROM Participants WHERE Participants.cid = 103)
15) The EXISTS keyword
The EXISTS keyword works like the IN keyword, but on a subquery. We will have WHERE EXISTS ( subquery ); . The main query returns a result if the subquery is non empty.
If we need to find names of skaters who have participated in competition 103, we will :
- SELECT Skaters.name FROM Skaters WHERE EXISTS (SELECT * FROM Participants WHERE Participants.cid = 103 *AND * Participants.sid = Skaters.sid)
16) The ALL/ANY/SOME keywords : quantifiers
Those keywords work just like the EXISTS keyword. They are optimizers. the following examples are equivalent of the same scenario.
- SELECT emplNo, sal FROM Employees WHERE sal > ALL (2000, 3000, 4000);
- SELECT emplNo, sal FROM Employees WHERE sal > 2000 AND sal > 3000 AND sal > 4000;
Here is a quick overview of the scenarios where ALL can be used :
- “x = ALL (…)”: The value must match all the values in the list to evaluate to TRUE.
- “x != ALL (…)”: The value must not match any values in the list to evaluate to TRUE.
- “x > ALL (…)”: The value must be greater than the biggest value in the list to evaluate to TRUE.
- “x < ALL (…)”: The value must be smaller than the smallest value in the list to evaluate to TRUE.
- “x >= ALL (…)”: The value must be greater than or equal to the biggest value in the list to evaluate to TRUE.
- “x <= ALL (…)”: The value must be smaller than or equal to the smallest value in the list to evaluate to TRUE.
- If a subquery returns zero rows, the condition evaluates to TRUE.
For the keyword ANY or SOME, both are equivalent and interchangeable:
- “x = ANY (…)”: The value must match one or more values in the list to evaluate to TRUE.
- “x != ANY (…)”: The value must not match one or more values in the list to evaluate to TRUE.
- “x > ANY (…)”: The value must be greater than the smallest value in the list to evaluate to TRUE.
- “x < ANY (…)”: The value must be smaller than the biggest value in the list to evaluate to TRUE.
- “x >= ANY (…)”: The value must be greater than or equal to the smallest value in the list to evaluate to TRUE.
- “x <= ANY (…)”: The value must be smaller than or equal to the biggest value in the list to evaluate to TRUE.
- If a subquery returns zero rows, the condition evaluates to FALSE.
We will continue with more advanced SQL queries in the next post.