Querying Databases in SQL Part 2
July 04, 2018
Objectives
- Filter data using advanced SQL commands (Group By, Having).
- Modify tables using Update, Drop, and Truncate
- Creating relationships between tables using Joins (Inner, outer, union)
Filtering Data using Advanced Commands
For the following example queries, continue to use the WholeSale Database from the previous section on SqlFiddle.
Historical Sales by Client (Sum
, AS
, Group By
)
Impressed by your quick turnaround, your boss wants you to send him all historical sales data per client. In this query we will learn how to give names to columns in the final report.
To do this, simply add AS custom_column_name
after the column_name
. In the query below, you can see that client_name
will be named as Client.
Next, we will perform our first calculation. All sales are under the Total_Cost
column in the sales table. The goal is to sum the Total_Cost
column by client. In order to do that you have to do two things:
- First, write
sum(column_name)
to indicate which calculation needs to be performed on which column - Next, after specifying which table to retrieve data from, write
group by column_name
to indicate which column should be aggregated to sum the cost.
In order to sort the result to show client with highest sale first, simply use order by
. As you can see, instead of specifying column name, you can simply put a number which in this case would mean to sort data by second column, i.e. Sale
. Using desc
at the end indicates it should be sorted in the descending order.
Query:
SELECT client_name AS Client, sum(total_cost) AS Sale
FROM sales
GROUP BY client_name
ORDER BY 2 DESC;
Result:
Historical Sales by Client by Month
Let’s take the above example a step further.
We will now display total sale by client by month. Since our date of sale is stored as a regular date with day, month and year, we have to extract month from it.
Here we would like to discuss two functions: month
and monthname
.
- To get the numeric value of the month use the
month
function. - To get the name of the month, use the
monthname
function.
For example, if date is ’03-26-2018’, the month
function will extract 3
from the date and the monthname
function will output March
.
Next, since we are aggregating across two columns both of them should be mentioned in the group by
clause. Finally, the order by
clause is an excellent example of how you can you both column name and a number to indicate which column to sort and in what order. The reason we use month for Order By
clause is that you want to sort it by the month’s numeric value instead of name. If you used month name in the clause then it would be ordered as April, February and March instead of February, March and April.
Query:
SELECT client_name AS Client, monthname(sale_date) AS Sale_Month, sum(total_cost) AS Total_Sale
FROM sales
GROUP BY client_name, month(sale_date)
ORDER BY month(sale_date), 3 DESC;
Result:
List of Total Product Sale by Client
Let’s preview an advanced query. For this query you will pull data from both the Product
and Sales
table. The goal of this query is to view which product each client ordered and the total amount spent per product. First, in order to select data from multiple tables they should be specified after from
and separated by commas. Since tables can have columns that share a name, after select each column will be specified as table_name.column_name
. If you wanted to take all data from one table, you can write table_name.*
. This would pull all columns from that table.
As learned earlier, you can use as
to provide a custom name to any column for the final output.
Here is where ID is important. Similar to vlookup
in Excel, there should a column that’s a reference point across both tables. The Product_ID
column in the Sales
table are in fact IDs from the ID
column of the Product
table. In order to pull data from both tables, we write a where clause to indicate that ID
from Product
table should match Product_ID
of the Sales table.
Query:
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_Sale
FROM Sales, Product
WHERE Sales.Product_ID = Product.ID
GROUP BY sales.client_name, product.product;
Result:
Having
Now let’s say you only want to see records where the total sale was more than $200. You cannot use where
or and
clause since the column you want to filter is an aggregate column. In this case, the HAVING
clause can be used. At the end of the above query, you simply add the condition that you only want results that satisfies the condition that total sale is greater than 200.
Query:
SELECT sales.client_name, product.product, sum(sales.total_cost) as Total_Sale
FROM Sales, Product
WHERE Sales.Product_ID = Product.ID
GROUP BY sales.client_name, product.product
HAVING Total_Sale > 200;
Result:
Modifying Tables using Update, Drop, and Truncate
Updating Entries
Your sales manager notified you that the database incorrectly shows that we sold 6 boxes to Panda Restaurant. He wants you to update the database to reflect that only 5 boxes were sold. In order to do that we will use the UPDATE
function.
Please note that code should be written in the schema (left) panel to add or change data. Since incorrect code can result in more or less updates than desired or worse case corrupts the database, best practice is to first write a select statement. Running the select statement will show you which records will be updated. In our example, we want to select the record where client is Panda Restaurant and Product ID is 2 since 2 corresponds to Box.
Query:
SELECT * FROM Sales
WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
Result:
The select statement is referring to the correct entry. Since the Quantity
changes, it will also impact Total_Cost
. You want to update both the Quantity
to 5 and Total_Cost
to 250 (5 boxes at $50/box).
Instead of selecting, you will start the query with “UPDATE” and before using the where clause to define conditions you will specify the column names that need to change and their corresponding values.
Syntax:
UPDATE Table_Name
SET Column_Name_1 = Updated Value_1, …., Column_Name_n = Updated Value_n,
WHERE condition to get correct record(s);
Query:
On the Left Panel, type the following below all of the other CREATE
and INSERT
statements, and then hit “Build Schema”.
UPDATE Sales
SET Quantity = 5, Total_Cost = 250
WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
Let’s run our SELECT
query again to make sure it worked. Type the following into the right Query Panel.
SELECT * FROM Sales
WHERE client_name = 'Panda Restaurant' and Product_ID = 2;
Result:
Drop Table
As the database grows there may be obsolete tables. An annual cleanup is recommended to ensure we are not wasting database spaces with old or archaic data which in turn affects performance of the code. Let’s delete the entire Sales
table.
Syntax:
DROP TABLE table_name;
Query:
DROP TABLE Sales;
Run the above in the left panel!
If you click Browser on the left panel, you can see that we only have the Product table now. Sales table has been deleted. Be extremely careful before you drop a table — make sure everything is right, i.e. the table name, etc.
Truncate Table
There could be a scenario where you want to delete all data but still keep the table. When dealing with large datasets, database designers will often have a historical table. An example could be a current and historical revenue table. Current revenue table will have this year’s revenue data while a historical revenue table will have all revenue data since inception. After moving data from current to historical table, you will want to delete all data from current table. In order to that, you use the “TRUNCATE” function.
Syntax:
TRUNCATE TABLE table_name;
For practice, we will delete data from our Sales table. Since we deleted the Sales table above, let’s go ahead and rebuild the entire schema. Copy and paste the CREATE
and INSERT
statements from the setup section above into the left panel. Once it’s built, type below code on the left panel, after the above statements.
Query:
TRUNCATE TABLE Sales;
If you click Browser, you can see that we still have both tables.
Result:
Write the below query on the right panel. From the result you can see that table exists but it has no records.
Let’s summarize the key difference between Drop and Truncate function. The former deletes the table and all content while the latter only deletes the content of the table.
Joins
SQL can return data across multiple tables. Joins are used to specify which data is needed from the tables. Join has two parts to it. First you specify a common value that should match between tables to return value. Next, you specify which values to return. Below are the types of joins.
- Inner Join
- Right Join
- Left Join
- Full Outer Join
With inflation rising and after reviewing client reviews, the company has decided to raise prices for some products and add some new products too.
Let’s use a new schema for this portion. Copy the following into the left panel, and then click “Build Schema”.
CREATE TABLE Product_Old
(`ID` int, `Product` varchar(128), `Cost` int)
;
INSERT INTO Product_Old
(`ID`, `Product`, `Cost`)
VALUES
(1, 'Cup', 10),
(2, 'Box', 50),
(3, 'Plate', 15),
(4, 'Spoon', 7),
(5, 'Knife', 7),
(6, 'Fork', 7),
(7, 'Paper Napkin', 15),
(8, 'Ketchup', 9),
(9, 'Salt', 2),
(10, 'Pepper', 2)
;
CREATE TABLE Product_New
(`ID` int, `Product` varchar(128), `Cost` int)
;
INSERT INTO Product_New
(`ID`, `Product`, `Cost`)
VALUES
(1, 'Cup', 12),
(2, 'Box', 55),
(3, 'Plate', 15),
(4, 'Spoon', 8),
(5, 'Knife', 8),
(6, 'Fork', 8),
(7, 'Paper Napkin', 15),
(8, 'Ketchup', 9),
(9, 'Salt', 2),
(10, 'Pepper', 2),
(11, 'Mustard', 3),
(12, 'Mayonnaise', 3),
(13, 'Silver Spoon', 20),
(14, 'Silver Fork', 20),
(15, 'Star Wars napkin', 17)
;
Inner Join
It only returns values that match in both tables. This is similar to the intersection concept in Set theory.
Query:
SELECT *
FROM Product_old
INNER JOIN Product_new ON Product_old.ID = Product_new.ID;
Result:
If you wanted to see the list of products with a change in cost you can use a where clause learned earlier in the lesson. Using the not equal to, <>
, operator in the where clause, you can specify that the results should only include products where the cost doesn’t match.
Query:
SELECT *
FROM Product_old
INNER JOIN Product_new ON Product_old.ID = Product_new.ID
where Product_old.cost <> Product_new.cost;
Result:
Right Join
This will give you all values that match between both tables as well as any extra values in the second table.
Query:
SELECT *
FROM Product_old
RIGHT JOIN Product_New ON Product_old.ID = Product_new.ID;
Result:
Since we’ve added products to the new table, you can see that it includes all the values from the new table. Additionally, it shows null values for the old table to indicate that there are no matching values from old table.
Left Join
This will give you all values that match between both tables as well as any extra values in the first table.
Query:
SELECT *
FROM Product_old
LEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Result:
Since there are no products in the old table that aren’t present in the new table, the query shows all products from both tables.
Full Outer Join
This will give you all values from both tables. This is similar to the union concept of set theory. MySQL doesn’t recognize this function so we will use an alternative way of getting all values. This operation is called is “UNION”. It combines the results from two select statements.
Query:
SELECT *
FROM Product_old
RIGHT JOIN Product_New ON Product_old.ID = Product_new.ID
UNION
SELECT *
FROM Product_old
LEFT JOIN Product_New ON Product_old.ID = Product_new.ID;
Result:
Joins are an excellent way to analyze data across tables. You can view which data points changed over time, get a more holistic view of your data and understand business drivers.
Ask SQL!
SQL can answer your data related question that can directly impact your business’ bottom line.
- Who are my Top 5 clients? This information can be used to offer incentives to keep your top clients loyal.
- What is the busiest week or busiest month? This can help manage staff and your time better.
- Send me a notification when my inventory for cups drops below 100 cases. This ensures you always have all your products in stock.
- Which product is high in stock and which clients are the most likely to buy them? Marketing division can send promotions for those products to those clients.
Leveraging data you collect to help you make important sales and marketing decisions without any prior programming knowledge — this is the reason SQL is used widely across industries.
We will learn how to write more queries with a larger database to get such answers in the next article. Stay tuned!