SQL Query Showcase
Introduction:
This is a showcase of sample SQL queries that I made for practice in learning the language. The sources listed before each query link to the direct place the content came from, unless noted otherwise. Each heading represents a concept of SQL that I touched on.
Primary Sources: DataLemur by Nick Singh SQL-practice.com by Boolean-algebra
Common Table Expression (CTE)
Source: Previous Interview Assessment
Context: A restaurant company wants to use this dataset to inform management of trends with how customer order their products
Table: Customers
Index | customerOloRef | firstName | lastName | emailAddress | phoneNumber |
---|---|---|---|---|---|
0 | 1210852832 | Ashlee | Black | 718798@test.com | 18539137645 |
1 | 1210862959 | Karen | Taylor | 908278@test.com | 17118714180 |
2 | 1211222409 | Rachel | Glover | 990342@test.com | 14160108199 |
3 | 1210884227 | Kevin | Barnett | 816118@test.com | 16375662719 |
4 | 1211205145 | Andrew | Oneal | 694351@test.com | 18535103663 |
5 | 1210892053 | Matthew | Cobb | 403207@test.com | 12141628213 |
6 | 1210878061 | Julie | Rice | 107345@test.com | 11470213737 |
7 | 1210868677 | Joshua | Williams | 922872@test.com | 10627821511 |
8 | 1210850046 | Kimberly | Miller | 734453@test.com | 14769450538 |
…
Table: Orders
Index | orderOloRef | customerOloRef | storeRef | timePlaced | orderedFromFave | source | handoff | paymentType | zipcode |
---|---|---|---|---|---|---|---|---|---|
0 | 1526136030 | 1212424032 | 25 | 2021-08-10 | FALSE | CallCenter | CurbsidePickup | CreditCard | 60134 |
1 | 1636426442 | 1194539172 | 44 | 2021-08-10 | FALSE | CallCenter | CurbsidePickup | CreditCard | 65708 |
2 | 2116129966 | 1193104823 | 22 | 2021-08-10 | FALSE | API | CounterPickup | CreditCard | 60134 |
3 | 2112894463 | 1192380993 | 24 | 2021-08-10 | FALSE | API | Dispatch | CreditCard | 60543 |
4 | 1513983208 | 1194885843 | 13 | 2021-08-10 | FALSE | API | Dispatch | CreditCard | 60018 |
5 | 1645861726 | 1192678698 | 9 | 2021-08-10 | FALSE | CallCenter | Dispatch | CreditCard | 60004 |
6 | 1868589889 | 1220411684 | 27 | 2021-08-10 | FALSE | API | CurbsidePickup | CreditCard | 60431 |
7 | 1874887693 | 411086145 | 15 | 2021-08-10 | FALSE | API | Dispatch | CreditCard | 60107 |
8 | 1917609011 | 1191607575 | 37 | 2021-08-10 | FALSE | CallCenter | Dispatch | CreditCard | 60452 |
…
Table: Products
Index | orderOloRef | orderProduct | quantity | cost | orderedFromFave | source | handoff | paymentType | zipcode |
---|---|---|---|---|---|---|---|---|---|
0 | 1332720190 | Chicken Tenders 6 Piece | 1 | 8.74 | FALSE | CallCenter | CurbsidePickup | CreditCard | 60134 |
1 | 1377148167 | Big Beef Sandwich | 1 | 10.86 | FALSE | CallCenter | CurbsidePickup | CreditCard | 65708 |
2 | 1674285799 | Cheeseburger | 1 | 0 | FALSE | API | CounterPickup | CreditCard | 60134 |
3 | 1560588602 | Caesar Salad | 1 | 0 | FALSE | API | Dispatch | CreditCard | 60543 |
4 | 1514784642 | Large French Fries | 3 | 3.74 | FALSE | API | Dispatch | CreditCard | 60018 |
5 | 1731086029 | Cheese Sauce | 2 | 1.24 | FALSE | CallCenter | Dispatch | CreditCard | 60004 |
6 | 1927656374 | Chocolate Cake Slice | 2 | 4.36 | FALSE | API | CurbsidePickup | CreditCard | 60431 |
7 | 1621969249 | *New Classic Beef Bowl | 1 | 8.49 | FALSE | API | Dispatch | CreditCard | 60107 |
8 | 1781607712 | Italian Beef Sandwich | 1 | 8.11 | FALSE | CallCenter | Dispatch | CreditCard | 60452 |
…
Step 1: Tell which customers order French Fries in their past orders.
SELECT *,
CASE WHEN orderproduct LIKE '%French Fries%'
THEN 'TRUE' ELSE 'FALSE' END AS isFrenchFry
FROM "Products"
GROUP BY index, orderproduct, orderoloref, quantity, cost;
Step 2: What is the purchasing frequency of customers that order certain products?
SELECT DISTINCT p1.orderproduct as "product_a",
p2.orderproduct as "product_b",
COUNT(*) as PurchaseFrequency,
COUNT(*) * 100.0/ SUM(COUNT(*)) OVER () as PercentFreq
FROM "Products" AS p1
INNER JOIN "Products" AS p2
ON p1.orderproduct <> p2.orderproduct
AND p1.orderoloref = p2.orderoloref
GROUP BY p1.orderproduct, p2.orderproduct
ORDER BY PurchaseFrequency DESC;
Step 3 & 4: What customers ordered the biggest quanitity of the most popular items on the menu?
WITH porders AS
(
SELECT CONCAT(cust.firstname, ' ', cust.lastname) AS name,
cust.emailaddress,
cust.phonenumber,
p1.orderproduct as "product_a",
p2.orderproduct as "product_b",
SUM(p1.quantity) as "total_quantity",
ROUND(SUM(p1.cost::numeric),2) as "total_cost",
COUNT(*) as PurchaseFrequency,
ROUND(COUNT(*) * 100.0/ SUM(COUNT(*)) OVER (), 3) as PercentFreq
FROM "Products" AS p1
INNER JOIN "Products" AS p2
ON p1.orderproduct <> p2.orderproduct
AND p1.orderoloref = p2.orderoloref
JOIN "Orders" AS ord
ON ord.orderoloref = p1.orderoloref
JOIN "Customers" AS cust
ON cust.customeroloref = ord.customeroloref
WHERE p1.orderproduct
LIKE '%French Fries%'
GROUP BY p1.orderproduct, p2.orderproduct, p1.cost, cust.firstname, cust.lastname, cust.emailaddress, cust.phonenumber
ORDER BY PurchaseFrequency DESC
)
SELECT *
FROM porders;
…
Source: Previous Interview Assessment
Context: What are the total revenue across months for the organization and what are noticable changes that I can observe from the data overtime for the last calendar year?
WITH CTE AS(
SELECT DATE_TRUNC('month', CURRENT_MONTH) AS month,
YEAR(DATE_TRUNC('month', CURRENT_MONTH)) AS year,
ROUND((SUM(unit_value)) OVER (PARTITION BY DATE_TRUNC('month', CURRENT_MONTH)), 2) AS total_monthly_unit_value,
ROUND((SUM(mrr_value)) OVER (PARTITION BY DATE_TRUNC('month', CURRENT_MONTH)), 2) AS total_mrr_value,
ROUND((SUM(transaction_value)) OVER (PARTITION BY DATE_TRUNC('month', CURRENT_MONTH)), 2) AS total_monthly_transaction_value
FROM DATA_ANALYST_INTERVIEW.PUBLIC.HISTORICAL_MRR_TRANSACTIONS
WHERE CURRENT_MONTH BETWEEN '2021-01-01' AND '2022-12-01'
GROUP BY DATE_TRUNC('month', CURRENT_MONTH)
ORDER BY DATE_TRUNC('month', CURRENT_MONTH)
)
SELECT MONTHNAME(month) AS DATE, year, total_monthly_unit_value,
ROUND(((total_monthly_unit_value/lag(total_monthly_unit_value, 1) OVER (ORDER BY DATE_TRUNC('month', month))) - 1) * 100, 1) AS UV_percentage_change,
total_mrr_value,
ROUND(((total_mrr_value/lag(total_mrr_value, 1) OVER (ORDER BY DATE_TRUNC('month', month))) - 1) * 100, 1) AS MRR_percentage_change,
total_monthly_transaction_value,
ROUND(((total_monthly_transaction_value/lag(total_monthly_transaction_value, 1) OVER (ORDER BY DATE_TRUNC('month', month))) - 1) * 100, 1) AS TV_percentage_change,
ROUND((UV_PERCENTAGE_CHANGE + MRR_PERCENTAGE_CHANGE + TV_PERCENTAGE_CHANGE)/3, 1) as AVERAGE_PERCENTAGE_CHANGE
FROM CTE
GROUP BY month, year, total_monthly_unit_value, total_mrr_value, total_monthly_transaction_value;
-- What is the total revenue by account type?
SELECT pda.account_type, ROUND(SUM(hmrr.transaction_value), 2) AS Revenue
FROM DATA_ANALYST_INTERVIEW.PUBLIC.HISTORICAL_MRR_TRANSACTIONS AS hmrr
JOIN DATA_ANALYST_INTERVIEW.PUBLIC.DIM_ACCOUNTS AS pda ON pda.owner_user_id = hmrr.owner_id
GROUP BY account_type
ORDER BY pda.account_type;
…
Source: (https://datalemur.com/questions/sql-spare-server-capacity)
WITH demands AS (
SELECT
datacenter_id,
SUM(monthly_demand) AS total_demand
FROM forecasted_demand
GROUP BY datacenter_id)
SELECT demands.datacenter_id, demands.total_demand,
centers.name, centers.monthly_capacity AS total_capacity
FROM demands
INNER JOIN datacenters AS centers ON demands.datacenter_id = centers.datacenter_id;
Subquery example
Source: (https://datalemur.com/questions/sql-spare-server-capacity)
SELECT centers.datacenter_id, (centers.monthly_capacity -
demands.total_demand) AS spare_capacity
FROM ( SELECT datacenter_id,
SUM(monthly_demand) AS total_demand FROM forecasted_demand GROUP BY
datacenter_id) AS demands
INNER JOIN datacenters AS centers ON
demands.datacenter_id = centers.datacenter_id
ORDER BY centers.datacenter_id;
Aggregate Functions with SUM
Source: (https://www.sql-practice.com/)
Keyword: SUM, Difficulty: HARD
SELECT CASE WHEN (patient_id % 2 = 0) THEN 'Yes'
ELSE 'No'
END AS has_insurance,
SUM(
CASE WHEN patient_id % 2 = 0 THEN 10
ELSE 50
END) AS cost_after_insurance
FROM admissions
GROUP BY has_insurance
Show the provinces that has more patients identified as ‘M’ than ‘F’…
Keyword: HAVING, Difficulty: HARD
--Query A
SELECT pr.province_name
FROM patients AS pa
JOIN province_names AS pr ON pa.province_id = pr.province_id
GROUP BY pr.province_name
HAVING SUM(gender = 'M') \> SUM(gender = 'F');
--Query B, using COUNT
SELECT pr.province_name
FROM patients AS pa
JOIN province_names AS pr ON pa.province_id = pr.province_id
GROUP BY pr.province_name
HAVING COUNT(CASE WHEN gender = 'M' THEN 1 END) \> COUNT( CASE WHEN gender = 'F' THEN 1 END);
--Query C, using a subquery and SUM
SELECT province_name
FROM ( SELECT province_name, SUM(gender = 'M') AS n_male, SUM(gender = 'F') AS n_female
FROM patients pa
JOIN province_names pr ON pa.province_id = pr.province_id
GROUP BY province_name )
WHERE n_male \> n_female
Aggregate and then display as a percentage
Keyword: COUNT, Difficulty: HARD
SELECT concat( ROUND( ( SELECT COUNT(*) FROM patients WHERE gender = 'M'
) / CAST(COUNT(*) as FLOAT), 4 ) \* 100, '%' ) AS percent_of_male_patients
FROM patients
WHERE gender = 'M'
GROUP BY gender
--or...
--Query B
SELECT Concat(round(100 \* avg(gender = 'M'), 2),'%') AS percent_of_male_patients
FROM patients;
--Query C
SELECT round(100 \* avg(gender = 'M'), 2) \|\| '%' AS percent_of_male_patients
FROM patients;
Window function example RANK()
Source: (https://datalemur.com/questions/sql-third-transaction)
SELECT user_id, spend, transaction_date
FROM (SELECT user_id, spend, transaction_date, RANK() OVER (PARTITION BY user_id ORDER BY
transaction_date) AS rank_num FROM transactions) AS trans_num
WHERE rank_num = '3'
GROUP BY user_id, spend, transaction_date
Sending vs. Opening Snaps (CTE, CASE WHEN, Aggregates)
Source: (https://datalemur.com/questions/time-spent-snaps)
WITH activity AS ( SELECT ab.age_bucket,
SUM(CASE WHEN act.activity_type = 'open' THEN act.time_spent ELSE 0 END) AS time_open,
SUM(CASE WHEN act.activity_type = 'send' THEN act.time_spent ELSE 0 END) AS time_sent,
SUM(act.time_spent) as total_time FROM activities act JOIN age_breakdown ab ON act.user_id = ab.user_id
WHERE act.activity_type IN ('open', 'send')
GROUP BY ab.age_bucket)
SELECT age_bucket, ROUND(((time_sent / total_time)\* 100.0),2) AS sent_perc,
ROUND(((time_open / total_time)\* 100.0),2) AS open_perc
FROM activity
ORDER BY age_bucket DESC
Window functions for calculating rolling averages based on tweets
Source: (https://datalemur.com/questions/rolling-average-tweets)
SELECT user_id, tweet_date, TRUNC(AVG(tweet_count)
OVER(PARTITION BY user_id ORDER BY tweet_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 2) AS rolling_avg_3days
FROM (SELECT user_id, tweet_date, count(tweet_date) as tweet_count FROM tweets GROUP BY user_id, tweet_date) AS daily_tweets
GROUP BY user_id, tweet_date, tweet_count
ORDER BY user_id ASC, tweet_date ASC;
Compensation Outliers
Source: (https://datalemur.com/questions/compensation-outliers)
WITH payout
AS
(SELECT
employee_id,
salary,
title,
(AVG(salary) OVER (PARTITION BY title)) * 2 AS doub_avg,
(AVG(salary) OVER (PARTITION BY title)) / 2 AS half_avg
FROM employee_pay)
SELECT
employee_id,
salary,
CASE
WHEN salary > doub_avg THEN 'Overpaid'
WHEN salary < half_avg THEN 'Underpaid'
END AS status
FROM payout
GROUP BY
employee_id, salary, doub_avg, half_avg
HAVING
salary > doub_avg OR
salary < half_avg
ORDER BY
employee_id;
Photoshop Revenue Analysis
Source: (https://datalemur.com/questions/photoshop-revenue-analysis)
SELECT customer_id,
SUM(revenue) as revenue
FROM adobe_transactions
WHERE customer_id IN (SELECT customer_id FROM adobe_transactions WHERE product = 'Photoshop')
AND product <> 'Photoshop'
GROUP BY customer_id
ORDER BY customer_id ASC
Unique Money Transfer Relationships
Source: (https://datalemur.com/questions/money-transfer-relationships)
SELECT
COUNT(payer_id) / 2
FROM (SELECT
payer_id, recipient_id
FROM payments
INTERSECT
SELECT
recipient_id, payer_id
FROM payments)
as relationships;