Basic SQL Analysis Spells For Data Wizards

Altamash akber
6 min readApr 25, 2023

--

“Sequel is the best data manipulation tool for analysis”, I have ever come to know!

Yeah, I back this statement as SQL gives you the leverage of breaking down the data for the “analyst first approach” That’s what I call my instinctive approach for data analysis apart from the statistical and visual techniques.

If you are a data professional or you are planning to switch your career to become Data Analyst, Data Scientist, or Data Engineer, SQL is the best basic tool you can gear up on.

In this blog I will be sharing some of the basic SQL statements/clauses that are categorized to be DQL or DML statements, I won’t be discussing this here maybe later sometime but here’s the flow of this blog:

SELECT CLAUSE

FROM CLAUSE

WHERE CLAUSE

GROUP BY CLAUSE

ORDER BY CLAUSE

HAVING CLAUSE

SELECT CLAUSE

The SELECT statement is a fundamental command in Structured Query Language (SQL) that is used to specify which columns to retrieve from one or more tables and how to filter, group, and sort the resulting data. The basic syntax of the SELECT statement is as follows:

Syntax:

SELECT Column_name FROM Table_name

Here are some example use cases for select statements:

So here I have used the SELECT clause with “*” which is read as “All” with FROM clause indicating the table I have to fetch this data from.

Well for large databases and corporate systems * isn’t considered to be the best practice as it takes too much of resources, so it is advised to specify columns or other alternatives for optimization. More on to this in another conversation, but here’s how I have done this.

Here I have specified all the columns that were required.

We can also use SELECT with TOP to get the number of records for that condition from the specified columns and tables

Here is how it's done:

FROM Clause

In SQL, the FROM clause is used to specify the data source from which the data will be retrieved or analyzed. It is one of the most fundamental clauses in SQL, and it is required in every SQL query that retrieves data from a database table.

The syntax of FROM clause is simple just the table name after the clause and it further can be used with joins to connect with other tables but that will be discussed another time.

Syntax:

SELECT Column_name FROM Table_name

Here’s a use-case example:

I have used the SELECT statement with specifying columns and with FROM clause I have specified the Worksheet$ table of Parental Leave and have used INNER JOIN to connect with the Industry_data table.

WHERE Clause

The WHERE clause in SQL is used to filter the data that is returned in a query based on specified conditions. It is one of the most commonly used clauses in SQL, and it allows you to retrieve only the data that meets specific criteria.

Syntax:

SELECT Column_name FROM Table_name
WHERE condition

Where is used with the conjunction of SELECT statement and it is usually placed after the FROM clause.

Here are a few examples of WHERE Clause:

This query is a simple SELECT statement that retrieves columns from the “worksheet$” table of the parental leave database. The query uses a WHERE clause to filter the data and retrieve only the records that do not belong to the “Arts & Entertainment” industry.

Here is an alternative to the above query:

Here everything is pretty much the same apart from the LIKE clause which is used instead of this “<>” read as “Not operator”. LIKE clause basically gets the data that’s having a structure similar to the defined arrangement of words/symbols, in this case, I have used ‘Arts%’ which states that all the arrangements that start with Arts and follow a range of letters afterward.

This kind of alternative would not be recommended if you have similar structure words if there were an industry named “Arts Photography” it would have also been excluded and that wouldn’t be the result we will be looking for, so the context of data is important here.

Here’s another example but I wouldn’t be explaining this one, it's your task to explain it in the comment section:

SELECT p.Industry, p.[Paid Maternity Leave],p.[Unpaid Maternity Leave],p.[Paid Paternity Leave], p.[Unpaid Paternity Leave],
i.competitiors AS companies, i.Market_Size
FROM ParentalLeave.dbo.Worksheet$ p
INNER JOIN ParentalLeave.dbo.Industry_data i ON p.Industry=i.Industry_name
WHERE i.Market_Size between 3 AND 6
AND p.Industry='Advertising' OR p.Industry='Business Services: Staffing & Outsourcing'

GROUP BY Clause

The GROUP BY clause in SQL is used to group rows in a table based on one or more columns. It is commonly used in data analysis to perform aggregations on data such as sum, count, average, and maximum/minimum values.

Syntax:

SELECT Column_name FROM Table_name
GROUP BY Column_name

Here are a few examples:

Here I have specified Industry_name and avg_market_size columns with the SELECT statement from the Industry_data table and then grouped the results by Industry_name so it would compile the data with respect to each industry.

Have you thought what if we don’t use group by statement here?

<iframe src=”https://giphy.com/embed/Qs1uMrvmHAKIUXxO2g" width=”480" height=”400" frameBorder=”0" class=”giphy-embed” allowFullScreen></iframe><p><a href=”https://giphy.com/gifs/Friends-friends-tv-the-one-with-cuffs-Qs1uMrvmHAKIUXxO2g">via GIPHY</a></p>

It‘s because, Without the GROUP BY statement, it is not possible to determine which rows should be grouped together and which rows should be treated as separate entities. The GROUP BY statement provides a logical way to group related rows together based on specific columns, allowing the aggregate functions to operate on each group independently.

Note that “AVG” is an aggregate function that gets the average of the specified numerical attributes,

More about aggregate functions in a separate dialog.

Another example of a Group By statement:

It's a pretty straightforward query so leaving it up to you to illustrate.

Having Clause

Let’s see if we can get to filter out the aggregated data with WHERE Clause.

So, we can only do this with the HAVING clause, why? Let’s get to know the HAVING clause:

The HAVING clause in SQL is used in conjunction with the GROUP BY clause to filter the results of a grouped query. It allows you to specify a condition that must be satisfied by the aggregated data in order for the group to be included in the query results.

Which we just tested now,

Syntax:

SELECT Column_name FROM Table_name
GROUP BY Column_name
HAVING Aggregate(Column_name) condition

Now let's address the above error by using the HAVING clause:

See, We Got it Right!

SUMMARY:

So, We are almost clear with the basic Statements/Clauses in SQL for data analysis. We talked about SELECT, FROM, WHERE, GROUP BY, ORDER BY & HAVING Clauses. Here’s a short brief about the use case of these clauses:

SELECT: Specifies the columns to retrieve from the table(s).

FROM: Specifies the table(s) from which to retrieve data.

WHERE: Filters the data based on a specified condition(s).

GROUP BY: Groups the data based on one or more columns.

HAVING: Filters the grouped data based on a specified condition(s).

ORDER BY: Sorts the resulting data based on one or more columns.

This was our first conversation of our series “SQL Analysis Spells For Data Wizards”, we would be going into depth with JOINS, Subqueries and Window Functions in another conversation.

Till then hear this out!

And don’t forget to explain the where clause drill, I’m expecting good response!

--

--

Altamash akber

Passionate about transforming complex data into actionable insights that drive informed decision-making.