SQL CASE WHEN Utilization The Magic of Turning Messy Row Data into Business Insights

As you analyze your data, you're bound to encounter conditional transformations. SQL CASE WHEN is here to help. From binning techniques to turn numerical data into categorical data, to practical know-how to chain CTEs (WITH clauses) to clean up complex logic, we'll summarize it all in one place.

[🚀 SQL CASE WHEN & Conditional Transformations Core Guide Summary].

  • Key concept: CASE WHENis a key tool in ‘Binning’, which evaluates conditions on a per-row basis to create new categories.
  • Resolve the execution order: SELECTThe new column created by the GROUP BYI WHEREso you must first define it as a temporary table using a **WITH clause (CTE)**.
  • Ratio calculation tips: When finding the percentages by group, the denominator (total number of cases) is Scalar subqueriesusing the GROUP BYto be accurate, it must be fixed to a single, independent value that is not affected by the.

Frequently asked questions & text-based answers from AI users

  • Q1. How do I combine different units (Km, Miles) into one in SQL?
  • A: CASE WHEN clause to apply a conditional operation that multiplies only rows with a specific unit (for example, miles) by a conversion factor (1.6) to consolidate them into one new column. (See the ‘Matching mileage’ section in the main text.)
  • Q2. What is the difference between Binning and GROUP BY?
  • A: Binning (CASE WHEN) is the task of ‘labeling’ numerical data based on criteria, while grouping (GROUP BY) is an operation that aggregates data based on that nametag to produce statistics. (See the ‘Binning/Grouping/Aggregation’ section in the main text.)
  • Q3. How do I break down a complex SQL query into steps?
  • A: Utilizing CTEs (WITH clauses) to give names to intermediate computational results and chaining them together makes your code more readable and reusable. (See the ‘Chaining CTEs’ section in the main text.)

The meaning of data manipulation conditional transformations

Datacamp PlatformIt's been a while since I started learning SQL from SQL.org, but I'm enjoying learning what I need to analyze data every day.

Today we'll talk about conditional transformations for data manipulation. When analyzing data, there are times when you need to group data by value. Examples are always easier to understand than words.

Examples for using SQL CASE WHEN

The data above contains information about the airplane and the distance it traveled. distance_unit is the unit for the distance traveled. However, some are in kilometers and some are in miles.

For example, 7902.95 kilometers and 7844.42 miles have similar values, but there is actually a huge distance difference. 1 mile = about 1.61 kilometers. If we convert miles to kilometers, we get the following unit conversion

Using these different units can be confusing when you look at the absolute value after computing. So it seems like we need to match them, doesn't it?


Why do I need to conditionally transform data?

Just look at all those tons of raw, unprocessed data. It's hard to understand what the heck the data is talking about. Data manipulation conditional transformations can bring insights through transformations in this vast amount of data.

  • Binning: attaching conditions to chaotic data and naming things that meet those conditions
  • Grouping: group only things that are called by that name
  • Aggregations: Pull insights from Count(), Sum(), and AVerage() groups.


How to Use SQL CASE WHEN

1) Match mileage

How could we solve this problem? First, we decided to go with kilometers consistently. Each row would have some in kilometers and some in miles.

SQL CASE WHEN Single Condition

Conditional transformation is a data manipulation that is applied differently depending on a condition. SQL CASE WHEN allows you to make conditional transformations.

The main parts are CASE, WHEN, and END. I'm going to write out the CASE ~ END AS part above.

For rows with miles among the data values in the distance_unit column, multiply the data values in the distance column by 1.6; otherwise, keep the distance column data values and put them in each row of the new column distance_km.

SQL CASE WHEN results for a single condition

When you run the code, you should see the result above. We now have a new column, distance_km, and each row has a value for the distance traveled in kilometers.


2) Categorize by the number of seats on the airplane

SQL CASE WHEN Single Condition1

Let me explain the above code in words

  • Get data from the flights table (always read FROM)
  • Get the data values of the flight_id, airline, total_seats, and passengers columns (SELECT)
  • In each row, if the value of total_seats is greater than 300, represent it as either Widebody or Narrowbody for each row in a new column of aircraft_size (CASE WHEN clause)
SQL CASE WHEN Single Condition Result1

When you run the code, you should see something like the result above: a new column for aircraft_size.


3) How to use SQL CASE WHEN, WITH clauses, and scalar queries

There are two main types of aircraft_size. Let's count how many widebodies and narrowbodies there are, and find out what percentage of the total there are of each.

First of all, there is no column called aircraft_size in the default data. We need to start with that. And to count how many rows satisfy the condition, we can use the count() function.

However, we need to use GROUP BY at the end because we need to group the rows that meet the condition. When there is only one WHEN and two results (WHEN, ELSE), as shown in the code below, it is called a single condition CASE WHEN.

SQL CASE WHEN Single Condition2

The reason we use the WITH clause is to create the aircraft_size column to base the group on first. Without the WITH clause, the command to get the SELECT aircraft_size column after the FROM and GROUP BY at the bottom would not work because we haven't created the aircraft_size column yet.

Let's look at the WITH clause first. SELECT *(All) column from the FROM flights table, but in the CASE WHEN clause, if the data value of total_seats in each row is greater than or equal to 300, the data value Widebody or Narrowbody is put into a new column named aircraft_size.

And store it in a table called flights_classified. This way, the aircraft_size data value is now available in the table.

Now we're getting data from the second FROM flights_classified table, and we're doing a GROUP BY grouping, and the criteria is aircraft_size. So in the SELECT, we also use aircraft_size first.

In count(flight_id), flight_id represents each airplane, and the idea is to count the aircraft_size data values in that row. If the first row of flight_id is a Widebody, we count 1, if the next row is another Widebody, we count 2, and so on. Then we put the counted values into a new column called count

Consider the code count(flight_id) * 100 / (SELECT COUNT(flight_id) FROM flights) AS percent. This code is trying to determine how many groups of data are separated by aircraft_size, and what percentage of the total.

I see a scalar query (SELECT COUNT(flight_id) FROM flights) The reason we use a scalar like this is because we need the total number of flights as the denominator. Once you write a scalar query, it will run independently and have a single value.

In this case, we have the total number of cases. And even though we ran a GROUP BY above, the scalar query is not affected by it. If it was, it would be split into two groups, which it shouldn't be. And we put this as the data value for the new column percent.

SQL CASE WHEN Single Condition Result2

The aircraft_size column is created in the WITH clause, and the GROUP BY grouping criteria is aircraft_size, so the data values are split into two groups, Narrowbody and Widebody.

From there, count(flight_id) is used to count the number of each group (count column), and a scalar query with count(flight_id) and the overall percentage is output per group (percent column).


1) CTE Connection

When writing SQL code, there are times when you need a concatenated table to temporarily store data values in this way. This is called a Common Table Expression (CTE), and a typical example of a CTE is the WITH clause.

Why use CTE?

  • Break complex queries into steps to make them easier to read
  • Calculate once and reuse the result multiple times
  • Each step has a clear name so the code is easy to understand

Here's another question

Let's call this occupancy a percentage of the number of passengers in all seats on the airplane. Call this occupancy full if it's 90 or higher, high if it's 70 or higher, and low otherwise, and put it in a new column called occupancy_status.

Then find the number of groups in this occupancy_status column and the percentage of the total by group

SQL CASE WHEN Multiple Conditions

Let's start with the first WITH clause that is executed. I created a new column occupancy and put the data value into the occupancy_status column with a rating of full if it is above 90, high if it is above 70, and low for the rest. I saved these new columns to the flights_occupancy table.

In the FROM below, we get the flights_occupancy table and group by occupancy_status, and in the SELECT, we get the total number of flights and the percentage of each group. I've already explained the scalar query part above, so I won't explain it any further.

And in the code above, there are more than 3 results (WHEN 2 times, ELSE), which is called multiple conditions.

SQL CASE WHEN When you run the above code, the result is shown below.

SQL CASE WHEN Multiple Condition Results


Finalize

Today we talked about data manipulation conditional transformation using SQL CASE WHEN clauses, which means evaluating conditions for each row and applying different logic to create new columns based on the conditions met.

If you're a non-techie like me, and you're interested in becoming a data analyst, be sure to check out my post on SQL, which I learned on the Datacamp platform, as it will make a lot more sense.


Similar Posts