Filtering SQL Data When complex conditions lead to incorrect results? The ‘middle column’ design is the answer

Filtering is a technique for extracting only the rows you need from a large amount of data. SQL data filtering methods range from simple to complex. If complex AND/OR conditions are making your head hurt, Intermediate Columns are the answer, as they make zero calculation errors.

[🚀 SQL Data Filtering Practical Guide: Key Takeaways].

Q: How do I filter complex AND/OR conditions without making mistakes?

  • A: Instead of writing the entire condition all at once, break up each logical unit into Boolean expressions called **’Intermediate Columns’**. This makes it more readable and easier to validate individual conditions.

Q: How do I resolve an error that occurs when filtering a new column created in a SELECT on a WHERE clause?

  • A: This is due to the execution order of SQL (FROM → WHERE → SELECT). This is solved by first creating a temporary table with the intermediate columns using a **WITH clause (CTE)** and then filtering on it in the main query.

Q: What should I be aware of when finding the union (opposite condition) of multiple conditions?

  • A: De Morgan's LawDepending on the NOTwhen applying ANDORTo, ORANDoperator is replaced with. NOT (A OR B)NOT A AND NOT Bworks the same as.

Q: What should I be aware of when filtering NULL data?

  • A: NULLis different from zero or the empty string. It means no data at all, so = operator, not the IS NULL or IS NOT NULLshould be used.

Filtering SQL data

Currently I have Datacamp Platformto learn intermediate SQL to manipulate data in a database. Today's topic is filtering SQL data. If you're interested in previous data-related posts, check out the DATA category on this website.

If you've ever used Excel, you're familiar with filters. You can check off a bunch of boxes and see exactly the data you need.

Filtering in Excel

Filtering SQL data is the act of extracting only the part of the data that meets certain conditions. Filtering serves two main purposes. Sometimes filtering alone answers a question, and sometimes it's used as a preparatory step to do something else with the filtered data.


How to Do Basic Filtering on SQL Data? WHERE Clause

Filtering SQL data is accomplished by using the WHERE clause. This is the default filtering for data manipulation.

SELECT company,

sector,

revenue,

FROM fortune_50

WHERE sector = ‘Technology’;

In the fortune_50 table, you want to show the company, sector, and revenue columns in the rows that satisfy the condition that the data value in the sector column is ‘Technology’.

If we didn't have a WHERE clause here, we would have just shown the company, sector, and revenue column data for all rows.


1) Filtering logical expressions

This time, instead of the above WHERE sector = ‘Technology’;, let's use WHERE profit < 0;.

The part after WHERE, sector = ‘Technology' and profit < 0, is called a logical expression. Logical expressions are divided into three parts

  • Left (LHS): Usually a column reference (e.g: profitmarket_cap)
  • Logical operators: Compare operation (=!=<> ><=>=)
  • Right-hand side (RHS): The value to be compared (e.g: 01000000'Technology')

For each row of data, this logical expression is evaluated to determine if it is true or false, and only the true ones are output to our eyes, which is what we call SQL data filtering. Note that != is a logical operator, meaning it is not equal.


2) Filter only the data that contains this word? LIKE and wildcard %

Sometimes you want to filter rows that contain only some words, not all words.UnitedHealth Group, Cigna Group This means that sometimes you only want to pull out rows that contain groups like this. This is where LIKE and the wildcard % come in handy.

SELECT company,

sector

FROM fortune_50

WHERE company LIKE ‘%Group’

The meaning of WHERE company LIKE ‘%Group’ is that you want to show all rows in the company column that satisfy the criteria that anything can be in the first part of the column, but end in Group.

  • '%Group%': include ‘Group' in any position
  • 'Group%' : Start with ‘Group'
  • '%Group' : ends with ‘Group'

Depending on where you put the wildcard %, you can filter rows that meet the criteria. However, LIKE pattern matching is case sensitive, so be sure to type it correctly. (BigQuery, PostgreSQL, Oracle)

3) Find rows with missing (NULL) data values in a specific column

If you look closely at the data you get from your database, you may notice that certain column data values are missing in each row. There are two main reasons for this: one is that the value doesn't exist in the first place, and the second is that the value is NULL, which is caused by not actually writing it.

  • The market capitalization is a data value that only occurs when a company actually goes public. If the company is privately held, it's not surprising that it's blank.
  • If the number of employees in an organization is not populated, it's because you didn't actually fill it out and it's blank.

SELECT company,

employees

FROM fortune_50

WHERE employees IS NULL;

It's easy to find rows that are missing a specific column data value. Just specify employees as the column name and IS NULL. Conversely, to find rows that are not missing, you can use IS NOT NULL.


4) Most popular filtering, Top N filtering

Top N is my favorite way to filter through tons of data. I also seem to use it a lot for filtering to find the top 10, or top 3, etc.

Finding the Top N while filtering SQL data


How to create filtering multiple conditions (more than one condition)?

Venn diagrams

So far, we've seen SQL data filtering on a single condition. While there are certainly cases of filtering like this, you'll probably have more filtering that requires multiple conditions to be satisfied. This is called filtering multiple conditions.


1) Logical relationship operators AND, OR, and NOT in conditions

From the company data, output rows that satisfy the following two conditions: the company is a tech company and the current CEO is a founder (using the AND operator)

SELECT company,

rank_2025,

sector,

is_founder_ceo,

ceo

FROM fortune_50

WHERE sector = ‘Technology’

AND is_founder_ceo = TRUE;

Filtering is about showing what satisfies the conditions. Here we see two (multiple) conditions. One is sector = ‘Technology' and the other is_founder_ceo = TRUE.

The sector is tech, and the founder is still the CEO. These are two conditions. And we see the operator AND, which organizes the logical relationship between these conditions.

  • AND : The preceding must be true, and the following must be true. (True if both are true)
  • OR: either before or after (True if only one is satisfied)
  • NOT: Something neither of you are happy with

Let's understand how it works

  • Evaluate each condition individually → True/False
  • Combine conditions with AND/OR → Final True/False
  • Select only rows with final True → Return results

The two conditions sector = ‘Technology' is True and is_founder_ceo = TRUE. AND will only output TRUE if both are true. OR, on the other hand, gives the output even if only one is true.


2) Organize the execution order when there are more than 3 conditions

If there are more than three conditions, two of them are obviously AND, OR, and NOT.

A AND B OR C, A OR B OR C, A OR B AND C, A OR B AND C. Just by looking at these examples, you can see that more than one operator is used. But which one should be executed first is important because the order in which they are executed determines whether you get the result you want or not.

SELECT company,

sector,

CEO,

is_founder_ceo

FROM fortune_50

WHERE is_founder_ceo = TRUE

AND (sector = ‘Technology’ OR sector = ‘Automotive’);

Think carefully about your conditions and use parentheses around the parts that should be executed first. Parentheses make the order of execution clear.


How to avoid making complex filtering mistakes?

1) Create a middle column

The more conditions we have, the greater the risk that we will make a mistake. Sure, we keep our eyes open and double-check, but we're human, and mistakes can happen.

For complex filtering, you can create new intermediate columns to represent each part of the condition, and then use them in the final filter to make it easier to write, read, and validate the condition.

The question is to find companies in the top 10 that are profitable or have increased their ranking in 2025 compared to 2024.

I know it's a multi-condition, but I don't know where to start in my head, so I need an intermediate column.

In the image above, there were originally no Is_top10, is_profitable, or Is_rank_improved. These are newly created intermediate columns. The idea is to put the result of each condition as the data value in the middle column.

This is where Boolean Columns come in. You can think of Boolean columns as having values between True and False.

The Is_top10 middle column asks for a true or false indication if a company is in the top 10. The code below illustrates this. rank_2025 <= 10 AS is_top10 The data value is True if it is in the top 10 in 2025, which can be understood as put this value into the newly created middle column is_top10

SELECT company,

rank_2025,

rank_2024,

sector,

profit,

rank_2025 <= 10 AS is_top10, - create an intermediate column in this part

profit > 0 AS is_profitable,

rank_2025 < rank_2024 AS is_rank_improved

FROM fortune_50;

WHERE is_top10

AND (is_profitable OR is_rank_improved);

The is_profitable and is_rank_improved columns can likewise be created with True and False values. Imagine what would be left, all the data values would be True and False.

For example, let's look at the WHERE clause and say that is_top10 is True, is_profitable is False, and is_rank_improved is True.

This row is output because it is True AND ( False OR True) -> True AND (True) -> True. Conversely, if it were False, it would not be output.


2) Execution order FROM , WHERE, SELECT

I thought that using an intermediate column like the one above would solve the complex filtering problem, and when I tried it, I got an error.

Anyone who knows the SQL code sequence will know where the error occurs in the code above.

SELECT the columns above (I won't list them all for clarity)

FROM fortune_50;

WHERE is_top10

AND (is_profitable OR is_rank_improved);

The SQL code execution order is FROM -> WHERE -> SELECT. But can WHERE recognize is_top10, is_profitable, and is_rank_improved, which have not been created yet?

That's why we get the error. However, we do know a code trick that allows us to put the execution order first. It's the WITH clause. If you're interested in learning more about the WITH clause, be sure to read the post below.

Using the WITH clause, we can first create is_top10, is_profitable, and is_rank_improved. Then, in the WHERE clause, we can recognize the middle column above and continue the execution.

Filtering SQL data up to the with clause

The WITH clause is executed before any other code. The WITH clause creates an intermediate column and stores it in the enriched_fortune_50 table. Then, in the original execution order, we get the table from FROM, execute the WHERE clause, and then SELECT * to show all the columns.


3) Mistakes that still happen (and what to do about them)

Even when I solved it with an intermediate column (Boolean column), I made a lot of mistakes. Here are the mistakes I made

  • No spaces are allowed when naming after AS. Use _ to create
  • With clause select gel Do not put , after the last column of the gel
  • If you want to say A or B, use IN (’A’, ‘B’), and be sure to use ‘ because A and B are text
  • If you only specify one in a SELECT, you don't need to put the , after the column


Validation steps that can't be overemphasized

With the above method, we got all the rows we wanted. However, for complex filtering, we need the following steps for validation

  • Building - verify that each intermediate column is working correctly individually
  • After you're done - verify that the final filtering result includes the correct rows and excludes the incorrect rows

1) Summarize the validation process

  • Step 1 (in progress): Sample each intermediate column to make sure it's correct individually
  • Step 2 (after creation): In the final result, click the
  • Inclusion check: Ensure that the rows included in the result satisfy the full condition
  • Check exclusions: Verify that the excluded rows do not meet the full condition

Validation is simply taking a few samples and checking them against the conditions in the intermediate column and the final result.


Inverting filtering conditions and compound inversions

1) Reverse the condition

Inverting a condition is finding something outside of the original condition. So far, we've been looking for rows that satisfy the condition, but now we want to look for rows that don't satisfy the condition.

FROM fortune_50

WHERE NOT (rank_2025 0.25);

() was the original condition, and by simply prefixing it with NOT, we get rows that don't satisfy the condition. Note that NOT is the most prioritized of the logical operators.


2) Compound inversion

When creating conditions, you sometimes use logical operators like AND and OR. When you want to find something outside of what satisfies these compound conditions, it's called compound inversion.

  • not (a and b)` → `not a or not b`
  • not (a or b)` → `not a and not b`

This is the only thing you need to keep in mind because the operator in parentheses will change. This part is called De Morgan's Law.


Finalize

Today we've talked about filtering SQL data, which is crucial for extracting the data we need.

The data shown by filtering is not all the data we have. It's just the rows that meet our criteria. You should always keep that in mind when doing further filtering or creating charts and graphs in your dashboard.

Sometimes, filtering doesn't bring up the rows of data I'm looking for, or the charts often show values I wasn't expecting.


Similar Posts