A Data Analyst's Guide to SQL Data Transformation: Calculating Business Key Metrics (pCVR, CPO) and Utilizing WITH Clauses

As you study data analytics, you will naturally encounter key business metrics. Understanding basic business performance and metrics terminology can add depth to your analysis, so let's take a look at a quick glossary of terms and the functions used to convert SQL data.

[🚀 SQL & Business Metrics Q&A from Data Analysts].

Q1. How can I avoid the ‘Division by zero’ error in SQL division operations?

  • A: In standard SQL, the NULLIF(denominator, 0)to convert 0 to NULL, and BigQuery uses SAFE_DIVIDE(numerator, denominator) function to return NULL instead of an error so that the query doesn't break.

Q2. What if I want to use the newly created column (alias) within the query for the next calculation?

  • A: **You must first create a temporary table using a **WITH clause (CTE). SQL does not immediately recognize aliases defined within the same SELECT statement, so it is best practice to store intermediate data in a WITH clause and reference it in the final SELECT.

Q3. How do I resolve an error that occurs when calculating individual row data and an overall sum (SUM) on a single line?

  • A: Utilize **Scalar Subquery**. (SELECT SUM(column) FROM table) form first pulls out the sum value as a single scalar value, SQL can perform row-by-row calculations (such as percent of total) without confusion.

SQL data transformation and business-critical metrics

Adding a new column with SQL data transformation

SQL 데이터 변환(Data transformation)은 기존 컬럼으로부터 새로운 컬럼을 도출할 수 있게 해주는 기본적인 데이터 조작 작업입니다.

즉, 데이터 변환 작업을 거치면, 결과가 표시되는 테이블 마지막 끝에 새 컬럼이 추가된 형태가 된다는 의미다.

SELECT *,

Expression AS New_Column_Name

FROM table_name;

The part that says expression can be a column name and an aggregation operation, or it can be expressed in a variety of formats. In any case, making it AS a new column name is called a data transformation. You can create more than just one of these.

In the following, we'll discuss the business-critical metrics and formulas to get the new column names that are added due to data transformation.


Business-critical metrics frequently generated by data transformation

To study data analytics, you can still use the datacampin the app. I thought it would be just code or queries, but there is a part of it that tells me key business metrics, so I thought I'd summarize it since it's likely to be used in the future.

Converting SQL data creates opportunities to create additional business metrics, such as the following.

1) Business metrics

Paid Conversion Rate (pCVR)

A report card on whether the people you spent money marketing to actually bought something.

pCVR = 100 * Number of Paid Orders / Number of Paid Visits

%를 확인할 때에는 단순히 숫자만 보면 안된다. 비율 확인 시에는 어느 정도 분모의 크기가 커야 신뢰성에 힘이 실린다.

2명 중에 1명이면 50%, 1000명 중 10명이면 1%이지만, 1000명의 분모 사이즈의 비율이 훨씬 신뢰성을 가지는 것은 나만 느끼는 것은 아닐 것이다.

Cost per order (CPO)

Average ad spend required to get one paid order

CPO = Total Ad Spend / Total Paid Orders

예를 들자면, 광고비 $1000을 들여서 유료 주문 수 50건을 만들어 냈다. 그렇다면 CPO는 $20이라는 얘기이다.

하나의 주문을 받기 위해서는 평균적으로 $20의 광고 비용이 든다는 얘기이다. CPO가 낮을수록 더 효율적으로 고객을 획득하고 있다는 뜻이다

Why you should know your CPO : Key metrics for setting ad budgets and managing profitability

  • Determining profitabilityIf your product costs $100 and your CPO is $20, you make a margin of $80 per order. However, if the CPO is $150, you lose
  • Optimize your ad budget: To achieve a target revenue of $100,000, you can calculate the required ad spend by multiplying the expected number of orders by the CPO
  • Compare channelsIf your Instagram CPO is $20 and your TikTok CPO is $50, it's clearer where to put more of your budget
  • Monitoring trendsIf you notice a trend of increasing CPO ($20 → $33), like our data, it's a sign to revisit your strategy.

Profit multiplier (return to principal): How many times you get back your investment on each trade

A metric to see how much more you made than you put in, with a base of 1. A value of 1.5 means you made back 1.5 times your investment (501 TP3T profit), while a value of 0.8 means you only made back 801 TP3T (201 TP3T loss).

Return Multiple = Sell Price / Avg Buy Price

I don't use %, so it's intuitive. The profit multiple is an indicator of how much the chunk (principal) of my investment has grown.

Price Change Ratio : Percentage change in price from point of purchase to point of sale

What percentage of my principal is this net gain? to find the

Price Change Ratio = ((Plate Value - Acid Value) / Acid Value) * 100

For example, let's say you bought a stock and the principal amount is 100,000 won.

Selling priceRevenue multiples (chunks)Price change rate
(net profit margin)
What the results mean
150,000 KRW1.5x50%1.5x gold, earned 501 TP3T
200,000 KRW2x100%You doubled your principal and earned 1001 TP3T
80,000 KRW0.8x-20%Only 0.8x principal left, lost 20%


Ad Contribution : What's actually left in my pocket after ad spend

총 매출에서 광고비와 이행 비용(택배비, 포장비, 인건비 등)을 제외한 순수익이다. 단순히 매출만 보면 장사가 잘 되는 것처럼 보여질 수 있다.

하지만, 광고 기여도로 계산을 하면 실제로 내 주머니에 얼마가 남는 지 알게 되기 때문에 매출보다 더 중요하다.

Ad Contribution = Revenue - Ad spend - (total number of paid orders * incremental cost per order)

  • Determine true profitability: Revenue may be ‘fictitious,‘ but ad attribution is ’real’
  • Optimize your ad spend: Analyze the characteristics of high ad contribution days to enhance your ad strategy for the day

Ads Contribution Rate) : Ad Attribution Percentage of total revenueas a percentage (%)

광고 기여도는 그 날의 매출 금액에 따라 결정되는 경우가 많다. 매출이 아주 큰 날과 아주 적은 날의 경우 비교하기가 어렵다.

기여율은 이렇게 규모가 다른 날들을 공정하게 비교하기 위함이다.

Contribution Rate = (Ad Contribution / Revenue) * 100

DateSalesAd attributionAd Attribution Rate
A day1002020%
Day B100010010%

At first glance, it looks like Day B made more money than Day A because its revenue and ad contributions were larger, but when you look at ad yields, you realize that Day A's ad strategy was much more efficient.

  • Profitability guidelines: For example, you might have an ad contribution rate of 151 TP3T, and when it drops below that, it becomes the basis for decision-making to discontinue ads or modify your strategy.


2) The difference between margin and ad contribution

광고 기여도를 공부한 후 약간의 의문이 들었다. 광고 기여도도 어쨌든 광고를 집행한 이후의 순수익을 나타낸다.

그리고 우리가 기본적으로 알고 있는 마진은 매출에서 원가 등 제반 비용을 빼고 나서 남는 순수익을 의미한다. 둘 다 순수익이라 명확하게 할 필요가 있었다

This one sentence makes the difference between the two clear.

“Margins determine ‘what’ to sell, ad attribution determines ‘how’ to sell.”

  • Margin: The intrinsic value of your product
  • Ad Attribution: How Marketing Really Works

Margin is the profit you make from the act of getting the item and selling it, regardless of advertising. You take your revenue and subtract things like acquisition costs, platform fees, and so on, and you get your net profit.

The analytics point of view is the profitability of the product, should we keep selling this product? is the point.

A product that costs $100,000 for meat, but leaves you with too little money after subtracting this and that, or even loses money, is a product that shouldn't be sold as a business.

Whereas, The analytical perspective of ad attribution is the effectiveness of the channel/campaign, which makes you wonder if you should continue with this ad. In other words, ad attribution is the final revenue (net revenue) left over after burning an ad.

It doesn't make sense to spend $20000 on an ad when you have a margin of $10000 left. It follows that this ad strategy should be modified or discontinued.

Margin percentage : If the money I earned (margin) is less than How many % for the money I spent (my investment principal)?

Margin ratio = (Margin / Total cost) * 100

ItemA commodity (wine)B goods (beer)Analysis results
Principal amount invested$100$10-.
Margin$20$5Margins are like wine
Margin percentage20%50%Beer is much more effective

얼핏 보면, 마진이 $20인 와인을 좋은 것 같다. 하지만 맥주의 경우 단 돈 $10를 가지고 $5를 벌어다 주는 50% 마진 비율을 가지고 있다.

즉, 내 돈(자본)을 훨씬 효율적으로 사용한다는 의미다. 자본 사용의 효율성을 보여지는 지표인 것이다.

  • Prioritize investments: spend more on high-margin products to grow the company faster

그리고, 또 한 가지 우리는 위에서 수익 배수를 배운 적이 있다. 내가 낸 돈보다 얼마나 더 벌었는 지를 알아보는 지표이다.

그러면 내 자본을 얼마나 효율적으로 썼는지를 보여주는 마진 비율과 비슷한 것이 아닐까 하는 생각이 들었다. (수익 배수 vs 마진 비율)

Revenue multiples show the change in overall chunk, while margin ratios show how much pure profit you're making compared to your investment

This means that if the revenue multiple is 1.2x, the margin ratio is 20%. It's just a matter of whether you call the same phenomenon a multiple or a percentage.


3) Percentage of total, revenue, and margin

Percentage of total sales chart
  • Percent of total: (individual item value / total item value) * 100
  • Why use it: Compare data of different scales by normalizing them to a common measure called specific gravity.
  • Percentage of total sales: (line item sales / total line item sales) * 100
  • Why use it: Metrics that show which products are driving market share and company appearance growth
  • Margin as a percentage of total: (Line item margin / total line item margin) * 100
  • Why use it: Your company's Actual revenue (boudoir)Metrics that show who contributes the most to

[Why use business-critical metrics].

  • Paid Conversion Rate (pCVR) Measure the accuracy of your ad targeting by determining if you're just getting a lot of visitors, or if they're qualified customers who will actually make a purchase.
  • Cost per order (CPO) Set a profitable ad budget by making sure that the cost of creating one order is not higher than your product margin.
  • Return Multiple Quickly understand your investment performance with the chunk concept of "how many times has my money multiplied?
  • Price change rate Compare different amounts of investments on the same line (%) to evaluate which investment is better.
  • Ad Contribution Don't be fooled by the illusion of revenue, and be able to calmly determine if a campaign is actually contributing to the company's bank balance.
  • Margin Ratio (%) Determine which products to focus your resources on by measuring the price/performance ratio - how much profit you make with less capital.
  • Percentage of total Understand the forest for the trees by identifying the core drivers of revenue or margin.

When calculating ratios, you must make sure that there are no negative numbers in the data values. If you have a negative number, the sum of all the items will be reduced, making the positive margin a larger percentage of the total, sometimes exceeding 1001 TP3T.

When calculating ratios, discard negative data values when they are negative.


Functions for transforming SQL data

1) nullif & safe_divide

Many key business metrics involve finding ratios. Finding a ratio means dividing by an operation, but you can't find a ratio if the denominator is empty.

In this case, even if only one of these cases occurs, it's an error in terms of the overall data, so even if there is a valid data value, it's still considered an error.

SELECT *,

100 * paid_orders / NULLIF(paid_sessions, 0) AS pcvr,

ad_spend / NULLIF(paid_orders, 0) AS cpo

FROM campaign_performance

ORDER BY campaign_day;

Look only at the part that is relevant to the denominator. The meaning of NULLIF(paid_sessions, 0) is that if the data value in the paid_sessions column is empty, mark it as null; otherwise, use the original data value.

This ensures that the denominator is never empty and the data value is retrieved without causing an error. However, in Bigquery, a special function called SAFE_DIVIDE is used.

SELECT *,

SAFE_DIVIDE(100 * paid_orders, paid_sessions) AS pcvr,

SAFE_DIVIDE(ad_spend, paid_orders) AS cpo

FROM campaign_performance

ORDER BY campaign_day;

SAFE_DIVIDE(100 * paid_orders, paid_sessions) means that SAFE_DIVIDE(x,y) essentially means to divide x by y.

In the y example, this means that if the data in the paid_sessions column is empty, it should be recognized as null and the result should be displayed as null. However, if there is data, it means to proceed with the division as normal.

Using Nullif and the Safe_divide function, you can avoid errors due to division.


2) WITH Clause

To explain the WITH clause, let's go back to the ad attribution, attribution rate part first

ad_contribution (Ad Contribution)

= revenue - ad_spend - (paid_orders×avg_fulfillment_cost) (fulfillment cost)

contribution_rate =100 × ad_contribution/revenue

To get the ad contribution rate, we first need to get the ad contribution (ad_contribution) using a SQL data transformation operation. Then we get the contribution rate.

However, you can't apply it all at once in a single piece of code like this

SELECT *,

revenue - ad_spend - (paid_orders * avg_fulfillment_cost) AS ad_contribution

100 × (ad_contribution/revenue) AS contribution_rate

FROM campaign_performance

In the second line, get the ad contribution (ad_contribution), then in the third line, add the ad contribution and get the ad contribution rate.

However, this results in an error: we can't get the ad_contribution data from the third line because the data value for ad_contribution hasn't been saved yet, so we get an error

This is where the WITH clause comes in. What it does is wrap the original query and give the result a name

WITH campaign_performance_extended AS (

SELECT *,

revenue - ad_spend - (paid_orders * avg_fulfillment_cost) AS ad_contribution

FROM campaign_performance

)

SELECT *,

SAFE_DIVIDE(100 * ad_contribution, revenue) AS contribution_rate

FROM campaign_performance_extended;

The WITH clause is responsible for storing the problematic ad_contribution values in the campaign_performance_extended temporary table

And when we get the contribution_rate in the next line, we'll be able to get the ad_contribution stored in the campaign_performance_extended temporary table.

Let's recap

  • WITH clause creates a temporary table named as the result of the query
  • Temporary table contains both original and newly calculated columns
  • final SELECTfetches data from a temporary table rather than the original table
  • WITH The column alias for the step is the final SELECTdirectly by name in the


3) Scalar Subquery

스칼라 서브쿼리는 다른 쿼리 안에 중첩된 쿼리로, 단일 값 하나만 반환한다. 집계 함수를 행 단위 계산에서 사용할 때 유용하다.

Scalars are called scalars because they return a single value (a scalar) rather than a table or multiple rows. Consider the query below.

SELECT *,

SAFE_DIVIDE(100 * revenue, SUM(revenue)) AS revenue_pot,

FROM category_sales;

Find the percentage of total revenue and put that value into revenue_pot. You might think that's a fine query.

SELECT *, is asking to see all the individual rows, and SUM(revenue) is the sum of all the rows, so you can think of it as a single row. SQL gets confused here, which is why you get the error

The scalar subquery first computes SUM(revenue) here and gets a single value, which SQL can then run without error on the individual rows and any specific value (which is no longer SUM(revenue)).

SELECT *,

SAFE_DIVIDE(100 * revenue, (SELECT SUM(revenue) FROM category_sales)) AS revenue_pot,

FROM category_sales;

The part in red is the scalar subquery. You can think of it as a query within a query. The subquery is computed first, resulting in a specific value. There is no more SUM.

4) Round() function

To adjust the number of decimal places for readability, use the ROUND() function. The example below shows up to two decimal places.

SELECT *,

ROUND(100 * col_a / col_b, 2) AS percentage

FROM table_name;


Finalize

In today's post, we're going to look at some of the key business metrics that can be created through SQL data transformation.

Since data analytics is often used specifically for business, I thought it would be more useful to understand business metrics to be able to tell stories in future dashboards.

We also learned about Nullif, Safe_divide, WITH clauses, scalar subqueries, and other errors that can occur when converting data.

I don't believe that simply learning code and keywords is the first step to becoming a data analyst. I think it's more important to be able to understand and interpret what the data is telling you, and to do that, you need to understand the key metrics that are often used in the business world.

Similar Posts