英文:
How to make the combination of products that a client has in SQL
问题 {#heading}
| Client | Product | |--------|---------| | 1 | A | | 1 | B | | 1 | C | | 2 | A | | 2 | B | | 2 | D | | 3 | A | | 3 | B |
如何在AWS Athena中编写查询以获取这些结果? Athena不支持存储过程。
在这里,我们有以下组合:
AB:3个案例(客户1、2、3)
AC:1个案例(客户1)
AD:1个案例(客户2)
BC:1个案例(客户1)
BD:1个案例(客户2)
CD:0个案例
ABC:1个案例(客户1)
ABD:1个案例(客户2)
我有一个查询可以获取AB的组合,但要获取其他组合,我需要更改过滤器(WHERE)并运行N次。 如果有数百种组合,这是不可行的。 英文:
I have a table thar contains many row with clients and produts that they have.
It is like that:
| Client | Product | |--------|---------| | 1 | A | | 1 | B | | 1 | C | | 2 | A | | 2 | B | | 2 | D | | 3 | A | | 3 | B |
How can a write a query to get that resultrs in AWS Athena?
The Athena doesn't support procedure.
Here, we have the following combinations:
AB: 3 cases(clients 1, 2, 3)
AC: 1 case (client 1)
AD: 1 case (client 2)
BC: 1 case (client 1)
BD: 1 case(client 2)
CD: 0 cases
ABC: 1 cases (client 1)
ABD: 1 case (client 2)
I have a query that get the combination of AB but to get the other I need to change de filters (WHERE) and run N times. If I have hundred of combinations it is unfeasible to do.
答案1 {#1}
得分: 1
你可以尝试使用 combinations
函数,如果在 Athena 中可用的话(它在 Athena 基于的 Trino 中是存在的):
WITH dataset (Client, Product) AS (
values (1, 'A'),
(1, 'B'),
(1, 'C'),
(2, 'A'),
(2, 'B'),
(2, 'D'),
(3, 'A'),
(3, 'B')
)
select p.comb, count(Client) count
from (select comb
from (select array_distinct(array_agg(Product)) arr
from dataset) p,
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) p -- 构建所有产品组合
left join (select client, comb
from (select client, array_distinct(array_agg(Product)) as arr
from dataset
group by client),
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) cp -- 为具体客户构建所有产品组合
on cp.comb = p.comb
group by p.comb
order by cardinality(p.comb), p.comb;
输出结果:
| comb | count | |-------------|-------| | [A, B] | 3 | | [A, C] | 1 | | [A, D] | 1 | | [B, C] | 1 | | [B, D] | 1 | | [C, D] | 0 | | [A, B, C] | 1 | | [A, B, D] | 1 | | [A, C, D] | 0 | | [B, C, D] | 0 |
请注意,combinations
函数在输出中限制为最多 100000 条记录,因此如果实际的产品组合数量更高,这个方法可能不起作用(无论如何,在相对较大数量的产品上,这都会是一个相当昂贵的查询)。
英文:
You can try using combinations
function if it is available in Athena (it is present in Trino on which Athena is based upon):
WITH dataset (Client, Product) AS (
values (1, 'A'),
(1, 'B'),
(1, 'C'),
(2, 'A'),
(2, 'B'),
(2, 'D'),
(3, 'A'),
(3, 'B')
)
select p.comb, count(Client) count
from (select comb
from (select array_distinct(array_agg(Product)) arr
from dataset) p,
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) p -- build all product combinations
left join (select client, comb
from (select client, array_distinct(array_agg(Product)) as arr
from dataset
group by client),
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) cp -- build all product combinations for concrete client
on cp.comb = p.comb
group by p.comb
order by cardinality(p.comb), p.comb;
Output:
| comb | count | |-------------|-------| | [A, B] | 3 | | [A, C] | 1 | | [A, D] | 1 | | [B, C] | 1 | | [B, D] | 1 | | [C, D] | 0 | | [A, B, C] | 1 | | [A, B, D] | 1 | | [A, C, D] | 0 | | [B, C, D] | 0 |
Note that combinations is limited to only 100000 records in output, so if actual number of product combination is higher this will not work (also either way this would be quite expensive query on any relatively big number of products).
答案2 {#2}
得分: 0
根据您提供的信息,我理解您想在Amazon Athena中计算每个客户的所有可能的产品组合。这是一个组合问题,可以通过使用自连接来解决。但是,对于大量唯一产品,这种解决方案在计算上会变得非常昂贵,因为自连接的数量会随着要组合的产品数量呈指数增长。
对于2个产品:
WITH combinations_2 AS (
SELECT a.client AS client,
a.product AS product1,
b.product AS product2
FROM your_table_name a
JOIN your_table_name b
ON a.client = b.client
WHERE a.product < b.product
)
SELECT product1, product2, COUNT(DISTINCT client) AS num_clients
FROM combinations_2
GROUP BY product1, product2;
对于3个产品:
WITH combinations_3 AS (
SELECT a.client AS client,
a.product AS product1,
b.product AS product2,
c.product AS product3
FROM your_table_name a
JOIN your_table_name b
ON a.client = b.client
JOIN your_table_name c
ON a.client = c.client
WHERE a.product < b.product
AND b.product < c.product
)
SELECT product1, product2, product3, COUNT(DISTINCT client) AS num_clients
FROM combinations_3
GROUP BY product1, product2, product3;
由于AWS Athena不支持存储过程或用户定义函数来处理递归或循环,因此使用纯SQL表达任意大小的组合非常具有挑战性。没有一种简单有效的方法来生成未知数量产品的所有组合。 英文:
Based on what you provided, I understood that you want compute all possible combinations of products per client in Amazon Athena. This is a combinatorial problem and can be approached using self-joins. However, this kind of solutions becomes computationally expensive for large numbers of unique products, as the number of self-joins would grow exponentially with the number of products you want to combine.
For 2 products :
WITH combinations_2 AS (
SELECT a.client AS client,
a.product AS product1,
b.product AS product2
FROM your_table_name a
JOIN your_table_name b
ON a.client = b.client
WHERE a.product < b.product
)
SELECT product1, product2, COUNT(DISTINCT client) AS num_clients
FROM combinations_2
GROUP BY product1, product2;
And for 3 products:
WITH combinations_3 AS (
SELECT a.client AS client,
a.product AS product1,
b.product AS product2,
c.product AS product3
FROM your_table_name a
JOIN your_table_name b
ON a.client = b.client
JOIN your_table_name c
ON a.client = c.client
WHERE a.product < b.product
AND b.product < c.product
)
SELECT product1, product2, product3, COUNT(DISTINCT client) AS num_clients
FROM combinations_3
GROUP BY product1, product2, product3;
AWS Athena which don't have the support for stored procedures or user-defined functions to handle recursion or loops, expressing combinations of an arbitrary size is quite challenging. There's no simple, efficient way to generate all combinations of an unknown number of products using pure SQL.
答案3 {#3}
得分: 0
我不太了解Athena,但我想提供一个使用STRING_AGG和CTE的示例,以阐述我可能会如何处理这个问题,以防有所帮助。
`WITH prod_mix AS (
SELECT client,
STRING_AGG(product,' ORDER BY product') as pmix
FROM client_product
GROUP BY client
),
combined AS (
SELECT A.*, B.pmix
FROM client_product A
INNER JOIN prod_mix B
ON A.client = B.client
) ,
client_mix AS (
SELECT pmix,
STRING_AGG(DISTINCT client,' ORDER BY client') AS cmix
FROM combined
GROUP BY pmix
)
SELECT * FROM client_mix;
`
英文:
I don't know much about Athena, but I though I'd throw an example of how I might approach that using STRING_AGG and CTE's, just in case it helps.
WITH prod_mix AS (
SELECT client,
STRING_AGG(product,'' ORDER BY product) as pmix
FROM client_product
GROUP BY client
),
combined AS (
SELECT A.*, B.pmix
FROM client_product A
INNER JOIN prod_mix B
ON A.client = B.client
) ,
client_mix AS (
SELECT pmix,
STRING_AGG(DISTINCT client,'' ORDER BY client) AS cmix
FROM combined
GROUP BY pmix
)
SELECT * FROM client_mix;