51工具盒子

依楼听风雨
笑看云卷云舒,淡观潮起潮落

PySpark 使用 OR 运算符在筛选中

英文:

PySpark using OR operator in filter

问题 {#heading}

这个过滤器有效:

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego'))

然而,当我扩展到包括其他城市时:

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') || array_contains(col("country.state.city"), 'Sacramento') || array_contains(col("country.state.city"), 'Los Angeles'))

我收到了SyntaxError: invalid syntax错误。

我还尝试过:

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego' || 'Sacramento' || 'Los Angeles'))

但这也返回了SyntaxError: invalid syntax错误。

在Spark中正确使用OR运算符来过滤加利福尼亚城市数据的方法是什么? 英文:

I have an array that I am indexing to filter for data from cities of California.

This filter works:
raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego'))

However, when I expand to include other cities:

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') || array_contains(col("country.state.city"), 'Sacramento') || array_contains(col("country.state.city"), 'Los Angeles'))

I get SyntaxError: invalid syntax

I have also tried

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego' || 'Sacramento' || 'Los Angeles'))

but this also returns SyntaxError: invalid syntax

What is the correct usage of the OR operator in Spark to filter data from Californian cities?

答案1 {#1}

得分: 0

逻辑或使用一个竖直的竖杠|)。

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') | array_contains(col("country.state.city"), 'Sacramento') | array_contains(col("country.state.city"), 'Los Angeles'))

英文:

Logical OR uses a single vertical bar (|).

raw_df_2 = raw_df_1.filter(array_contains(col("country.state.city"), 'San Diego') | array_contains(col("country.state.city"), 'Sacramento') | array_contains(col("country.state.city"), 'Los Angeles'))

赞(0)
未经允许不得转载:工具盒子 » PySpark 使用 OR 运算符在筛选中