51工具盒子

依楼听风雨
笑看云卷云舒,淡观潮起潮落

Changing order in array struct column and adding new elements – SPARK SCALA

英文:

Changing order in array struct column and adding new elements - SPARK SCALA

问题 {#heading}

我有这个schema1:

Changing order in array struct column and adding new elements – SPARK SCALA

我试图对结构中的元素进行排序,并在schema2中的任何位置添加一些新元素,如NULL:

Changing order in array struct column and adding new elements – SPARK SCALA

我尝试过这样做:

df.withColumn("sample",
    expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3, x.elem4, x.elem5, x.elem6, x.elem7))"))

我得到了这个错误:

仅允许折叠的字符串表达式出现在奇数位置,得到:NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder;第1行位置45;

有任何帮助吗?谢谢 英文:

I have this schema1 :

Changing order in array struct column and adding new elements – SPARK SCALA

Im trying to sort the elements on the struct and also add some new elements as NULL in any position as in the schema2:

Changing order in array struct column and adding new elements – SPARK SCALA
I tried this :

df.withColumn("sample",
    expr("transform(sample, x -> struct(x.elem1, x.elem2, 'NULL' as elem2, x.elem3,x.elem4, x.elem5, x.elem6 , x.elem7))"))

I've got this error :

> Only foldable string expressions are allowed to appear at odd position, got: NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder,NamePlaceholder; line 1 pos 45;

Any help? Thanks

答案1 {#1}

得分: 1

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
      show()

works producing:

+-----------------+
|   sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+

using struct will introduce generated column names for the nulls so:

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
      selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
      show()

yields:

+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
|    1|    2|null|    4|
+-----+-----+----+-----+

as such you'll want to specify names via named_struct.

As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on? 英文:

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> named_struct('elem1', x.elem1, 'elem2', x.elem2, 'elem3', null, 'elem4', x.elem4)) sample_rewired").
      show()

works producing:

+-----------------+
|   sample_rewired|
+-----------------+
|[{1, 2, null, 4}]|
+-----------------+

using struct will introduce generated column names for the nulls so:

sparkSession.sql("select array(named_struct('elem4', 4, 'elem1', 1, 'elem2', 2)) sample").
      selectExpr("transform(sample, x -> struct(x.elem1, x.elem2, null, x.elem4)) sample_rewired").
      selectExpr("explode(sample_rewired) no_array").selectExpr("no_array.*").
      show()

yields:

+-----+-----+----+-----+
|elem1|elem2|col3|elem4|
+-----+-----+----+-----+
|    1|    2|null|    4|
+-----+-----+----+-----+

as such you'll want to specify names via named_struct.

As to the error you've got, no idea, it looks odd. Using withColumn and expr also doesn't fail. What Spark version are you on?


赞(0)
未经允许不得转载:工具盒子 » Changing order in array struct column and adding new elements – SPARK SCALA