Explode Function in PySpark
Lets create a DF with Sample sets of Data:Example-1:
from pyspark.sql.functions import split, explode
df = spark.createDataFrame([(1, "A", [1,2,3]), (2, "B", [3,5])],["col1", "col2", "col3"])
df.show()
+----+----+---------+
|col1|col2| col3|
+----+----+---------+
| 1| A|[1, 2, 3]|
| 2| B| [3, 5]|
+----+----+---------+
df.withColumn("col3", explode(df.col3)).show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| A| 1|
| 1| A| 2|
| 1| A| 3|
| 2| B| 3|
| 2| B| 5|
+----+----+----+
Example-2:
df = sc.parallelize([(1, 2, 3, 'a b c'),
(4, 5, 6, 'd e f'),
(7, 8, 9, 'g h i')]).toDF(['col1', 'col2', 'col3','col4'])
df.show()
+----+----+----+-----+
|col1|col2|col3| col4|
+----+----+----+-----+
| 1| 2| 3|a b c|
| 4| 5| 6|d e f|
| 7| 8| 9|g h i|
+----+----+----+-----+
df.withColumn('col4',explode(split('col4',' '))).show()
+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 1| 2| 3| a|
| 1| 2| 3| b|
| 1| 2| 3| c|
| 4| 5| 6| d|
| 4| 5| 6| e|
| 4| 5| 6| f|
| 7| 8| 9| g|
| 7| 8| 9| h|
| 7| 8| 9| i|
+----+----+----+----+
Example-3:
df = sc.parallelize([('a','karnatak')]).toDF(['name','city'])
df.show()
+----+--------+
|name| city|
+----+--------+
| a|karnatak|
+----+--------+
df.select(explode(split(df["city"], "")),"name").show()
+---+----+
|col|name|
+---+----+
| k| a|
| a| a|
| r| a|
| n| a|
| a| a|
| t| a|
| a| a|
| k| a|
| | a|
+---+----+
No comments:
Post a Comment