Explode Function in PySpark



Explode Function in PySpark

Lets create a DF with Sample sets of Data:

Example-1:


from pyspark.sql.functions import split, explode

df = spark.createDataFrame([(1, "A", [1,2,3]), (2, "B", [3,5])],["col1", "col2", "col3"])

df.show()
+----+----+---------+                                                    
|col1|col2|     col3|
+----+----+---------+
|   1|   A|[1, 2, 3]|
|   2|   B|   [3, 5]|
+----+----+---------+

df.withColumn("col3", explode(df.col3)).show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   A|   1|
|   1|   A|   2|
|   1|   A|   3|
|   2|   B|   3|
|   2|   B|   5|
+----+----+----+

Example-2:

df = sc.parallelize([(1, 2, 3, 'a b c'),
                     (4, 5, 6, 'd e f'),
                     (7, 8, 9, 'g h i')]).toDF(['col1', 'col2', 'col3','col4'])

df.show()
+----+----+----+-----+
|col1|col2|col3| col4|
+----+----+----+-----+
|   1|   2|   3|a b c|
|   4|   5|   6|d e f|
|   7|   8|   9|g h i|
+----+----+----+-----+


df.withColumn('col4',explode(split('col4',' '))).show()

+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
|   1|   2|   3|   a|
|   1|   2|   3|   b|
|   1|   2|   3|   c|
|   4|   5|   6|   d|
|   4|   5|   6|   e|
|   4|   5|   6|   f|
|   7|   8|   9|   g|
|   7|   8|   9|   h|
|   7|   8|   9|   i|
+----+----+----+----+


Example-3:

df = sc.parallelize([('a','karnatak')]).toDF(['name','city'])


df.show()
+----+--------+
|name|    city|
+----+--------+
|   a|karnatak|
+----+--------+

df.select(explode(split(df["city"], "")),"name").show()
+---+----+
|col|name|
+---+----+
|  k|   a|
|  a|   a|
|  r|   a|
|  n|   a|
|  a|   a|
|  t|   a|
|  a|   a|
|  k|   a|
|   |   a|
+---+----+














No comments:

Post a Comment

Pages