countByValue() And countByKey()

March 20, 2020 countbyvalue pyspark countbyvalue pyspark example difference between countbyvalue and reducebykey countbyvalue spark python countbykey spark example, countByValue/countByKey, PySpark, WordCount

WordCount Example using countByValue()

countByValue():

****countByValue() converts result in a Map collection not a RDD****

data_split.countByValue()

defaultdict(<type 'int'>, {u'good': 2, u'hello': 2, u'morning': 2})

countByKey():

****Count the number of elements for each key.

It counts the value of RDD consisting of two components tuple for each distinct key. It actually counts the number of elements for each key and return the result to the master as lists of (key,count) pairs.****

Example-1:

rdd1 = sc.parallelize((("HR",5),("RD",4),("ADMIN",5),("SALES",4),("SER",6),("MAN",8)))

rdd1.collect()

[('HR', 5), ('RD', 4), ('ADMIN', 5), ('SALES', 4), ('SER', 6), ('MAN', 8)]

rdd1.countByKey().items()

[('SER', 1), ('HR', 1), ('SALES', 1), ('RD', 1), ('ADMIN', 1), ('MAN', 1)]

Example-2:

data = sc.parallelize(("hello", "hello", "how", "how", "are", "are", "you", "hi"))

data.map(lambda x:(x,1)).countByKey().items()

[('how', 2), ('you', 1), ('hi', 1), ('hello', 2), ('are', 2)]

Data Engineering

countByValue() And countByKey()

WordCount Example using countByValue()

No comments:

Post a Comment

Popular

Tags

Pages