pyspark.RDD.top¶

RDD.top(num: int, key: Optional[Callable[[T], S]] = None) → List[T][source]¶

Get the top N elements from an RDD.

New in version 1.0.0.

Parameters

numint: top N
keyfunction, optional: a function used to generate key for comparing

Returns

list: the top N elements

See also

RDD.takeOrdered()
RDD.max()
RDD.min()

Notes

This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver’s memory.

It returns the list sorted in descending order.

Examples

>>> sc.parallelize([10, 4, 2, 12, 3]).top(1)
[12]
>>> sc.parallelize([2, 3, 4, 5, 6], 2).top(2)
[6, 5]
>>> sc.parallelize([10, 4, 2, 12, 3]).top(3, key=str)
[4, 3, 2]

pyspark.RDD.toLocalIterator

pyspark.RDD.treeAggregate