pyspark.pandas.Series.nlargest

Series.nlargest(n: int = 5) → pyspark.pandas.series.Series[source]

Return the largest n elements.

Parameters
nint, default 5
Returns
Series

The n largest values in the Series, sorted in decreasing order.

See also

Series.nsmallest

Get the n smallest elements.

Series.sort_values

Sort Series by values.

Series.head

Return the first n rows.

Notes

Faster than .sort_values(ascending=False).head(n) for small n relative to the size of the Series object.

In pandas-on-Spark, thanks to Spark’s lazy execution and query optimizer, the two would have same performance.

Examples

>>>
>>> data = [1, 2, 3, 4, np.nan ,6, 7, 8]
>>> s = ps.Series(data)
>>> s
0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
5    6.0
6    7.0
7    8.0
dtype: float64

The n largest elements where n=5 by default.

>>>
>>> s.nlargest()
7    8.0
6    7.0
5    6.0
3    4.0
2    3.0
dtype: float64
>>>
>>> s.nlargest(n=3)
7    8.0
6    7.0
5    6.0
dtype: float64