pyspark.sql.functions.arrays_overlap#

pyspark.sql.functions.arrays_overlap(a1, a2)[source]#

Collection function: This function returns a boolean column indicating if the input arrays have common non-null elements, returning true if they do, null if the arrays do not contain any common elements but are not empty and at least one of them contains a null element, and false otherwise.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
a1, a2Column or str

The names of the columns that contain the input arrays.

Returns
Column

A new Column of Boolean type, where each value indicates whether the corresponding arrays from the input columns contain any common elements.

Examples

Example 1: Basic usage of arrays_overlap function.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b"], ["b", "c"]), (["a"], ["b", "c"])], ['x', 'y'])
>>> df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                true|
|               false|
+--------------------+

Example 2: Usage of arrays_overlap function with arrays containing null elements.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None], ["b", None]), (["a"], ["b", "c"])], ['x', 'y'])
>>> df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                NULL|
|               false|
+--------------------+

Example 3: Usage of arrays_overlap function with arrays that are null.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(None, ["b", "c"]), (["a"], None)], ['x', 'y'])
>>> df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                NULL|
|                NULL|
+--------------------+

Example 4: Usage of arrays_overlap on arrays with identical elements.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b"], ["a", "b"]), (["a"], ["a"])], ['x', 'y'])
>>> df.select(sf.arrays_overlap(df.x, df.y)).show()
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
|                true|
|                true|
+--------------------+