pyspark.sql.functions.arrays_overlap#
- pyspark.sql.functions.arrays_overlap(a1, a2)[source]#
Collection function: This function returns a boolean column indicating if the input arrays have common non-null elements, returning true if they do, null if the arrays do not contain any common elements but are not empty and at least one of them contains a null element, and false otherwise.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- a1, a2
Column
or str The names of the columns that contain the input arrays.
- a1, a2
- Returns
Column
A new Column of Boolean type, where each value indicates whether the corresponding arrays from the input columns contain any common elements.
Examples
Example 1: Basic usage of arrays_overlap function.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", "b"], ["b", "c"]), (["a"], ["b", "c"])], ['x', 'y']) >>> df.select(sf.arrays_overlap(df.x, df.y)).show() +--------------------+ |arrays_overlap(x, y)| +--------------------+ | true| | false| +--------------------+
Example 2: Usage of arrays_overlap function with arrays containing null elements.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", None], ["b", None]), (["a"], ["b", "c"])], ['x', 'y']) >>> df.select(sf.arrays_overlap(df.x, df.y)).show() +--------------------+ |arrays_overlap(x, y)| +--------------------+ | NULL| | false| +--------------------+
Example 3: Usage of arrays_overlap function with arrays that are null.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(None, ["b", "c"]), (["a"], None)], ['x', 'y']) >>> df.select(sf.arrays_overlap(df.x, df.y)).show() +--------------------+ |arrays_overlap(x, y)| +--------------------+ | NULL| | NULL| +--------------------+
Example 4: Usage of arrays_overlap on arrays with identical elements.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(["a", "b"], ["a", "b"]), (["a"], ["a"])], ['x', 'y']) >>> df.select(sf.arrays_overlap(df.x, df.y)).show() +--------------------+ |arrays_overlap(x, y)| +--------------------+ | true| | true| +--------------------+