Filtering Joining Arrays

Master NumPy array filtering & joining for efficient data preprocessing and manipulation in Machine Learning. Learn Boolean indexing and array concatenation.

Filtering and Joining Arrays in NumPy

NumPy provides powerful tools for manipulating arrays, including filtering elements based on conditions and joining multiple arrays into a single entity. This guide covers these essential operations.

Filtering Arrays in NumPy

Filtering arrays in NumPy allows you to extract specific elements based on given conditions. This capability is vital for tasks such as data preprocessing, analysis, and transformation. NumPy's Boolean indexing is a cornerstone for this, enabling the creation of masks to select only the elements satisfying defined criteria.

Basic Filtering with Boolean Indexing

Boolean indexing works by creating a mask – a Boolean array of the same shape as the original array. True values in the mask indicate elements that meet a specific condition, while False values indicate those that do not. When this mask is applied to the original array, only the elements corresponding to True are returned.

Example 1: Filtering Elements Greater Than 10

import numpy as np

array = np.array([1, 5, 8, 12, 20, 3])
condition = array > 10
filtered_array = array[condition]

print("Original Array:", array)
print("Filtered Array (elements > 10):", filtered_array)

Output:

Original Array: [ 1  5  8 12 20  3]
Filtered Array (elements > 10): [12 20]

Filtering with Multiple Conditions

You can combine multiple conditions using logical operators:

  • &: Logical AND. Both conditions must be true.

  • |: Logical OR. At least one condition must be true.

  • ~: Logical NOT. Inverts the Boolean result of a condition.

Example 2: Filtering Elements Between 5 and 15

To select elements that are greater than 5 AND less than 15:

import numpy as np

array = np.array([1, 5, 8, 12, 20, 3])
condition = (array > 5) & (array < 15)
filtered_array = array[condition]

print("Filtered Array (5 < elements < 15):", filtered_array)

Output:

Filtered Array (5 < elements < 15): [ 8 12]

Filtering with Functions

You can define custom functions that return a Boolean value for each element and then use these functions to create filters.

Example 3: Using np.where() to Filter Elements > 10

np.where() is useful for finding the indices of elements that satisfy a condition. These indices can then be used for filtering.

import numpy as np

array = np.array([1, 5, 8, 12, 20, 3])
condition = array > 10
filtered_indices = np.where(condition)
filtered_array = array[filtered_indices]

print("Filtered Array (elements > 10):", filtered_array)

Output:

Filtered Array (elements > 10): [12 20]

Example 4: Using a Custom Function to Filter Prime Numbers

This example demonstrates filtering an array to find prime numbers using a custom is_prime function.

import numpy as np

array = np.array([10, 15, 20, 25, 30, 35])

def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, int(np.sqrt(num)) + 1):
        if num % i == 0:
            return False
    return True

## Create a Boolean mask by applying is_prime to each element
mask = np.array([is_prime(x) for x in array])
filtered_array = array[mask]

print("Filtered Array (prime numbers):", filtered_array)

Output:

Filtered Array (prime numbers): []

(Note: The provided array in Example 4 does not contain any prime numbers.)

Filtering in Multi-dimensional Arrays

Boolean indexing can be applied to multi-dimensional arrays to filter rows, columns, or elements based on conditions across specific dimensions.

Example 5: Filtering Rows Based on Column Condition

This example filters rows where the element in the second column (axis=1) is greater than 25.

import numpy as np

array = np.array([[10, 20, 30],
                  [15, 25, 35],
                  [20, 30, 40]])

## Condition applied to the second column (index 1)
condition = array[:, 1] > 25
filtered_array = array[condition]

print("Filtered Array:\n", filtered_array)

Output:

Filtered Array:
 [[20 30 40]]

Joining Arrays in NumPy

Joining refers to combining two or more arrays into one. NumPy offers several functions for this purpose, depending on the desired orientation and structure of the combined array.

Using np.concatenate()

np.concatenate() joins arrays along an existing axis. The arrays must have compatible shapes for the specified axis.

Example 1: Concatenating 1D Arrays

import numpy as np

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
result = np.concatenate((array1, array2))

print("Concatenated Array:", result)

Output:

Concatenated Array: [1 2 3 4 5 6]

Example 2: Concatenating 2D Arrays Along Axis 0 and Axis 1

  • axis=0: Stacks arrays vertically (row-wise).

  • axis=1: Stacks arrays horizontally (column-wise).

import numpy as np

array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])

result_axis_0 = np.concatenate((array1, array2), axis=0)
result_axis_1 = np.concatenate((array1, array2), axis=1)

print("Concatenated along Axis 0:\n", result_axis_0)
print("\nConcatenated along Axis 1:\n", result_axis_1)

Output:

Concatenated along Axis 0:
 [[1 2]
  [3 4]
  [5 6]
  [7 8]]

Concatenated along Axis 1:
 [[1 2 5 6]
  [3 4 7 8]]

Example 3: Concatenating Different Dimensional Arrays

To concatenate arrays with different dimensions, you might need to reshape them first to have compatible shapes.

import numpy as np

array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5, 6], [7, 8, 9]])

## Reshape array1 to be a 2D array (1 row, 3 columns)
array1_reshaped = array1.reshape(1, -1)

result = np.concatenate((array1_reshaped, array2), axis=0)

print("Concatenated Array:\n", result)

Output:

Concatenated Array:
 [[1 2 3]
  [4 5 6]
  [7 8 9]]

Using np.stack()

np.stack() joins a sequence of arrays along a new axis. This increases the dimensionality of the resulting array.

Example 4: Stacking 2D Arrays into 3D

Here, two 2D arrays are stacked to form a 3D array. axis=2 means the new axis will be the last dimension.

import numpy as np

array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])

stacked_array = np.stack((array1, array2), axis=2)

print("Stacked Array:\n", stacked_array)

Output:

Stacked Array:
 [[[1 5]
   [2 6]]

  [[3 7]
   [4 8]]]

Example 5: Stacking Multiple 1D Arrays

Stacking multiple 1D arrays along axis=0 results in a 2D array where each original array becomes a row.

import numpy as np

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.array([7, 8, 9])

stacked_array = np.stack((array1, array2, array3), axis=0)

print("Stacked Array:\n", stacked_array)

Output:

Stacked Array:
 [[1 2 3]
  [4 5 6]
  [7 8 9]]

Using np.hstack() for Horizontal Stacking

np.hstack() (horizontal stack) is a convenience function that is equivalent to np.concatenate along axis=1. It stacks arrays column-wise.

Example 6: Horizontally Stacking 2D Arrays

import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])

hstacked_array = np.hstack((array1, array2))

print("Horizontally Stacked Array:\n", hstacked_array)

Output:

Horizontally Stacked Array:
 [[ 1  2  3  7  8  9]
  [ 4  5  6 10 11 12]]

Using np.vstack() for Vertical Stacking

np.vstack() (vertical stack) is a convenience function that is equivalent to np.concatenate along axis=0. It stacks arrays row-wise.

Example 7: Vertically Stacking 2D Arrays

import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])

vstacked_array = np.vstack((array1, array2))

print("Vertically Stacked Array:\n", vstacked_array)

Output:

Vertically Stacked Array:
 [[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]
  [10 11 12]]

Splitting Arrays After Joining

NumPy provides several functions to split arrays back into their original or into new segments:

  • np.split(): Splits an array into multiple sub-arrays along a specified axis.

  • np.array_split(): Similar to np.split(), but allows for unequal splits.

  • np.hsplit(): Splits an array horizontally (column-wise).

  • np.vsplit(): Splits an array vertically (row-wise).

  • np.dsplit(): Splits an array along the third axis (depth).

Example 8: Splitting Vertically Stacked Arrays

This demonstrates how to split an array that was previously vertically stacked using np.vsplit(). The 2 argument indicates splitting into 2 equal parts.

import numpy as np

array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
vstacked_array = np.vstack((array1, array2))

## Split the vertically stacked array into 2 equal parts
split_arrays = np.vsplit(vstacked_array, 2)

print("Split Arrays:")
for arr in split_arrays:
    print(arr)

Output:

Split Arrays:
[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]