Filtering Joining Arrays
Master NumPy array filtering & joining for efficient data preprocessing and manipulation in Machine Learning. Learn Boolean indexing and array concatenation.
Filtering and Joining Arrays in NumPy
NumPy provides powerful tools for manipulating arrays, including filtering elements based on conditions and joining multiple arrays into a single entity. This guide covers these essential operations.
Filtering Arrays in NumPy
Filtering arrays in NumPy allows you to extract specific elements based on given conditions. This capability is vital for tasks such as data preprocessing, analysis, and transformation. NumPy's Boolean indexing is a cornerstone for this, enabling the creation of masks to select only the elements satisfying defined criteria.
Basic Filtering with Boolean Indexing
Boolean indexing works by creating a mask – a Boolean array of the same shape as the original array. True
values in the mask indicate elements that meet a specific condition, while False
values indicate those that do not. When this mask is applied to the original array, only the elements corresponding to True
are returned.
Example 1: Filtering Elements Greater Than 10
import numpy as np
array = np.array([1, 5, 8, 12, 20, 3])
condition = array > 10
filtered_array = array[condition]
print("Original Array:", array)
print("Filtered Array (elements > 10):", filtered_array)
Output:
Original Array: [ 1 5 8 12 20 3]
Filtered Array (elements > 10): [12 20]
Filtering with Multiple Conditions
You can combine multiple conditions using logical operators:
&
: Logical AND. Both conditions must be true.|
: Logical OR. At least one condition must be true.~
: Logical NOT. Inverts the Boolean result of a condition.
Example 2: Filtering Elements Between 5 and 15
To select elements that are greater than 5 AND less than 15:
import numpy as np
array = np.array([1, 5, 8, 12, 20, 3])
condition = (array > 5) & (array < 15)
filtered_array = array[condition]
print("Filtered Array (5 < elements < 15):", filtered_array)
Output:
Filtered Array (5 < elements < 15): [ 8 12]
Filtering with Functions
You can define custom functions that return a Boolean value for each element and then use these functions to create filters.
Example 3: Using np.where()
to Filter Elements > 10
np.where()
is useful for finding the indices of elements that satisfy a condition. These indices can then be used for filtering.
import numpy as np
array = np.array([1, 5, 8, 12, 20, 3])
condition = array > 10
filtered_indices = np.where(condition)
filtered_array = array[filtered_indices]
print("Filtered Array (elements > 10):", filtered_array)
Output:
Filtered Array (elements > 10): [12 20]
Example 4: Using a Custom Function to Filter Prime Numbers
This example demonstrates filtering an array to find prime numbers using a custom is_prime
function.
import numpy as np
array = np.array([10, 15, 20, 25, 30, 35])
def is_prime(num):
if num <= 1:
return False
for i in range(2, int(np.sqrt(num)) + 1):
if num % i == 0:
return False
return True
## Create a Boolean mask by applying is_prime to each element
mask = np.array([is_prime(x) for x in array])
filtered_array = array[mask]
print("Filtered Array (prime numbers):", filtered_array)
Output:
Filtered Array (prime numbers): []
(Note: The provided array
in Example 4 does not contain any prime numbers.)
Filtering in Multi-dimensional Arrays
Boolean indexing can be applied to multi-dimensional arrays to filter rows, columns, or elements based on conditions across specific dimensions.
Example 5: Filtering Rows Based on Column Condition
This example filters rows where the element in the second column (axis=1
) is greater than 25.
import numpy as np
array = np.array([[10, 20, 30],
[15, 25, 35],
[20, 30, 40]])
## Condition applied to the second column (index 1)
condition = array[:, 1] > 25
filtered_array = array[condition]
print("Filtered Array:\n", filtered_array)
Output:
Filtered Array:
[[20 30 40]]
Joining Arrays in NumPy
Joining refers to combining two or more arrays into one. NumPy offers several functions for this purpose, depending on the desired orientation and structure of the combined array.
Using np.concatenate()
np.concatenate()
joins arrays along an existing axis. The arrays must have compatible shapes for the specified axis.
Example 1: Concatenating 1D Arrays
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
result = np.concatenate((array1, array2))
print("Concatenated Array:", result)
Output:
Concatenated Array: [1 2 3 4 5 6]
Example 2: Concatenating 2D Arrays Along Axis 0 and Axis 1
axis=0
: Stacks arrays vertically (row-wise).axis=1
: Stacks arrays horizontally (column-wise).
import numpy as np
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])
result_axis_0 = np.concatenate((array1, array2), axis=0)
result_axis_1 = np.concatenate((array1, array2), axis=1)
print("Concatenated along Axis 0:\n", result_axis_0)
print("\nConcatenated along Axis 1:\n", result_axis_1)
Output:
Concatenated along Axis 0:
[[1 2]
[3 4]
[5 6]
[7 8]]
Concatenated along Axis 1:
[[1 2 5 6]
[3 4 7 8]]
Example 3: Concatenating Different Dimensional Arrays
To concatenate arrays with different dimensions, you might need to reshape them first to have compatible shapes.
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5, 6], [7, 8, 9]])
## Reshape array1 to be a 2D array (1 row, 3 columns)
array1_reshaped = array1.reshape(1, -1)
result = np.concatenate((array1_reshaped, array2), axis=0)
print("Concatenated Array:\n", result)
Output:
Concatenated Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Using np.stack()
np.stack()
joins a sequence of arrays along a new axis. This increases the dimensionality of the resulting array.
Example 4: Stacking 2D Arrays into 3D
Here, two 2D arrays are stacked to form a 3D array. axis=2
means the new axis will be the last dimension.
import numpy as np
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])
stacked_array = np.stack((array1, array2), axis=2)
print("Stacked Array:\n", stacked_array)
Output:
Stacked Array:
[[[1 5]
[2 6]]
[[3 7]
[4 8]]]
Example 5: Stacking Multiple 1D Arrays
Stacking multiple 1D arrays along axis=0
results in a 2D array where each original array becomes a row.
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.array([7, 8, 9])
stacked_array = np.stack((array1, array2, array3), axis=0)
print("Stacked Array:\n", stacked_array)
Output:
Stacked Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Using np.hstack()
for Horizontal Stacking
np.hstack()
(horizontal stack) is a convenience function that is equivalent to np.concatenate
along axis=1
. It stacks arrays column-wise.
Example 6: Horizontally Stacking 2D Arrays
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
hstacked_array = np.hstack((array1, array2))
print("Horizontally Stacked Array:\n", hstacked_array)
Output:
Horizontally Stacked Array:
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
Using np.vstack()
for Vertical Stacking
np.vstack()
(vertical stack) is a convenience function that is equivalent to np.concatenate
along axis=0
. It stacks arrays row-wise.
Example 7: Vertically Stacking 2D Arrays
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
vstacked_array = np.vstack((array1, array2))
print("Vertically Stacked Array:\n", vstacked_array)
Output:
Vertically Stacked Array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Splitting Arrays After Joining
NumPy provides several functions to split arrays back into their original or into new segments:
np.split()
: Splits an array into multiple sub-arrays along a specified axis.np.array_split()
: Similar tonp.split()
, but allows for unequal splits.np.hsplit()
: Splits an array horizontally (column-wise).np.vsplit()
: Splits an array vertically (row-wise).np.dsplit()
: Splits an array along the third axis (depth).
Example 8: Splitting Vertically Stacked Arrays
This demonstrates how to split an array that was previously vertically stacked using np.vsplit()
. The 2
argument indicates splitting into 2 equal parts.
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
vstacked_array = np.vstack((array1, array2))
## Split the vertically stacked array into 2 equal parts
split_arrays = np.vsplit(vstacked_array, 2)
print("Split Arrays:")
for arr in split_arrays:
print(arr)
Output:
Split Arrays:
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]