Advanced Indexing

Master NumPy advanced indexing for precise array manipulation in ML, AI, and data science. Learn conditional filtering & element selection, returning copies, not views.

NumPy Advanced Indexing

NumPy's advanced indexing capabilities provide precise, flexible, and powerful methods to access and manipulate array elements. Unlike basic slicing, which returns a view of the original array, advanced indexing returns a copy of the selected elements. This distinction is crucial as modifications to advanced-indexed arrays do not affect the original array.

Advanced indexing is essential for:

  • Conditional data filtering

  • Selecting specific rows, columns, or elements based on logical criteria or explicit indices

  • Performing complex data transformations

What is Advanced Indexing in NumPy?

Advanced indexing occurs when you use:

  • Integer arrays: Sequences of integers to specify exact positions.

  • Boolean arrays: Arrays of True/False values to filter elements based on conditions.

  • A tuple containing at least one sequence or an ndarray of integer or Boolean type: This applies when indexing multi-dimensional arrays with multiple integer or Boolean arrays.

It helps you extract data based on logic or exact index positions, making it invaluable in data science, machine learning, and scientific computation workflows.

Types of Advanced Indexing

NumPy supports two primary types of advanced indexing:

  1. Integer Indexing

  2. Boolean Indexing

1. Integer Indexing in NumPy

Integer indexing allows you to select elements from an array using their precise indices. This is particularly powerful when working with multi-dimensional arrays, enabling you to pick elements from specific locations.

When indexing with a tuple of integer arrays, the output is shaped according to the broadcasted shape of these index arrays. The resulting array's element at [i, j, ...] is arr[row_indices[i, j, ...], col_indices[i, j, ...], ...].

Example 1: Selecting Elements Using Integer Arrays

This example demonstrates selecting specific elements from a 2D array using corresponding row and column indices.

import numpy as np

x = np.array([[1, 2],
              [3, 4],
              [5, 6]])

## Select elements at (0,0), (1,1), and (2,0)
row_indices = np.array([0, 1, 2])
col_indices = np.array([0, 1, 0])
y = x[row_indices, col_indices]

print("Original array:\n", x)
print("Selected elements:\n", y)

Output:

Original array:
 [[1 2]
 [3 4]
 [5 6]]
Selected elements:
 [1 4 5]

Example 2: Selecting Corner Elements of a Matrix

This example shows how to select elements from the corners of a larger matrix using integer arrays.

import numpy as np

x = np.array([[ 0,  1,  2],
              [ 3,  4,  5],
              [ 6,  7,  8],
              [ 9, 10, 11]])

## Define row and column indices for the corners
rows = np.array([[0, 0],  # First row: index 0, elements at column 0 and 2
                 [3, 3]]) # Last row: index 3, elements at column 0 and 2
cols = np.array([[0, 2],
                 [0, 2]])

y = x[rows, cols]
print("Selected corner elements:\n", y)

Output:

Selected corner elements:
 [[ 0  2]
  [ 9 11]]

Example 3: Accessing Specific Student Marks

This example simulates accessing the marks of specific students from a 2D array representing student scores.

import numpy as np

marks = np.array([[85, 99, 88],
                  [78, 93, 85],
                  [86, 45, 90]])

## Select marks for students at index 0 and 1, specifically their first two subjects
selected_marks = marks[[0, 1], [0, 1]]
print("Selected marks:\n", selected_marks)

Output:

Selected marks:
 [85 93]

Handling Index Out of Bounds

When using integer indexing, providing an index that is outside the bounds of the array's dimensions will raise an IndexError.

import numpy as np

x = np.array([[0, 1],
              [2, 3],
              [4, 65]])

## Attempting to access an index that does not exist
try:
    print(x[3, 1])
except IndexError as e:
    print(f"Caught expected error: {e}")

Output:

Caught expected error: index 3 is out of bounds for axis 0 with size 3

2. Boolean Indexing in NumPy

Boolean indexing, also known as "fancy indexing with booleans," allows you to filter elements from an array based on a condition. You create a Boolean mask (an array of True/False values) that has the same shape as the original array. Elements corresponding to True in the mask are selected.

Example 4: Boolean Mask for Selection

This example shows basic boolean indexing where a Boolean array directly determines which elements are selected.

import numpy as np

arr = np.array([10, 20, 30, 40, 50])
bool_mask = np.array([True, False, True, False, True])
selected = arr[bool_mask]

print("Original array:", arr)
print("Boolean mask:", bool_mask)
print("Selected elements:", selected)

Output:

Original array: [10 20 30 40 50]
Boolean mask: [ True False  True False  True]
Selected elements: [10 30 50]

Example 5: Filtering Items Greater Than a Value

This is a common use case where you filter elements that satisfy a specific condition.

import numpy as np

x = np.array([[ 0,  1,  2],
              [ 3,  4,  5],
              [ 6,  7,  8],
              [ 9, 10, 11]])

## Create a boolean mask where elements are greater than 5
mask_greater_than_5 = x > 5
selected_elements = x[mask_greater_than_5]

print("Original array:\n", x)
print("Elements > 5:\n", selected_elements)

Output:

Original array:
 [[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]
Elements > 5:
 [ 6  7  8  9 10 11]

Example 6: Removing NaN Values from an Array

Boolean indexing is very effective for cleaning data, such as removing NaN (Not a Number) values.

import numpy as np

a = np.array([np.nan, 1, 2, np.nan, 3, 4, 5])

## Create a boolean mask for non-NaN values
mask_not_nan = ~np.isnan(a)
cleaned_array = a[mask_not_nan]

print("Original array:", a)
print("Cleaned array (NaNs removed):", cleaned_array)

Output:

Original array: [nan  1.  2. nan  3.  4.  5.]
Cleaned array (NaNs removed): [1. 2. 3. 4. 5.]

Example 7: Filtering Complex Numbers

You can use NumPy's ufuncs (universal functions) with boolean indexing to filter elements based on their type or properties.

import numpy as np

a = np.array([1, 2 + 6j, 5, 3.5 + 5j, 7])

## Create a boolean mask for complex numbers
mask_is_complex = np.iscomplex(a)
complex_values = a[mask_is_complex]

print("Original array:", a)
print("Complex numbers:", complex_values)

Output:

Original array: [1.   +0.j      2.   +6.j      5.   +0.j      3.5  +5.j      7.   +0.j    ]
Complex numbers: [2.+6.j 3.5+5.j]

Example 8: Extracting Even Numbers

This demonstrates filtering elements that satisfy a mathematical condition.

import numpy as np

x = np.array([1, 2, 3, 4, 5, 6, 7, 8])

## Create a boolean mask for even numbers
mask_is_even = x % 2 == 0
even_numbers = x[mask_is_even]

print("Original array:", x)
print("Even numbers:", even_numbers)

Output:

Original array: [1 2 3 4 5 6 7 8]
Even numbers: [2 4 6 8]

Summary of Key Differences: Basic vs. Advanced Indexing

| Feature | Basic Indexing | Advanced Indexing | | :--------------- | :--------------------------------- | :------------------------------------------ | | Return Type | View of the original array | Copy of selected data | | Modifications| Affect the original array | Do not affect the original array | | Index Type | Slice objects (:), integers (int) | Integer arrays, Boolean arrays, tuples thereof | | Use Case | Simple subarrays, single elements | Complex conditions, specific element selection |

Conclusion

Advanced indexing in NumPy is a fundamental technique for precise data manipulation and selection. Whether you are selecting elements by their exact positions using integer arrays or filtering data based on complex conditions with Boolean arrays, mastering these concepts will significantly enhance your ability to manage and preprocess data efficiently in any data science or machine learning workflow.