Data Science Python Programming Basics Matplotlib LibraryColor Normalization

Color Normalization

Master Matplotlib colormaps and normalization for effective data visualization in ML. Learn how to map data to colors consistently and interpret your AI insights.

Colormaps and Their Normalization in Matplotlib

Normalization is a crucial process in data visualization, particularly when mapping numerical data to colors. It involves rescaling data values to a common, typically fixed, range. This ensures consistency and allows for meaningful interpretation of visual representations across different datasets and scales.

In Matplotlib, normalization plays a key role in how data points are assigned colors from a colormap. Different normalization techniques cater to various data distributions and visualization goals.

Types of Colormap Normalization in Matplotlib

Matplotlib offers a variety of built-in normalization classes, each designed for specific data characteristics:

Linear Normalization (Default Behavior)
Logarithmic Normalization
Centered Normalization
Symmetric Logarithmic Normalization
Power-law Normalization
Discrete Bounds Normalization
Two-Slope Normalization (Not detailed in the provided raw content, but is another available type)
Custom Normalization

Let's explore the commonly used types:

1. Linear Normalization (`colors.Normalize`)

Linear normalization is the default method used by Matplotlib. It maps data values linearly to the range of a colormap, typically from 0 to 1. This mapping is controlled by the vmin (minimum value) and vmax (maximum value) parameters. Any value below vmin will be mapped to the lowest color in the colormap, and any value above vmax will be mapped to the highest color.

Formula:

$ \text{normalized_value} = \frac{\text{data_value} - vmin}{vmax - vmin} $

Example:

import matplotlib as mpl
from matplotlib.colors import Normalize

## Create a Normalize object with a specified range
norm = Normalize(vmin=-1, vmax=1)

## Normalize a value
normalized_value = norm(0)

## Display the normalized value
print('Normalized Value:', normalized_value)

Output:

Normalized Value: 0.5

Linear normalization is suitable for most datasets with a relatively uniform distribution. However, for data with extreme outliers or highly skewed distributions, non-linear mappings can provide more nuanced and informative visualizations.

2. Logarithmic Normalization (`colors.LogNorm`)

Logarithmic normalization transforms data using a logarithmic scale (typically base-10). This is extremely useful when visualizing data that spans several orders of magnitude or exhibits exponential growth. It compresses larger values, making it easier to distinguish details in smaller values.

Example: Logarithmic vs. Linear Normalization

This example demonstrates how LogNorm can reveal details in data that Normalize might obscure due to scale differences.

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors

## Sample Data
X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))
Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)
## Scale data to have positive values for LogNorm
data_to_plot = Z**2 * 100 + 0.01 # Add a small offset to handle potential zeros

## Create subplots
fig, ax = plt.subplots(1, 2, figsize=(7, 4), layout='constrained')

## Logarithmic Normalization
## vmin is set to a small positive number to avoid log(0)
pc = ax[0].imshow(data_to_plot, cmap='plasma', norm=colors.LogNorm(vmin=0.01, vmax=100))
fig.colorbar(pc, ax=ax[0], extend='both')
ax[0].set_title('Logarithmic Normalization')

## Linear Normalization
pc = ax[1].imshow(data_to_plot, cmap='plasma', norm=colors.Normalize(vmin=0.01, vmax=100))
fig.colorbar(pc, ax=ax[1], extend='both')
ax[1].set_title('Linear Normalization')

plt.show()

3. Centered Normalization (`colors.CenteredNorm`)

Centered normalization is ideal for data that is symmetrical around a central value, such as positive and negative anomalies relative to a mean. The CenteredNorm class automatically maps the central value (by default, 0) to the midpoint of the colormap (0.5), with deviations extending symmetrically in both directions.

Example: Centered Normalization

This example shows how CenteredNorm effectively visualizes data with both positive and negative values using a diverging colormap.

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors, cm

## Sample Data
X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))
Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)

## Select a divergent colormap
cmap = cm.coolwarm

## Create subplots
fig, ax = plt.subplots(1, 2, figsize=(7, 4), layout='constrained')

## Default Linear Normalization
## Matplotlib often infers vmin/vmax for pcolormesh, but explicit can be better
pc = ax[0].pcolormesh(X, Y, Z, cmap=cmap)
fig.colorbar(pc, ax=ax[0])
ax[0].set_title('Default Normalize')

## Centered Normalization
## CenteredNorm automatically uses the mean of Z as the center if not specified
pc = ax[1].pcolormesh(X, Y, Z, norm=colors.CenteredNorm(), cmap=cmap)
fig.colorbar(pc, ax=ax[1])
ax[1].set_title('CenteredNorm()')

plt.show()

4. Symmetric Logarithmic Normalization (`colors.SymLogNorm`)

Symmetric logarithmic normalization is a powerful tool for data that contains both positive and negative values, and where a logarithmic scaling is beneficial for both. It extends the concept of logarithmic scaling to negative numbers by treating them symmetrically around zero. Key parameters include:

linthresh: The threshold below which the linear scale is used. This prevents issues with log(0).
linscale: Scales the linear region.
base: The base of the logarithm.

Example: Symmetric Logarithmic Normalization

This comparison highlights how SymLogNorm handles both positive and negative data with a logarithmic transformation, offering a different perspective than standard linear normalization.

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors, cm

X, Y = np.mgrid[-3:3:complex(0, 128), -2:2:complex(0, 128)]
Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)

## Create subplots
fig, ax = plt.subplots(1, 2, figsize=(7, 4), layout='constrained')

## Symmetric Logarithmic Normalization
## linthresh is crucial for avoiding log(0) and defining the linear region around zero.
pcm = ax[0].pcolormesh(X, Y, Z, norm=colors.SymLogNorm(linthresh=0.03, linscale=0.03, vmin=-1.0, vmax=1.0, base=10), cmap='plasma', shading='auto')
fig.colorbar(pcm, ax=ax[0])
ax[0].set_title('SymLogNorm()')

## Default Linear Normalization for comparison
pcm = ax[1].pcolormesh(X, Y, Z, cmap='plasma', vmin=-np.max(np.abs(Z)), vmax=np.max(np.abs(Z)), shading='auto') # Ensure symmetric range for linear
fig.colorbar(pcm, ax=ax[1])
ax[1].set_title('Normalize')

plt.show()

5. Power-law Normalization (`colors.PowerNorm`)

Power-law normalization remaps colors using a power-law relationship, defined by the gamma parameter. This is useful for data distributions that are not well-represented by linear or logarithmic scales, allowing for flexible adjustments to the color mapping.

Example: Power-law Normalization

This example shows how PowerNorm with gamma=0.5 (square root) can alter the color mapping compared to linear normalization.

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors, cm

X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))
Z = (1 + np.sin(Y * 10.)) * X**2

## Create subplots
fig, ax = plt.subplots(1, 2, figsize=(7, 4), layout='constrained')

## Power-law Normalization
pcm = ax[0].pcolormesh(X, Y, Z, norm=colors.PowerNorm(gamma=0.5), cmap='PuBu_r', shading='auto')
fig.colorbar(pcm, ax=ax[0])
ax[0].set_title('PowerNorm(gamma=0.5)')

## Default Linear Normalization
pcm = ax[1].pcolormesh(X, Y, Z, cmap='PuBu_r', shading='auto')
fig.colorbar(pcm, ax=ax[1])
ax[1].set_title('Normalize')

plt.show()

6. Discrete Bounds Normalization (`colors.BoundaryNorm`)

Discrete bounds normalization is used when you want to map data values to specific, discrete color bins rather than a continuous gradient. It defines a set of boundaries, and values falling between these boundaries are assigned a single color. The number of colors used is determined by ncolors.

Example: Discrete Bounds Normalization

This example illustrates how BoundaryNorm creates distinct color bands based on specified data ranges.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as colors

X, Y = np.meshgrid(np.linspace(-3, 3, 128), np.linspace(-3, 3, 128))
Z = (1 + np.sin(Y * 10.)) * X**2

fig, ax = plt.subplots(2, 2, figsize=(7, 6), layout='constrained')
ax = ax.flatten()

## Default norm: Continuous gradient
pcm = ax[0].pcolormesh(X, Y, Z, cmap='RdBu_r')
fig.colorbar(pcm, ax=ax[0], orientation='vertical')
ax[0].set_title('Default norm')

## BoundaryNorm with specified bounds:
## Creates 7 distinct bins, using 256 colors (ncolors=256) which will be evenly distributed across these bins.
bounds = np.linspace(-1.5, 1.5, 7)
norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256)
pcm = ax[1].pcolormesh(X, Y, Z, norm=norm, cmap='RdBu_r')
fig.colorbar(pcm, ax=ax[1], extend='both', orientation='vertical')
ax[1].set_title('BoundaryNorm: 7 boundaries')

## Another example with fewer colors to emphasize discrete nature
bounds_2 = [-1.5, -0.75, 0, 0.75, 1.5]
norm_2 = colors.BoundaryNorm(boundaries=bounds_2, ncolors=len(bounds_2)-1) # Use ncolors matching number of bins
pcm = ax[2].pcolormesh(X, Y, Z, norm=norm_2, cmap='viridis')
fig.colorbar(pcm, ax=ax[2], extend='both', orientation='vertical')
ax[2].set_title('BoundaryNorm: 4 discrete colors')

## Example with a different cmap and fewer colors
bounds_3 = [-2, -1, 0, 1, 2]
norm_3 = colors.BoundaryNorm(boundaries=bounds_3, ncolors=len(bounds_3)-1)
pcm = ax[3].pcolormesh(X, Y, Z, norm=norm_3, cmap='coolwarm')
fig.colorbar(pcm, ax=ax[3], extend='both', orientation='vertical')
ax[3].set_title('BoundaryNorm: 4 discrete colors (coolwarm)')


plt.show()

Conclusion

Matplotlib provides a robust suite of normalization techniques to effectively visualize diverse datasets. The choice of normalization method is critical and should be guided by the underlying data distribution, the presence of outliers, and the specific insights you aim to convey. Understanding these options allows for more accurate, informative, and aesthetically pleasing data representations.