Data Science Python Programming Basics Matplotlib LibraryMultiprocessing Techniques

Multiprocessing Techniques

Boost your data visualization! Learn how to use Python

Multiprocessing with Matplotlib

Multiprocessing is a powerful technique for executing multiple processes concurrently, effectively leveraging multi-core processors to significantly improve performance. While Matplotlib is traditionally single-threaded, integrating it with Python's multiprocessing module unlocks several benefits:

Parallel Plot Generation: Create multiple plots simultaneously, drastically reducing the time required for generating numerous visualizations.
Concurrent Figure Saving: Save multiple figures at the same time, enhancing the efficiency of exporting plots.
Decoupled Data Processing and Plotting: Separate data generation and processing from the plotting tasks, leading to a more organized and performant workflow.

1. Creating Multiple Matplotlib Plots in Parallel

Generating plots sequentially can be a bottleneck, especially when dealing with large datasets or complex visualizations. Multiprocessing speeds up this process by allowing the creation of multiple plots concurrently.

Example: Creating Multiple Plots Using Multiprocessing

This example demonstrates how to create and display four scatter plots in parallel.

import matplotlib.pyplot as plt
import numpy as np
import multiprocessing

## Function to create and display a single plot
def plot(datax, datay, name):
    """Creates a scatter plot with a given dataset and label."""
    x = datax
    y = datay**2  # Example: plotting y as the square of datay
    plt.scatter(x, y, label=name)
    plt.title(f"Plot for {name}")
    plt.xlabel("X-axis")
    plt.ylabel("Y-axis (X^2)")
    plt.legend()
    plt.show() # This will display each plot individually

## Function to initiate multiprocessing
def multiP():
    """Starts multiple plot processes."""
    print("Starting parallel plot generation...")
    processes = []
    for i in range(4):
        # Create a process for each plot
        # Using simple sequential data for demonstration
        p = multiprocessing.Process(target=plot, args=(np.arange(i*10, (i+1)*10), np.arange(i*10, (i+1)*10), f"Dataset {i+1}"))
        processes.append(p)
        p.start()

    # Wait for all processes to complete
    for p in processes:
        p.join()
    print("All plots generated.")

if __name__ == "__main__":
    # Ensure this block runs only when the script is executed directly
    input('Press Enter to start parallel plotting...')
    multiP()

Output:

This code will launch four separate windows, each displaying a unique scatter plot generated concurrently. Each plot will correspond to a different dataset and will be labeled accordingly.

2. Saving Multiple Matplotlib Figures Concurrently

Saving multiple figures one by one can be time-consuming. Multiprocessing significantly improves this by allowing you to save several plots simultaneously, boosting overall efficiency.

Example: Saving Multiple Figures Using Multiprocessing

This example utilizes a multiprocessing.Pool to concurrently generate and save four distinct figures.

import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool
import os

## Function to generate and save a plot
def do_plot(number):
    """Generates random data, creates a scatter plot, and saves it."""
    try:
        # Create a new figure for each plot
        fig = plt.figure(number)

        # Generate random data
        a = random.sample(range(1000), 100) # 100 random numbers between 0 and 999
        b = random.sample(range(1000), 100)
        plt.scatter(a, b)
        plt.title(f"Random Scatter Plot {number}")
        plt.xlabel("Random Data A")
        plt.ylabel("Random Data B")

        # Define the filename with zero-padding
        filename = f"{number:03d}.jpg"

        # Save the figure
        plt.savefig(filename)
        plt.close(fig) # Close the figure to free up memory

        print(f"Image {filename} saved successfully...")
        return f"Saved {filename}"
    except Exception as e:
        print(f"Error saving image {number}: {e}")
        return f"Failed to save {number}"

if __name__ == '__main__':
    print("Starting concurrent figure saving...")
    # Using a Pool with default number of workers (usually CPU count)
    with Pool() as pool:
        # Map the do_plot function to a range of numbers
        results = pool.map(do_plot, range(1, 5)) # Process numbers 1 through 4

    print("\n--- Saving Results ---")
    for res in results:
        print(res)
    print("All specified images saved.")

    # Optional: List the saved files to verify
    print("\nSaved files:")
    for f in os.listdir("."):
        if f.endswith(".jpg"):
            print(f)

Output:

This script will create and save four image files: 001.jpg, 002.jpg, 003.jpg, and 004.jpg. The saving process for these files will occur concurrently, demonstrating the efficiency gain.

3. Generating Data in One Process and Plotting in Another

Multiprocessing allows for a clear separation of concerns, such as performing computationally intensive data generation in one process while handling the plotting in a separate, dedicated process. This approach can lead to smoother interactive experiences and better resource management.

Example: Using Multiprocessing for Data Generation and Plotting

This advanced example uses inter-process communication (via multiprocessing.Pipe) to send data from a generator process to a plotting process that updates a Matplotlib figure dynamically.

import multiprocessing as mp
import time
import matplotlib.pyplot as plt
import numpy as np

## Ensure reproducible random data generation
np.random.seed(19680801)

class ProcessPlotter:
    """
    Manages the plotting process. Receives data via a pipe and updates
    a Matplotlib plot dynamically.
    """
    def __init__(self):
        self.x_data = []
        self.y_data = []
        self._is_running = True

    def terminate(self):
        """Closes all plot windows."""
        print("Terminating plotter...")
        plt.close('all')
        self._is_running = False

    def call_back(self):
        """
        Callback function for the timer. Checks the pipe for new data
        and updates the plot.
        """
        if not self._is_running:
            return False # Stop the timer if the process is terminated

        while self.pipe.poll(): # Check if there's data in the pipe
            command = self.pipe.recv()
            if command is None: # Signal to terminate
                self.terminate()
                return False
            else:
                # Append new data point
                self.x_data.append(command[0])
                self.y_data.append(command[1])
                # Update the plot data
                self.ax.plot(self.x_data, self.y_data, 'ro-') # 'ro-' for red circles and lines
        self.fig.canvas.draw() # Redraw the canvas to show updates
        return True # Continue the timer

    def __call__(self, pipe):
        """
        The main execution method for the plotter process.
        Sets up the Matplotlib figure and timer.
        """
        print('Plotter process: Starting...')
        self.pipe = pipe
        self.fig, self.ax = plt.subplots()
        self.ax.set_title("Dynamic Plotting with Multiprocessing")
        self.ax.set_xlabel("Time (arbitrary units)")
        self.ax.set_ylabel("Random Value")

        # Create a timer to periodically call the callback function
        # interval in milliseconds
        timer = self.fig.canvas.new_timer(interval=500)
        timer.add_callback(self.call_back)
        timer.start()

        print('Plotter process: ...done')
        plt.show() # Display the plot window and start the event loop
        print('Plotter process: exited plt.show()')


class NBPlot:
    """
    Manages the creation of a separate process for plotting and
    sends data to it via a pipe.
    """
    def __init__(self):
        # Create a pipe for communication: plot_pipe for this process, plotter_pipe for the plotter process
        self.plot_pipe, plotter_pipe = mp.Pipe()

        # Instantiate the plotter class
        self.plotter = ProcessPlotter()

        # Create a separate process for the plotter
        # The plotter process will execute the plotter instance's __call__ method
        self.plot_process = mp.Process(
            target=self.plotter, args=(plotter_pipe,), daemon=True # daemon=True allows main process to exit even if this one is running
        )
        self.plot_process.start()
        print("Main process: Plotting process started.")

    def plot(self, finished=False):
        """
        Sends data to the plotter process or a termination signal.
        """
        send_to_plotter = self.plot_pipe.send
        if finished:
            send_to_plotter(None) # Send None to signal termination
            print("Main process: Sent termination signal.")
        else:
            # Generate a random 2D data point
            data = np.random.random(2)
            send_to_plotter(data) # Send the data point

## Main function to coordinate data generation and plotting
def main_with_multiprocessing():
    """Orchestrates the data generation and plotting using multiprocessing."""
    nb_plotter = NBPlot()
    print("Main process: Generating data points for 10 seconds...")
    for i in range(20): # Generate 20 data points over time
        nb_plotter.plot()
        time.sleep(0.5) # Wait for half a second between data points

    print("Main process: All data points generated. Sending termination signal.")
    nb_plotter.plot(finished=True) # Signal the plotter to finish

    # Give a moment for the plotter process to receive the signal and close
    time.sleep(2)
    print("Main process: Exiting.")


if __name__ == '__main__':
    # On some systems (like macOS), 'forkserver' or 'spawn' start methods are preferred
    # for multiprocessing, especially when using GUI toolkits like Matplotlib.
    # 'fork' can sometimes lead to issues.
    if plt.get_backend() == "MacOSX":
        mp.set_start_method("forkserver", force=True)
        print("Set multiprocessing start method to 'forkserver'.")
    elif plt.get_backend() in ["Qt5Agg", "TkAgg", "GTK3Agg"]: # Other GUI backends might also benefit
         mp.set_start_method("spawn", force=True) # 'spawn' is generally safe
         print(f"Set multiprocessing start method to 'spawn' for backend {plt.get_backend()}.")

    input('Press Enter to start the integrated data generation and plotting example...')
    main_with_multiprocessing()

Output:

When executed, this example will:

Open a Matplotlib window.
A separate process will continuously generate random (x, y) data points.
These data points will be sent to the plotting process via a pipe.
The plotting process will receive the data and dynamically update the Matplotlib plot by adding new points and connecting them with lines.
After a short period, a termination signal will be sent, causing the plot to finalize and the windows to close.

Conclusion

By employing Python's multiprocessing module with Matplotlib, developers can unlock significant performance improvements and create more sophisticated, efficient, and responsive data visualization applications. The key benefits include:

Accelerated Visualization: Achieve faster execution by generating and displaying multiple plots in parallel.
Streamlined Export: Improve workflow efficiency by saving numerous figures concurrently.
Enhanced Performance Architectures: Design applications where data processing and visualization tasks are cleanly separated, leading to better modularity and performance.