Multiprocessing Techniques
Boost your data visualization! Learn how to use Python
Multiprocessing with Matplotlib
Multiprocessing is a powerful technique for executing multiple processes concurrently, effectively leveraging multi-core processors to significantly improve performance. While Matplotlib is traditionally single-threaded, integrating it with Python's multiprocessing
module unlocks several benefits:
Parallel Plot Generation: Create multiple plots simultaneously, drastically reducing the time required for generating numerous visualizations.
Concurrent Figure Saving: Save multiple figures at the same time, enhancing the efficiency of exporting plots.
Decoupled Data Processing and Plotting: Separate data generation and processing from the plotting tasks, leading to a more organized and performant workflow.
1. Creating Multiple Matplotlib Plots in Parallel
Generating plots sequentially can be a bottleneck, especially when dealing with large datasets or complex visualizations. Multiprocessing speeds up this process by allowing the creation of multiple plots concurrently.
Example: Creating Multiple Plots Using Multiprocessing
This example demonstrates how to create and display four scatter plots in parallel.
import matplotlib.pyplot as plt
import numpy as np
import multiprocessing
## Function to create and display a single plot
def plot(datax, datay, name):
"""Creates a scatter plot with a given dataset and label."""
x = datax
y = datay**2 # Example: plotting y as the square of datay
plt.scatter(x, y, label=name)
plt.title(f"Plot for {name}")
plt.xlabel("X-axis")
plt.ylabel("Y-axis (X^2)")
plt.legend()
plt.show() # This will display each plot individually
## Function to initiate multiprocessing
def multiP():
"""Starts multiple plot processes."""
print("Starting parallel plot generation...")
processes = []
for i in range(4):
# Create a process for each plot
# Using simple sequential data for demonstration
p = multiprocessing.Process(target=plot, args=(np.arange(i*10, (i+1)*10), np.arange(i*10, (i+1)*10), f"Dataset {i+1}"))
processes.append(p)
p.start()
# Wait for all processes to complete
for p in processes:
p.join()
print("All plots generated.")
if __name__ == "__main__":
# Ensure this block runs only when the script is executed directly
input('Press Enter to start parallel plotting...')
multiP()
Output:
This code will launch four separate windows, each displaying a unique scatter plot generated concurrently. Each plot will correspond to a different dataset and will be labeled accordingly.
2. Saving Multiple Matplotlib Figures Concurrently
Saving multiple figures one by one can be time-consuming. Multiprocessing significantly improves this by allowing you to save several plots simultaneously, boosting overall efficiency.
Example: Saving Multiple Figures Using Multiprocessing
This example utilizes a multiprocessing.Pool
to concurrently generate and save four distinct figures.
import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool
import os
## Function to generate and save a plot
def do_plot(number):
"""Generates random data, creates a scatter plot, and saves it."""
try:
# Create a new figure for each plot
fig = plt.figure(number)
# Generate random data
a = random.sample(range(1000), 100) # 100 random numbers between 0 and 999
b = random.sample(range(1000), 100)
plt.scatter(a, b)
plt.title(f"Random Scatter Plot {number}")
plt.xlabel("Random Data A")
plt.ylabel("Random Data B")
# Define the filename with zero-padding
filename = f"{number:03d}.jpg"
# Save the figure
plt.savefig(filename)
plt.close(fig) # Close the figure to free up memory
print(f"Image {filename} saved successfully...")
return f"Saved {filename}"
except Exception as e:
print(f"Error saving image {number}: {e}")
return f"Failed to save {number}"
if __name__ == '__main__':
print("Starting concurrent figure saving...")
# Using a Pool with default number of workers (usually CPU count)
with Pool() as pool:
# Map the do_plot function to a range of numbers
results = pool.map(do_plot, range(1, 5)) # Process numbers 1 through 4
print("\n--- Saving Results ---")
for res in results:
print(res)
print("All specified images saved.")
# Optional: List the saved files to verify
print("\nSaved files:")
for f in os.listdir("."):
if f.endswith(".jpg"):
print(f)
Output:
This script will create and save four image files: 001.jpg
, 002.jpg
, 003.jpg
, and 004.jpg
. The saving process for these files will occur concurrently, demonstrating the efficiency gain.
3. Generating Data in One Process and Plotting in Another
Multiprocessing allows for a clear separation of concerns, such as performing computationally intensive data generation in one process while handling the plotting in a separate, dedicated process. This approach can lead to smoother interactive experiences and better resource management.
Example: Using Multiprocessing for Data Generation and Plotting
This advanced example uses inter-process communication (via multiprocessing.Pipe
) to send data from a generator process to a plotting process that updates a Matplotlib figure dynamically.
import multiprocessing as mp
import time
import matplotlib.pyplot as plt
import numpy as np
## Ensure reproducible random data generation
np.random.seed(19680801)
class ProcessPlotter:
"""
Manages the plotting process. Receives data via a pipe and updates
a Matplotlib plot dynamically.
"""
def __init__(self):
self.x_data = []
self.y_data = []
self._is_running = True
def terminate(self):
"""Closes all plot windows."""
print("Terminating plotter...")
plt.close('all')
self._is_running = False
def call_back(self):
"""
Callback function for the timer. Checks the pipe for new data
and updates the plot.
"""
if not self._is_running:
return False # Stop the timer if the process is terminated
while self.pipe.poll(): # Check if there's data in the pipe
command = self.pipe.recv()
if command is None: # Signal to terminate
self.terminate()
return False
else:
# Append new data point
self.x_data.append(command[0])
self.y_data.append(command[1])
# Update the plot data
self.ax.plot(self.x_data, self.y_data, 'ro-') # 'ro-' for red circles and lines
self.fig.canvas.draw() # Redraw the canvas to show updates
return True # Continue the timer
def __call__(self, pipe):
"""
The main execution method for the plotter process.
Sets up the Matplotlib figure and timer.
"""
print('Plotter process: Starting...')
self.pipe = pipe
self.fig, self.ax = plt.subplots()
self.ax.set_title("Dynamic Plotting with Multiprocessing")
self.ax.set_xlabel("Time (arbitrary units)")
self.ax.set_ylabel("Random Value")
# Create a timer to periodically call the callback function
# interval in milliseconds
timer = self.fig.canvas.new_timer(interval=500)
timer.add_callback(self.call_back)
timer.start()
print('Plotter process: ...done')
plt.show() # Display the plot window and start the event loop
print('Plotter process: exited plt.show()')
class NBPlot:
"""
Manages the creation of a separate process for plotting and
sends data to it via a pipe.
"""
def __init__(self):
# Create a pipe for communication: plot_pipe for this process, plotter_pipe for the plotter process
self.plot_pipe, plotter_pipe = mp.Pipe()
# Instantiate the plotter class
self.plotter = ProcessPlotter()
# Create a separate process for the plotter
# The plotter process will execute the plotter instance's __call__ method
self.plot_process = mp.Process(
target=self.plotter, args=(plotter_pipe,), daemon=True # daemon=True allows main process to exit even if this one is running
)
self.plot_process.start()
print("Main process: Plotting process started.")
def plot(self, finished=False):
"""
Sends data to the plotter process or a termination signal.
"""
send_to_plotter = self.plot_pipe.send
if finished:
send_to_plotter(None) # Send None to signal termination
print("Main process: Sent termination signal.")
else:
# Generate a random 2D data point
data = np.random.random(2)
send_to_plotter(data) # Send the data point
## Main function to coordinate data generation and plotting
def main_with_multiprocessing():
"""Orchestrates the data generation and plotting using multiprocessing."""
nb_plotter = NBPlot()
print("Main process: Generating data points for 10 seconds...")
for i in range(20): # Generate 20 data points over time
nb_plotter.plot()
time.sleep(0.5) # Wait for half a second between data points
print("Main process: All data points generated. Sending termination signal.")
nb_plotter.plot(finished=True) # Signal the plotter to finish
# Give a moment for the plotter process to receive the signal and close
time.sleep(2)
print("Main process: Exiting.")
if __name__ == '__main__':
# On some systems (like macOS), 'forkserver' or 'spawn' start methods are preferred
# for multiprocessing, especially when using GUI toolkits like Matplotlib.
# 'fork' can sometimes lead to issues.
if plt.get_backend() == "MacOSX":
mp.set_start_method("forkserver", force=True)
print("Set multiprocessing start method to 'forkserver'.")
elif plt.get_backend() in ["Qt5Agg", "TkAgg", "GTK3Agg"]: # Other GUI backends might also benefit
mp.set_start_method("spawn", force=True) # 'spawn' is generally safe
print(f"Set multiprocessing start method to 'spawn' for backend {plt.get_backend()}.")
input('Press Enter to start the integrated data generation and plotting example...')
main_with_multiprocessing()
Output:
When executed, this example will:
Open a Matplotlib window.
A separate process will continuously generate random
(x, y)
data points.These data points will be sent to the plotting process via a pipe.
The plotting process will receive the data and dynamically update the Matplotlib plot by adding new points and connecting them with lines.
After a short period, a termination signal will be sent, causing the plot to finalize and the windows to close.
Conclusion
By employing Python's multiprocessing
module with Matplotlib, developers can unlock significant performance improvements and create more sophisticated, efficient, and responsive data visualization applications. The key benefits include:
Accelerated Visualization: Achieve faster execution by generating and displaying multiple plots in parallel.
Streamlined Export: Improve workflow efficiency by saving numerous figures concurrently.
Enhanced Performance Architectures: Design applications where data processing and visualization tasks are cleanly separated, leading to better modularity and performance.