Today I test a simple python script for Pool and ThreadPool python classes from multiprocessing python module.
The main goal was to test Python’s multiprocessing performance with my computer.
NumPy releases the GIL for many of its operations, which means you can use multiple CPU cores even with threads.
Processing large amounts of data with Pandas can be difficult, and with Polars dataframe library is a potential solution.
Sciagraph gives you both performance profiling and peak memory profiling information.
Let's teste only these class:
The multiprocessing.pool.Pool class provides a process pool in Python.
The multiprocessing.pool.ThreadPool class in Python provides a pool of reusable threads for executing spontaneous tasks.
This is the python script:
from time import time
import multiprocessing as mp
from multiprocessing.pool import ThreadPool
import numpy as np
import pickle
def main():
arr = np.ones((1024, 1024, 1024), dtype=np.uint8)
expected_sum = np.sum(arr)
with ThreadPool(1) as threadpool:
start = time()
assert (
threadpool.apply(np.sum, (arr,)) == expected_sum
)
print("Thread pool:", time() - start)
with mp.get_context("spawn").Pool(1) as process_pool:
start = time()
assert (
process_pool.apply(np.sum, (arr,))
== expected_sum
)
print("Process pool:", time() - start)
if __name__ == "__main__":
main()
This is the result:
python thread_process_pool_001.py
Thread pool: 1.6689703464508057
Process pool: 11.644825458526611