from sklearn.datasets import make_blobs
= make_blobs(n_samples = 10_000_000,
X, y = 15,
centers = 2,
n_features = 42) random_state
Yes! and no GPU involved. Intel has made an extension that can accelerate your scikit-learn code without any change to scikit-learn’s api. This means that you can apply it to an already existing code without changing anything and still get an enhanced performance.
How does this work?
Basically Intel has wrote its own CPU optimized algorithms but kept full conformance with all Scikit-Learn APIs and algorithms. What the introduced code does is replace Scikit-Learn’s algorithms with Intel’s new optimized versions of these algorithms. This is also known as patching.
So, does it work?. Let’s see how KMeans is doing, starting with creating our own synthetic dataset of 10 million samples using Scikit-Learn’s make_blobs
.
Now all we need to do in order to use the optimized algorithm is to import the patch and use it.
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.cluster import KMeans
from timeit import default_timer as timer
= timer()
start = KMeans(n_clusters=15, random_state=12).fit(X)
kmeans = timer() - start
time_diff f"The elapsed time for the optimized model is: {time_diff:.2f} s"
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
'The elapsed time for the optimized model is: 24.25 s'
But what about the original algorithm?
To get back to using our original Scikit-Learn algorithms, all we need to do is just opt out of using the patch by simply unpatching. Again, make sure to reimport Scikit-Learn after you unpatch.
Let’s now see how much time will it take the original algorithm.
from sklearnex import unpatch_sklearn
unpatch_sklearn()
from sklearn.cluster import KMeans
= timer()
start = KMeans(n_clusters=15, random_state=12).fit(X)
kmeans = timer() - start
time_diff f"The elapsed time for Scikit-Learn's original model is: {time_diff:.2f} s"
"The elapsed time for Scikit-Learn's original model is: 254.50 s"
The optimized algorithm is almost 10 times faster than the original one. For more informative comparison, check Benchmarking Intel Extension for Scikit-learn article by Intel.
This is great but…
Unfortunately, not all algorithms in Scikit-Learn are affected by the patch. For a list of supported algorithms check: Supported Algorithms
Citation
@misc{hamdy2022,
author = {Mohammed Hamdy},
title = {Faster {Scikit-Learn} Using Only Two Lines of Code},
date = {2022-12-19},
url = {https://hypothesis-space.netlify.app/posts/faster_sklearn},
langid = {en}
}