Hypothesis Space - Faster Scikit-Learn using only two lines of code

Yes! and no GPU involved. Intel has made an extension that can accelerate your scikit-learn code without any change to scikit-learn’s api. This means that you can apply it to an already existing code without changing anything and still get an enhanced performance.

How does this work?

Basically Intel has wrote its own CPU optimized algorithms but kept full conformance with all Scikit-Learn APIs and algorithms. What the introduced code does is replace Scikit-Learn’s algorithms with Intel’s new optimized versions of these algorithms. This is also known as patching.

So, does it work?. Let’s see how KMeans is doing, starting with creating our own synthetic dataset of 10 million samples using Scikit-Learn’s make_blobs.

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples = 10_000_000, 
                  centers = 15, 
                  n_features = 2, 
                  random_state = 42)

Now all we need to do in order to use the optimized algorithm is to import the patch and use it.

Note

Make sure to import scikit-learn after importing and calling the patch. Otherwise, the patching will not affect the original scikit-learn estimators.

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.cluster import KMeans
from timeit import default_timer as timer

start = timer()
kmeans = KMeans(n_clusters=15, random_state=12).fit(X)
time_diff = timer() - start
f"The elapsed time for the optimized model is: {time_diff:.2f} s"

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)

'The elapsed time for the optimized model is: 24.25 s'

But what about the original algorithm?

To get back to using our original Scikit-Learn algorithms, all we need to do is just opt out of using the patch by simply unpatching. Again, make sure to reimport Scikit-Learn after you unpatch.

Let’s now see how much time will it take the original algorithm.

from sklearnex import unpatch_sklearn
unpatch_sklearn()

from sklearn.cluster import KMeans

start = timer()
kmeans = KMeans(n_clusters=15, random_state=12).fit(X)
time_diff = timer() - start
f"The elapsed time for Scikit-Learn's original model is: {time_diff:.2f} s"

"The elapsed time for Scikit-Learn's original model is: 254.50 s"

The optimized algorithm is almost 10 times faster than the original one. For more informative comparison, check Benchmarking Intel Extension for Scikit-learn article by Intel.

This is great but…

Unfortunately, not all algorithms in Scikit-Learn are affected by the patch. For a list of supported algorithms check: Supported Algorithms

Citation

BibTeX citation:

@misc{hamdy2022,
  author = {Mohammed Hamdy},
  title = {Faster {Scikit-Learn} Using Only Two Lines of Code},
  date = {2022-12-19},
  url = {https://hypothesis-space.netlify.app/posts/faster_sklearn},
  langid = {en}
}

For attribution, please cite this work as:

Mohammed Hamdy. 2022. “Faster Scikit-Learn Using Only Two Lines of Code.” Hypothesis Space. https://hypothesis-space.netlify.app/posts/faster_sklearn.