Speeding Up Machine Learning Model Performance with Intel(R) Extension for Scikit Learn*

In this report, we would get familiar with methods that could be utilized in speeding up the performance of our machine learning models.


Prerequisites

What is Scikit-Learn (SKlearn)?

  • Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction via an interface in Python.

  • This library, which is largely written in Python, is built upon NumPy, Scipy, and Matplotlib.

  • Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms.

  • The functionality that Scikit-learn provides includes:

  • Regression, including Linear and Logistic Regression.

  • Classification, including K-Nearest Neighbors.

  • Clustering, including K-Means and K-Means++.

  • Model selection.

  • Preprocessing, including Min-Max Normalization.


Speed Up Performance with Intel Extension for Scikit-Learn

  • Scikit-learn performance is not always optimal. It’s mostly implemented in Python so some ML algorithms can take hours to run, which is expensive.

  • Intel Extension for Scikit-learn can deliver significant performance improvements just by adding a couple of lines of code. It’s also open source.

  • In this report we would get familiar with methods that could be utilized to speed up the performance of our algorithms, hence saving time and money.


Why Should Developers Care?

  • Well, it really boils down to performance:

  • You get 1–3 orders of magnitude of improvement by training your algorithm using scikit-learn-intelex depending on the dataset and the algorithm you use.

  • Intel Extension for Scikit-learn provides optimized implementations of many Scikit-learn algorithms (table below), which are conformant with the original version and show the same results.

  • When you are using algorithms or parameters not supported by the extension, the package just falls back into the original Scikit-learn. This makes the user experience seamless. Your application works as before or faster without any need to rewrite the code.


Gallery of Algorithms

  • Pretty sure by now you may be familiar with the branches of Machine Learning (Unsupervised & Supervised Learning).

  • Unsupervised Learning: when you do not need to know what the prediction is going to tell you, you’re just trying to understand what the data is revealing itself.

  • Supervised Learning: analyzing a known training set, the learning algorithm produces an inferred function to make predictions about the output values.

  • Applying Intel® Extension for Scikit-learn* will impact the following Scikit-learn algorithms:


Setting Up:

Installation and Usage

Intel Extension for Scikit-learn supports Linux, Windows, and Mac systems on x86 architectures. It can be downloaded using either PyPI or Anaconda Cloud (available from main, conda-forge, and intel channels):

Install from PyPI (general installation):

pip install scikit-learn-intelex

Install from Anaconda:

  • Conda-Forge channel (recommended for users by default):

сonda install scikit-learn-intelex -c conda-forge
conda install scikit-learn-intelex -c intel
  • Defaults channel (recommended for users that prefer the main channel):

conda install scikit-learn-intelex

Install from Container:

Note that a DockerHub account is required to properly access the links.

In order to install the latest Intel® Extension for Scikit-Learn as a Docker container, please use the following command:

docker pull intel/intel-optimized-ml:scikit-learn

Getting Started with Intel® Extension for Scikit-Learn*:

Patching

  • Patching is a way of keeping that stock version of Scikit-learn around for use, but when you want that extra performance boost, you can either turn it on at the beginning of your code (patch_sklearn() function call):

################# Insert Patch here ###########
from sklearnex import patch_sklearn
patch_sklearn()
###############################################
  • And any code following that patch, if it’s an import statement, from sklearn, then it will import that Scikit-learn optimized version for you

  • Specify a patch; once you call that patch function, any imports that occur in that cell or any following cells that run when the patch has been invoked, then those imports would import the Intel version:

# Importing sklearn optimized version of LogisticRegression
from sklearn.linear_model import LogisticRegression
  
# Creating an object for the model and fitting it on a training data set
logmodel =  LogisticRegression()
logmodel.fit(X_train_sm, y_train_sm) 
  
# Predicting the Target variable                 
predicted = logmodel.predict(X_test)
  
# Classification Report
report = metrics.classification_report(y_test, predicted)

🕯️ The import order is very important: patch BEFORE you import the targeted Scikit-learn* library!

  • To patch Scikit-learn with Intel® Extension for Scikit-learn* is to replace stock Scikit-learn algorithms with their optimized versions provided by the extension. You can always undo the patch.

Patching Alternatives

There are different ways to patch Scikit-learn to enable the Intel® Extension for Scikit-Learn* Optimisations:

  • Without editing the code of a Scikit-learn application by using the following command line flag:

python -m sklearnex my_application.py
  • Inside script or Jupyter Notebook:

from sklearnex import patch_sklearn
patch_sklearn()
  • Inside script or Jupyter Notebook (Unpatching):

from sklearnex import unpatch_sklearn
unpatch_sklearn()
  • Patching surgically:

# patching a specific function 
from sklearnex import patch_sklearn
patch_sklearn (“SVC”)# patch a list 
patch_sklearn ([“SVC”, “PCA”])
  • Unpatching surgically:

from sklearnex import unpatch_sklearn unpatch_sklearn (“SVC”)

Getting a list of Optimised Functions

from sklearnex import get_patch_names
get_patch_names()
  
>>
['pca','kmeans','dbscan', 'distances','linear','ridge','elasticnet','lasso',
 'logistic','log_reg','knn_classifier','nearest_neighbors',
'knn_regressor', 'random_forest_classifier','random_forest_regressor',
 'train_test_split', 'fin_check','roc_auc_score', 'tsne', 'logisticregression',
 'kneighborsclassifier', 'nearestneighbors','kneighborsregressor',
 'randomrorestclassifier', 'randomforestregressor','svr', 'svc', 'nusvr',
 'nusvc','set_config', 'get_config','config_context']

Global Patching

Use global patching to patch all your Scikit-learn applications without any additional actions.

To patch all supported algorithms, run:

python sklearnex.glob patch_sklearn

Patching and Imports: The Order

  • The order in which you do things is important:

  • If you want to patch to work for you, you have to call patch_sklearn() before you import anything else from Sklearn you want to optimize.

################# Patch ####################################
from sklearnex import patch_sklearn
patch_sklearn()
######################################################################### Import 
from sklearn.model_selection import train_test_split
  
  #Split the data
X_train, X_test, y_train, y_test = train_test_split(... X, y, test_size=0.3)

🕯️ These patching methods are interchangeable. They support different enabling scenarios while producing the same result.


Running Patch vs Unpatch on CreditCardFraudDetection Data

  • So, I’ve been using the Scikit-learn Extension on my models.

  • To see how much time and money you can save by using Intel® Extension for Scikit-learn, we compare the patched Scikit-learn with the original package (unpatch) for ML training and inference.

Unpatch

Here we can see the result of running the stock Logistic Regression algorithm; this was run using the unpatch_sklearn() function call, this is the native, off-the-shelf version of Sklearn; completed in 35.5 seconds.

################# Insert Patch here ############################
from sklearnex import unpatch_sklearn
unpatch_sklearn()
##########################################################
# Import
from sklearn.linear_model import LogisticRegression

# Time taken in making prediction
start_time = time.time()

# Creating an object for model and fitting it on training data set
logmodel =  LogisticRegression()
logmodel.fit(X_train_sm, y_train_sm)   

# Predicting the Target variable                 
predicted = logmodel.predict(X_test)
patched_time = time.time() - start_time
print("Time to calculate \\033[1m logmodel.predict in Unpatched scikit-learn {:4.1f}\\033[0m seconds".format(patched_time))

# Classification Report
report = metrics.classification_report(y_test, predicted)
print(f"Classification report for Logistic Regression with SMOTE:\\n{report}\\n")
>>

Fig 1: Vanilla (unpatched) Logistic Regression.

Patch

Here we can see the result of running the Intel -optimized version of the Logistic Regression algorithm; this was run using the patch_sklearn() function call (Intel optimized) version of Sklearn; execution time of 7.1 seconds!

Fig 2: Intel Sciki-learn Extension in action.


Summary

Intel Extension for Scikit-learn:

  • Optimizes the performance of common ML algorithms

  • Saves money by reducing ML training and inference time

  • Offers a seamless experience (just add two lines of code to enable acceleration)

The project is growing fast, so there are improvements in every release. Follow on Medium and GitHub to keep up with the latest updates.

Learn more about other machine learning and AI-optimized end-to-end workloads at intel.com/oneAPI-AIKit.

Thank you for taking your time in following this report.


Previous
Previous

Recurrent Neural Networks Report

Next
Next

Apache Spark Tutorial - Mac Terminal.