Speeding Up Machine Learning Model Performance with Intel(R) Extension for Scikit Learn*
In this report, we would get familiar with methods that could be utilized in speeding up the performance of our machine learning models.
Prerequisites
What is Scikit-Learn (SKlearn)?
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction via an interface in Python.
This library, which is largely written in Python, is built upon NumPy, Scipy, and Matplotlib.
Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms.
The functionality that Scikit-learn provides includes:
Regression, including Linear and Logistic Regression.
Classification, including K-Nearest Neighbors.
Clustering, including K-Means and K-Means++.
Model selection.
Preprocessing, including Min-Max Normalization.
Speed Up Performance with Intel Extension for Scikit-Learn
Scikit-learn performance is not always optimal. It’s mostly implemented in Python so some ML algorithms can take hours to run, which is expensive.
Intel Extension for Scikit-learn can deliver significant performance improvements just by adding a couple of lines of code. It’s also open source.
In this report we would get familiar with methods that could be utilized to speed up the performance of our algorithms, hence saving time and money.
Why Should Developers Care?
Well, it really boils down to performance:
You get 1–3 orders of magnitude of improvement by training your algorithm using
scikit-learn-intelex
depending on the dataset and the algorithm you use.Intel Extension for Scikit-learn provides optimized implementations of many Scikit-learn algorithms (table below), which are conformant with the original version and show the same results.
When you are using algorithms or parameters not supported by the extension, the package just falls back into the original Scikit-learn. This makes the user experience seamless. Your application works as before or faster without any need to rewrite the code.
Gallery of Algorithms
Pretty sure by now you may be familiar with the branches of Machine Learning (Unsupervised & Supervised Learning).
Unsupervised Learning: when you do not need to know what the prediction is going to tell you, you’re just trying to understand what the data is revealing itself.
Supervised Learning: analyzing a known training set, the learning algorithm produces an inferred function to make predictions about the output values.
Applying Intel® Extension for Scikit-learn* will impact the following Scikit-learn algorithms:
Setting Up:
Installation and Usage
Intel Extension for Scikit-learn supports Linux, Windows, and Mac systems on x86 architectures. It can be downloaded using either PyPI or Anaconda Cloud (available from main, conda-forge, and intel channels):
Install from PyPI (general installation):
pip install scikit-learn-intelex
Install from Anaconda:
Conda-Forge channel (recommended for users by default):
сonda install scikit-learn-intelex -c conda-forge
Intel channel (recommended for Intel® Distribution for Python users):
conda install scikit-learn-intelex -c intel
Defaults channel (recommended for users that prefer the main channel):
conda install scikit-learn-intelex
Install from Container:
Note that a DockerHub account is required to properly access the links.
In order to install the latest Intel® Extension for Scikit-Learn as a Docker container, please use the following command:
docker pull intel/intel-optimized-ml:scikit-learn
Getting Started with Intel® Extension for Scikit-Learn*:
Patching
Patching is a way of keeping that stock version of Scikit-learn around for use, but when you want that extra performance boost, you can either turn it on at the beginning of your code (
patch_sklearn()
function call):
################# Insert Patch here ########### from sklearnex import patch_sklearn patch_sklearn() ###############################################
And any code following that patch, if it’s an import statement, from
sklearn
, then it will import that Scikit-learn optimized version for youSpecify a patch; once you call that patch function, any imports that occur in that cell or any following cells that run when the patch has been invoked, then those imports would import the Intel version:
# Importing sklearn optimized version of LogisticRegression from sklearn.linear_model import LogisticRegression # Creating an object for the model and fitting it on a training data set logmodel = LogisticRegression() logmodel.fit(X_train_sm, y_train_sm) # Predicting the Target variable predicted = logmodel.predict(X_test) # Classification Report report = metrics.classification_report(y_test, predicted)
🕯️ The import order is very important: patch BEFORE you import the targeted Scikit-learn* library!
To patch Scikit-learn with Intel® Extension for Scikit-learn* is to replace stock Scikit-learn algorithms with their optimized versions provided by the extension. You can always undo the patch.
Patching Alternatives
There are different ways to patch Scikit-learn to enable the Intel® Extension for Scikit-Learn* Optimisations:
Without editing the code of a Scikit-learn application by using the following command line flag:
python -m sklearnex my_application.py
Inside script or Jupyter Notebook:
from sklearnex import patch_sklearn patch_sklearn()
Inside script or Jupyter Notebook (Unpatching):
from sklearnex import unpatch_sklearn unpatch_sklearn()
Patching surgically:
# patching a specific function from sklearnex import patch_sklearn patch_sklearn (“SVC”)# patch a list patch_sklearn ([“SVC”, “PCA”])
Unpatching surgically:
from sklearnex import unpatch_sklearn unpatch_sklearn (“SVC”)
Getting a list of Optimised Functions
from sklearnex import get_patch_names get_patch_names() >> ['pca','kmeans','dbscan', 'distances','linear','ridge','elasticnet','lasso', 'logistic','log_reg','knn_classifier','nearest_neighbors', 'knn_regressor', 'random_forest_classifier','random_forest_regressor', 'train_test_split', 'fin_check','roc_auc_score', 'tsne', 'logisticregression', 'kneighborsclassifier', 'nearestneighbors','kneighborsregressor', 'randomrorestclassifier', 'randomforestregressor','svr', 'svc', 'nusvr', 'nusvc','set_config', 'get_config','config_context']
Global Patching
Use global patching to patch all your Scikit-learn applications without any additional actions.
To patch all supported algorithms, run:
python sklearnex.glob patch_sklearn
Patching and Imports: The Order
The order in which you do things is important:
If you want to patch to work for you, you have to call
patch_sklearn()
before youimport
anything else from Sklearn you want to optimize.
################# Patch #################################### from sklearnex import patch_sklearn patch_sklearn() ######################################################################### Import from sklearn.model_selection import train_test_split #Split the data X_train, X_test, y_train, y_test = train_test_split(... X, y, test_size=0.3)
🕯️ These patching methods are interchangeable. They support different enabling scenarios while producing the same result.
Running Patch vs Unpatch on CreditCardFraudDetection Data
So, I’ve been using the Scikit-learn Extension on my models.
To see how much time and money you can save by using Intel® Extension for Scikit-learn, we compare the patched Scikit-learn with the original package (unpatch) for ML training and inference.
Unpatch
Here we can see the result of running the stock Logistic Regression algorithm; this was run using the unpatch_sklearn()
function call, this is the native, off-the-shelf version of Sklearn; completed in 35.5 seconds.
################# Insert Patch here ############################ from sklearnex import unpatch_sklearn unpatch_sklearn() ########################################################## # Import from sklearn.linear_model import LogisticRegression # Time taken in making prediction start_time = time.time() # Creating an object for model and fitting it on training data set logmodel = LogisticRegression() logmodel.fit(X_train_sm, y_train_sm) # Predicting the Target variable predicted = logmodel.predict(X_test) patched_time = time.time() - start_time print("Time to calculate \\033[1m logmodel.predict in Unpatched scikit-learn {:4.1f}\\033[0m seconds".format(patched_time)) # Classification Report report = metrics.classification_report(y_test, predicted) print(f"Classification report for Logistic Regression with SMOTE:\\n{report}\\n") >>
Fig 1: Vanilla (unpatched) Logistic Regression.
Patch
Here we can see the result of running the Intel -optimized version of the Logistic Regression algorithm; this was run using the patch_sklearn()
function call (Intel optimized) version of Sklearn; execution time of 7.1 seconds!
Fig 2: Intel Sciki-learn Extension in action.
Summary
Intel Extension for Scikit-learn:
Optimizes the performance of common ML algorithms
Saves money by reducing ML training and inference time
Offers a seamless experience (just add two lines of code to enable acceleration)
The project is growing fast, so there are improvements in every release. Follow on Medium and GitHub to keep up with the latest updates.
Learn more about other machine learning and AI-optimized end-to-end workloads at intel.com/oneAPI-AIKit.
Thank you for taking your time in following this report.