{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Support Vector Machines Workshop.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Support Vector Machines \n", "\n", "(This notebook was created for a workshop at RV College of Engineering on 6th Sep 2021)\n", "\n", "- Support vector machine(SVM) belongs to the supervised learning class, and can be used for both regression and classification purposes. \n", "- The support vector machine's main intent is to create a line, or a hyperplane(decision boundary), that can separate data points in an n-dimensional space to classify further data points into one of the classes determined.\n", "- The hyperplanes are created due to the SVM selecting the closest points. These close points are known as support vectors, and that is where the name of support vector machines originates from. The whole SVM algorithm can be visualized as - \n", "\n", "" ], "metadata": { "id": "QpROp4WgrIoq" } }, { "cell_type": "markdown", "source": [ "## Linear SVM" ], "metadata": { "id": "KpfWJwhasN3g" } }, { "cell_type": "markdown", "source": [ "- Consider this separation of classes. Fairly straightforward to separate using a 2D line" ], "metadata": { "id": "5xdbwFLLsdTj" } }, { "cell_type": "markdown", "source": [ "" ], "metadata": { "id": "VX6z1N1Xsbhc" } }, { "cell_type": "markdown", "source": [ "- One possible set of lines - \n", "\n", "" ], "metadata": { "id": "LKrMifwOsrKL" } }, { "cell_type": "markdown", "source": [ "- The algorithm finds the points closest to the hyperplane.\n", "- The distance between the points and the hyperplane is known as the margin\n", "- The goal is to maximise this margin.\n", "- Therefore, the green line where there's the highest margin is the hyperplane - for classification.\n", "\n", "" ], "metadata": { "id": "nK_qMCkXs96W" } }, { "cell_type": "code", "execution_count": null, "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn.svm import *\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import confusion_matrix # this creates a confusion matrix\n", "from sklearn.metrics import plot_confusion_matrix # draws a confusion matrix\n", "from sklearn import datasets" ], "outputs": [], "metadata": { "id": "tzqUEOp5qcWq" } }, { "cell_type": "code", "execution_count": null, "source": [ "dir(datasets)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['__all__',\n", " '__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__',\n", " '_base',\n", " '_california_housing',\n", " '_covtype',\n", " '_kddcup99',\n", " '_lfw',\n", " '_olivetti_faces',\n", " '_openml',\n", " '_rcv1',\n", " '_samples_generator',\n", " '_species_distributions',\n", " '_svmlight_format_fast',\n", " '_svmlight_format_io',\n", " '_twenty_newsgroups',\n", " 'clear_data_home',\n", " 'dump_svmlight_file',\n", " 'fetch_20newsgroups',\n", " 'fetch_20newsgroups_vectorized',\n", " 'fetch_california_housing',\n", " 'fetch_covtype',\n", " 'fetch_kddcup99',\n", " 'fetch_lfw_pairs',\n", " 'fetch_lfw_people',\n", " 'fetch_olivetti_faces',\n", " 'fetch_openml',\n", " 'fetch_rcv1',\n", " 'fetch_species_distributions',\n", " 'get_data_home',\n", " 'load_boston',\n", " 'load_breast_cancer',\n", " 'load_diabetes',\n", " 'load_digits',\n", " 'load_files',\n", " 'load_iris',\n", " 'load_linnerud',\n", " 'load_sample_image',\n", " 'load_sample_images',\n", " 'load_svmlight_file',\n", " 'load_svmlight_files',\n", " 'load_wine',\n", " 'make_biclusters',\n", " 'make_blobs',\n", " 'make_checkerboard',\n", " 'make_circles',\n", " 'make_classification',\n", " 'make_friedman1',\n", " 'make_friedman2',\n", " 'make_friedman3',\n", " 'make_gaussian_quantiles',\n", " 'make_hastie_10_2',\n", " 'make_low_rank_matrix',\n", " 'make_moons',\n", " 'make_multilabel_classification',\n", " 'make_regression',\n", " 'make_s_curve',\n", " 'make_sparse_coded_signal',\n", " 'make_sparse_spd_matrix',\n", " 'make_sparse_uncorrelated',\n", " 'make_spd_matrix',\n", " 'make_swiss_roll']" ] }, "metadata": {}, "execution_count": 9 } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gzvfzI5SgXcl", "outputId": "0abf2eab-a4e6-4f31-c6e9-6fd613ba3297" } }, { "cell_type": "code", "execution_count": null, "source": [ "# load iris dataset\n", "iris = datasets.load_iris()\n", "iris_df=pd.DataFrame(iris.data)\n", "iris_df['class']=iris.target\n", "\n", "iris_df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']\n", "iris_df" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", " | sepal_len | \n", "sepal_wid | \n", "petal_len | \n", "petal_wid | \n", "class | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
1 | \n", "4.9 | \n", "3.0 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
2 | \n", "4.7 | \n", "3.2 | \n", "1.3 | \n", "0.2 | \n", "0 | \n", "
3 | \n", "4.6 | \n", "3.1 | \n", "1.5 | \n", "0.2 | \n", "0 | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
145 | \n", "6.7 | \n", "3.0 | \n", "5.2 | \n", "2.3 | \n", "2 | \n", "
146 | \n", "6.3 | \n", "2.5 | \n", "5.0 | \n", "1.9 | \n", "2 | \n", "
147 | \n", "6.5 | \n", "3.0 | \n", "5.2 | \n", "2.0 | \n", "2 | \n", "
148 | \n", "6.2 | \n", "3.4 | \n", "5.4 | \n", "2.3 | \n", "2 | \n", "
149 | \n", "5.9 | \n", "3.0 | \n", "5.1 | \n", "1.8 | \n", "2 | \n", "
150 rows × 5 columns
\n", "