You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

783 lines
111 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Principal Component Analysis\n",
"\n",
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np \n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"Breast cancer wisconsin (diagnostic) dataset\n",
"--------------------------------------------\n",
"\n",
"**Data Set Characteristics:**\n",
"\n",
" :Number of Instances: 569\n",
"\n",
" :Number of Attributes: 30 numeric, predictive attributes and the class\n",
"\n",
" :Attribute Information:\n",
" - radius (mean of distances from center to points on the perimeter)\n",
" - texture (standard deviation of gray-scale values)\n",
" - perimeter\n",
" - area\n",
" - smoothness (local variation in radius lengths)\n",
" - compactness (perimeter^2 / area - 1.0)\n",
" - concavity (severity of concave portions of the contour)\n",
" - concave points (number of concave portions of the contour)\n",
" - symmetry\n",
" - fractal dimension (\"coastline approximation\" - 1)\n",
"\n",
" The mean, standard error, and \"worst\" or largest (mean of the three\n",
" worst/largest values) of these features were computed for each image,\n",
" resulting in 30 features. For instance, field 0 is Mean Radius, field\n",
" 10 is Radius SE, field 20 is Worst Radius.\n",
"\n",
" - class:\n",
" - WDBC-Malignant\n",
" - WDBC-Benign\n",
"\n",
" :Summary Statistics:\n",
"\n",
" ===================================== ====== ======\n",
" Min Max\n",
" ===================================== ====== ======\n",
" radius (mean): 6.981 28.11\n",
" texture (mean): 9.71 39.28\n",
" perimeter (mean): 43.79 188.5\n",
" area (mean): 143.5 2501.0\n",
" smoothness (mean): 0.053 0.163\n",
" compactness (mean): 0.019 0.345\n",
" concavity (mean): 0.0 0.427\n",
" concave points (mean): 0.0 0.201\n",
" symmetry (mean): 0.106 0.304\n",
" fractal dimension (mean): 0.05 0.097\n",
" radius (standard error): 0.112 2.873\n",
" texture (standard error): 0.36 4.885\n",
" perimeter (standard error): 0.757 21.98\n",
" area (standard error): 6.802 542.2\n",
" smoothness (standard error): 0.002 0.031\n",
" compactness (standard error): 0.002 0.135\n",
" concavity (standard error): 0.0 0.396\n",
" concave points (standard error): 0.0 0.053\n",
" symmetry (standard error): 0.008 0.079\n",
" fractal dimension (standard error): 0.001 0.03\n",
" radius (worst): 7.93 36.04\n",
" texture (worst): 12.02 49.54\n",
" perimeter (worst): 50.41 251.2\n",
" area (worst): 185.2 4254.0\n",
" smoothness (worst): 0.071 0.223\n",
" compactness (worst): 0.027 1.058\n",
" concavity (worst): 0.0 1.252\n",
" concave points (worst): 0.0 0.291\n",
" symmetry (worst): 0.156 0.664\n",
" fractal dimension (worst): 0.055 0.208\n",
" ===================================== ====== ======\n",
"\n",
" :Missing Attribute Values: None\n",
"\n",
" :Class Distribution: 212 - Malignant, 357 - Benign\n",
"\n",
" :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n",
"\n",
" :Donor: Nick Street\n",
"\n",
" :Date: November, 1995\n",
"\n",
"This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\n",
"https://goo.gl/U2Uwz2\n",
"\n",
"Features are computed from a digitized image of a fine needle\n",
"aspirate (FNA) of a breast mass. They describe\n",
"characteristics of the cell nuclei present in the image.\n",
"\n",
"Separating plane described above was obtained using\n",
"Multisurface Method-Tree (MSM-T) [K. P. Bennett, \"Decision Tree\n",
"Construction Via Linear Programming.\" Proceedings of the 4th\n",
"Midwest Artificial Intelligence and Cognitive Science Society,\n",
"pp. 97-101, 1992], a classification method which uses linear\n",
"programming to construct a decision tree. Relevant features\n",
"were selected using an exhaustive search in the space of 1-4\n",
"features and 1-3 separating planes.\n",
"\n",
"The actual linear program used to obtain the separating plane\n",
"in the 3-dimensional space is that described in:\n",
"[K. P. Bennett and O. L. Mangasarian: \"Robust Linear\n",
"Programming Discrimination of Two Linearly Inseparable Sets\",\n",
"Optimization Methods and Software 1, 1992, 23-34].\n",
"\n",
"This database is also available through the UW CS ftp server:\n",
"\n",
"ftp ftp.cs.wisc.edu\n",
"cd math-prog/cpo-dataset/machine-learn/WDBC/\n",
"\n",
".. topic:: References\n",
"\n",
" - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n",
" for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n",
" Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n",
" San Jose, CA, 1993.\n",
" - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n",
" prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n",
" July-August 1995.\n",
" - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n",
" to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n",
" 163-171."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('../DATA/cancer_tumor_data_features.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mean radius</th>\n",
" <th>mean texture</th>\n",
" <th>mean perimeter</th>\n",
" <th>mean area</th>\n",
" <th>mean smoothness</th>\n",
" <th>mean compactness</th>\n",
" <th>mean concavity</th>\n",
" <th>mean concave points</th>\n",
" <th>mean symmetry</th>\n",
" <th>mean fractal dimension</th>\n",
" <th>...</th>\n",
" <th>worst radius</th>\n",
" <th>worst texture</th>\n",
" <th>worst perimeter</th>\n",
" <th>worst area</th>\n",
" <th>worst smoothness</th>\n",
" <th>worst compactness</th>\n",
" <th>worst concavity</th>\n",
" <th>worst concave points</th>\n",
" <th>worst symmetry</th>\n",
" <th>worst fractal dimension</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>17.99</td>\n",
" <td>10.38</td>\n",
" <td>122.80</td>\n",
" <td>1001.0</td>\n",
" <td>0.11840</td>\n",
" <td>0.27760</td>\n",
" <td>0.3001</td>\n",
" <td>0.14710</td>\n",
" <td>0.2419</td>\n",
" <td>0.07871</td>\n",
" <td>...</td>\n",
" <td>25.38</td>\n",
" <td>17.33</td>\n",
" <td>184.60</td>\n",
" <td>2019.0</td>\n",
" <td>0.1622</td>\n",
" <td>0.6656</td>\n",
" <td>0.7119</td>\n",
" <td>0.2654</td>\n",
" <td>0.4601</td>\n",
" <td>0.11890</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>20.57</td>\n",
" <td>17.77</td>\n",
" <td>132.90</td>\n",
" <td>1326.0</td>\n",
" <td>0.08474</td>\n",
" <td>0.07864</td>\n",
" <td>0.0869</td>\n",
" <td>0.07017</td>\n",
" <td>0.1812</td>\n",
" <td>0.05667</td>\n",
" <td>...</td>\n",
" <td>24.99</td>\n",
" <td>23.41</td>\n",
" <td>158.80</td>\n",
" <td>1956.0</td>\n",
" <td>0.1238</td>\n",
" <td>0.1866</td>\n",
" <td>0.2416</td>\n",
" <td>0.1860</td>\n",
" <td>0.2750</td>\n",
" <td>0.08902</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>19.69</td>\n",
" <td>21.25</td>\n",
" <td>130.00</td>\n",
" <td>1203.0</td>\n",
" <td>0.10960</td>\n",
" <td>0.15990</td>\n",
" <td>0.1974</td>\n",
" <td>0.12790</td>\n",
" <td>0.2069</td>\n",
" <td>0.05999</td>\n",
" <td>...</td>\n",
" <td>23.57</td>\n",
" <td>25.53</td>\n",
" <td>152.50</td>\n",
" <td>1709.0</td>\n",
" <td>0.1444</td>\n",
" <td>0.4245</td>\n",
" <td>0.4504</td>\n",
" <td>0.2430</td>\n",
" <td>0.3613</td>\n",
" <td>0.08758</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>11.42</td>\n",
" <td>20.38</td>\n",
" <td>77.58</td>\n",
" <td>386.1</td>\n",
" <td>0.14250</td>\n",
" <td>0.28390</td>\n",
" <td>0.2414</td>\n",
" <td>0.10520</td>\n",
" <td>0.2597</td>\n",
" <td>0.09744</td>\n",
" <td>...</td>\n",
" <td>14.91</td>\n",
" <td>26.50</td>\n",
" <td>98.87</td>\n",
" <td>567.7</td>\n",
" <td>0.2098</td>\n",
" <td>0.8663</td>\n",
" <td>0.6869</td>\n",
" <td>0.2575</td>\n",
" <td>0.6638</td>\n",
" <td>0.17300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>20.29</td>\n",
" <td>14.34</td>\n",
" <td>135.10</td>\n",
" <td>1297.0</td>\n",
" <td>0.10030</td>\n",
" <td>0.13280</td>\n",
" <td>0.1980</td>\n",
" <td>0.10430</td>\n",
" <td>0.1809</td>\n",
" <td>0.05883</td>\n",
" <td>...</td>\n",
" <td>22.54</td>\n",
" <td>16.67</td>\n",
" <td>152.20</td>\n",
" <td>1575.0</td>\n",
" <td>0.1374</td>\n",
" <td>0.2050</td>\n",
" <td>0.4000</td>\n",
" <td>0.1625</td>\n",
" <td>0.2364</td>\n",
" <td>0.07678</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 30 columns</p>\n",
"</div>"
],
"text/plain": [
" mean radius mean texture mean perimeter mean area mean smoothness \\\n",
"0 17.99 10.38 122.80 1001.0 0.11840 \n",
"1 20.57 17.77 132.90 1326.0 0.08474 \n",
"2 19.69 21.25 130.00 1203.0 0.10960 \n",
"3 11.42 20.38 77.58 386.1 0.14250 \n",
"4 20.29 14.34 135.10 1297.0 0.10030 \n",
"\n",
" mean compactness mean concavity mean concave points mean symmetry \\\n",
"0 0.27760 0.3001 0.14710 0.2419 \n",
"1 0.07864 0.0869 0.07017 0.1812 \n",
"2 0.15990 0.1974 0.12790 0.2069 \n",
"3 0.28390 0.2414 0.10520 0.2597 \n",
"4 0.13280 0.1980 0.10430 0.1809 \n",
"\n",
" mean fractal dimension ... worst radius worst texture worst perimeter \\\n",
"0 0.07871 ... 25.38 17.33 184.60 \n",
"1 0.05667 ... 24.99 23.41 158.80 \n",
"2 0.05999 ... 23.57 25.53 152.50 \n",
"3 0.09744 ... 14.91 26.50 98.87 \n",
"4 0.05883 ... 22.54 16.67 152.20 \n",
"\n",
" worst area worst smoothness worst compactness worst concavity \\\n",
"0 2019.0 0.1622 0.6656 0.7119 \n",
"1 1956.0 0.1238 0.1866 0.2416 \n",
"2 1709.0 0.1444 0.4245 0.4504 \n",
"3 567.7 0.2098 0.8663 0.6869 \n",
"4 1575.0 0.1374 0.2050 0.4000 \n",
"\n",
" worst concave points worst symmetry worst fractal dimension \n",
"0 0.2654 0.4601 0.11890 \n",
"1 0.1860 0.2750 0.08902 \n",
"2 0.2430 0.3613 0.08758 \n",
"3 0.2575 0.6638 0.17300 \n",
"4 0.1625 0.2364 0.07678 \n",
"\n",
"[5 rows x 30 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manual Construction of PCA\n",
"\n",
"\n",
"### Scaling Data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"scaler = StandardScaler()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"scaled_X = scaler.fit_transform(df)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.09706398, -2.07333501, 1.26993369, ..., 2.29607613,\n",
" 2.75062224, 1.93701461],\n",
" [ 1.82982061, -0.35363241, 1.68595471, ..., 1.0870843 ,\n",
" -0.24388967, 0.28118999],\n",
" [ 1.57988811, 0.45618695, 1.56650313, ..., 1.95500035,\n",
" 1.152255 , 0.20139121],\n",
" ...,\n",
" [ 0.70228425, 2.0455738 , 0.67267578, ..., 0.41406869,\n",
" -1.10454895, -0.31840916],\n",
" [ 1.83834103, 2.33645719, 1.98252415, ..., 2.28998549,\n",
" 1.91908301, 2.21963528],\n",
" [-1.80840125, 1.22179204, -1.81438851, ..., -1.74506282,\n",
" -0.04813821, -0.75120669]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_X"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Calculate Covariance Matrix"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Because we scaled the data, this won't produce any change.\n",
"# We've left if here because you would need to do this for unscaled data\n",
"scaled_X -= scaled_X.mean(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.09706398, -2.07333501, 1.26993369, ..., 2.29607613,\n",
" 2.75062224, 1.93701461],\n",
" [ 1.82982061, -0.35363241, 1.68595471, ..., 1.0870843 ,\n",
" -0.24388967, 0.28118999],\n",
" [ 1.57988811, 0.45618695, 1.56650313, ..., 1.95500035,\n",
" 1.152255 , 0.20139121],\n",
" ...,\n",
" [ 0.70228425, 2.0455738 , 0.67267578, ..., 0.41406869,\n",
" -1.10454895, -0.31840916],\n",
" [ 1.83834103, 2.33645719, 1.98252415, ..., 2.28998549,\n",
" 1.91908301, 2.21963528],\n",
" [-1.80840125, 1.22179204, -1.81438851, ..., -1.74506282,\n",
" -0.04813821, -0.75120669]])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_X"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# Grab Covariance Matrix\n",
"covariance_matrix = np.cov(scaled_X, rowvar=False)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# Get Eigen Vectors and Eigen Values\n",
"eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# Choose som number of components\n",
"num_components=2"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# Get index sorting key based on Eigen Values\n",
"sorted_key = np.argsort(eigen_values)[::-1][:num_components]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# Get num_components of Eigen Values and Eigen Vectors\n",
"eigen_values, eigen_vectors = eigen_values[sorted_key], eigen_vectors[:, sorted_key]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# Dot product of original data and eigen_vectors are the principal component values\n",
"# This is the \"projection\" step of the original points on to the Principal Component\n",
"principal_components=np.dot(scaled_X,eigen_vectors)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 9.19283683, 1.94858307],\n",
" [ 2.3878018 , -3.76817174],\n",
" [ 5.73389628, -1.0751738 ],\n",
" ...,\n",
" [ 1.25617928, -1.90229671],\n",
" [10.37479406, 1.67201011],\n",
" [-5.4752433 , -0.67063679]])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"principal_components"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Second Principal Component')"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfoAAAFzCAYAAADWqstZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABK90lEQVR4nO3df5hdZXkv/O89kw3sBGSCpGg2hETKFV5oJCNTpCe2GqpEscAYUKR6Ktr3pO2rpwVtTkP1lUDpSWpEtLWnFq2iRTEgMAZDCWhQKYo6aRJDlFTl9w6VCAwCGcjO5D5/7LUma9asZ61nrb1+7bW/n+uaa2bW/rGevWfPup8f9/M8oqogIiKiauorugBERESUHQZ6IiKiCmOgJyIiqjAGeiIiogpjoCciIqowBnoiIqIKm1F0AbJw9NFH6/z584suBhERUS62bNnyK1WdE3RbJQP9/PnzMTo6WnQxiIiIciEij5huY9c9ERFRhTHQExERVRgDPRERUYUx0BMREVUYAz0REVGFMdATERFVGAM9ERFRhTHQExERVRgDPRERUYVVcmU8IuotI1ubWLdpF3aPjWPuQB0rly3E8GCj6GIRlULmgV5EPg/gDwA8qaq/5RxbB+AcAPsA/ALAe1V1LOCxDwN4DsAEgP2qOpR1eYmou4xsbeKyW3ZgvDUBAGiOjeOyW3YAAIM9EfLpur8OwJt9x+4C8Fuq+moA/wngspDHL1XVxQzyRBRk3aZdk0HeNd6awLpNuwoqEVG5ZB7oVfW7AJ72HbtTVfc7v94H4Nisy0FE1bR7bDzWcaJeU4ZkvPcB+DfDbQrgThHZIiIrciwTEXWJuQP1WMeJek2hgV5EPgxgP4AvG+7yOlV9DYC3AHi/iPxeyHOtEJFRERnds2dPBqUlojJauWwh6rX+KcfqtX6sXLawoBIRlUthgV5ELkY7Se9dqqpB91HVpvP9SQC3Ajjd9Hyqeq2qDqnq0Jw5czIoMRGV0fBgA2uWL0JjoA4B0BioY83yRUzEI3IUMr1ORN4M4H8BeL2q7jXcZxaAPlV9zvn5LABX5lhMIuoSw4MNBnYig8xb9CJyA4DvA1goIo+LyB8D+DSAIwDcJSLbROQzzn3nisjtzkOPAfDvIrIdwA8BbFTVO7IuLxERUZVk3qJX1YsCDv+L4b67AZzt/PwggFMzLBoREVHllSHrnoiIiDLCQE9ERFRhDPREREQVxkBPRERUYQz0REREFcZAT0REVGEM9ERERBXGQE9ERFRhDPREREQVxkBPRERUYQz0REREFcZAT0REVGEM9ERERBXGQE9ERFRhDPREREQVxkBPRERUYQz0REREFcZAT0REVGEM9ERERBXGQE9ERFRhDPREREQVxkBPRERUYQz0REREFcZAT0REVGEzii4AEVGnRrY2sW7TLuweG8fcgTpWLluI4cFG0cUiKgUGeiLqaiNbm7jslh0Yb00AAJpj47jslh0AwGBPBHbdE1GXW7dp12SQd423JrBu066CSkRULgz0RNTVdo+NxzpO1GtyCfQi8nkReVJE7vccO0pE7hKRnznfZxse+x7nPj8TkffkUV4i6h5zB+qxjhP1mrxa9NcBeLPv2CoA31LVEwF8y/l9ChE5CsDlAF4L4HQAl5sqBETUm1YuW4h6rX/KsXqtHyuXLSyoRETlkkugV9XvAnjad/g8AF90fv4igOGAhy4DcJeqPq2qzwC4C9MrDETUw4YHG1izfBEaA3UIgMZAHWuWL2IiHpGjyKz7Y1T1Cefn/wJwTMB9GgAe8/z+uHNsGhFZAWAFAMybNy/FYhJR2Q0PNhjYiQxKkYynqgpAO3yOa1V1SFWH5syZk1LJiIiIuluRgf6XIvJKAHC+PxlwnyaA4zy/H+scIyIiIgtFBvoNANws+vcA+HrAfTYBOEtEZjtJeGc5x4iIiMhCXtPrbgDwfQALReRxEfljAGsBvElEfgbgjc7vEJEhEfkcAKjq0wD+BsCPnK8rnWNERERkQdrD49UyNDSko6OjRReDiIgoFyKyRVWHgm4rRTIeERERZYOBnoiIqMIY6ImIiCqM29QS9SDu307UOxjoiXoM928n6i3suifqMdy/nai3MNAT9Rju307UWxjoiXoM928n6i0M9EQ9hvu3E/UWJuMR9Rg34Y5Z90S9gYGeqAdx/3ai3sGueyIiogpjoCciIqowBnoiIqIKY6AnIiKqMAZ6IiKiCmPWPRG4yQsRVRcDPfU8bvJCRFXGrnvqedzkhYiqjIGeeh43eSGiKmPXPfW8uQN1NAOCOjd5KRbzJojSwRY99Txu8lI+bt5Ec2wcioN5EyNbm0UXjajrMNBTzxsebGDN8kVoDNQhABoDdaxZvoitxwIxb4IoPey6JwI3eSkb5k0QpYeBnigAx4eL4b7varideRNE8THQE/lwXn0x/O+7H/MmiJLhGD2RD8eHixH0vruYN0GUXGEtehFZCGC959CrAHxUVT/puc8bAHwdwEPOoVtU9cqcikg9iuPDxTC9vwLg3lVn5lsYogopLNCr6i4AiwFARPoBNAHcGnDXe1T1D3IsGvU4zqsvBt93omyUpev+9wH8QlUfKbogVC4jW5tYsnYzFqzaiCVrN+cyj5rz6ovB950oG2VJxnsngBsMt/2OiGwHsBvAX6rqzqA7icgKACsAYN68eZkUkvJVVFKc+9zMus8X33eibIiqaSJLTgUQOQTtIH6Kqv7Sd9vLABxQ1edF5GwAn1LVE6Oec2hoSEdHR7MpMOVmydrNgV25jYE6x2yJiDxEZIuqDgXdVoYW/VsA/Ic/yAOAqv7a8/PtIvJ/RORoVf1VriWkQtgkxXG+OxFRuDKM0V8EQ7e9iLxCRMT5+XS0y/tUjmWjApmSsNzjXA+diChaZKAXkUNtjiUhIrMAvAnALZ5jfyoif+r8egGA+50x+r8H8E4teqyBchOVnJXFfPcikv+IiLJk03X/fQCvsTgWm6q+AODlvmOf8fz8aQCf7vQ81J2ikrPSnu/OFfGIqIqMgV5EXgGgAaAuIoNor1sBAC8DMDOHshGFbjaT9rzrsB4CBnoi6lZhLfplAC4GcCyAT3iOPwfgrzMsE5GVlcsWTlsbvZN511wRj4iqyBjoVfWLAL4oIuer6s05lonIStrzrrkyGxFVkc0Y/TdE5A8BzPfen2vOUxmkuY982j0ERERlYBPovw7gWQBbALyUbXGIisOV2YioimwC/bGq+ubMS0JUAmn2EBARlYFNoP+eiCxS1R2Zl4aoxLgKHxF1I5tA/zoAF4vIQ2h33QsAVdVXZ1oyohKxnWPPygARlY1NoH9L5qUgKjmbOfZccIeIyihyCVxnj/jjAJzp/LzX5nFEVWIzxz6LJXmJiDpls9b95QD+CsBlzqEagOuzLBRR2URtsANwwR0iKieblvnbAJwL4AUAUNXdAI7IslBEZRO1wQ5gVxkgIsqbTaDf5+wYp8DkjnNEPWV4sIE1yxehMVCHAGgM1LFm+aIpY+82lQEiorzZJOPdKCL/DGBARP4HgPcB+Gy2xSIqn6g59mVYcIdZ/0TkJzbbu4vImwCchfbUuk2qelfWBevE0NCQjo6OFl0Molz5s/6Bdo+Cv+eBiKpHRLao6lDQbTYtejiBvdTBnajXcZtdIgpik3W/XER+JiLPisivReQ5Efl1HoUjInvM+ieiIDbJeB8DcK6qHqmqL1PVI1T1ZVkXjIjiYdY/EQWxCfS/VNWfZl4SIuoIs/6JKIjNGP2oiKwHMALPNrWqektWhaJqYSZ4PsqQ9U9E5WMT6F+G9rK3Z3mOKQAGeorE9d/zxW12icgvMtCr6nvzKAhVEzPBiYiKZZN1f6yI3CoiTzpfN4vIsXkUjrofM8GJiIplk4z3BQAbAMx1vm5zjhFFYiY4EVGxbAL9HFX9gqrud76uAzAn43JRRTATnIioWDaB/ikRebeI9Dtf7wbwVNYFo2qw2QyGiIiyY5N1/z4A/wDgGuf3ewEwQY+sMROcwnD6JVG2bLLuH0F7P/pMiMjDAJ4DMAFgv39RfhERAJ8CcDba0/wuVtX/yKo8RJQfTr8kyp5N1v2
"text/plain": [
"<Figure size 576x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8,6))\n",
"plt.scatter(principal_components[:,0],principal_components[:,1])\n",
"plt.xlabel('First principal component')\n",
"plt.ylabel('Second Principal Component')"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_breast_cancer"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# REQUIRES INTERNET CONNECTION AND FIREWALL ACCESS\n",
"cancer_dictionary = load_breast_cancer()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cancer_dictionary.keys()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,\n",
" 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,\n",
" 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,\n",
" 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,\n",
" 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,\n",
" 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,\n",
" 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,\n",
" 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,\n",
" 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,\n",
" 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,\n",
" 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,\n",
" 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,\n",
" 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,\n",
" 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,\n",
" 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,\n",
" 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cancer_dictionary['target']"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Second Principal Component')"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfoAAAFzCAYAAADWqstZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAC0iUlEQVR4nOydd3yT1ffH3yc73ey9FBQnDgQRFXHinrgn7r3n9+feG3EguBCcKA5ABRFBGaKAgqCy9y7dbXae+/vjSUeapE2hi3Lfr1debZ55kqY59557zueIUgqNRqPRaDRNE0tDG6DRaDQajabu0I5eo9FoNJomjHb0Go1Go9E0YbSj12g0Go2mCaMdvUaj0Wg0TRjt6DUajUajacLYGtqAuqBly5aqa9euDW2GRqPRaDT1wvz587crpVrF29ckHX3Xrl2ZN29eQ5uh0Wg0Gk29ICJrE+3ToXuNRqPRaJow2tFrNBqNRtOE0Y5eo9FoNJomjHb0Go1Go9E0YbSj12g0Go2mCaMdvUaj0Wg0TRjt6DUajUajacJoR6/RaDQaTRNGO3qNRtMkUCqIUqqhzdBoGh3a0Ws0ml0awzsFY9tA1Nb9UdsOwygeqR2+RlOBOnf0IvK+iGwTkcUVtr0oIktE5G8R+VpEshKcu0ZEFonIAhHRmrYajSYK5Z8JBXeDsRFQoAqh+E1UyRsNbZpG02iojxn9KGBQpW1TgP2VUgcCy4AHqzh/oFLqIKVU7zqyT6PR7KKo4qGAr9JWL5S8j1KBBrBIo2l81LmjV0r9CuRW2vajUioUeToH6FjXdmg0miZIKEEfDxUCo6B+bdFoGimNYY1+CPBDgn0K+FFE5ovIdfVok0aj2RWwdY+/XRxgaVa/tmg0jZQGdfQi8j8gBHyc4JAjlVKHACcDN4vI0VVc6zoRmSci87Kzs+vAWo1G09iQ9LsAV6WNbki7GZEm2YVbo6kxDeboReRK4DTgEpUgRVYptTHycxvwNdAn0fWUUiOVUr2VUr1btWpVBxZrNJrGhjgOQ5qNANs+gAMs7SH9/5CUqxraNI2m0dAgQ14RGQTcBwxQSnkSHJMKWJRSRZHfTwSeqEczNRrNLoA4+yHObxvaDI2m0VIf5XWfAr8Be4vIBhG5GngDSAemRErn3o4c215Evo+c2gaYKSILgT+A75RSk+raXo1Go9FomhJ1PqNXSl0UZ/N7CY7dBJwS+X0V0KsOTdNoNBqNpsnTGLLuNRqNRqPR1BHa0Ws0Go1G04TRjl6j0Wg0miaMdvQajUaj0TRhtKPXaDQajaYJox29RqPRaDRNGO3oNRqNRqNpwmhHr9FoNBpNE0Y7eo1Go9FomjDa0Ws0Go1G04TRjl6j0Wg0miaMdvQajUaj0TRhtKPXaDQajaYJox29RqPRaDRNGO3oNRqNRqNpwmhHr9FoNBpNE0Y7eo1Go9FomjDa0Ws0Go1G04TRjl6j0Wg0miaMdvQajUaj0TRhtKPXaDQajaYJox29RqPRaDRNGO3oNRqNRqNpwtga2gCNRqPZWTav3srUj2dQUuDh8FMP5cAB+yIiDW2WRtMo0I5eo9Hs0vz86QxevuZtjHCYcDDMxLd/pO+ph/DQJ3dgseigpUaj/ws0Gs0ui6fIyyvXvk3AGyAUCKMU+Er8/P7dn8yZOL+hzdNoGgXa0Ws0ml2WBT8vxmqzxmz3lfj5+ZOZDWCRRtP40I5eo9HssljtsU6+FLtTr0xqNFBPjl5E3heRbSKyuMK25iIyRUSWR342S3DuFZFjlovIFfVhr0aj2TU4+Nj94253pTo56cqB9WyNRtM4qa8Z/ShgUKVtDwBTlVI9gKmR51GISHPgUaAv0Ad4NNGAQKPR7H44XA4e++peXKlO3GkuHC47Dped0288iYMGxh8EaDS7G/US21JK/SoiXSttPhM4JvL7h8B04P5Kx5wETFFK5QKIyBTMAcOndWWrRqPZtTj42AP4bONIZn8zF0+Rl94n9aJD93YNbZZG02hoyEWsNkqpzZHftwBt4hzTAVhf4fmGyLYYROQ64DqAzp0716KZGo2msZOakcIJlw9oaDM0mkZJo0jGU0opQO3kNUYqpXorpXq3atWqlizTaDQajWbXpiEd/VYRaQcQ+bktzjEbgU4VnneMbNNoNBqNRpMEDenoxwOlWfRXAN/GOWYycKKINIsk4Z0Y2abRaDQajSYJ6qu87lPgN2BvEdkgIlcDzwEniMhy4PjIc0Skt4i8CxBJwnsSmBt5PFGamKfRaDQajaZ6xFweb1r07t1bzZs3r6HN0Gg0Go2mXhCR+Uqp3vH2NYpkPI1Go9FoNHWDdvQajUaj0TRhtKPXaHZDDM+3GNknYWw9BCPnUlTw74Y2SaPR1BHa0Ws0uxlGyQdQ+AiEV4MqhuAfqJxLUcF/G9o0jUZTB2hHr9HsRigVgOJhgLfSHj+q6NWGMEmj0dQx2tFrNLsTRjZgxNmhIKRn9BpNU0Q7eo1md8LSHFQ8Rw9YO9avLRqNpl7Qjl6j2Y0QcUPKBYCr0h4XknZbQ5ik0WjqmIbsXqfRaBoASX8AhR28n4AKgSUL0h9CnP0b2jSNRlMHaEev0exmiNiQjPtR6XeB8oKkIyINbZZGo6kjtKPXaHZTROwg9oY2Q6PR1DF6jV6j0Wg0miaMdvQajUaj0TRhtKPXaDQajaYJox29RqPRaDRNGJ2Mp9ntUUqB/2eUdxyoMOI+E1yDENHjYI1Gs+ujHb1mt0cVPgy+CWapGaCCc8D3HWS9ocvONBrNLo+esmh2a1RwKXjHlzl5c6MXArMgOK/hDNNoNJpaQjt6ze5NYDZxm7woD8r/a72bo9FoNLWNDt1rdm8kA/PfIFBphwMkswEM0pSyetFapn0+GyMU5qjz+rF37z0b2iSNZpdEO3rN7o3rRCh6ElTlHRbEfXpDWKQBPnv+a8Y88SWhQAilFN+88QNn3jyIa5+/rKFN02h2OXToXrNbI5Z0pNlIc2YvqSBpIKlI1muItU1Dm7dbsnnVVsY8/gUBbwAjbKAMhd8T4Ns3J7Fy4ZqGNk+j2eXQM3rNbo84+kDr3yAwHwiDozcizoY2a7dlzsT5qJgICwT9IWZ9/Tt79upa7zZpNLsy2tFrNEQavDgPL3uujALw/QiqBJxHIrbuDWjd7sOWNdtY8sdyU9ugEmIRbE7dhEejqSna0Ws0lVD+Wai8m0DE7Nde9Aoq5QIk/SFdV1+HjHr0M754cTyIEAqEYvZbbVYGDO7XAJZpNLs2eo1eo6mAUj5U/i2AF5QHMxvfB96xkVI8TV2waMZ/fPnyRAK+IAFvdAWEM8WB3WXn+hcvo0P3dg1koUaz66Jn9BpNRQJzgDizduVFeb9CnP3r3aTdgcmjphHw+mO2O5x2Bl11LBc+eDYt2zdvAMs0ml2fBpvRi8jeIrKgwqNQRO6odMwxIlJQ4ZhHGshcze6CiiOeU0a43szY3Qj4gnET8GwOG70G7q+dvEazEzTYjF4ptRQ4CEBErMBG4Os4h85QSp1Wj6ZpGhFK+cE3BcKbwH4gOPrW7Tq5oy+oOA5d3IjrjLq7727OMRccwW/j5+Er8UVtD4XCHHLc/g1klUbTNGgsofvjgJVKqbUNbYim8aBCq1E5FwF+UH4QB9h6QvNRiLjq5J5iSUVlPg8F92Kq6AQBFziPB+fAOrmnBg4/7VB6n9SLeZMX4CvxY7VZsdqt3PbWNaRmpja0eRrNLo3EK2OpdyNE3gf+VEq9UWn7McA4YAOwCbhHKfVPgmtcB1wH0Llz50PXrtVjhl0dY/vZEPqXaNk6J6RehyX91jq9twpvRnkngCpGnMeA/WCdcV/HKKX46+fFzP72D1IzUzjhsgF03Kt9Q5ul0ewSiMh8pVTvuPsa2tGLiAPTie+nlNpaaV8GYCilikXkFOA1pVSP6q7Zu3dvNW+e7jy2K6PC21HZxxCrQQ9YOmJp/XP5scrQveM1Gs1uTVWOvjF8O56MOZvfWnmHUqpQKVUc+f17wC4iLevbQE1DUNUA1NynfNMwsk9Abe2JsbUPRvG7cYVWNBqNZnemMTj6i4BP4+0QkbYSiZeKSB9
"text/plain": [
"<Figure size 576x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8,6))\n",
"plt.scatter(principal_components[:,0],principal_components[:,1],c=cancer_dictionary['target'])\n",
"plt.xlabel('First principal component')\n",
"plt.ylabel('Second Principal Component')"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}