You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

820 KiB

<html> <head> </head>

___

Copyright by Pierian Data Inc. For more information, visit us at www.pieriandata.com

Introduction to DBSCAN

Let's briefly explore visually the differences between DBSCAN and other clustering techniques, such as K-Means Clustering.

DBSCAN and Clustering Examples

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
blobs = pd.read_csv('../DATA/cluster_blobs.csv')
In [3]:
blobs.head()
Out[3]:
X1 X2
0 4.645333 6.822294
1 4.784032 6.422883
2 -5.851786 5.774331
3 -7.459592 6.456415
4 4.918911 6.961479
In [4]:
sns.scatterplot(data=blobs,x='X1',y='X2')
Out[4]:
<AxesSubplot:xlabel='X1', ylabel='X2'>
In [5]:
moons = pd.read_csv('../DATA/cluster_moons.csv')
In [6]:
moons.head()
Out[6]:
X1 X2
0 0.674362 -0.444625
1 1.547129 -0.239796
2 1.601930 -0.230792
3 0.014563 0.449752
4 1.503476 -0.389164
In [23]:
sns.scatterplot(data=moons,x='X1',y='X2')
Out[23]:
<AxesSubplot:xlabel='X1', ylabel='X2'>
In [8]:
circles = pd.read_csv('../DATA/cluster_circles.csv')
In [9]:
circles.head()
Out[9]:
X1 X2
0 -0.348677 0.010157
1 -0.176587 -0.954283
2 0.301703 -0.113045
3 -0.782889 -0.719468
4 -0.733280 -0.757354
In [10]:
sns.scatterplot(data=circles,x='X1',y='X2')
Out[10]:
<AxesSubplot:xlabel='X1', ylabel='X2'>

Label Discovery

In [11]:
def display_categories(model,data):
    labels = model.fit_predict(data)
    sns.scatterplot(data=data,x='X1',y='X2',hue=labels,palette='Set1')

Kmeans Results

In [12]:
from sklearn.cluster import KMeans
model = KMeans(n_clusters = 2)
In [27]:
display_categories(model,moons)
In [14]:
model = KMeans(n_clusters = 3)
display_categories(model,blobs)
In [25]:
model = KMeans(n_clusters = 2)
display_categories(model,circles)

DBSCAN Results

In [16]:
from sklearn.cluster import DBSCAN
In [17]:
model = DBSCAN(eps=0.6)
In [18]:
display_categories(model,blobs)
In [28]:
model = DBSCAN(eps=0.15)
plt.figure(figsize=(10,6),dpi=150)
display_categories(model,moons)
In [20]:
display_categories(model,circles)

Let's further explore DBSCAN Hyperparameters!

</html>