Fuzzy C-Mean Clustering

Fuzzy adalah clustering menggunakan derajat keanggotaan dengan pendekatan incremental.

Steps

1. Persiapkan Environment:

from pandas import DataFrame
import random
import numpy as np
from IPython.display import HTML, display
from tabulate import tabulate
from math import log
from sklearn.feature_selection import mutual_info_classif

def table(df): display(HTML(tabulate(df, tablefmt='html', headers='keys', showindex=False)))

2. Persiapkan Input Data (D = m x n):

  • n: Jumlah Sampel
  • m: Jumlah Fitur
  • c: Jumlah Cluster
  • w: Tingkat blur/fuzzy (biasanya 2)
  • T: Batas maks Iterasi (biasanya 10)
  • e: Akurasi (biasanya 0.1)
  • Pt: Fungsi Objektif ke-t
  • t: Iterasi ke-t
Data = read_csv('leaf.csv', sep=',')
Data = Data[['Eccentricity','Solidity', 'Lobedness', 'Entropy']].sample(6, random_state=42)
D = Data.values
print("Table (D) >>")
table(D)
Table (D) >>
0 1 2 3
0.995930.806622.7342 0.27303
0.506920.530243.0788 0.67289
0.244650.565242.854 0.8331
0.865450.824430.40204 1.0136
0.828660.9418 0.11857 1.8038
0.727190.993880.00160190.9805
n, m, c, w, T, e, P0, t = *D.shape, 3, 2, 10, 0.1, 0, 1
print("Variables >>")
print(" n = %d\n m = %d\n c = %d\n w = %d\n T = %d\n e = %f\n P0 = %d\n t = %d" % (n, m, c, w, T, e, P0, t))
Variables >>
 n = 6
 m = 4
 c = 3
 w = 2
 T = 10
 e = 0.100000
 P0 = 0
 t = 1

3. Siapkan Matrik Derajat Kluster (U = c x n):

Data diisi dengan random atau hasil iterasi lama

random.seed(42)
U = np.array([[random.uniform(0, 1) for _ in range(c)] for _ in range(n)])
print("U >>\n")
print(U)
U >>

[[0.6394268  0.02501076 0.27502932]
 [0.22321074 0.73647121 0.67669949]
 [0.89217957 0.08693883 0.42192182]
 [0.02979722 0.21863797 0.50535529]
 [0.02653597 0.19883765 0.64988444]
 [0.54494148 0.22044062 0.58926568]]

4. Hitung Centroid Tiap Cluster (V = m x c):

V_{xy} = \frac{\sum^n_{i=1}(U_{iy})^w\times{D_{ix}}}{\sum^n_{i=1}(U_{iy})^w}
# Caution: NP Array is math-agnostic (column-by-column)
def cluster(U, D, x, y): return sum([U[i,y]**w*D[i,x] for i in range(n)])/sum([U[i,y]**w for i in range(n)])
V = np.array([[cluster(U,D,x,y) for x in range(m)] for y in range(c)])
print("V >>\n")
print(V)
V >>

[[0.54370379 0.70992788 2.28168401 0.7092545 ]
 [0.56356398 0.60788268 2.50132412 0.78491767]
 [0.67635682 0.7819355  1.31182068 1.05856209]]

5. Hitung Fungsi Objektif pada t (Pt)

P_t = \sum^n_{i=1}\sum^c_{k=1}\left(\left[\sum^m_{j=1}\left(D_{ij}-V_{kj}\right)^2\right](U_{ik})^w\right)
def objective(V,U,D): return sum([sum([sum([(D[i,j]-V[k,j])**2 for j in range(m)])*(U[i,k]**w) for k in range(c)]) for i in range(n)])
Pt = objective(V,U,D)
print("Pt >>\n")
print(Pt)
Pt >>

7.165764247017886

6. Hitung Ulang Matrik Derajat Kluster (U = c x n):

U_{ik} = \frac{\left[\sum^m_{j=1}(D_{ij}-V_{kj})^2\right]^{\frac{-1}{w-1}}}{\sum^c_{k=1}\left[\sum^m_{j=1}(D_{ij}-V_{kj})^2\right]^{\frac{-1}{w-1}}}
def converge(V,D,i,k): return (sum([(D[i,j]-V[k,j])**2 for j in range(m)])**(-1/(w-1)))/sum([sum([(D[i,j]-V[k,j])**2 for j in range(m)])**(-1/(w-1)) for k in range(c)])
print("U >>\n")
np.array([[converge(V,D,i,k) for k in range(c)] for i in range(n)])
array([[0.42661745, 0.47867606, 0.09470648],
       [0.32401778, 0.61139512, 0.0645871 ],
       [0.31857727, 0.62718924, 0.05423349],
       [0.16315857, 0.13281473, 0.7040267 ],
       [0.20677417, 0.18023246, 0.61299337],
       [0.20507176, 0.17092863, 0.62399961]])

7. Cek Berhenti Atau Loop Kembali

Jika $ P_t - P_{t-1} < e $ atau $ t >= T $ maka BERHENTI

Jika tidak, ulangi langkah dari Hitung Centroid Tiap Cluster

def iterate(U):
    V = np.array([[cluster(U, D, x, y) for x in range(m)] for y in range(c)])
    return np.array([[converge(V,D,i,k) for k in range(c)] for i in range(n)]), objective(V,U,D)

def fuzzyCM(U):
    #U = np.array([[random.uniform(0, 1) for _ in range(c)] for _ in range(n)])

    U, P2, P, t = *iterate(U), 0, 1
    while abs(P2 - P) > e and t < T:
        U, P2, P, t = *iterate(U), P2, t+1
    return U, t

FuzzyResult, FuzzyIters = fuzzyCM(U)
print("Iterating %d times, fuzzy result >> \n" % FuzzyIters)
print(FuzzyResult)
Iterating 5 times, fuzzy result >>

[[9.99946808e-01 4.84760420e-05 4.71615530e-06]
 [5.97625571e-02 9.36333265e-01 3.90417747e-03]
 [3.61938911e-02 9.59438085e-01 4.36802436e-03]
 [1.97778963e-02 1.70437266e-02 9.63178377e-01]
 [3.11702255e-02 3.00196318e-02 9.38810143e-01]
 [1.40843238e-02 1.23997912e-02 9.73515885e-01]]

8. Ambil Nilai Terbesar pada Kolom Sebagai Cluster pada setiap Record Data

table(DataFrame([D[i].tolist()+[np.argmax(FuzzyResult[i].tolist())] for i in range(n)], columns=Data.columns.tolist()+["Cluster Index"]))
Eccentricity Solidity Lobedness Entropy Cluster Index
0.99593 0.80662 2.7342 0.27303 0
0.50692 0.53024 3.0788 0.67289 1
0.24465 0.56524 2.854 0.8331 1
0.86545 0.82443 0.40204 1.0136 2
0.82866 0.9418 0.11857 1.8038 2
0.72719 0.99388 0.0016019 0.9805 2