Sunday, 23 March 2025

Centroid in Machine learning

Centroid represent the center point of the cluster. It represents the point that is closest to all the data points in the cluster. Centroid is used in variety of machine learning algorithms, such as k-means clustering.

What is the centroid of a triangle?

It is the point where the three medians intersect.

 

Following program draws the centroid of a triangle.

 


centroid_of_a_triangle.py

import matplotlib.pyplot as plt
import numpy as np

# Define the vertices of the triangle
vertices = np.array([
    [0, 0],
    [4, 0],
    [2, 3]
])

# Calculate the centroid of the triangle
centroid = np.mean(vertices, axis=0)

# Create a scatter plot of the triangle vertices
plt.scatter(vertices[:, 0], vertices[:, 1], c='blue', marker='o', label='Vertices')

# Plot the centroid
plt.scatter(centroid[0], centroid[1], c='red', marker='x', s=150, label='Centroid')

# Add labels and legend
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.title('Triangle with Centroid')

# Connect the vertices to form the triangle
triangle = plt.Polygon(vertices, closed=True, fill=None,  color='blue')
plt.gca().add_patch(triangle)

# Display the plot
plt.show()

 

Output


 

What is the centroid of a circle?

It is the center of the circle.

 

centroid_of_a_circle.py

import matplotlib.pyplot as plt
import numpy as np

# Define the center and radius of the circle
center = (3, 3)  # Center coordinates (x, y)
radius = 2

# Create a figure and axis
fig, ax = plt.subplots()

# Plot the circle
circle = plt.Circle(center, radius, fill=False, color='blue')
ax.add_patch(circle)

# Plot the center
plt.scatter(center[0], center[1], c='red', marker='o', s=100, label='Centroid')

plt.legend()
plt.title('Circle with Center')

# Set axis limits for a better view
ax.set_xlim(0, 6)
ax.set_ylim(0, 6)

plt.show()

 

Output


 


What is the centroid of a data points in Machine learning algorithm?

The centroid of a set of data points is the point that is equidistant from all the data points.

 

Following example generate random two dimensional data points in the range of 1 and 10, and divide the data into two clusters based on some predefine centroid positions.

 

centroid.py

import matplotlib.pyplot as plt
import numpy as np

def rand_array(a, b, *more):
    return a + (b - a) * np.random.rand(*more)


# Generate some example data points
np.random.seed(47)

# Generate 100 rows, where each row has 2 columns and the values are in range of 1 and 10
min = 1
max = 10
data_points = rand_array(min, max, 100, 2)

# Define some default centroids for the two clusters
centroid1 = np.array([3, 6])
centroid2 = np.array([6, 8])

# Separate data points into two clusters based on distance from centroids
cluster1 = data_points[np.linalg.norm(data_points - centroid1, axis=1) < np.linalg.norm(data_points - centroid2, axis=1)]
cluster2 = data_points[np.linalg.norm(data_points - centroid2, axis=1) < np.linalg.norm(data_points - centroid1, axis=1)]

# Create a scatter plot of data points in cluster1 and cluster2
plt.scatter(cluster1[:, 0], cluster1[:, 1], c='blue', label='Cluster 1')
plt.scatter(cluster2[:, 0], cluster2[:, 1], c='red', label='Cluster 2')

# Plot the centroids
plt.scatter(centroid1[0], centroid1[1], c='blue', marker='x', s=100, label='Centroid 1')
plt.scatter(centroid2[0], centroid2[1], c='red', marker='x', s=100, label='Centroid 2')

# Add labels and legend
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.title('Data Points and Centroids')

# Display the plot
plt.show()

 

Output


 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment