You are reading the article A Simple Guide To Centroid Based Clustering (With Python Code) updated in December 2023 on the website Kientrucdochoi.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 A Simple Guide To Centroid Based Clustering (With Python Code)
This article was published as a part of the Data Science Blogathon.
IntroductionClustering is the process of grouping similar data together. It falls under the category of unsupervised learning, that is the input data does not have labeled responses. Clustering algorithms find their applications in various fields like finance, medicine, and e-commerce. One such example is in e-commerce a possible use case would be to group similar customer segments based on their purchasing styles to give them offers or discounts.
In clustering every data point belongs to some cluster however a single data point cannot be present in more than one cluster. The performance of a clustering algorithm can be measured by metrics such as the Dunn index (DI). A large inter-cluster distance(well separated) and a smaller inter-cluster distance(compact) clusters have a higher value of DI.
Different approaches can be employed in clustering based on your dataset, such as
Centroid based clustering
Hierarchical clustering/Connectivity based clustering
Density-based clustering
We would focus on centroid-based clustering in this article.
Centroid based clusteringK means algorithm is one of the centroid based clustering algorithms. Here k is the number of clusters and is a hyperparameter to the algorithm. The core idea behind the algorithm is to find k centroids followed by finding k sets of points which are grouped based on the proximity to the centroid such that the squared distances of the points in the cluster to the centroid are minimized. This optimization is a very hard problem hence an approximation is used to solve the problem. One such approximation is Lloyd’s algorithm whose implementation is as below.
Implementation:-
Select k points at random as centroids/cluster centers.
Assign data points to the closest cluster based on Euclidean distance
Calculate centroid of all points within the cluster
Repeat iteratively till convergence. (Same points are assigned to the clusters in consecutive iterations)
The only problem with this implementation however is that it suffers from initialization sensitivity. On selecting different centroids in the initialization stage different clusters are generated. Workaround to the problem would be
to repeat k means multiple times with different initializations and select the best result.
instead of using random initialization to use a smart initialization process such as K means ++.
In some cases it is difficult to interpret centroids, for example, if you are dealing with text data, centroids are not interpretable. An approach to deal with this would be to use the K medoids algorithm. It would select the most centered member within the data as a cluster center and is generally more robust to outliers than other means.
You may wonder how is the best value for k is selected?Elbow or knee method is used to determine the best k. We try the k means implementation for multiple k values and plot them against our sum of squared distances from the centroid(loss function). The elbow of the curve (where the curve visibly bends) is selected as the optimum k.
Advantages of using K means:-
The algorithm in most cases runs in linear time.
Simple and intuitive to understand
Limitations of using K means:-
A number of clusters need to be known beforehand.
It is not very robust to outliers.
Does not work very well with nonconvex shapes.
Tries to generate equal-sized clusters
Let’s run through a code example of K means in action.
As input, I have generated a dataset in python using sklearn.datasets.make_blobs.
We have a hundred sample points and two features in our input data with three centers for the clusters.
We then fit our data to the K means clustering model as implemented by sklearn.
We have used our initialization method to be k means++ instead of random and set our k to be 3
Try out the above code in the live coding window below:
We can see the three centers of clusters printed and kmeans.labels_ gives us the cluster each of our hundred points is assigned to.
To better visualize this we use the below code to represent the clusters graphically.
We can see that our k means algorithm did a very good job in clustering our dataset.
Thank you for reading along, I am Alicia Ghantiwala a data science enthusiast and a software engineer at BNP Paribas.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
You're reading A Simple Guide To Centroid Based Clustering (With Python Code)
The Dir() Function In Python: A Complete Guide (With Examples)
In Python, the dir() function returns a list of the attributes and methods that belong to an object. This can be useful for exploring and discovering the capabilities of an object, as well as for debugging and testing.
For example, let’s list the attributes associated with a Python string:
# List the attributes and methods of a string dir('Hello')Output:
The result is a complete list of all the methods and attributes associated with the str data type in Python. If you’ve dealt with strings before, you might see methods that look familiar to you, such as split, join, or upper.
This is a comprehensive guide to understanding what is the dir() function in Python. You will learn how to call the function on any object and more importantly how to analyze the output. Besides, you’ll learn when you’ll most likely need the dir() function.
What Is the dir() Function in Python?The dir() function in Python lists the attributes and methods of an object. It takes an object as an argument and returns a list of strings, which are the names of its attributes and methods. Using the dir() function can be useful for inspecting objects to get a better understanding of what they do.
For example, you can use dir() to list the attributes of a built-in data type like a list or a dictionary, or you can use it on a custom class to see what’s in it. Besides, you can explore a poorly documented module or library with the dir() function.
Syntax and ParametersThe syntax of the dir() function in Python is as follows:
dir(object)The dir() function takes a single parameter:
object: This is the object or type whose attributes you’re interested in. This can be any type of object in Python, such as a built-in data type like a list or a dictionary, or a user-defined class.
The function returns a list of strings that represent the methods and attributes of that object.
ExamplesLet’s call the dir() function on a string ‘Hello’:
# List the attributes and methods of a string print(dir('Hello'))Output:
The above result is a list that has all the attributes and methods of a string object.
For example, the upper() method can be used to convert a string to uppercase, and the find() method can be used to search for a substring within a string. Both these methods belong to the string class and are thus present in the output of the dir() function call.
Speaking of the variety of string methods in Python, make sure to read my complete guide to Strings in Python.
Notice that you can call the dir() function on any object in Python, not just built-in types like string. In other words, you can list the attributes your custom class has. You’ll find more examples of this later on.
What Are the Double Underscore Methods (E.g. ‘__add__’)?In the previous example, you saw a bunch of methods that start with __, such as __add__, or __class__.
The double underscore methods that appear in the outputs of the dir() function are called “magic methods” or “dunder methods” in Python.
These methods are special methods that are defined by the Python language itself, and they are used to implement some of the built-in behavior of objects in Python.
For example, the __len__() method is called when you use the len() function on an object, and it returns the length of the object. The __add__() method is called when you use the + operator on two objects, and it returns the result of the operation.
The dunder methods are not meant to be called directly, but they are invoked automatically by the Python interpreter when certain operations are performed on an object.
Anyway, let’s go back to the topic.
Using dir() to Inspect Classes and InstancesThe dir() function is useful if you want to inspect the attributes of classes and their instances.
When you use dir() with a class, lists the attributes defined in the class, including any inherited attributes and methods from the class’s superclasses.
In the earlier examples, you called the dir() function on a Python string instance. But you can call it on any other class instance, including custom classes created by you. More importantly, you can call the dir() function directly on a class instead of an instance of it.
Here’s an example of using dir() to inspect the attributes and methods of a custom class you’ve just created:
class MyClass: def __init__(self, x, y): self.x = x self.y = y def my_method(self): return self.x + self.y print(dir(MyClass))Output:
In this example, there’s a custom class called MyClass that has an __init__() method and a my_method() method. Calling the dir() function returns a big list of strings. These are all the attributes of the custom class MyClass.
The very last name on the above list is ‘my_method‘. This is the method we defined in MyClass. But notice that the variables x and y aren’t there. This is because you called the dir() function on the class, not on an instance of it. Because of this, the variables x and y aren’t technically there. Those instance variables are created upon instantiating the objects.
As another example, let’s call dir() on a class instance instead of a class:
class MyClass: def __init__(self, x, y): self.x = x self.y = y def my_method(self): return self.x + self.y my_instance = MyClass(1, 2) print(dir(my_instance))Output:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'my_method', 'x', 'y']Again, toward the end of the list, you can see the attribute my_method. But this time, because you called the dir() on an instance, you will also see the variables x and y.
When Use the dir() Function?The dir() function is useful in Python for finding out what attributes an object has. This is helpful when you are working with an object that you are not familiar with, or when you just want to see all of the available methods of an object.
Another common use for the dir() function is to find out what attributes and methods are available in a particular module or package.
For example, to see what attributes and methods are available in the math module, call dir() on it:
import math print(dir(math))This prints out all the attributes and methods that are available in the math module. It’s helpful for quickly finding out what functions and other objects are available in the module, and might save you time when you are working with a module that you are not familiar with. This is especially true if the module is poorly documented (unlike the math module, though)
SummaryIn conclusion, the dir() function is a valuable tool in Python for finding out what attributes and methods are available for a given object. It can be used to quickly explore new objects and modules, and can save time when you are working with an object or module that you are not familiar with.
Thanks for reading. Happy coding!
Read AlsoHow to Inspect a Python Object
Your Simple Guide To Twitter #Hashtags
There’s no doubt Twitter has changed the world we live in.
From its launch in July 2006 to its current state 13 years later, the microblogging website has become the ultimate news source, outreach platform, meme supplier, political soapbox, and so, so much more to so many people.
One of the many results of Twitter and its growing popularity was the rise of the hashtag.
Twitter infamously helped create the hashtag in 2007, first used by Chris Messina, which changed not just Twitter, but all of social media – and much of the world around it – in a big way.
What Are Hashtags?A hashtag is a keyword index tool written with a #, or the pound symbol, at the beginning of a series of space-less keyword sets to refer to a specific topic, idea, or trend.
After debuting on Twitter thanks to Messina, hashtags flourished, first on Twitter, then on other social media platforms like Instagram, Facebook, and even business-oriented LinkedIn.
Hashtags have become a staple on most social media platforms and are embedded in the everyday fabric of social media.
And, thankfully, they’ve made categorization in a world of data overload easier than ever before.
How to Use HashtagsHashtags help categorize content among a plethora of information, thus making it easier than ever before to find and sort specific bits of information as they are published across Twitter.
It has become a legitimate source for breaking news, official statements, campaign launches, and even jarring photos and videos that have led to arrests and accusations, as well as other unexpected, unprecedented, and unbelievable interactions.
When using hashtags – either ones that are already trending or trying to kickstart a new one for a specific reason, campaign or idea – there are basic guidelines to using the right one, at the right time, with the right content. This will limit the potential for unintentional blowback, and later, damage control.
Creating a new hashtag and hopping on an existing one are drastically different moves and need to be handled as such. But they’re both helpful and are skills all quality social media marketers (and Twitter users) should understand.
Creating HashtagsCreating a hashtag can be tricky.
Like most “viral” content on the web, some of the strangest ones will find a way to break through the surface and become a multi-day Twitter trend.
Others will fall to the wayside with very little effort.
Even the best hashtags benefit from influencer piggybacking, overall timing, and general luck to becoming a common trend on Twitter.
In addition to those aspects, you should follow a few other rules when creating a new hashtag if you want it catch on and become popular.
The three most important rules for creating hashtags are:
Keep It Simple
Keeping it simple is the most important aspect when it comes to creating a hashtag.
If it’s too complicated or elaborate, it will likely not catch on.
It also can’t be so vague that it’s impossible to separate it from other, unrelated hashtags with similar keywords or ideas.
Keep It Memorable
Clever hashtags tend to get legs easier than ones that are not.
If it’s witty and easy to remember, not only will the hashtag likely catch on and be used, but it will also likely have a longer shelf life than a hashtag that is not that memorable.
Give It the ‘Common Sense Check’
This is just as critical as the first two rules for creating hashtags, if not more.
Does the hashtag you’re trying to create make sense?
Can it be confused with another topic or hashtag that has nothing to do with your goal?
Most of all, does it offend, confuse, or lean toward the idea that this isn’t the best hashtag for your unique messaging?
A simple common sense check should help direct you as to whether your newly developed hashtag is going to be a winner or if it’s danger looming.
Using Existing HashtagsWhen using hashtags that are already being used by others on the platform, there are some important rules to consider as well, but they are a bit different than those for creating new hashtags.
The three most important rules for using hashtags:
Research the Hashtag Before Adopting It
It may not mean what you think it means.
Your first step to ensuring it is the hashtag you’re looking for is to research it; look at other tweets using the hashtag and make sure they are in line with your thinking.
Too many times, users miss the mark with this one and adopt a hashtag that really means something completely different than what they intend.
Just ask DiGiorno’s Pizza about #WhyIStayed.
Make Sure It’s Relevant
Once you know what it means, make sure it makes sense to use for your messaging. Miss the mark and suffer the consequences.
Be Clever
Be sure to use your wit and personality and put your brand/personal spin on it.
Remember, the right hashtag has been used hundreds or thousands of times before you. This is the chance for you to stand out in a crowded room. Do it!
The biggest aspect of this to realize and remember is that, if hashtags are used incorrectly, it could come back to hurt the brand.
Being associated with a poor user experience is a quick and easy way to lose followers, fans, and even customers.
Potential Hashtag NightmaresJust like anything else on the internet, there are people who will try to manipulate the system to gain an edge by doing less than others.
When it comes to hashtags, lazy (and bad) marketers will piggyback on popular and trending hashtags to gain increased visibility, sometimes compromising the integrity of the hashtag if misleading tweets aren’t filtered out.
These piggy-backers are rarely, if ever, rewarded. And brands that try it only suffer the backlash of the public, then the history books (i.e., American Apparel’s Hurricane Sandy Sale and other piggybacking disasters).
Like most things in the digital marketing realm, make sure what you’re doing is ethical and sensible. It’s unlikely you’d be penalized for that.
When to Use HashtagsHashtags have a time and place to be used, and it can be in every tweet a brand publishes.
It also doesn’t need to be, either.
Be genuine in your messaging and use hashtags to help categorize information, not to manipulate or deceive. Customers will remember it and they know what they want.
Why Use HashtagsSimply put, hashtags improve your messages’ general visibility on Twitter (typically).
In addition to the increased organic visibility, hashtag users also tend to see increased engagement on the platform, increased brand awareness, and increased customer feedback, among other things, when effectively (and properly) using hashtags – all of which result in increased visibility.
More Resources:
Heap Sort Algorithm (With Code In Python And C++)
What is Heap Sort Algorithm?
Heap Sort is one of the popular and faster sorting algorithms. It’s built on the complete binary tree data structure. We will search for the maximum element and put it on the top for the max heap. We will put it on the parent node of the binary tree.
Let’s say an array is given, data = [10,5, 7, 9, 4, 11, 45, 17, 60].
In the array, if i-th (i=0,1,2,3 …) index is a parent node then, (2i+1) and (2i+2) will be the left and right children. Creating a complete binary tree with this array will look like this:
We will do the heapify process from the beginning to the end of the array. Initially, if we convert the array to a tree, it will look like the above. We can see that it’s not maintaining any heap property (min-heap or max heap). We will get the sorted array by doing the heapify process for all the nodes.
Application of Heap SortHere’s some usage of the heap sort algorithm:
Construction of “Priority Queues” needs heap sort. Because heapsort keeps the element sorted after each insertion is being made.
Heap Data Structure is efficient in finding the kth largest element in a given array.
Linux Kernel uses the heap sort as a default sorting algorithm as it has O (1) space complexity.
Create Heap Sort with ExampleHere, we will construct a max heap from the following complete binary tree.
The leaf nodes are 17, 60, 4, 11, and 45. They don’t have any child nodes. That is why they are leaf nodes. So, we will start the heapify method from their parent node. Here are the steps:
Step 1) Select the left-most sub-tree. If the child nodes are greater, swap the parent node with the child node.
Here the parent node is 9. And the child nodes are 17 and 60. As, 60 is the largest, 60 and 9 will be swapped to maintain the max heap.
Step 2) Now, the left-most subtree is heapified. The next parent node is 7. This parent has two child nodes, and the largest is 45. So, 45 and 7 will be swapped.
Step 3) Nodes 60 and 4 have the parent node 5. As “5” is smaller than the child node 60, it will be swapped.
Step 4) Now, node 5 has the child node 17,9. This is not maintaining the max heap property. So, 5 will be replaced with 17.
Step 5) Node 10 will be swapped with 60, then swapped with 17. The process will look like the following.
Step 6) Up to step 5, we created the max heap. Every parent node is larger than its child nodes. The root node has the maximum value (60).
Note: To create the sorted array, we need to replace the max valued node with its successor.
This process is called “extract max”. As 60 is the max node, we will fix its position to the 0th index and create the heap without node 60.
Step 7) As 60 is removed, then the next maximum value is 45. We will do the process “Extract Max” again from node 45.
This time we will get 45 and replace the root node with its successor 17.
We need to perform “Extract Max” until all the elements are sorted.
After doing these steps until we extract all the max values, we will get the following array.
What is Binary Heap?A Binary Heap is a kind of complete binary tree data structure. In this kind of tree structure, the parent node is either greater or smaller than the child nodes. If the parent node is smaller, then the heap is called the “Min Heap” and if the parent node is greater, the heap is called the “Max Heap”.
Here’re examples of min heap and max heap.
Min Heap and Max Heap
In the above figure, if you notice the “Min Heap”, the parent node is always smaller than its child nodes. At the head of the tree, we can find the smallest value 10.
Similarly, for the “Max Heap”, the parent node is always larger than the child nodes. The maximum element is present at the head node for the “Max Heap”.
What is “Heapify”?“Heapify” is the principle of the heap that ensures the position of the node. In Heapify, a max heap always maintains a relationship with parent and child, and that is parent node will be larger than the child nodes.
For example, if a new node is added, we need to reshape the heap. However, we might need to change or swap the nodes or rearrange array. This process of reshaping a heap is called the “heapify”.
Here is an example of how heapify works:
Adding a new node and heapify
Here are the steps for heapify:
Step 1) Added node 65 as the right child of node 60.
Step 2) Check if the newly added node is greater than the parent.
Step 3) As it’s greater than the parent node, we swapped the right child with its parent.
How to build the HeapBefore building the heap or heapify a tree, we need to know how we will store it. As the heap is a complete binary tree, it’s better to use an array to hold the data of the heap.
Let’s say an array contains a total of n elements. If “i”th index is a parent node, then the left node will be at index (2i+1), and the right node will be at index (2i+2). We are assuming that the array index begins from 0.
Using this, let’s store a max heap to an array-like following:
Array-based representation of the max heap.
The heapify algorithm maintains the heap property. If the parent does not have the extreme value (smaller or greater), it will be swapped with the most extreme child node.
Here are the steps to heapify a max heap:
Step 1) Start from the leaf node.
Step 2) Find the maximum between the parent and children.
Step 3) Swap the nodes if the child node has a larger value than the parent.
Step 4) Go one level up.
Step 5) Follow steps 2,3,4 until we reach index 0 or sort the entire tree.
Here’s the pseudo-code for recursive heapify (max heap):
def heapify(): input→ array, size, i largest = i left = 2*i + 1 right = 2*i + 2 if left<n and array[largest ] < array[left]: largest = left if right<n and array[largest ] < array[right]: largest = right If largest not equals i: swap(array[i],array[largest]) heapify(array,n,largest) Pseudo Code for Heap SortHere’s the pseudo-code for the heap sort algorithm:
Heapify(numbers as an array, n as integer, i as integer): largest = i left = 2i+1 right= 2i+2 if(left<=n) and (numbers[i]<numbers[left]) largest=left if(right<=n) and (numbers[i]<numbers[right]) largest=right if(largest != i) swap(numbers[i], numbers[largest]) Heapify(numbers,n,largest) HeapSort(numbers as an array): n= numbers.size() for i in range n/2 to 1 Heapify(numbers,n,i) for i in range n to 2 Swap numbers[i] with numbers[1] Heapify(numbers,i,0) Example of Heap Sort Code in C++using namespace std; void display(int arr[], int n) { for (int i = 0; i < n; i++) { cout << arr[i] << “t”; } cout << endl; } void heapify(int numbers[], int n, int i) { int largest = i; int left = 2 * i + 1; int right = 2 * i + 2; if (left < n && numbers[left] < numbers[largest]) { largest = left; } if (right < n && numbers[right] < numbers[largest]) { largest = right; } if (largest != i) { swap(numbers[i], numbers[largest]); heapify(numbers, n, largest); } } void heapSort(int numbers[], int n) { { heapify(numbers, n, i); } { swap(numbers[0], numbers[i]); heapify(numbers, i, 0); } } int main() { int numbers[] = { 10,5, 7, 9, 4, 11, 45, 17, 60}; int size = sizeof(numbers) / sizeof(numbers[0]); cout<<“Initial Array:t”; display(numbers,size); heapSort(numbers, size); cout<<“Sorted Array (descending order):t”; display(numbers, size); }
Output:
Initial Array: 10 5 7 9 4 11 45 17 60 Sorted Array (descending order): 60 45 17 11 10 9 7 5 4 Example of Heap Sort Code in Python def display(arr): for i in range(len(arr)): print(arr[i], end = "t") print() def heapify(numbers, n, i): largest = i left = 2 * i + 1 right = 2 * i + 2 if left < n and numbers[left] < numbers[largest]: largest = left if right < n and numbers[right] < numbers[largest]: largest = right if largest != i: numbers[i], numbers[largest] = numbers[largest], numbers[i] heapify(numbers, n, largest) def heapSort(items, n): for i in range(n heapify(items, n, i) for i in range(n - 1, -1, -1): items[0], items[i] = items[i], items[0] heapify(items, i, 0) numbers = [10, 5, 7, 9, 4, 11, 45, 17, 60] print("Initial List:t", end = "") display(numbers) print("After HeapSort:t", end = "") heapSort(numbers, len(numbers)) display(numbers)Output:
Initial List: 10 5 7 9 4 11 45 17 60 After HeapSort: 60 45 17 11 10 9 7 5 4 Time and Space Complexity analysis of Heap SortThere’s Time complexity and Space complexity that we can analyze for the heap sort. For time complexity we’ve the following cases:
Best Case
Average Case
Worst Case
The heap is implemented on a complete binary tree. So, at the bottom level of the binary tree, there will be the maximum number of nodes. If the bottom level has n nodes, then the above level will have n/2 nodes.
In this example, Level 3 has four items, level 2 has two items, and level 1 has one item. If there is a total n number of items, the height or total level will be Log2(n). So, inserting a single element could take a maximum of Log(n) iterations.
When we want to take the maximum value from the heap, we just take the root node. Then again, run the heapify. Each heapify takes Log2(n) time. Extracting the maximum takes O(1) time.
Best Case Time Complexity for Heap Sort AlgorithmWhen all the elements are already sorted in the array, it will take O(n) time to build the heap. Because if the list is sorted then inserting an item will take the constant time that is O(1).
So, it will take O(n) time to create a max-heap or min-heap in the best case.
Average Case Time Complexity for Heap Sort AlgorithmInserting an item or extracting a maximum costs O(log(n)) time. So, the average case time complexity for the heap sort algorithm is O(n log(n)).
Worst Case Time Complexity for Heap Sort AlgorithmSimilar to the average case, in the worst-case scenario, we might to perform heapify n times. Each heapify will cost O(log(n)) time. So, the worst-case time complexity will be O(n log(n)).
Space Complexity for Heap Sort AlgorithmHeap sort is an in-place designed algorithm. This means that no extra or temporary memory is needed to perform the task. If we see the implementation, we will notice that we used swap () to perform the exchange of the nodes. No other list or array was needed. So, the space complexity is O(1).
Clustering In Power Bi And Python: How It Works
Below are two visuals with clusters created in Power BI. The one on the left is a table and the other on the right is a scatter plot.
Our scatter plot has two-dimensional clustering, using two data sets to create clusters. The first is the shopping data set, consisting of customer ID, annual income, and age, and the other is the spending score. Meanwhile, our table uses multi-dimensional clustering, which uses all the data sets.
To demonstrate how it works, I will need to eliminate the clusters so we can start with each visual from scratch. Once you create these clusters in Power BI, they become available as little parameters or dimensions in your data set.
We’ll delete the multi-dimensional clusters using this process and then get our table and scatter plot back, starting with the latter.
If we choose Age and Spending Score from our data set, Power BI will automatically summarize them into two dimensions inside our scatter plot.
If we add our Customer ID to our Values by dragging it from the Fields section to the Values section, we will get that scatter plot back, just like in the image below.
In the Clusters window, we can enter a Name for our clusters, select a Field, write a Description, and choose the Number of Clusters.
We will name our clusters Shopping Score Age, select CustomerID for the field, and input Clusters for CustomerID as a description. We’ll then set the number of clusters to Auto.
The current dimensions in our table, which you can find in the column headers, are Customer ID, Annual Income, Age, and Spending Score. A dimension we didn’t bring in is Gender.
Let’s bring this dimension into our table and scatter plot by dragging it from the Fields section to the Values section, as we did when we added our Customer ID.
As you can see above, we now have a Gender dimension that indicates whether the person is Male or Female. However, if we go to Automatically find clusters to create a cluster for this dimension, it will result in a “Number of fields exceeded” response.
There are two things we can do to go around this roadblock. We can turn the variables, Male and Female, into 0 and 1, giving them numerical values, or we can remove them. However, removing them means that this dimension will no longer be part of our clustering consideration.
Let’s try the second method and remove Gender by unselecting or unchecking it in the Fields section. We then go to our ellipses and select Automatically find clusters.
Now let’s proceed on how to cluster using Python, where we’ll run across the data and create a new data set. We’ll be using an unsupervised machine-learning model that will give you similar results for your multidimensional clustering. I will also show you how to put different algorithms and tweak them along the way.
We first need to run a Command Prompt, like Anaconda Prompt that we’ll be using in this example, and install a package called PyCaret here. After opening the Anaconda prompt, we enter pip install pycaret to install the package.
We’ll put that machine learning algorithm into our Python Script using a simple code. We start by entering from pycaret.clustering import * to import our package. We then type in dataset = get_clusters() in the next line to replace the data set and bring in the function called get_clusters.
We want our function to get our data set, so we’ll assign it with a variable by entering data set = inside the open and close parenthesis. Next, we add our model called K-Means and assign the number of clusters for our model.
Before we run our Python script, first let me show you the different models we use in PyCaret. As you can see below, we’re using K-Means, which follows the same logic as having that Center Point. Aside from that, we also have kmodes, a similar type of clustering.
These other clustering models above will work based on your needs and are much more flexible and not blob-based. If you have a different data set and feel like your Power BI model isn’t working, you can use all of these models. You can go to the Python script section highlighted below and specify the one you want.
Now we can run our Python script using the K-means unsupervised machine learning algorithm. As you get new data, K-means will learn and alter those Center Points and give you better clustering.
Python allows you to assign better names for your clusters to make them more digestible to your users, a feature absent when clustering in Power BI.
Python Post Request: A Concise Guide To Implementing And Optimizing
Python is a versatile and widely-used programming language that provides extensive libraries and functions for various tasks. One such library is requests, an elegant, simple, and user-friendly HTTP library that can send POST requests, which are crucial in sharing data with servers and processing it accordingly.
A Python POST request is a method used to send data to a server. This is typically used when you need to submit form data or upload a file to the server. In Python, you can execute POST requests through the Requests library’s post() method, which allows you to send data to a specified URL. You can share data in various formats, including dictionaries, lists of tuples, bytes, or file objects.
Understanding Python POST requests is fundamental if you’re a web developer or programmer working with web APIs (application programming interfaces). By learning how to send data using the Requests library, you can maximize the potential of Python while simplifying and streamlining the code.
This article is a comprehensive guide for developers looking to master the use of POST requests in Python. We’ll start with the basics, exploring what a POST request is and how it interacts with servers or APIs. We’ll then take a deep dive into practical applications, discussing how to correctly implement POST requests using Python’s Requests library.
Let’s dive in!
Before writing the code for Python post requests module, let’s quickly go over the protocols that you should be aware of when working with Python POST and requests package.
HTTP (Hypertext Transfer Protocol) is a set of protocols that enables communication between clients and servers.
Two common HTTP methods are GET and POST requests.
HTTP GET Request: Used for retrieving data from a server. Example: Fetching a webpage.
HTTP POST Request: Used for sending data to a server. Example: Submitting a form.
In Python, the Requests library is used for making HTTP requests. It abstracts the complexities of making requests behind a simple API, allowing you to focus on interacting with services and consuming data in your application.
To install the Requests library, use the following command in your terminal:
pip install requestsOnce installed, you can use the library to make POST requests. Here’s a simple example:
import requests data = {'key': 'value'} headers = {'Content-type': 'application/json'} response = requests.post(url, json=data, headers=headers) print(response.status_code) print(response.json())In this example, we import the requests library, define the url we want to make the POST request to, and create a data dictionary to send as the request body.
We also define a custom headers dictionary to set the content-type to JSON. Finally, we make the request using the requests.post() method, passing the url, json, and headers. We then print the response status code and JSON content.
When working with POST requests in Python, you need to consider the following aspects before writing your response code:
HTTP methods (POST, GET)
URLs and endpoints
Constructing request data (e.g., dictionaries, JSON)
Custom headers and connection settings
Working with response data (e.g., status codes, content)
Once you have your features defined, you can then go ahead and use the request library to send or retrieve data.
To send a POST request using the Request library, you’ll first need to import the requests module.
You can do this using the following code:
import requestsOnce you have imported the requests library, you can use the requests.post method to send a POST request.
The method accepts various parameters, such as the URL and the data you want the browser to send. Here’s an example of using the requests.post method:
data = {“key”: “value”}
response = requests.post(url, data=data)
In this example, we’re sending a POST request to the specified url with the data as our payload.
The requests.post method returns a response object containing information about the server’s response to our request.
When working with the response object, you might want to check for the status code, headers, and the content of the response.
You can access these properties using the following attributes:
response.status_code: This attribute provides the HTTP status code returned by the server.
response.headers: This attribute provides a dictionary containing the HTTP headers sent by the server.
response.text or response.content: These attributes provide the content of the response, either as a text string or as binary data, respectively.
Here’s an example of how to access these attributes for the response object:
print("Status code:", response.status_code) print("Headers:", response.headers) print("Content:", response.text)In addition, the response object also provides useful methods for handling JSON responses, such as the response.json() method, which parses the JSON content and returns a Python object.
For example:
if response.headers["Content-Type"] == "application/json": print("JSON data:", response.json())This will return a Python object and print it to the console.
In this section, we’ll discuss how to work with data and parameters when making POST requests using Python’s Requests library.
Specifically, we’ll discuss the following:
JSON Data and Dictionaries
URL Encoded Form Data
Let’s get into it!
When making a POST request with JSON data, you can use the json parameter to pass a Python dictionary object.
Requests will automatically encode the dictionary as JSON and set the correct Content-Type header.
For example:
import requests data = { "key1": "value1", "key2": "value2" } response = requests.post(url, json=data)If you need to send JSON data as a string, you can use json.dumps along with the data= parameter and set the Content-Type header manually.
For example:
import requests import json data = { "key1": "value1", "key2": "value2" } headers = {"Content-Type": "application/json"} response = requests.post(url, data=json.dumps(data), headers=headers)When sending data as application/x-www-form-urlencoded, you can pass a dictionary, a list of tuples, or bytes using the data= parameter.
Requests will encode the data and set the correct Content-Type header to get request. For example:
import requests data = { "key1": "value1", "key2": "value2" } response = requests.post(url, data=data)Alternatively, you can pass a list of tuples as well:
data = [("key1", "value1"), ("key2", "value2")] response = requests.post(url, data=data)To upload files using a multi-part POST request, pass a dictionary of file objects or file paths using the files= parameter.
Requests will set the appropriate Content-Type header.
For example:
import requests files = {"file": ("filename.txt", open("filename.txt", "rb"))} response = requests.post(url, files=files)If you need to send additional data along with the file, you use the data= parameter:
data = { "key1": "value1", "key2": "value2" } response = requests.post(url, data=data, files=files)These features improve the efficiency and flexibility of HTTP requests in various scenarios, such as handling timeouts and redirections, as well as working with proxies and certificates.
When making a POST request, you may need to set a timeout value, which determines how long the request should wait before giving up.
To set a timeout, you can use the timeout argument with the desired number of seconds:
import requests data = {"key": "value"} response = requests.post(url, data=data, timeout=5)In the example above, the request will timeout if it takes longer than 5 seconds to complete. If the timeout is not set, the request might hang indefinitely, causing issues in your application.
Redirections, on the other hand, occur when the server directs the client to a new URL. By default, requests follows redirections for all request types. However, you can disable this by setting the allow_redirects parameter to False:
response = requests.post(url, data=data, allow_redirects=False)If your application needs to make requests via a proxy server, you can specify the proxy using the proxies parameter as shown below:
response = requests.post(url, data=data, proxies=proxies)
In the example above, the proxies parameter contains a dictionary specifying the proxy URL for HTTPS requests.
Additionally, you may need to verify the server’s TLS certificate to ensure a secure connection. Requests verifies the server’s certificate by default.
However, you can disable this verification by setting the verify parameter to False like the following:
response = requests.post(url, data=data, verify=False)Disabling certificate verification is not recommended, as it may expose your application to security risks.
Instead, if you have a self-signed or custom certificate, you can specify the certificate file using the cert parameter like the following:
tls_certificate = "path/to/certificate.pem" response = requests.post(url, data=data, cert=tls_certificate)In this example, the cert parameter contains the path to the TLS certificate file. The requests library will use this certificate when verifying the server’s identity.
In the previous section, you learned how to make a POST request in Python. However, when making POST request, it’s common to run into errors.
Therefore, in this section, we’ll go over some common errors that you might encounter when performing POST requests using Python.
When working with the Python requests library, it’s important that you handle errors and debug your application. One of the key aspects to consider is working with response status codes.
Whenever a request is made to a server, it returns a status code that indicates the success or failure of the request.
The requests library provides an attribute called .status_code that allows you to access the status code of a response.
For example:
import requests print(response.status_code)The above code will print the status code to the console.
To ensure the application handles errors appropriately, you can check the success of a request using the if statement and .ok attribute.
This attribute returns a boolean value, where True indicates a successful request and False indicates a failed one.
The code below lets you handle the status code:
if response.ok: print("Request was successful") else: print("Request failed")It is good practice to handle specific status codes using conditional statements or defining custom exceptions in your application.
To debug issues in your application, it’s helpful to inspect the request and response objects returned by the Python requests library.
You can use various attributes and methods to get more information about the request and response objects.
Following are some of the attributes of Python POST request and response objects that you should be familiar with:
Request object:
request.headers: Provides a dictionary of request headers.
chúng tôi : Returns the URL of the request.
request.method: Indicates the HTTP method used for the request (e.g., ‘POST’).
Response object:
response.headers: Returns a dictionary of response headers.
response.content: Provides the response content as bytes.
response.text: Returns the response content as a string.
The following example demonstrates how you can inspect request and response objects:
import requests # Inspect request object print("Request headers:", response.request.headers) print("Request URL:", response.request.url) print("Request method:", response.request.method) # Inspect response object print("Response headers:", response.headers) print("Response content:", response.content) print("Response text:", response.text)By inspecting the request and response objects, you can debug and handle errors in your application using the Python requests library.
To learn more about handling errors in Python, check the following video out:
Mastering the use of POST requests in Python opens up a world of possibilities. As you dive deeper into the realm of web development, data science, or automation, you’ll realize that POST requests are an essential tool in your toolkit.
This is because POST requests allow you to interact with a web page and services in a meaningful way. They allow you to create new data on the server, whether that’s uploading a file, submitting a form, or even just sending a simple message.
Continue practicing, experimenting, and refining your skills, and don’t be afraid to delve deeper and explore the vast capabilities that Python has to offer in the realm of web interactions. Happy coding!
Update the detailed information about A Simple Guide To Centroid Based Clustering (With Python Code) on the Kientrucdochoi.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!