Valid Databricks-Certified-Professional-Data-Scientist Dumps shared by ExamDiscuss.com for Helping Passing Databricks-Certified-Professional-Data-Scientist Exam! ExamDiscuss.com now offer the newest Databricks-Certified-Professional-Data-Scientist exam dumps, the ExamDiscuss.com Databricks-Certified-Professional-Data-Scientist exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Databricks-Certified-Professional-Data-Scientist dumps with Test Engine here:
Question-18. What is the best way to ensure that the k-means algorithm will find a good clustering of a collection of vectors?
Correct Answer: D
Explanation k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining, k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. The problem is computationally difficult (NP-hard); however there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes This Question-is about the properties that make k-means an effective clustering heuristic which primarily deal with ensuring that the initial centers are far away from each other. This is how modern k-means algorithms like k-means++ guarantee that with high probability Lloyd's algorithm will find a clustering within a constant factor of the optimal possible clustering for each k.