Abstract | FINDING effective clustering methods for a high dimensional dataset is challenging due to the curse of dimensionality. These challenges can usually make the most of basic common algorithms fail in highdimensional spaces from tackling problems such as large number of groups, and overlapping. Most domains uses some parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the implementation of several techniques of a high-dimensional data for finding a low-dimensional space. Many proposed methods fail to overcome the challenges, especially when the data input is high-dimensional, and the clusters have a complex. REGULARLY in high dimensional data, lots of the data dimensions are not related and might hide the existing clusters in noisy data. High-dimensional data often reside on some low dimensional subspaces. The problem of subspace clustering algorithms is to uncover the type of relationship of an objects from one dimension that are related in different subsets of another dimensions. The state-of-the-art methods for subspace segmentation which included the Low Rank Representation (LRR) and Sparse Representation (SR). The former seeks the global lowest-rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which, however, causes failure in the case of large noise. THIS thesis aims are to identify the key problems and obstacles that have challenged the researchers in recent years in clustering high dimensional data, then to implement an effective subspace clustering methods for solving high dimensional crimes domains for both real events and synthetic data which has complex data structure with 168 different offence crimes. As well as to overcome the disadvantages of existed subspace algorithms techniques. To this end, a Low-Rank Sparse Representation (LRSR) theory, the future will refer to as Criminal Data Analysis Based on LRSR will be examined, then to be used to recover and segment embedding subspaces. The results of these methods will be discussed and compared with what already have been examined on previous approaches such as K-mean and PCA segmented based on K-means. The previous approaches have helped us to chose the right subspace clustering methods. The Proposed method based on subspace segmentation method named Low Rank subspace Sparse Representation (LRSR) which not only recovers the low-rank subspaces but also gets a relatively sparse segmentation with respect to disjoint subspaces or even overlapping subspaces. BOTH UCI Machine Learning Repository, and crime database are the best to find and compare the best subspace clustering algorithm that fit for high dimensional space data. We used many Open-Source Machine Learning Frameworks and Tools for both employ our machine learning tasks and methods including preparing, transforming, clustering and visualizing the high-dimensional crime dataset, we precisely have used the most modern and powerful Machine Learning Frameworks data science that known as SciKit-Learn for library for the Python programming language, as well as we have used R, and Matlab in previous experiment. |
---|