School Clustering Through Machine Learning and Geospatial Analysis

Cargando...
Miniatura

Compartir

Fecha

Título de la revista

ISSN de la revista

Título del volumen

Editor

ANTACOM A.C.; SIP-IPN; UPIITA-IPN

Resumen

Descripción

Schools and school population are prone to safety and health risks due to their proximity to hazardous features, either natural or infrastructural. In Mexico there is an official standard that points out which features to consider when selecting the location for building a school, and how proximal these features can be to the school’s location in order to not represent a threat. This work focused on applying both geospatial analysis and unsupervised machine learning techniques to detect the hazardous features per school in Mexico, and group these schools as per the data patterns themselves. For this, a data set containing Mexico’s schools and the proximal hazardous features for each school was built by spatially combining multiple official data sets. After that, the K-Modes partitional clustering machine learning algorithm was used with the created dataset. Multiple clustering models were built with this algorithm by testing various K values (number of clusters), and their clustering quality was measured with internal clustering evaluation metrics. The clustering model with the highest quality was the one that grouped Mexico’s schools into 11 clusters, each one indicating the most common(s) hazardous feature(s) to the schools of each cluster. The evaluation metrics results for this model were: Silhouette Score (0.72), Calinski-Harabasz Index (32863.96), Davies-Bouldin Index (0.72) indicating strong clustering results, with well-separated and cohesive clusters. The study was carried out by following the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology, consisting in multiple phases and tasks for data mining projects execution. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Palabras clave

Clustering, Data mining, Geospatial analysis, Machine learning, Schools, Unsupervised learning, Adversarial machine learning, Contrastive learning, Federated learning, Unsupervised learning, Clustering model, Clusterings, Data set, Evaluation metrics, Geo-spatial analysis, Machine-learning, Me-xico, Safety and healths, School, Unsupervised machine learning, Health risks, Clustering, Data mining, Geospatial analysis, Machine learning, Schools, Unsupervised learning

Citación

Colecciones

Aprobación

Revisión

Complementado por

Referenciado por