Modelo predictivo del consumo de alcohol en estudiantes de la Universidad de Córdoba a partir de la minería de datos

López Gaviria, Lila Patricia

Modelo predictivo del consumo de alcohol en estudiantes de la Universidad de Córdoba a partir de la minería de datos

dc.audience	Comunidad Universidad de Medellín	spa
dc.contributor.advisor	Hernández Leal, Emilcy Juliana
dc.contributor.advisor	Orozco Duque, Andrés Felipe
dc.contributor.author	López Gaviria, Lila Patricia
dc.coverage.spatial	Lat: 06 15 00 N degrees minutes Lat: 6.2500 decimal degrees Long: 075 36 00 W degrees minutes Long: -75.6000 decimal degrees
dc.coverage.spatial	Lat: 06 15 00 N degrees minutes Lat: 6.2500 decimal degreesLong: 075 36 00 W degrees minutes Long: -75.6000 decimal degrees
dc.date.accessioned	2025-05-27T15:56:30Z
dc.date.available	2025-05-27T15:56:30Z
dc.date.issued	2025-03-05
dc.description.abstract	El consumo de alcohol entre estudiantes universitarios es un problema cada vez más frecuente en las Instituciones de Educación Superior (IES). Esto se suma al hecho de que las bebidas alcohólicas están presentes en todo tipo de celebraciones y reuniones sociales. En este contexto, se propone una metodología para la clasificación del riesgo de consumo de alcohol basada en modelos de machine learning. Los modelos evaluados incluyen Logistic Regression, Random Forest, Perceptrón Multicapa y Support Vector Machine (SVM). Random Forest mostró el mejor desempeño general, con un F1-score de 0.45 después de la optimización de hiperparámetros y la selección de características relevantes. El modelo SVM se destacó en la métrica de recall, detectando hasta el 86% de los casos de consumo de alcohol tras la aplicación de técnicas de balanceo como SMOTE, RUS; no obstante, esto incrementó el número de falsos positivos. Por su parte, los modelos de Regresión Logística y Perceptrón Multicapa presentaron un rendimiento moderado en comparación con los anteriores. El uso de diversas técnicas de balanceo, como SMOTE, ADASYN, RUS, Cluster Centroids, SMOTEENN y Tomek Links, contribuyó a mejorar significativamente desempeño de los modelos, especialmente en términos de recall, permitiendo así una detección de los estudiantes consumidores de alcohol.	spa
dc.description.abstract	Alcohol consumption among university students is an increasingly frequent problem in Higher Education Institutions (HEIs). This is in addition to the fact that alcoholic beverages are present in all kinds of celebrations and social gatherings. In this context, a methodology for alcohol consumption risk classification based on machine learning models is proposed. The models evaluated include Logistic Regression, Random Forest, Multilayer Perceptron and Support Vector Machine (SVM). Random Forest showed the best overall performance, with an F1-score of 0.45 after hyperparameter optimization and relevant feature selection. The SVM model excelled in the recall metric, detecting up to 86% of alcohol consumption cases after the application of balancing techniques such as SMOTE, RUS; however, this increased the number of false positives. On the other hand, the Logistic Regression and Multilayer Perceptron models presented a moderate performance compared to the previous ones. The use of various balancing techniques, such as SMOTE, ADASYN, RUS, Cluster Centroids, SMOTEENN and Tomek Links, contributed to significantly improve the performance of the models, especially in terms of recall, thus allowing the detection of student alcohol consumers.	eng
dc.description.degreelevel	Maestría	spa
dc.description.degreename	Magíster en Ingeniería de Software	spa
dc.format.extent	91 páginas	spa
dc.format.medium	Recurso en Línea	spa
dc.format.mimetype	application/pdf
dc.identifier.instname	instname:Universidad de Medellín	spa
dc.identifier.local	T 0627 2024
dc.identifier.reponame	reponame:Repositorio Institucional Universidad de Medellín	spa
dc.identifier.uri	https://hdl.handle.net/11407/8933
dc.language.iso	spa
dc.publisher	Universidad de Medellín	spa
dc.publisher.faculty	Facultad de Ingenierías	spa
dc.publisher.place	Medellín	spa
dc.publisher.program	Maestría en Ingeniería de Software	spa
dc.relation.citationendpage	91
dc.relation.citationstartpage	1
dc.relation.references	Afifi, H., Pochaba, S., Boltres, A., Laniewski, D., Haberer, J., Leonard, P., . . . others (2024). Machine learning with computer networks: Techniques, datasets and models. IEEE access.
dc.relation.references	Ahumada-Cortez, J. G., Gámez-Medina, M. E., y Valdez-Montero, C. (2017). El consumo de alcohol como problema de salud pública. Ra Ximhai, 13 (2), 13–24.
dc.relation.references	Arteaga Yánez, Y. L., Peraza de Aparicio, C. X., Ortega Guevara, N. M., Luna Álvarez, H. E., Zurita Barrios, N. Y., López Gamboa, Y., . . . others (2022). Cuidados de enfermería en la salud mental. Quito, Universidad Metropolitana.
dc.relation.references	Batista, G. E., Prati, R. C., y Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter , 6 (1), 20–29.
dc.relation.references	Betancourth-Zambrano, S., Tacán-Bastidas, L., y Cordoba-Paz, E. G. (2017). Consumo de alcohol en estudiantes universitarios colombianos. Universidad y salud, 19 (1), 37–50.
dc.relation.references	Chawla, N. V., Bowyer, K. W., Hall, L. O., y Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16 , 321– 357.
dc.relation.references	Cruz, E., González, M., y Rangel, J. C. (2022). Técnicas de machine learning aplicadas a la evaluación del rendimiento ya la predicción de la deserción de estudiantes universitarios, una revisión. Prisma Tecnológico, 13 (1), 77–87.
dc.relation.references	Elhassan, T., y Aljurf, M. (2016). Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method. Global J Technol Optim S, 1 , 2016.
dc.relation.references	Fayyad, U., Piatetsky-Shapiro, G., y Smyth, P. (1996, Mar.). From data mining to knowledge discovery in databases. AI Magazine, 17 (3), 37. Descargado de https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/1230 doi: 10.1609/aimag.v17i3.1230
dc.relation.references	Ferrera Perera, M. Y., y cols. (2019). Opinión de los/as empresarios/as de la hostelería sobre posibles medidas preventivas en relación al abuso y dependencia al alcohol.
dc.relation.references	Fierro, F. S., Castañeda, J., y Revelo-Aldás, M. (2022, 6). Modelos predictivos para la estimación de adolescentes con tendencia al alcoholismo. AXIOMA, 1 , 74-79. doi:10.26621/ra.v1i26.779
dc.relation.references	Frawley, W. J., Piatetsky-Shapiro, G., y Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI magazine, 13 (3), 57–57.
dc.relation.references	García, J., Molina, J., Berlanga, A., Patricio, M., Bustamante, A., y Padilla, W. (2018). Ciencia de datos : técnicas analíticas y aprendizaje estadístico. Bogotá, Colombia. Publicaciones altaria, sl.
dc.relation.references	García-Carretero, M. A., Moreno-Hierro, L., Martínez, M. R., de los Ángeles Jordán-Quintero, M., Morales-García, N., y O’Ferrall-González, C. (2019, 9). Alcohol consumption patterns of university students of health sciences. enfermería Clínica, 29 , 291-296. doi: 10.1016/j.enfcli.2019.01.003
dc.relation.references	Gerard, C. (2021). Practical machine learning in javascript: Tensorflow. Js for web developers. Springer.
dc.relation.references	Gironés, J., Quiles, R. C., Roma, J. C., Alfonso, J. M., Casas, J., y Minguillón, J. (2017). Minería de datos: modelos y algoritmos. Editorial UOC.
dc.relation.references	He, H., Bai, Y., García, E. A., y Li, S. (2008). Adasyn: Adaptive synthetic sampling approach for imbalanced learning. En 2008 IEEE International joint conference on neural networks (ieee world congress on computational intelligence) (pp. 1322–1328).
dc.relation.references	He, H., y Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21 (9), 1263–1284.
dc.relation.references	Ishaq, A., Sadiq, S., Umer, M., Ullah, S., Mirjalili, S., Rupapara, V., y Nappi, M. (2021). Improving the prediction of heart failure patients’ survival using smote and effective data mining techniques. IEEE access, 9 , 39707–39716.
dc.relation.references	Kharabsheh, M., Meqdadi, O., Alabed, M., Veeranki, S., Abbadi, A., y Alzyoud, S. (2019). A machine learning approach for predicting nicotine dependence. International Journal of Advanced Computer Science and Applications, 10 (3).
dc.relation.references	Kitchenham, B., y Charters, S. M. (2007). Guidelines for performing systematic literature reviews in software engineering. Descargado de https://www.researchgate.net/publication/302924724
dc.relation.references	Lamprou, S. (2021). A study in alcohol: A comparison of data mining methods for identifying binge drinking risk factors in university students.
dc.relation.references	Marcon, G., de Ávila Pereira, F., Zimerman, A., da Silva, B. C., von Diemen, L., Passos, I. C., y Recamonde-Mendoza, M. (2021, 9). Patterns of high-risk drinking among medical students: A web-based survey with machine learning. Computers in Biology and Medicine, 136 . doi: 10.1016/j.compbiomed.2021.104747
dc.relation.references	Minsalud, M. d. J. y. d. D. O. d. D. d. C. y. M. d. E. N. (2016). Estudio nacional de consumo de sustancias psicoactivas en población escolar Colombia. SD [cited 2016 Noviembre 16. Available from: https://www.minjusticia.gov.co/programasco/ODC/Publicaciones/Publicaciones/CO03142016estudioconsumoescolares 2016.
dc.relation.references	OEA, C. I. p. e. C. d. A. d. D. C., Organización de los Estados Americanos. (2019). Informe sobre el consumo de drogas en las Américas.
dc.relation.references	OMS, O. M. d. l. S. (2018). El consumo nocivo de alcohol mata a más de 3 millones de personas al a˜no, en su mayoría hombres. OMS Ginebra.
dc.relation.references	Pascual Pastor, F. (2012). 3. conceptos y diagnóstico del alcoholismo. MONOGRAFÍA SOBRE, 121.
dc.relation.references	Reátegui, R., Torres-Carrión, P., López, V., Galárraga, A., Grondona, G., y Núñez, C. L. (2020). Cluster analysis base on psychosocial information for alcohol, tobacco and other drugs consumers. En Communications in computer and information science (Vol. 1194 CCIS, p. 269-283). Springer. doi: 10.1007/978-3-030-42520-322
dc.relation.references	Rodriguez de la Cruz, P. J., González-Angulo, P., Salazar-Mendoza, J., Camacho-Martínez, J. U., y López-Cocotle, J. J. (2022). Percepción de riesgo de consumo de alcohol y tabaco en universitarios del área de salud. Sanus, 7 .
dc.relation.references	Samhsa. (1994, octubre). https://www.bvscolombia.org/tamizaje-y-evaluacion-spa/.BVS Colombia. (Accessed: 2025-1-9)
dc.relation.references	Saunders, J. B., Aasland, O. G., Babor, T. F., De la Fuente, J. R., y Grant, M. (1993). Development of the alcohol use disorders identification test (audit): Who collaborative project on early detection of persons with harmful alcohol consumption-ii. Addiction, 88 (6), 791–804.
dc.relation.references	Singh, A., Singh, V., Gourisaria, M. K., y Sharma, A. (2022). Alcohol consumption rate prediction using machine learning algorithms. En 2022 oits international conference on information technology (ocit) (p. 85-90). doi: 10.1109/OCIT56763.2022.00026
dc.relation.references	Swamynathan, M. (2017). Mastering machine learning with python in six steps: A practical implementation guide to predictive data analytics using python. Springer.
dc.relation.references	Tan, P.-N., Steinbach, M., y Kumar, V. (2006). Data mining introduction. People’s Posts and Telecommunications Publishing House, Beijing.
dc.relation.references	Tomek, I. (1976). Two modifications of cnn. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(11), 769-772. doi: 10.1109/TSMC.1976.4309452
dc.relation.references	Valdiviezo-Diaz, P., Torres-Carrión, P., Bustamante-Granda, B. F., y Sánchez-Puertas, R. N. (2020, 11). Aplicación de técnicas de minería de datos para la predicción del consumo de tabaco y alcohol en estudiantes universitarios. Revista Ibérica de Sistemas e Tecnologías de Informa，c˜ao Iberian Journal of Information Systems and Technologies, 32 , 242-255.
dc.relation.references	Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE transactions on computers, 100 (9), 1100–1103.
dc.relation.references	Yen, S.-J., y Lee, Y.-S. (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36 (3), 5718–5727.
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.creativecommons	Attribution-NonCommercial-ShareAlike 4.0 International
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0
dc.subject	Data mining	eng
dc.subject	Minería de datos	spa
dc.subject	Alcohol drinking	eng
dc.subject	Aprendizaje automático	spa
dc.subject	Machine learning	eng
dc.subject	Consumo de alcohol	spa
dc.subject	Prediction	eng
dc.subject	Predictivo	spa
dc.subject	Classification algorithms	eng
dc.subject	Algoritmos de clasificación	spa
dc.subject.lemb	Algoritmos	spa
dc.subject.lemb	Aprendizaje automático (Inteligencia artificial)	spa
dc.subject.lemb	Consumo de bebidas alcohólicas	spa
dc.subject.lemb	Estudiantes universitarios	spa
dc.subject.lemb	Minería de datos	spa
dc.title	Modelo predictivo del consumo de alcohol en estudiantes de la Universidad de Córdoba a partir de la minería de datos	spa
dc.type	info:eu-repo/semantics/masterThesis
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.hasversion	publishedVersion
dc.type.hasversion	info:eu-repo/semantics/acceptedVersion
dc.type.local	Tesis de Maestría	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: T_MIS_888.pdf
Tamaño:: 6.68 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

Tesis