Use este identificador para citar ou acessar este item: https://doi.org/10.25824/redu/XTQQEL
DOI: https://doi.org/10.25824/redu/XTQQEL
Título: FSGA-MuRTe: evolutionary feature selection in multi-representational text environments for unsupervised text clustering
Assunto: Computer and Information Science
Descrição: FSGA-MuRTe (Feature Selection Using Genetic Algorithm in Multi-Representational Text Environments) is an evolutionary framework for unsupervised feature selection in high-dimensional multi-representational text clustering environments. This repository contains the materials associated with the Master's Dissertation developed at the Faculty of Technology (FT), University of Campinas (UNICAMP), entitled “An Evolutionary Feature Selection Approach in Multi-Representational Text Environments”. The framework investigates the use of Genetic Algorithms (GA) to reduce redundancy and improve clustering quality in heterogeneous textual feature spaces. The proposed approach combines multiple textual representations through an early-fusion strategy, including Bag-of-Words (BoW), Word2Vec, FastText, Doc2Vec, BERT/SBERT embeddings, Part-of-Speech (POS) representations, LIWC psycholinguistic features, and Medical Research Council (MRC) psycholinguistic representations. Each representation is independently L2-normalized and concatenated into a unified feature space prior to evolutionary feature selection. Feature subsets are optimized using a Genetic Algorithm guided by the Dunn Index as the fitness function, followed by unsupervised clustering using K-Means. Clustering quality is evaluated before and after feature selection using internal validation metrics, including Dunn Index (DI), Calinski–Harabasz Index (CH), and Davies–Bouldin Index (DB). Experiments were conducted across multiple textual domains, including scientific abstracts, news articles, reviews, fake news datasets, sentiment analysis datasets, and short-text corpora obtained from publicly available sources. The datasets were previously processed, filtered, cleaned, and sampled to support reproducible clustering experiments in heterogeneous text environments. The results demonstrated consistent improvements in cluster compactness and separation after evolutionary feature selection, indicating that bioinspired optimization is an effective strategy for mitigating redundancy and improving clustering quality in multi-representational text environments.
Autor(es): Gaspar, João Pedro Vicente
Carvalho, Marco Antonio Garcia de
URI: https://doi.org/10.25824/redu/XTQQEL
https://redu.unicamp.br/dataset.xhtml?amp;persistentId=doi:10.25824/redu/XTQQEL
Outros identificadores:  
Fomento: No Funder
Número do Projeto: 0000
Termo de uso:  
Data: 27-Mai-2026
Data de Disponibilização: 29-Mai-2026
Formato: application/octet-stream
application/zip
text/markdown
Tipo:  
Editora / Evento / Instituição: Gaspar, João
Idioma :  
Aparece nas coleções:Repositório de Dados de Pesquisa da UNICAMP



Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.