Use este identificador para citar ou acessar este item: https://doi.org/10.25824/redu/YI280E
DOI: https://doi.org/10.25824/redu/YI280E
Título: Replication data for: Evaluating named entity recognition - a comparative analysis of mono- and multilingual transformer models on a novel brazilian corporate earnings call transcripts dataset
Assunto: Computer and Information Science
Descrição: <p>This package contains a dataset comprising 384 earnings call transcripts from Brazilian banks, along with the accompanying Jupyter notebooks used for preprocessing, annotating, and fine-tuning. The notebooks are specifically designed for fine-tuning BERT- and T5-based transformer models for the task of financial Named Entity Recognition (NER).</p> <p>The submission is organized into two main files:</p> <ul> <li> <strong>File: SourceCode.zip</strong> – This file includes the original PDF files of the transcripts and a series of Jupyter notebooks (Python) that document the step-by-step methodology of the study: 1) text extraction and sentence pre-processing; 2) weak supervision for annotation; 3) generation of train, validation, and test splits; and 4) fine-tuning of the Transformer models. </li> <li> <strong>File: Datasets.zip</strong> – This file contains a single CSV file with all raw sentences extracted from the PDFs, as well as a subfolder with the annotated sentences, already divided into standard training, validation, and testing sets to facilitate reproducible research. </li> </ul>
Autor(es): Abilio, Ramon Simões
Coelho, Guilherme Palermo
Silva, Ana Estela Antunes da
URI: https://doi.org/10.25824/redu/YI280E
https://redu.unicamp.br/dataset.xhtml?amp;persistentId=doi:10.25824/redu/YI280E
Outros identificadores:  
Fomento: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Número do Projeto: CAPES: 001
Termo de uso:  
Data: 19-Fev-2026
Data de Disponibilização: 21-Fev-2026
Formato: application/zip
application/zip
Tipo:  
Editora / Evento / Instituição: Abilio, Ramon Simões
Idioma :  
Aparece nas coleções:Repositório de Dados de Pesquisa da UNICAMP



Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.