Descrição:
<p>This package contains a dataset comprising 384 earnings call transcripts from Brazilian banks, along with the accompanying Jupyter notebooks used for preprocessing, annotating, and fine-tuning. The notebooks are specifically designed for fine-tuning BERT- and T5-based transformer models for the task of financial Named Entity Recognition (NER).</p>
<p>The submission is organized into two main files:</p>
<ul>
<li>
<strong>File: SourceCode.zip</strong> – This file includes the original PDF files of the transcripts and a series of Jupyter notebooks (Python) that document the step-by-step methodology of the study: 1) text extraction and sentence pre-processing; 2) weak supervision for annotation; 3) generation of train, validation, and test splits; and 4) fine-tuning of the Transformer models.
</li>
<li>
<strong>File: Datasets.zip</strong> – This file contains a single CSV file with all raw sentences extracted from the PDFs, as well as a subfolder with the annotated sentences, already divided into standard training, validation, and testing sets to facilitate reproducible research.
</li>
</ul>