Use este identificador para citar ou acessar este item:
https://doi.org/10.25824/redu/RGPRFD| DOI: | https://doi.org/10.25824/redu/RGPRFD |
| Título: | Experimental results dataset for the master's dissertation: "A quantitative and comparative analysis of single-pass stream-based active learning query algorithms" |
| Assunto: | Computer and Information Science |
| Descrição: | <h2>1. Overview</h2> <p> This dataset contains the aggregated and structured results of a large-scale benchmark evaluating twelve single-pass stream-based active learning query strategies. This is the experimental results dataset for the master's dissertation: <strong>"A Quantitative and Comparative Analysis of Single-Pass Stream-Based Active Learning Query Algorithms".</strong> </p> <p>The experiments span:</p> <ul> <li><strong>82 datasets</strong></li> <li><strong>5 machine learning models</strong></li> <li><strong>12 stream-based query strategies</strong></li> <li><strong>5 labeling budgets</strong>: 5%, 10%, 20%, 50%, and 100%</li> <li><strong>20,000+ experimental runs</strong></li> </ul> <p> Each row represents a single experimental configuration, defined by: </p> <pre> (dataset, model, hyperparameters, query strategy, labeling budget) </pre> <p> This file is designed for statistical analysis, ranking, and comparative evaluation of strategies under constrained labeling scenarios. </p> <hr> <h2>2. File Structure</h2> <ul> <li><strong>Granularity:</strong> One row per experimental run</li> <li><strong>Primary metric:</strong> Final model accuracy</li> <li><strong>Evaluation setting:</strong> Single-pass stream-based active learning</li> </ul> <hr> <h2>3. Column Dictionary</h2> <p>Below is the semantic definition of each column in the dataset.</p> <hr> <h3><code>dataset</code></h3> <ul> <li><strong>Type:</strong> String</li> <li><strong>Description:</strong> Dataset used in the experiment.</li> <li><strong>Scope:</strong> 82 unique datasets.</li> <li><strong>Purpose:</strong> Enables cross-dataset robustness analysis.</li> </ul> <hr> <h3><code>model_name</code></h3> <ul> <li><strong>Type:</strong> String</li> <li><strong>Description:</strong> Machine learning algorithm used.</li> <li><strong>Scope:</strong> 5 model families.</li> <li><strong>Purpose:</strong> Allows studying model–strategy interaction.</li> </ul> <hr> <h3><code>model_params</code></h3> <ul> <li><strong>Type:</strong> String (serialized dictionary)</li> <li><strong>Description:</strong> Hyperparameters used for the model.</li> <li><strong>Example:</strong></li> </ul> <pre> {'C': 0.01} </pre> <ul> <li><strong>Recommendation:</strong> Parse into dictionary for reproducibility or hyperparameter grouping.</li> </ul> <hr> <h3><code>query_strategy</code></h3> <ul> <li><strong>Type:</strong> String</li> <li><strong>Description:</strong> Active learning strategy used in the stream.</li> <li><strong>Scope:</strong> 12 strategies.</li> <li><strong>Purpose:</strong> Main variable of interest for comparative evaluation.</li> </ul> <hr> <h3><code>budget</code></h3> <ul> <li><strong>Type:</strong> Float</li> <li><strong>Values:</strong></li> <ul> <li>0.05</li> <li>0.10</li> <li>0.20</li> <li>0.50</li> <li>1.00</li> </ul> <li><strong>Description:</strong> Fraction of instances allowed to be labeled.</li> <li><strong>Interpretation:</strong> Controls labeling cost.</li> </ul> <hr> <h3><code>initial_score</code></h3> <ul> <li><strong>Type:</strong> Float</li> <li><strong>Description:</strong> Baseline performance before applying active learning.</li> <li><strong>Purpose:</strong> Reference point for measuring improvement.</li> </ul> <hr> <h3><code>percentage_queried</code></h3> <ul> <li><strong>Type:</strong> Float</li> <li><strong>Description:</strong> Actual fraction of instances labeled.</li> <li><strong>Note:</strong></li> <ul> <li>May slightly differ from the defined budget due to stream dynamics.</li> <li>Reflects real labeling consumption.</li> </ul> </ul> <hr> <h3><code>final_accuracy</code></h3> <ul> <li><strong>Type:</strong> Float</li> <li><strong>Description:</strong> Final model performance after active learning.</li> <li><strong>Metric:</strong> Classification accuracy.</li> <li><strong>Primary evaluation metric.</strong></li> </ul> <hr> <h2>4. Summary</h2> <p> <code>experiment_results.csv</code> is a large-scale benchmark dataset for evaluating stream-based active learning strategies under varying labeling budgets. </p> <p>It supports:</p> <ul> <li>Cross-dataset comparisons</li> <li>Strategy ranking</li> <li>Budget sensitivity analysis</li> <li>Model–strategy interaction studies</li> <li>Efficiency and robustness evaluation</li> </ul> <p> The structure is analysis-ready and designed for statistical benchmarking and research publication purposes. </p> |
| Autor(es): | Chacon, Guilherme Silva Wainer, Jacques |
| URI: | https://doi.org/10.25824/redu/RGPRFD https://redu.unicamp.br/dataset.xhtml?amp;persistentId=doi:10.25824/redu/RGPRFD |
| Outros identificadores: | |
| Fomento: | No Funder |
| Número do Projeto: | 0000 |
| Termo de uso: | |
| Data: | 25-Fev-2026 |
| Data de Disponibilização: | 27-Fev-2026 |
| Formato: | text/tab-separated-values |
| Tipo: | |
| Editora / Evento / Instituição: | Chacon, Guilherme Silva |
| Idioma : | |
| Aparece nas coleções: | Repositório de Dados de Pesquisa da UNICAMP |
Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.