Descrição:
This dataset contains approximately four million Telegram posts collected from 119 prominent Brazilian anti-vaccine channels between 2020 and 2025.
The dataset includes message content, metadata, associated media, and classification related to vaccine posts, enabling researchers to examine how false or misleading information spreads, evolves, and influences public sentiment.
The collection captures the period corresponding to a significant decline in national vaccination coverage and the concurrent "infodemic" circulating on digital platforms.
Total Messages: 3,998,633
Time Period: January 2020 – June 2025
Total Data Volume: 5.5 TB (including media)
License: Creative Commons BY-NC-SA 4.0
General Statistics
Total Channels/Groups: 119
Unique Anonymized Users: 71,672
Messages with Text: 3,345,088 (83.6%)
Vaccine-Related Posts: 407,723 (10.2%)
Main Languages: Portuguese (58.3%), English (8.0%), Spanish (1.7%)
Data was collected using a custom Python tool built on the Telethon library. The target channels were identified using seed lists from prior literature and keyword searches including terms like "Vacina," "mRNA," "Nova Ordem Mundial," and "Efeitos adversos". Only public channels with at least 1,000 members were monitored.
Annotation & Processing
Language Detection: Performed using langdetect with a confidence threshold of 0.5.
Topic Classification: The field is_vaccine_related was generated using the Sabiá-3 Large Language Model. The model achieved a 90% F1-score against human annotators.
Criteria: Mentions of vaccines/immunization, efficacy/safety discussions, policy
discussions, conspiracy theories, or hesitancy.
Limitations
1. Engagement Metrics: The "reactions" feature was only implemented by Telegram in late 2021; data prior to Dec 30, 2021, lacks this metadata.
2. File Size: Media files larger than 50MB were excluded from collection.
3. Deleted Content: Content removed by channel admins or the Brazilian Supreme Court during the collection period may be missing or altered
The metadata is in a .json file, which can be opened in any simple text editor.