Descrição:
<h2>Description</h2>
<p>
This dataset is associated with the master's dissertation entitled
<em>“AirCEP – An Application for Monitoring Air Quality with Complex Event Processing”</em>,
developed at the Institute of Computing, University of Campinas (UNICAMP), Brazil, in 2025.
</p>
<h3>Study Overview</h3>
<p>
The study addresses the challenges of air quality monitoring in regions with limited infrastructure
and high operational costs, proposing a modular and scalable architecture called <strong>AirCEP</strong>.
The system integrates Edge Computing, configurable data filtering mechanisms, Complex Event Processing (CEP),
and real-time visualization to reduce network traffic, computational resource consumption, and alert latency.
</p>
<p><strong>AirCEP was designed to:</strong></p>
<ul>
<li>Minimize network bandwidth usage through configurable data filters applied at the edge.</li>
<li>Reduce CPU and memory consumption during real-time stream processing.</li>
<li>Detect complex environmental events from continuous air quality measurements.</li>
<li>Generate real-time alerts for adverse air quality conditions.</li>
<li>Provide a monitoring dashboard for visualization and decision support.</li>
</ul>
<p><strong>The architecture is composed of three main components:</strong></p>
<ol>
<li>
<strong>Data Router (Edge Layer):</strong> Applies configurable filters to reduce data volume before transmission.
</li>
<li>
<strong>Stream Processing Engine:</strong> Implements Complex Event Processing using Apache Flink to analyze continuous data streams and detect patterns of interest.
</li>
<li>
<strong>Visualization Dashboard:</strong> Built with Grafana to display real-time metrics, historical data, and alerts.
</li>
</ol>
<h3>Dataset Description</h3>
<p>
The dataset includes the measurements and experimental results used to evaluate the AirCEP architecture.
It contains:
</p>
<ul>
<li><strong>Air quality measurements</strong> from monitoring stations and/or simulated sensors.</li>
<li>
<strong>Pollutants monitored:</strong>
<ul>
<li>PM2.5</li>
<li>PM10</li>
<li>SO₂ (Sulfur Dioxide)</li>
<li>NO₂ (Nitrogen Dioxide)</li>
<li>O₃ (Ozone)</li>
<li>CO (Carbon Monoxide)</li>
</ul>
</li>
<li>Timestamped readings collected as continuous data streams.</li>
<li>
Experimental logs from two deployment scenarios:
<ul>
<li><strong>Local environment:</strong> All components deployed on the same machine.</li>
<li><strong>Remote environment:</strong> Sensors physically separated from the processing unit.</li>
</ul>
</li>
<li>
<strong>Performance metrics collected during experiments:</strong>
<ul>
<li>Network traffic (bytes transmitted)</li>
<li>CPU utilization</li>
<li>Memory consumption</li>
<li>End-to-end latency</li>
</ul>
</li>
<li>Filter configurations and CEP rule definitions used for event detection.</li>
</ul>
<h3>Data Structure</h3>
<p>
The dataset is structured to support reproducibility and reuse:
</p>
<ul>
<li>Time-series formatted records (timestamp + pollutant measurements).</li>
<li>System performance logs aligned with experimental runs.</li>
<li>Configuration files defining filtering thresholds and CEP rules.</li>
<li>Scenario identification metadata (local vs. remote deployment).</li>
</ul>
<h3>Methodology Context</h3>
<p>The experiments compare:</p>
<ul>
<li>Baseline transmission (without filtering) versus filtered transmission.</li>
<li>Resource consumption in different deployment architectures.</li>
<li>Latency impact of physical distance between sensors and processing nodes.</li>
</ul>
<p>
Results demonstrate:
</p>
<ul>
<li>Up to ~30% reduction in network traffic.</li>
<li>Up to ~19% reduction in CPU consumption.</li>
<li>Latency more strongly influenced by physical distance than filtering itself.</li>
</ul>
<h3>Reuse Potential</h3>
<p>
This dataset can be reused for:
</p>
<ul>
<li>Research in real-time stream processing.</li>
<li>Complex Event Processing evaluation.</li>
<li>Edge Computing performance studies.</li>
<li>Network optimization experiments.</li>
<li>Air quality monitoring systems benchmarking.</li>
<li>Smart city and IoT research.</li>
<li>Comparative analysis between cloud-centric and edge-based architectures.</li>
</ul>
<p>
Researchers may replicate the experimental setup, validate performance trade-offs,
benchmark alternative stream processing engines, or test new event detection rules.
</p>