Description:
<h1>GPUZIP v2.0 Reproducibility Dataset</h1>
<p>This dataset provides all the necessary materials to reproduce the results presented in the GPUZIP v2.0 article. It is organized into folders, each containing a <code>README.md.txt</code> file that describes its contents and explains how to interpret the files.</p>
<p><strong>Note:</strong>This dataset is organized as a directory structure, so for better visualization change the "View type" to "Tree" before explore the dataset through this web application.</p>
<h2>Types of Files</h2>
<p>The repository contains the following file types:</p>
<ul>
<li><b>.md.txt</b>: Markdown-formatted README files. For optimal readability, use a Markdown viewer such as VSCode or <a href="https://github.com/mundimark/awesome-markdown-editors">Learn More</a>, however, as a straightforward approach any text reader (e.g., Notepad, <code>cat</code>, <code>vi</code>, <code>nano</code>) can also read them.</li>
<li><b>.*.zipfile</b>: Compressed file (usually called .zip). Files with the .extension.zipfile format (e.g., large-mod.su.zipfile) should be unzipped to access their original format (e.g., large-mod.su). Throughout the documentation, files are always referenced by their uncompressed extensions (e.g., .su). To ensure consistency and avoid confusion, it is recommended that all .zipfile files be unzipped before exploring the repository. Hint: Please see the scripts below for unzipping all files.</li>
<li><b>.xlsx</b>: Excel files. Compatible with LibreOffice, Google Sheets, and Numbers.</li>
<li><b>.par</b>: Configuration files for proprietary RTM runs. Readable with any text editor.</li>
<li><b>.hdr</b>: Header files for velocity models. Refer to <code>Datasets/HowToReadDatasetFiles.md.txt</code> for details.</li>
<li><b>.bin</b>: Raw binary data files containing velocity models in float format. See <code>Datasets/HowToReadDatasetFiles.md.txt</code> for parsing instructions.</li>
<li><b>.data</b>: Binary data files, similar to <code>.bin</code>.</li>
<li><b>.su</b>: Seismic Unix files containing seismic traces. Refer to <code>Datasets/HowToReadDatasetFiles.md.txt</code> for details.</li>
<li><b>.png, .jpg, .jpeg, .gif</b>: Rendered visuals of velocity models or diagrams.</li>
<li><b>.qdrep</b>: Nsight Systems profiling files. Compatible with Nsight Systems 2024.01.1.</li>
</ul>
<h2>Root Directory Contents</h2>
<h3><b>Datasets/</b></h3>
<p>Contains input datasets, including velocity models, seismic traces, and configurations. Detailed information is provided in <code>Datasets/HowToReadDatasetFiles.md.txt</code>.</p>
<h3><b>DataWarmUp/</b></h3>
<p>Holds results from compressor calibration experiments, including raw data, logs, and the compiled <code>.xlsx</code> summaries. Experiments were conducted with two shots. See <code>DataWarmUp/README.md.txt</code> for more information.</p>
<h3><b>GeometryScript/</b></h3>
<p>Utility script for rendering shot distributions in the datasets. Helpful in visualizing experiment setups.</p>
<h3><b>NSight/</b></h3>
<p>This folder contains a subset of Nsight profiling files for the Marmousi3D dataset, covering all compressors and a cache size of two across all checkpointing algorithms. If needed, contact the authors for additional profiling data.</p>
<h3><b>Quality/</b></h3>
<p>Contains the results for all shots for quality assessment (Section 7.6). See <code>Quality/README.md.txt</code>.</p>
<h3><b>TimeBreakdown/</b></h3>
<p>Complete results for Section 7.4 of the GPUZIP v2.0 article. This folder includes detailed breakdowns of two-shot experiments. See <code>TimeBreakdown/README.md.txt</code> for details.</p>
<h3><b>SpeedupAndMemory.xlsx</b></h3>
<p>Comprehensive data used to generate charts in Figure 6 and Table 4 (Sections 7.2 and 7.1) of the article.</p>
<h2>Extra: Util for Unzipping All Files</h2>
<p>We provide a simple script to unzip all files so that data exploration can be more fluid. Feel free to use it.</p>
<h3>Windows (.bat)</h3>
<pre><code>@echo off
setlocal enabledelayedexpansion
for /r %%f in (*.zipfile) do (
echo Decompressing: %%f
powershell -Command "Expand-Archive -Path '%%f' -DestinationPath '%%~dpf' -Force"
if not errorlevel 1 (
echo Decompressed successfully: %%f
del "%%f"
) else (
echo Failed to decompress: %%f
)
)
echo All zip files processed.
pause
</code></pre>
<h3>Shell script (MacOS, Linux, Unix)</h3>
<pre><code>#!/bin/bash
find . -type f -name "*.zipfile" | while read -r zipfile; do
echo "Decompressing: $zipfile"
unzip -o "$zipfile" -d "$(dirname "$zipfile")"
if [ $? -eq 0 ]; then
echo "Successfully decompressed: $zipfile"
rm "$zipfile"
else
echo "Failed to decompress: $zipfile"
fi
done
echo "All zip files processed."
</code></pre>
<h2>How Do I Read .bin, .data, and .su Files?</h2>
<p>See: <code>Datasets/HowToReadDatasetFiles.md.txt</code></p>
<h2>How Do I Read .par and .hdr Files?</h2>
<p>See: <code>Datasets/HowToReadDatasetFiles.md.txt</code></p>
<h2>How to Interpret Log Files?</h2>
<p>To analyze cache hits, misses, and memory consumption, refer to the logs in the <code>TimeBreakdown</code> folder (<code>decom-*.txt</code> files). Key metrics can be extracted as follows:</p>
<ul>
<li><b>Cache Hits</b>: Search for <code>RET_HIT</code>.</li>
<li><b>Cache Misses</b>: Search for <code>RET_MIS</code>.</li>
<li><b>Prefetched Items</b>: Search for <code>===> Prefetching:</code>.</li>
<li><b>Prefetch Action Vector (PAV)</b>: Search for <code>PAV:</code>.</li>
<li><b>Memory Consumption</b>: Search for <code>[MEM_TRACK]</code>.</li>
<li><b>Checkpoint Pool Size</b>: Search for <code>Checkpoint Pool Size</code>.</li>
</ul>
<p>Each log file concludes with a summary from Nsight.</p>