A stream-processing tool for filtering terabytes of GitHub Archive data on consumer hardware. Outputs to Parquet/JSONL with zero storage overhead.