Benchmarks

Quick Jump: latest snapshot | performance | correctness

Among other metrics, the PaSh continuous integration infrastructure tracks performance and correctness metrics on the main branch over time. Some of these results—for the most recent commits—are shown below; longer commit history, other branches, and more configurations are part of the control subsystem.

One can also view these results in textual form. For example, to get the latest 5 builds run curl -s ci.binpa.sh | head. To get details about a specific commit, run curl -s ci.binpa.sh/f821b4b.

Latest Snapshot

The next few plots show results only for the latest commit on main branch.

Classic Unix One-liners

This benchmark set contains 9 pipelines written by Unix experts: a few pipelines are from Unix legends (e.g., Top-N, Spell), one from a book on Unix scripting, and a few are from top Stackoverflow answers. Pipelines contain 2-7 stages (avg.: 5.2), ranging from scalable CPU-intensive (e.g., grep stage in Nfa-regex) to non-parallelizable stages (e.g., diff stage in Diff). Inputs are script-specific and average 10GB per benchmark.

Unix50 from Bell Labs

This benchmark includes 36 pipelines solving the Unix 50 game. The pipelines were designed to highlight Unix's modular philosophy, make extensive use of standard commands under a variety of flags, and appear to be written by non-experts. PaSh executes each pipeline as-is, without any modification.

Mass-transit analytics during COVID-19

This benchmark set contains 4 pipelines that were used to analyze real telemetry data from bus schedules during the COVID-19 response in a large European city. The pipelines compute several average statistics on the transit system per day—such as daily serving hours and daily number of vehicles. Pipelines range between 9 and 10 stages (avg.: 9.2), use typical Unix staples sed, sort, and uniq, and operate on a fixed 34GB dataset that contains mass-transport data collected over a single year.

Performance

The next few plots show the performance of various benchmarks over time, as a timeseries (with the time captured by the commit ID) on main branch.

Classic Unix One-liners

This benchmark set contains 9 pipelines written by Unix experts: a few pipelines are from Unix legends (e.g., Top-N, Spell), one from a book on Unix scripting, and a few are from top Stackoverflow answers. Pipelines contain 2-7 stages (avg.: 5.2), ranging from scalable CPU-intensive (e.g., grep stage in Nfa-regex) to non-parallelizable stages (e.g., diff stage in Diff). Inputs are script-specific and average 10GB per benchmark.

Unix50 from Bell Labs

This benchmark includes 36 pipelines solving the Unix 50 game. The pipelines were designed to highlight Unix's modular philosophy, make extensive use of standard commands under a variety of flags, and appear to be written by non-experts. PaSh executes each pipeline as-is, without any modification.

Mass-transit analytics during COVID-19

This benchmark set contains 4 pipelines that were used to analyze real telemetry data from bus schedules during the COVID-19 response in a large European city. The pipelines compute several average statistics on the transit system per day—such as daily serving hours and daily number of vehicles. Pipelines range between 9 and 10 stages (avg.: 9.2), use typical Unix staples sed, sort, and uniq, and operate on a fixed 34GB dataset that contains mass-transport data collected over a single year.

Dependency Untangling

While the JIT engine operates as if invoked on every region, Pash is engineered to spawn a long-running stateful compilation server just once, feeding it compilation requests until the execution of the script completes. This design has two benefits: (1) it reduces run-time overhead by avoiding reinitializing the compiler for each compilation request; and (2) it allows maintaining and querying past-compilation results when compiling a new fragment. The latter allows Pash to untangle dependencies across regions, finding and exploiting opportunities for cross-region parallel execution. This set contains several benchmarks including log processing/parsing, media conversion, genome computation and compression apps.

Average Temperature

Contains a large script downloading and processing multi-year temperature data across the US.

NLP

Contains several scripts from Unix for Poets, a tutorial for developing programs for natural-language processing out of Unix and Linux utilities.

WebIndex

Large multi-stage script for web crawling and indexing, using a variety of third-party and built-in utilities.

Correctness

This section tracks various statistics across tests checking the correctness of various PaSh components.

Benchmark Passed Failed Untested Unresolved Unsupported Other
Compiler Tests 54/54 0/54 0/54 0/54 0/54 0/54
Intro Tests 2/2 0/2 0/2 0/2 0/2 0/2
Interface Tests 39/39 0/39 0/39 0/39 0/39 0/39
Annotation Tests 1/1 1/1 1/1 1/1 1/1 1/1
Aggregator Tests 108/109 1/109 0/109 0/109 0/109 0/109
Smoosh Tests 10/10 0/10 0/10 0/10 0/10 0/10
Posix Tests 375/494 41/494 31/494 6/494 40/494 1/494