Sanzu: A Data Science Benchmark

   With the rapid growth in data, the need to efficiently analyze data has become paramount. As a result, data science is rising in importance. Data science provides a systematic approach for processing and analyzing data. Although, a number of frameworks and data systems have emerged to support the data science work-flow, there is no standard benchmark to evaluate them. We developed Sanzu, a benchmark for data science. It includes a micro benchmark to test individual operations and a macro benchmark to represent real-world use-cases.

Sanzu is publicly available for the researchers and academic users. The latest release (version 0.7) can be downloaded here