|
Sanzu: A Data Science Benchmark
|
With the
rapid growth in data, the need to efficiently analyze data has become
paramount. As a result, data science is rising in importance. Data
science provides a systematic approach for processing and analyzing
data. Although, a number of frameworks and data systems have emerged to
support the data science work-flow, there is no standard benchmark to
evaluate them. We developed Sanzu, a benchmark for data science. It
includes a micro benchmark to test individual operations and a macro
benchmark to represent real-world use-cases.
Sanzu is publicly available for the researchers and academic users. The latest release (version 0.7) can be downloaded here.