FURI | Spring 2025
A Benchmark Suite for Evaluating AI/ML Inference Queries

Inference queries that nest AI/ML model inferences and SQL queries, such as SELECT tid, cid FROM transactions, customers, WHERE transactions.cid = customers.id AND dnn_predict(transactions.*, customers.*) = FALSE, is urgently demanded in various industries, such as retailing, healthcare, finance, and homeland security, for personalized product recommendation, conversational intelligence, fraud detection, etc. These applications all rely on performing AI/ML inferences over relational databases. A lot of systems supporting inference queries emerged recently, such as Amazon Redshift, Google BigQuery, GaussML, Raven, EvaDB, PostgresML, and so on. However, existing AI/ML benchmarks such as MLPerf focus on the AI/ML performance, while database benchmarks such as TPC-H and TPC-DS focus on query performance. There is a lack of a high-quality benchmark for inference queries and for comparing the performance of different systems that support and optimize such queries or different optimization algorithms that co-optimize SQL and ML.
To address the problem, our research will present a new and comprehensive benchmark suite, which will provide 100 inference queries ranging from simple to complex in query logic and model complexity. These queries consist of different numbers and different types of AI/ML models, and different numbers and different types of relational operators. These queries are built on real-world datasets such as IMDB. They will support realistic AI/ML models ranging from simple linear regression models and decision tree models to complex deep neural network models, convolutional models, transformer models, and LLM models.