Hi @hchen, to add to what @aznable said, currently most of the query execution is single file, but we generally employ all the resources of that system to answer the query as quickly as possible (meaning we are not single-threaded, in fact queries on a big system are often running on hundreds of thousands of GPU threads and tens of CPU threads!). This can be the most optimal choice for maximum throughput in the sense that queries don’t have to share resources or suffer overhead from coordination/locking with other queries. In short, we’ve made a conscious choice to prioritize intra-query data parallelism over inter-query parallelism.
That said, we are a continuous quest to maximize throughput and minimize latency in our system. Part of our effort in this regard is to reduce query execution times, which both reduces latency and increases throughput. While OmniSci Core is in general quite fast, there are definitely known places that we plan to improve (i.e. large reductions between GPUs or compilation overhead).
Regarding inter-query parallelism, we already have a small measure of it in the sense that the parser/optimization pass, as well as Thrift result serialization and/or PNG creation (for render queries), can occur in parallel for multiple queries. This may sound small, but for our typical fast path queries that can take under 20ms, this adds a significant amount of parallelism/throughput to the system.
That said, we recognize that there are times when more full inter-query parallelism is warranted, particularly so that queries that under-utilize all system resources can run in parallel. Stay tuned on this one.