Multi-thread query


Hello, everyone.

I want to know whether i can execute multi-thread query on the same table on mapd or not.

I have modified the file mapdql.cpp, and run the function thrift_with_retry in multi-thread mode. Then the result looked like that they were executed serially.


Hi @hchen,

You are right, the queries on the system run in serial mode, so if you run multiple queries the first one is executed, and the others are placed on a queue so that a single query can use all the resources available on your hardware


Thank you very much.

And i also have another problem. if i want to execute a sql statement on computer with single 8 GPUs, the sql will query data which is less than 32 millions lines(1 fragment), but the size of the data is beyond the size of a single GPU. The query can not pass.

So i want to know whether mapd will balance the data between GPUs or not and what is the balance plan.


I’m not exactly an expert with multi gpus systems, but as you said I guess a single fragment is going to be executed on a Gpu only.

To change this behavior you have two ways

  1. re-create the table using a smaller fragment size
  2. re-create the table partitioning it as sharded

here a link to docs

If you are joining two tables with a significant number of records, sharding would be the right choice; you need to partition both tables with the same key.


Thank you for you reply.

As you said, a single fragment is going to be executed on a GPU only, and when i want to execute query statements, the first fragment of the data will be loaded into GPU0 first, then the second fragment of the data will be loaded into GPU1, and so on.

So, it is likely that GPU0 will be full, but other GPUs have much memory left, even empty. In this case, what will mapd do to balance data between GPUs?



as I said before you could use sharding; if you have 8 GPUs use a shard count of 8 when you create the table.


Hi @hchen, to add to what @aznable said, currently most of the query execution is single file, but we generally employ all the resources of that system to answer the query as quickly as possible (meaning we are not single-threaded, in fact queries on a big system are often running on hundreds of thousands of GPU threads and tens of CPU threads!). This can be the most optimal choice for maximum throughput in the sense that queries don’t have to share resources or suffer overhead from coordination/locking with other queries. In short, we’ve made a conscious choice to prioritize intra-query data parallelism over inter-query parallelism.

That said, we are a continuous quest to maximize throughput and minimize latency in our system. Part of our effort in this regard is to reduce query execution times, which both reduces latency and increases throughput. While OmniSci Core is in general quite fast, there are definitely known places that we plan to improve (i.e. large reductions between GPUs or compilation overhead).

Regarding inter-query parallelism, we already have a small measure of it in the sense that the parser/optimization pass, as well as Thrift result serialization and/or PNG creation (for render queries), can occur in parallel for multiple queries. This may sound small, but for our typical fast path queries that can take under 20ms, this adds a significant amount of parallelism/throughput to the system.

That said, we recognize that there are times when more full inter-query parallelism is warranted, particularly so that queries that under-utilize all system resources can run in parallel. Stay tuned on this one. :slight_smile:


Thank you for your reply


Thank you for your detailed reply, i will keep an eye on the progress of omnisci, especially query optimization.


If i set the value of shard when i create the table, the number of fragments will equal the value of shared. Is it true?


Well, the easy answer is yes, at least from what I can monitor in the system, but you should take special care sharding tables because you have to choose the right column to get a uniform distributions between shards.

Sharding is especially useful in case you have to join two tables; in this case, you will shard for the join condition to be sure that only the rows needed for the join of the outer table are sent to the single GPU, speeding up operations and saving memory/bandwidth and system resources.