Batched execution for large data sets


#1

Hi,

I have following questions regarding query processing in MapD-core:

  1. I understand that if the data involved in query processing as size more than the GPU memory, MapD-core throws following exception:

Exception: Query couldn’t keep the entire working set of columns in GPU memory

Is there an easy way to add batched execution (dividing the input data into smaller batches such that the data in each batch can fit in GPU memory and after all batches are done, combine the result)? I need some pointers such as what files would be involved in such a change?

  1. I believe that at the time of the query execution, if the required data is not in GPU memory it is transferred from CPU. Could you please let me know where the functions involved in this functionality.

Thanks,
Harshada


MapD CPU-GPU workload balance
#2

You can try to use enable-watchdog parameter setting it to false in mapd.conf or as switch when you launch mapd server.

There is also another parameter allow-cpu-retry you can experiment with


#3

Thank you.

I think I am looking for something like turning off enable-watchdog off.

-Harshada


#4

Hi,

We do support a mode as you describe where processing is done in batches. You must disable watchdog to allow it to happen --enable-watchdog=false.

There are potentially still issue executing down this path as some of the cardinality checking will still expect to be able to get a full set of a column into memory.

I would recommend you explore the \cpu mode query operation when you know your query is going to cause thrashing of all the GPU cached columns, due to a very large batch operation. In mapdql you can use \cpu to inform the MapD engine ti run the following queries in CPU only mode. To go back to GPU mode the command is \GPU

regards


MapD CPU-GPU workload balance
TSocket::write_partial() send() <Host: localhost Port: 9091>Broken pipe