MapD CPU-GPU workload balance


#1

Hi, how MapD balances workload between CPU and GPU? I am working with MapD and when I run a specific query it displays the next error message “Query couldn’t keep the entire working set of columns in GPU memory”.

The dataset I have is around 1,150M rows and I am working with a 11GB RAM GPU (a little bit short for that dataset size, I know). The query is the following:
SELECT a, b, c FROM myTable WHERE a=‘a’, b=2, c=3.0 GROUP BY a, b, c. So, a is a STR[dict. encode] (4 bytes), b is a smallInt (2 bytes) and finally c is a float (4 bytes). Therefore, the data involved in the query is bigger than what can fit in GPU (1,150M * 11 bytes = 11.78GB) > 11GB.

Therefore, more or less I can understand the error message. However, I have read in some papers that MapD should balance some of the query load to CPU to run the query. I am wrong? Did I miss something? Or should I tun in someway MapD to allow this balancing?

Thanks in advance.


#2

the default is that the entire filtering/grouping,joining has to be performed on GPU.

the CPU is doing just projection and maybe a final reduction.

try to use

enable-watchdog = false (this should enable to send data on chuncks on gpu)
enable-cpu-retry = true

for reference in the forum


TSocket::write_partial() send() <Host: localhost Port: 9091>Broken pipe
#3

Can confirm. I also had this issue and changing this setting corrected the problem.