800gb data set Query


#1

Hi,
I have an 800GB dataset loaded on MapD.
It has 2,059,791,295 Rows.
I have been trying to run the following query.

SELECT Descr_Column,Date_dt ,sum(Price)/sum(Quantity) as Price from Data_Week04
where Quantity >0
group by 1,2
The query didnt complete once.
On the web server, I had got the error
Exception: Failed to run the cardinality estimation query: Query couldn’t keep the entire working set of columns in GPU memory

The configuration I am using on GCP was
n1-highmem-64 (64 vCPUs, 416 GB memory)
4 x NVIDIA Tesla P100

I also had changed the flags
in Mapd.conf
allow-cpu-retry = True
allow-loop-joins

I am wondering if MapD is meant for smaller datasets than 800GB or am I doing something wrong.

Regards,
Anant


#2

hi @anantmb,

assuming Data_Week04 is a table, doing a fast (and pessimistic) math the query would need just 28 bytes per row so the memory needed to keep the columns involved in memory would be around 54GB and 4 Tesla P100 has a total memory of 64GB, your query is failing because of combined cardinality of Desc_column and Date_dt.

You can try to disable watchdog setting the parameter to enable-watchdog = false or running the query in cpu mode