Discussions

Expand all | Collapse all

800gb data set Query

  • 1.  800gb data set Query

    Posted 02-07-2019 08:43

    Hi,
    I have an 800GB dataset loaded on MapD.
    It has 2,059,791,295 Rows.
    I have been trying to run the following query.

    SELECT Descr_Column,Date_dt ,sum(Price)/sum(Quantity) as Price from Data_Week04
    where Quantity >0
    group by 1,2
    The query didnt complete once.
    On the web server, I had got the error
    Exception: Failed to run the cardinality estimation query: Query couldn’t keep the entire working set of columns in GPU memory

    The configuration I am using on GCP was
    n1-highmem-64 (64 vCPUs, 416 GB memory)
    4 x NVIDIA Tesla P100

    I also had changed the flags
    in Mapd.conf
    allow-cpu-retry = True
    allow-loop-joins

    I am wondering if MapD is meant for smaller datasets than 800GB or am I doing something wrong.

    Regards,
    Anant



  • 2.  RE: 800gb data set Query

    Posted 02-07-2019 09:50

    hi @anantmb,

    assuming Data_Week04 is a table, doing a fast (and pessimistic) math the query would need just 28 bytes per row so the memory needed to keep the columns involved in memory would be around 54GB and 4 Tesla P100 has a total memory of 64GB, your query is failing because of combined cardinality of Desc_column and Date_dt.

    You can try to disable watchdog setting the parameter to enable-watchdog = false or running the query in cpu mode