Discussions

Expand all | Collapse all

Query Elapsed Time does not increase linearly.

  • 1.  Query Elapsed Time does not increase linearly.

    Posted 19 days ago
    Query Elapsed Time does not increase linearly as the scale factor increases. What do you think is the biggest difference when scale factor increases from 32G to 64G?

    Below is a table that measures the query execution time when in warm cache.

    System Specification
    CPU:i7-8700K(3.7GHz)
    GPU:GTX 1080Ti 11G
    RAM:32G


    #Core

    ------------------------------
    Jeon Woohyuck
    [[Unknown]]
    ------------------------------


  • 2.  RE: Query Elapsed Time does not increase linearly.

    Posted 19 days ago
    Edited by Candido Dessanti 19 days ago
    Hi @Jeon Woohyuck,

    Assuming you are talking about TPC-H schema and you are on mapd 4.4.x, it's likely that on a 64GB scale the data needed by the queries didn't fit on RAM of your GPU, so the optimizer, to avoid a CPU fallback, execute the query on several trips, unloading and loading the data to the GPU memory as needed OR you are on another release and the optimizer is falling back to a CPU execution

    First scenario:

    with the Q1 on of 30GB

    SELECT l_returnflag, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice*(1-l_discount)) AS sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM lineitem WHERE l_shipdate <= DATE '1996-12-01' - INTERVAL '90' DAY (3) GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag;

    the query takes 1248ms and the memory allocated is 6415.57 MB and it's running on a single trip

    turning on the enable-debug-timer parameter you will see in output like that:

    I0305 09:50:09.712998 129067 measure.h:73] Timer start execution_dispatch_run operator(): 1056
    I0305 09:50:09.713070 129067 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 09:50:09.713325 129067 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 0 ms
    I0305 09:50:09.713392 129067 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 09:50:09.713407 129067 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 09:50:10.844782 129067 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 1131 ms
    I0305 09:50:10.844820 129067 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 1131 ms
    I0305 09:50:10.844846 129067 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 1131 ms

    Adding 30gb of data to the table the GPU memory needed is 12831.14GB, so the execution is and the query's time increase to 3357ms (2.7x) and in the log you will find several messages from the memory manager. Here is extract

    I0305 10:00:08.908902 94987 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 886 ms
    I0305 10:00:08.909087 94993 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 10:00:08.909134 94993 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 2875000 Number pages requested 500000 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:08.933188 94993 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 0 Number pages requested 500000 Best Eviction Start Slab 1 GPU_MGR:0
    I0305 10:00:08.957132 94993 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 0 Number pages requested 500000 Best Eviction Start Slab 4 GPU_MGR:0
    I0305 10:00:08.981112 94993 BufferMgr.cpp:425] ALLOCATION failed to find 32000000B free. Forcing Eviction. Eviction start 3752714 Number pages requested 62500 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:08.984266 94993 BufferMgr.cpp:425] ALLOCATION failed to find 32000000B free. Forcing Eviction. Eviction start 937500 Number pages requested 62500 Best Eviction Start Slab 5 GPU_MGR:0
    I0305 10:00:08.987442 94993 BufferMgr.cpp:425] ALLOCATION failed to find 128000000B free. Forcing Eviction. Eviction start 250000 Number pages requested 250000 Best Eviction Start Slab 3 GPU_MGR:0
    I0305 10:00:08.999441 94993 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 90 ms
    I0305 10:00:08.999529 94993 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 10:00:08.999537 94993 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 10:00:09.149390 94993 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 149 ms
    I0305 10:00:09.149436 94993 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 149 ms
    I0305 10:00:09.149449 94993 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 1127 ms
    I0305 10:00:09.149472 94992 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 10:00:09.149579 94992 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 0 ms
    I0305 10:00:09.149632 94992 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 10:00:09.149646 94992 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 10:00:09.370874 94992 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 221 ms
    I0305 10:00:09.370887 94992 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 221 ms
    I0305 10:00:09.370935 94992 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 1348 ms
    I0305 10:00:09.370981 94990 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 10:00:09.371073 94990 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 500000 Number pages requested 500000 Best Eviction Start Slab 4 GPU_MGR:0
    I0305 10:00:09.395375 94990 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 1000000 Number pages requested 500000 Best Eviction Start Slab 4 GPU_MGR:0
    I0305 10:00:09.417973 94990 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 1500000 Number pages requested 500000 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:09.439924 94990 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 2164416 Number pages requested 500000 Best Eviction Start Slab 3 GPU_MGR:0
    I0305 10:00:09.462505 94990 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 91 ms
    I0305 10:00:09.462582 94990 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 10:00:09.462592 94990 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 10:00:09.610553 94990 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 147 ms
    I0305 10:00:09.610569 94990 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 147 ms
    I0305 10:00:09.610580 94990 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 1588 ms
    I0305 10:00:09.610630 94991 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 10:00:09.610745 94991 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 2664416 Number pages requested 500000 Best Eviction Start Slab 3 GPU_MGR:0
    I0305 10:00:09.634245 94991 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 2250000 Number pages requested 500000 Best Eviction Start Slab 1 GPU_MGR:0
    I0305 10:00:09.656515 94991 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 1250000 Number pages requested 500000 Best Eviction Start Slab 2 GPU_MGR:0
    I0305 10:00:09.678789 94991 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 1750000 Number pages requested 500000 Best Eviction Start Slab 2 GPU_MGR:0
    I0305 10:00:09.701083 94991 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 90 ms
    I0305 10:00:09.701160 94991 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 10:00:09.701169 94991 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 10:00:09.921865 94991 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 220 ms
    I0305 10:00:09.921881 94991 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 220 ms
    I0305 10:00:09.921892 94991 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 1899 ms
    I0305 10:00:09.921943 94994 measure.h:73] Timer start fetchChunks fetchChunks: 1682
    I0305 10:00:09.922060 94994 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 2250000 Number pages requested 500000 Best Eviction Start Slab 2 GPU_MGR:0
    I0305 10:00:09.947115 94994 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 1500000 Number pages requested 500000 Best Eviction Start Slab 4 GPU_MGR:0
    I0305 10:00:09.971230 94994 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 0 Number pages requested 500000 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:09.995249 94994 BufferMgr.cpp:425] ALLOCATION failed to find 256000000B free. Forcing Eviction. Eviction start 500000 Number pages requested 500000 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:10.025403 94994 BufferMgr.cpp:425] ALLOCATION failed to find 128000000B free. Forcing Eviction. Eviction start 3820642 Number pages requested 250000 Best Eviction Start Slab 0 GPU_MGR:0
    I0305 10:00:10.037401 94994 measure.h:80] Timer end fetchChunks fetchChunks: 1682 elapsed 115 ms
    I0305 10:00:10.037493 94994 measure.h:73] Timer start executePlanWithGroupBy executePlanWithGroupBy: 2092
    I0305 10:00:10.037501 94994 measure.h:73] Timer start lauchGpuCode launchGpuCode: 97
    I0305 10:00:10.187234 94994 measure.h:80] Timer end lauchGpuCode launchGpuCode: 97 elapsed 149 ms
    I0305 10:00:10.187294 94994 measure.h:80] Timer end executePlanWithGroupBy executePlanWithGroupBy: 2092 elapsed 149 ms
    I0305 10:00:10.187307 94994 measure.h:80] Timer end execution_dispatch_run operator(): 1056 elapsed 2165 ms

    As said before looks the executor processing one fragment after another adding some overhead to the process (memory manager, multiples GPUs launches and so on) making the linearity of the execution time impossible.

    In the second scenario the optimizer is falling back the execution to CPU, so for each fragment (12 if you are using the default fragments size) a CPU thread is because you have just 4C/8T on your CPU the execution time of Q1 and Q6 is slower

    Cutting the cores/thread on my system get 3613ms for Q1 and 6119ms for Q6.

    If you want to be sure what's happening on your system, I suggest to turn the enable-debug-timer to true on your and search for launchGpuCode or launchCpuCode on log

    On omnisci 4.5 the behavior is different and query fall back to CPU execution. You can also save memory using the new encoding for the date datatype and use the appropriate precision for decimal datatypes.

    This DDL would lead to a quite big memory saving

    CREATE TABLE (
    L_ORDERKEY INTEGER,
    L_PARTKEY INTEGER,
    L_SUPPKEY INTEGER,
    L_LINENUMBER INTEGER,
    L_QUANTITY DECIMAL(8,2) ENCODING FIXED(32),
    L_EXTENDEDPRICE DECIMAL(8,2) ENCODING FIXED(32),
    L_DISCOUNT DECIMAL(4,2) ENCODING FIXED(16),
    L_TAX DECIMAL(4,2) ENCODING FIXED(16),
    L_RETURNFLAG TEXT ENCODING DICT(8),
    L_LINESTATUS TEXT ENCODING DICT(8),
    L_SHIPDATE DATE ENCODING FIXED(16),
    L_COMMITDATE DATE ENCODING FIXED(16),
    L_RECEIPTDATE DATE ENCODING FIXED(16),
    L_SHIPINSTRUCT TEXT ENCODING DICT(8),
    L_SHIPMODE TEXT ENCODING DICT(8),
    L_COMMENT TEXT ENCODING DICT(32))

    ------------------------------
    Candido Dessanti
    Dba
    consulting
    Rome
    ------------------------------



  • 3.  RE: Query Elapsed Time does not increase linearly.

    Posted 18 days ago

    My results are like this.





    I don't have the following log.

    Did you use the 'allow-cpu-retry=true' option?

    ------------------------------
    Jeon Woohyuck
    [[Unknown]]
    ------------------------------



  • 4.  RE: Query Elapsed Time does not increase linearly.

    Posted 18 days ago
    Hi @Jeon Woohyuck,

    the allow_cpu_retry parameter has to be set explicitly to true on mapd_server with versions less than 4.5; from 4.5 it's turned on by default


    ------------------------------
    Candido Dessanti
    Dba
    Crismatica consulting
    Rome
    ------------------------------