Discussions

Expand all | Collapse all

GPU Memory Vs Normal Memory

  • 1.  GPU Memory Vs Normal Memory

    Posted 21 days ago
    Hi,

    I am doing some POC for OmniSci on GPU Card.

    Question 1 : How we can get to know, which Data volume can run in GPU ?
    Question 2: When i deleted records from table, still the memory requirement not change. what could be reason.

    POC Description
    GPU Card: NVIDIA TESLA P4 (8GB)
    CPU: Intel Xeon Silver 4114 (10C/20T) x 2
    RAM: 192 GB
    HDD's : 600GB(SAS II) x 6

    Test Data
    Telecom system access Data
    Data Volume : 600 Million to 1000 Million Records
    Data Volume : 35GB

    ------

    omnisql> select count(*)/1000000 from TELECOM_CDR;
    1015

    select SERVICE_DATE, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;

    I0802 17:28:14.137711 306323 MapDHandler.cpp:5620] stdlog MapDHandler.cpp 748 sql_execute 857 reportdb report 803-c8fQ {"query_str","execution_time_ms","total_time_ms"} {"select count(*)/1000000 from TELECOM_CDR;","856","857"}
    I0802 17:28:33.225072 306323 Calcite.cpp:432] User report catalog reportdb sql 'select SERVICE_HOUR, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;'
    I0802 17:28:33.240665 306323 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 14 (ms)
    W0802 17:28:33.247790 306323 QueryFragmentDescriptor.cpp:265] Not enough memory on device 0 for input chunks totaling 7072000000 bytes (available device memory: 7062729523 bytes)
    I0802 17:28:33.248360 306323 RelAlgExecutor.cpp:64] Query unable to run in GPU mode, retrying on CPU
    I0802 17:28:34.323828 306854 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:35.772059 306850 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:37.243136 306841 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:37.524581 306323 MapDHandler.cpp:5620] stdlog MapDHandler.cpp 748 sql_execute 4300 reportdb report 803-c8fQ {"query_str","execution_time_ms","total_time_ms"} {"select SERVICE_HOUR, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;","4300","4300"}


    omnisql> select count(*)/1000000 from TELECOM_CDR;
    694
    omnisql>
    omnisql> select SERVICE_HOUR, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;

    I0802 17:28:33.225072 306323 Calcite.cpp:432] User report catalog reportdb sql 'select SERVICE_HOUR, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;'
    I0802 17:28:33.240665 306323 Calcite.cpp:452] Time in Thrift 1 (ms), Time in Java Calcite server 14 (ms)
    W0802 17:28:33.247790 306323 QueryFragmentDescriptor.cpp:265] Not enough memory on device 0 for input chunks totaling 7072000000 bytes (available device memory: 7062729523 bytes)
    I0802 17:28:33.248360 306323 RelAlgExecutor.cpp:64] Query unable to run in GPU mode, retrying on CPU
    I0802 17:28:34.323828 306854 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:35.772059 306850 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:37.243136 306841 BufferMgr.cpp:291] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
    I0802 17:28:37.524581 306323 MapDHandler.cpp:5620] stdlog MapDHandler.cpp 748 sql_execute 4300 reportdb report 803-c8fQ {"query_str","execution_time_ms","total_time_ms"} {"select SERVICE_HOUR, sum(TOTAL_BYTES)/1024/1024/1024 from TELECOM_CDR group by 1;","4300","4300"}
    #Core


  • 2.  RE: GPU Memory Vs Normal Memory

    Posted 21 days ago
    Edited by Candido Dessanti 20 days ago
    Hi,
    You are being blocked by the fragment memory manager introduced in the release 4.5, so the memory manager is falling back the query for CPU execution because it is estimating the query is going to need more than 90% of GPU memory.

    To. Hange this behavior, you can try changing gpu-input-mem-limit to 1 on the omnisci.conf file, then restart the instance.

    The memory needed by the query is the number of records multiplied for column needed for group by, aggregates and filtering; by the estimated is likely you are using in the second query datatypes totaling 8/12 bytes per rows that multiplied for 1015m would need, more than 8/12gbytes.

    Could you share your ddls? maybe changing the datatypes could helps.

    After a massive delete you should run an optimize table statement, this would remove the logically deleted rows

    https://docs.omnisci.com/latest/5_tables.html




  • 3.  RE: GPU Memory Vs Normal Memory

    Posted 13 days ago
    Hi @Sumit Srivastava

    in reply to this message

    Objective: Performance Testing on Omnisci Database
    Issue Observed: Performance Slowness
    Hardware Setup Detail
    Intel X86
    CPU : 2 x Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
    128 GB
    5 x 600GB (SAS II)
    GPU Nivida P4 (8GB)
    omnisql> \memory_summary
    OmniSci Server CPU Memory Summary:
    MAX USE ALLOCATED FREE
    102854.44 MB 0.00 MB 0.00 MB 0.00 MB
    OmniSci Server GPU Memory Summary:
    [GPU] MAX USE ALLOCATED FREE
    [0] 7483.94 MB 0.00 MB 0.00 MB 0.00 MB
    POC Test
    No of 01
    Table Detail
    omnisql> \d TELCO_CDR
    CREATE TABLE TELCO_CDR (
    SE_DATE DATE ENCODING DAYS(32),
    SE_HOUR INTEGER,
    SE_NAME TEXT ENCODING DICT(32),
    MOBILE TEXT ENCODING DICT(32),
    TOTAL_BYTES BIGINT,
    BYTES_IN BIGINT,
    BYTES_OUT BIGINT)
    omnisql>
    Data Loading
    -- Loading .csv Files
    -- Data 6GB
    -- No of Records: 100 Million
    Parallel Activities
    -- Data Loading
    -- Running 1 and 2 Aggregate query in parallel
    Non Aggregate query
    -- select SUM(TOTAL_BYTES) from TELCO_CDR where MOBILE = 'XXXX');
    -- Running this query for 3000 Unique Mobile Numbers with sleep
    Aggregate Query
    -- select SE_HOUR, SUM(TOTAL_BYTES) from TELCO_CDR group by 1 order by 1; [10 Second Sleep between each query]
    -- Select SE_NAME, SUM(TOTAL_BYTES)/1024/1024/1024  from TELCO_CDR group by 1 having SUM(TOTAL_BYTES)/1024/1024/1024 >10 order by 1  [15 seconds sleep between each query]
    When
    i
    am running the above 4 tasks in parallel, system performance is getting
    bad
    and many time not getting the answer.

    assuming cardinality of se_hour and se_name I the problem is the performance of order by that at the moment is running really slow on omnisci.

    on my workstation while a load of data on the table the timings are the following

    q1 between 90 and 160ms (1 row returned)
    q2 between 120 and 240ms (1693 rows returned)
    q3 between 2600 and 3500ms (14144 rows returned)

    se_hour has 1693 distinct values
    mobile and se_name has 7897225 values

    without order by
    q2 between 112 and 160ms
    q3 2400 and 2500ms

    everything is running on GPU because the memory needed for is around 2gb

    my configuration is 2x rtx2080ti and a 12c/24t with 128gb of ram