Discussions

Expand all | Collapse all

Problem with joining two tables

  • 1.  Problem with joining two tables

    Posted 09-12-2019 03:12

    Hello,

    I'm going to try to explain my problem as much as I can.

    So, basically, table A has 588360324 rows, and table B has 146991045. By simple filtering table A for sID = 999888 (select count(*) from A where sID=999888) I'm getting 35424 results.

    When I'm trying to execute query:


    select sum(A.val) val1,A.sDate,A.sID from A join B on B.sID=A.sID and B.id=A.bID where A.sID=999888 group by A.sDate,A.sID;

    I'm getting an error:

    Projection query output result set on table(s): B would contain 146991045 rows, which is more than the current system limit of 128000000.

    Is there any option or solution which I can use to solve my problem and execute query properly? Maybe I can change the limit? When I'm trying to query both tables separately (without joins) there is no problem.

    My hardware is PowerEdge R720 with 2xK80 (seen as 4 cards 11 GB RAM each), 40 core processor, 128 GB RAM. I'm using opensource version.


    #Uncategorized
    #Core
    #General


  • 2.  RE: Problem with joining two tables

    Posted 09-12-2019 04:21
    Hi,

    it's the watchdog exception that's blocking your query, so you can disable the watchdog, then retry running your query.
    To disable add the line enable-watchdog=false to you omnisci.conf file, then restart the server.

    Said that looks a but to me because I don't see any reason why there would be an intermediate projection of the B table in such a simple query.

    Have you tried to shard the tables? Sharding could be beneficial for performance


  • 3.  RE: Problem with joining two tables

    Posted 09-12-2019 05:02
    Edited by Archi K 09-12-2019 05:06
    Hi!

    Thank You for the answer. Now server is trying to execute this query. After couple of seconds I'm getting an error:

    Exception: Fatal error while attempting to build hash tables for join: Hash tables with more than 2B entries not supported yet

    I'm new to omnisci to be honest and I'm testing its capabilities. I've read about sharded tables but haven't used it yet. Do You think this error is caused by the fact my tables are not sharded? If it is, I will try sharding later but as I understand I have to reload my data to them. I planned sharding my tables later at performance optimization stage, but I'm struggling to perform such a simple queries which is frustrating :)


  • 4.  RE: Problem with joining two tables

    Posted 09-12-2019 06:08
    Hi,

    the table sharding wanted to be a suggestion to improve the performance, so I cannot saidd that sharding tables the query will run.

    The fact you are waiting a couple of seconds before get an error it doesn't sound good. I run without any problem query like that

    select r_name,n_name,sum(o_totalprice), count() from orders_1500m join customers_150m on o_custkey = c_custkey join nation on n_nationkey = c_nationkey join region on n_regionkey=r_regionkey group by 1,2

    in less than 1000ms (filtered or not) with just 2 gaming GPUs. The big difference is that you are using two keys to join tables.  Could you add verbose=yes and enable-debug-timer = true, re-run the query and share the [mapd_storage]/data/mapd_log/omnisci_server.INFO, to better undestand what's happening?


  • 5.  RE: Problem with joining two tables

    Posted 29 days ago
    Edited by Archi K 29 days ago
    Hello!

    I've performed those operations and attached requested file.

    omnisci_server.INFO contents:

    https://pastebin.com/QyLuPjqC

    If there is no way to solve this problem, next thing I'm thinking about is to use one keys instead of two as You mentioned and check if it's going to give any results. To do this I have to denormalize the data and merge those keys to have one composed of both of them as I've done before to analyse the data in Analysis Services. Or maybe put those two tables together into one (all columns together) and then import data to omnisci and analyse in omnisci this way. 

    But now I'll just wait for You, maybe there is an answer in my INFO file :)

    I want to thank You for Your help and great commitment in my issue!




  • 6.  RE: Problem with joining two tables

    Posted 28 days ago
    Hi @Archi K,

    thanks for posting the log and sorry for the late reply

    enabling the filter pushdown setting the parameter enable-filter-push-down to true could help to lower the cardinality of the join, making the query run, but te query performance will not be optimal, because the large projection on the right table, so it would be better, if possible to have a single key column to join such high cardinalities tables.

    if the enable filter pushdown is working, you can to the ETL work needed in omnisci database using the pseudo column that every omnisci's table has.

    so you can do a create table as select for the bigger table using a narrow key-range, then complete with some IAS.

    e.g.

    create table a as
    select f1,f2,f3...,fn, b.rownum as fk_to_table_b
    from a join b on b.sID=a.sID and b.id=a.bID 
    where b.sID between 1 and 10;
    
    insert into table a
    select f1,f2,f3...,fn, b.rownum as fk_to_table_b
    from a join b on b.sID=a.sID and b.id=a.bID 
    where b.sID between 11 and 20;
    
    and so on.

    the do the same for the b table without the join.

    as I told you before, sharding tables on the new FK/PK created; also changing the fragment size to 64000000 would help speeding up the join


  • 7.  RE: Problem with joining two tables

    Posted 27 days ago
    Hello @Candido Dessanti,

    Unfortunately, enabling filter pushdown setting doesn't work as well, exception message is the same. So the next thing I'm going to do is to use one pregenerated key instead of those two as I did it in MS Analysis Services. It has been successfully tested before for data integrity so this step is behind me already :) I'll be back with info if it works.

    Again, thank You for Your help!



  • 8.  RE: Problem with joining two tables

    Posted 26 days ago
    Hi,

    I'm sorry it's not working; I'll try to reproduce doing some because I haven't any dataset with two keys with such cardinality.

    With just one key I didn't see any reason why I shouldn't work


  • 9.  RE: Problem with joining two tables

    Posted 26 days ago
    Hello!

    I think it will be fine. For now I have to postpone the tests a little as I have to prepare the data, setup replications and it requires some processes to be finished. I'll write a post as soon as I finish with all the one-key omnisci test database.

    All the best!