Exception: Hash tables with more than 2B entries not supported yet


I am running into exception “Exception: Hash tables with more than 2B entries not supported yet” when trying to join two tables with 4b rows each. Does this mean at this time I can not run joins which results in more than 2b operations?


no it means you cant join a table with another one with such cardinality because the software is unable to build an hash table with more than 2b of distinct values

so if you have a fact table with 10B of rows and you join with a dimension table with 1 millions of rows the query will work because it has to build an hash table on outer table with just 1 million entries, but it will fail with two table of 3b rows (1-1 relation)

this is an educated guess, maybe i am wrong.

anyway you can try with key sharding; create both tables with columns involved on join as shard key (never tried, i dont know if it works on single GPUs system) with a shard count big enough to makes the number of key per shard lower than the hard limit you are hitting