I’d like to scan a large 40 million row table of IP addresses for a set of suspicious addresses that I am interested in. I formulated the query/subquery below - and this works perfectly on another database I’m using (postgres) but it does not work on mapd (both databases are running on the same server).
mapdql> select sip_dip_string from suspicious_ips where sip_dip_string not in (select sip_dip_string from netflow_data);
Exception: Query would require a scan without a limit on table(s): netflow_data
Is there a way to increase the limit on how many rows can be returned by a subquery? Also, how can I go about measuring how much of my 40 million row table I can fit into GPU memory. Is there documentation I could consult to begin to get a deeper level of the tuning characteristics of mapd and how my data set fits in available memory?