Hi @pmdy,
We try to reply to the community questions ASAP.
Anyway I changed a little the DDL sharing dictionaries of the text columns you are using to do the semi join and the query is performing margially better (256ms instead of 400ms on a GTX 1050ti using GPU mode and a quad core i7 on CPU mode)
the modified DDL is
CREATE TABLE nyc_trees_2015_683k_sd (
created_at DATE ENCODING DAYS(16),
tree_id TEXT ENCODING NONE,
block_id TEXT ENCODING DICT(32),
the_geom TEXT ENCODING NONE,
tree_dbh SMALLINT,
stump_diam SMALLINT,
curb_loc TEXT ENCODING DICT(8),
status TEXT ENCODING DICT(8),
health TEXT ENCODING DICT(8),
spc_latin TEXT ENCODING DICT(8),
spc_common TEXT ENCODING DICT(8),
steward TEXT ENCODING DICT(8),
guards TEXT ENCODING DICT(8),
sidewalk TEXT ENCODING DICT(8),
user_type TEXT ENCODING DICT(8),
problems TEXT ENCODING DICT(8),
root_stone TEXT ENCODING DICT(8),
root_grate TEXT,
root_other TEXT,
trnk_wire TEXT ENCODING DICT(8),
trnk_light TEXT ENCODING DICT(8),
trnk_other TEXT ENCODING DICT(8),
brnch_ligh TEXT ENCODING DICT(8),
brnch_shoe TEXT ENCODING DICT(8),
brnch_othe TEXT ENCODING DICT(8),
address TEXT ENCODING DICT(32),
zipcode TEXT ENCODING DICT(8),
zip_city TEXT ENCODING DICT(8),
cb_num TEXT ENCODING DICT(8),
borocode TEXT ENCODING DICT(8),
boroname TEXT ENCODING DICT(8),
cncldist TEXT ENCODING DICT(8),
st_assem TEXT ENCODING DICT(8),
st_senate TEXT ENCODING DICT(8),
nta TEXT ENCODING DICT(8),
nta_name TEXT ENCODING DICT(8),
boro_ct TEXT ENCODING DICT(16),
state TEXT ENCODING DICT(8),
latitude FLOAT,
longitude FLOAT,
x_sp FLOAT,
y_sp FLOAT,
SHARED DICTIONARY (root_grate) REFERENCES nyc_trees_2015_683k(root_stone),
SHARED DICTIONARY (root_other) REFERENCES nyc_trees_2015_683k(root_stone));
to reload the table an IAS is quite fast
insert into nyc_trees_2015_683k_sd select * from nyc_trees_2015_683k;
probably it would be better to share dictionaries of trnk and brnch too.
You can read about shared dictionaries here
to further speed-up the query (but I haven’t checked the results in full), I re-wrote the query against the dictionary shared table (It’s not going to work on the original table) this way
select
ma.block_id,
(
select
avg(mac.latitude)
from
nyc_trees_2015_683k_sd as mac
where (
KEY_FOR_STRING( mac.root_grate) = KEY_FOR_STRING(ma.root_stone)
or KEY_FOR_STRING(mac.root_grate) = KEY_FOR_STRING(ma.root_other)
)
) as test
from nyc_trees_2015_683k_sd as ma
the key is the use of KEY_FOR_STRING function against the dictionary encoded strings used for the semi-join.
rewriting the query
now the query is returning back in 70-110ms.
If you need further assistance, please ask.
Have fun 