Exception: Cast from dictionary-encoded string to none-encoded would be slow

Running this script yields me an error message. I am using the sample data table, nyc_trees_2015_683k that came out of the box.

select 
  ma.block_id,
  (
    select 
      avg(p.latitude)
    from 
      (
        select 
          mac.latitude
        from 
          nyc_trees_2015_683k as mac 
        where (
          mac.root_grate = ma.root_stone
          or mac.root_grate = ma.root_other // commenting this line, script works
        )
      ) p
  ) as test
from nyc_trees_2015_683k as ma;

Replacing the ‘or’ operator with ‘and’ made the script run without error.

where (
      mac.root_grate = ma.root_stone
      and mac.root_grate = ma.root_other 
)

Hi @pmdy,

welcome to our community; we will hope you are enjoying omniscidb.

Your query is being stopped by the watchdog, that’ is turned on by default , so to run you query without any modification, you should start the server with disabling it.

if you are running he omnisci_server by command line just append
--enable-watchdog false
tou your command, while if you are starting the omnisci_server as a service, add
enable-watchdog=false
on omnisci.conf file located by default on /var/lib/omnisci/

Anyway the performance of this query isn’t going to be stellar, and probably the query would be rewitten

Regards

That worked. I was expecting an answer within days. Awesome forum! Thank you @candido.dessanti

Hi @pmdy,

We try to reply to the community questions ASAP.

Anyway I changed a little the DDL sharing dictionaries of the text columns you are using to do the semi join and the query is performing margially better (256ms instead of 400ms on a GTX 1050ti using GPU mode and a quad core i7 on CPU mode)

the modified DDL is

CREATE TABLE nyc_trees_2015_683k_sd (
  created_at DATE ENCODING DAYS(16),
  tree_id TEXT ENCODING NONE,
  block_id TEXT ENCODING DICT(32),
  the_geom TEXT ENCODING NONE,
  tree_dbh SMALLINT,
  stump_diam SMALLINT,
  curb_loc TEXT ENCODING DICT(8),
  status TEXT ENCODING DICT(8),
  health TEXT ENCODING DICT(8),
  spc_latin TEXT ENCODING DICT(8),
  spc_common TEXT ENCODING DICT(8),
  steward TEXT ENCODING DICT(8),
  guards TEXT ENCODING DICT(8),
  sidewalk TEXT ENCODING DICT(8),
  user_type TEXT ENCODING DICT(8),
  problems TEXT ENCODING DICT(8),
  root_stone TEXT ENCODING DICT(8),
  root_grate TEXT,
  root_other TEXT,
  trnk_wire TEXT ENCODING DICT(8),
  trnk_light TEXT ENCODING DICT(8),
  trnk_other TEXT ENCODING DICT(8),
  brnch_ligh TEXT ENCODING DICT(8),
  brnch_shoe TEXT ENCODING DICT(8),
  brnch_othe TEXT ENCODING DICT(8),
  address TEXT ENCODING DICT(32),
  zipcode TEXT ENCODING DICT(8),
  zip_city TEXT ENCODING DICT(8),
  cb_num TEXT ENCODING DICT(8),
  borocode TEXT ENCODING DICT(8),
  boroname TEXT ENCODING DICT(8),
  cncldist TEXT ENCODING DICT(8),
  st_assem TEXT ENCODING DICT(8),
  st_senate TEXT ENCODING DICT(8),
  nta TEXT ENCODING DICT(8),
  nta_name TEXT ENCODING DICT(8),
  boro_ct TEXT ENCODING DICT(16),
  state TEXT ENCODING DICT(8),
  latitude FLOAT,
  longitude FLOAT,
  x_sp FLOAT,
  y_sp FLOAT,
  SHARED DICTIONARY (root_grate) REFERENCES nyc_trees_2015_683k(root_stone),
  SHARED DICTIONARY (root_other) REFERENCES nyc_trees_2015_683k(root_stone));

to reload the table an IAS is quite fast

insert into nyc_trees_2015_683k_sd select * from nyc_trees_2015_683k;

probably it would be better to share dictionaries of trnk and brnch too.

You can read about shared dictionaries here

to further speed-up the query (but I haven’t checked the results in full), I re-wrote the query against the dictionary shared table (It’s not going to work on the original table) this way

select 
  ma.block_id,
  (
        select 
          avg(mac.latitude)
        from 
          nyc_trees_2015_683k_sd as mac 
        where (
          KEY_FOR_STRING( mac.root_grate) = KEY_FOR_STRING(ma.root_stone)
          or KEY_FOR_STRING(mac.root_grate) = KEY_FOR_STRING(ma.root_other) 
        )
      
  ) as test
from nyc_trees_2015_683k_sd as ma

the key is the use of KEY_FOR_STRING function against the dictionary encoded strings used for the semi-join.
rewriting the query

now the query is returning back in 70-110ms.

If you need further assistance, please ask.

Have fun :wink: