What kind of queries use bimap?


#1

What kind of queries use bimap,Are there only queries from distinct count?


#2

also the queries with IN clause will use a bitmap (or bytemap; we talked about that on another topic) if the number of values exceed a threshold and I guess as in subquery like select … where filed in (select …).


#3

hi @aznable,
thinks,Can you enumerate such query statements for me?


#4

hi @tianhz,

you are remainding me the times I was a student…lol.

I can’t enumerate all the kind of queries that use a bitmap, because I’m not the author of this software so i can give you an accurate response (and I like to be accurate when replying to a such specific question)

AFIK the queries that involve a bitmap/bytemap are the following

  1. count distinct
  2. a select with an in clause with more than 3 values, this isn’t always true, because depend by the datatype you are using

e.g.
select whatever your want from table where numeric field in (1,2,3) will generate a llvm code like that
%3 = call i8 @eq_int16_t_nullable(i16 %2, i16 %arg_literal_0, i64 -32768, i8 -128)
%4 = call i8 @logical_or(i8 0, i8 %3, i8 -128)
%5 = call i8 @eq_int16_t_nullable(i16 %2, i16 %arg_literal_2, i64 -32768, i8 -128)
%6 = call i8 @logical_or(i8 %4, i8 %5, i8 -128)
%7 = call i8 @eq_int16_t_nullable(i16 %2, i16 %arg_literal_4, i64 -32768, i8 -128)
%8 = call i8 @logical_or(i8 %6, i8 %7, i8 -128)

just adding a value so select whatever your want from table where numeric field/dictionary encoded text in (1,2,3,4) will use a bitmap insted of simple comparison

%4 = call i8 @bit_is_set(i64 %arg_literal_0, i64 %3, i64 1, i64 4, i64 -32768, i8 -128)
%5 = icmp sgt i8 %4, 0

but if you use a different datatype as a timestamp the bitmap won’t be used whetever the number of values
select whatever your want from table where timstamp field in
(‘20110101 00:00:00’,‘20110102 00:00:00’,‘20110103 00:00:00’,‘20110104 00:00:00’,‘20110105 00:00:00’,‘20110106 00:00:00’,‘20110107 00:00:00’,‘20110108 00:00:00’,‘20110109 00:00:00’,‘20110110 00:00:00’,‘20110111 00:00:00’,‘20110112 00:00:00’,‘20120101 00:00:00’,‘20120102 00:00:00’,‘20120103 00:00:00’,‘20120104 00:00:00’,‘20120105 00:00:00’,‘20120106 00:00:00’,‘20120107 00:00:00’,‘20120108 00:00:00’,‘20120109 00:00:00’,‘20120110 00:00:00’,‘20120111 00:00:00’,‘20120112 00:00:00’)

this will generate a number of calls like that
%4 = call i8 @eq_int64_t_nullable(i64 %3, i64 %arg_literal_0, i64 -9223372036854775808, i8 -128)
%5 = call i8 @logical_or(i8 0, i8 %4, i8 -128)

corresponding to the number of distinct values contained in the in filter

this makes sense because the resulting bitmap would be huge (the timestamp(0) datatype is stored as number of seconds since an epoch so we are talking about quite big numbers)

  1. Maybe a select from in (select will end using a bitmap but i’m not sure about that.

I’m unaware if bitmap are also used on other operations