I have a few doubts about the function "check_total_bitmap_memory"


#1

void check_total_bitmap_memory(const QueryMemoryDescriptor& query_mem_desc) {
const int32_t groups_buffer_entry_count = query_mem_desc.getEntryCount();
if (g_enable_watchdog) {
checked_int64_t total_bytes_per_group = 0;
const size_t num_count_distinct_descs =
query_mem_desc.getCountDistinctDescriptorsSize();
for (size_t i = 0; i < num_count_distinct_descs; i++) {
const auto count_distinct_desc = query_mem_desc.getCountDistinctDescriptor(i);
if (count_distinct_desc.impl_type_ != CountDistinctImplType::Bitmap) {
continue;
}
//“Total_bytes_per_group” represents the total number of predicted rows (bits) converted to bytes
total_bytes_per_group += count_distinct_desc.bitmapPaddedSizeBytes();
}
int64_t total_bytes{0};
// Need to use OutOfHostMemory since it’s the only type of exception
// QueryExecutionContext is supposed to throw.
try {
total_bytes =
static_cast<int64_t>(total_bytes_per_group * groups_buffer_entry_count);
} catch (…) {
// Absurd amount of memory, merely computing the number of bits overflows int64_t.
// Don’t bother to report the real amount, this is unlikely to ever happen.
throw OutOfHostMemory(std::numeric_limits<int64_t>::max() / 8);
}
if (total_bytes >= 2 * 1000 * 1000 * 1000L) {
throw OutOfHostMemory(total_bytes);
}
}
}

I have two questions:
(1)

total_bytes_per_group += count_distinct_desc.bitmapPaddedSizeBytes();

Total_bytes_per_group” represents the total number of predicted rows (bits) converted to bytes,Why do I have to multiply by the number of rows again in the "total_bytes ".
(2)What is the basis of “2 * 1000 * 1000 * 1000L”?


#2
  1. You can have multiple groups where you are doing a count(distinct)

e.g.

select count(distinct field1) from table1

obviously here you have just one group so the groups_buffer_entry_count will be 1

select field2, count(distinct field1) from table1 group by 1

where you will have multiple groups and the groups_buffer_entry_count will be different.

  1. maybe it’s related to the maximum slab size of GPUs (2GB); the operations actually uses a single slab, so it makes sense for the watchdog to raise an exception

#3

hi @aznable,
think you! I think struct CountDistinctDescriptor: : bitmap_zs_bits represents the largest number of rows, if this line has the data, the corresponding bit is 1,Is that the right idea?when i add “group by”,the system need multiply by the maximum number of rows.


#4

Hi @tianhz,

a bitmap with the relative row position would make sense on filtering, but for a count distinct the would make more sense to turn on the bit corresponding to the contents of the column.

Imagine a column with a range between 0 and 10 with 4 single values (1,2,6,10) and you want to do a count distinct; for every row you have just to turn on the corresponding bit, then count number of bit turned on, in ourr case the final resuolt would be

1100010001

doing so the memory footprint would be quite small and the operation is quite fast; did something like that in the '90s to sort (well…it was more like a rank) on Amiga that used a motorola 68k clocked at 7 mhz (lol)


#5

hi @aznable,
thinks! I have no doubts about this question.I’m studying mapd bitmap, but it is too complicated to analyze from the code. Could you give me some advice on how to analyze this process?


#6

Hi @tianhz,

Yes study the code this way is hard and it’s easy to feel overwhelmed, but I can suggest a different approc.
If you want to see what the software call to perform a count distinct you can take a look to the llvm code generated with a simple query like this

select count(distinct f1) from t1;

then take a look to the relevant code

; Function Attrs: alwaysinline
define i32 @row_func(i64* %out, i64* %agg_init_val, i64 %pos, i64* %frag_row_off, i64* %num_rows_per_scan, i8* %literals, i8* %col_buf0, i64* %join_hash_tables) #20 {
entry:
  %0 = load i64, i64* %frag_row_off
  br i1 true, label %filter_true, label %filter_false

filter_true:                                      ; preds = %entry
  %1 = call i64 @fixed_width_int_decode(i8* %col_buf0, i32 4, i64 %pos)
  %2 = trunc i64 %1 to i32
  %3 = call i64 @cast_int32_t_to_int64_t_nullable(i32 %2, i32 -2147483648, i64 -9223372036854775808)
  %4 = bitcast i8* %literals to i64*
  %5 = getelementptr i64, i64* %4, i32 -1
  %6 = load i64, i64* %5
  %7 = bitcast i8* %literals to i64*
  %8 = getelementptr i64, i64* %7, i32 -2
  %9 = load i64, i64* %8
  call void @agg_count_distinct_bitmap_skip_val_gpu(i64* %out, i64 %3, i64 19700101, i64 -9223372036854775808, i64 %6, i64 %9, i64 1, i64 151372688)
  br label %filter_false

then you can look what agg_count_distinct_bitmap_skip_val_gpu in the github repository.

After that you can experiment with more complex queries


#7

hi @aznable,
Will bitmap be used for normal queries?
such as:
select * from table1;
select table1.id,table2.id from table1,table2 where table1.id = table2.id;