Discussions

Expand all | Collapse all

Query plan massively slower for the same query on MapD 4.7.1

  • 1.  Query plan massively slower for the same query on MapD 4.7.1

    Posted 20 days ago

    Hi everybody, 

    I was trying to upgrade the version of OmnisciDb from 4.5.0 to 4.7.0 and I notice that for one of my queries  the timings got incredibly slower, increasing from 5s to 4 minutes to return the data.
    The query is the following:

    SELECT foo_id,
           COUNT(1)            count_entries,
           SUM(CASE
                   WHEN (NOT (foo_column_1 = 'foo_column_1_bar_1' or (foo_column_1 is null and (foo_column_2 in ('foo_column_2_bar_1', 'foo_column_2_bar_2', 'foo_column_2_bar_3', 'foo_column_2_bar_4'))) or
                              (foo_column_1 in ('foo_column_1_bar_2', 'foo_column_1_bar_3') or (foo_column_1 is null and (foo_column_2 = 'foo_column_2_bar_5'))) or
                              (foo_column_2 is null and (foo_column_1 is null) and (foo_column_3 like '%baz%'))) OR
                         foo_column_1 is null and (foo_column_2 not in
                                                ('foo_column_2_bar_1', 'foo_column_2_bar_2', 'foo_column_2_bar_3', 'foo_column_2_bar_4') OR foo_column_3 is null and (foo_column_1 is null)) THEN 1
                   ELSE 0 END) conditional_count,
    FROM bar
    where TRUE
      and (ts >= '2019-01-01 00:00:00.0' and ts < '2019-06-21 00:00:00.0')
    GROUP BY foo_id
    ORDER BY count_entries DESC
    LIMIT 10
        OFFSET 0 ROWS;
    The table definition is: 
    CREATE TABLE bar (
    foo_id INTEGER NOT NULL,
    foo_column_1 TEXT ENCODING DICT(8),
    foo_column_2 TEXT ENCODING DICT(16),
    foo_column_3 TEXT NOT NULL ENCODING NONE,
    ts TIMESTAMP(0) NOT NULL ENCODING FIXED(32))
    And the amount of data queried is around 250Gb with something around 800 millions rows for the queried period.
    While extracting the query plan I noticed that it changed between the versions and the most noticeable changes are:
    1. The like condition got negated.
    AND(IS NULL($11), IS NULL($12), LIKE($9, '%baz%')))) 4.5.0
    OR(IS NOT NULL($11), IS NOT NULL($12), NOT(LIKE($9, '%baz%')))) 4.7.0
    2. The in condition got changed from:
    AND(IS NULL($12), OR(=($11, ....)...))
    to
    OR(IS NOT NULL($12), AND(<>($11, ...),<>($11, ...)...))
    3. The initial condition had the negation at the beginning:
    CASE(OR(NOT(OR(...))))
    but now it is propagated in every single condition:
    CASE(OR(AND...))
    I know that the query can be improved by for example materializing the data in the column but is it possible that there is an issue with the query planner that was introduced in version 4.6+? 
    Finally is there a performance difference between <> and = operator from the MapD point of view? 

    Thank you,
    Francesco.
    #Core


  • 2.  RE: Query plan massively slower for the same query on MapD 4.7.1

    Posted 19 days ago
    Edited by Candido Dessanti 19 days ago
    Hi @Francesco Ferretti,

    does the query work write this way?

    SELECT
    foo_id,
    COUNT( 1 ) count_entries,
    SUM( CASE
    WHEN(
    ( foo_column_1 = 'foo_column_1_bar_1'
    OR( foo_column_1 IS NULL AND( foo_column_2 IN( 'foo_column_2_bar_1', 'foo_column_2_bar_2', 'foo_column_2_bar_3', 'foo_column_2_bar_4' )))
    OR( foo_column_1 IN( 'foo_column_1_bar_2', 'foo_column_1_bar_3' ) OR( foo_column_1 IS NULL AND( foo_column_2 = 'foo_column_2_bar_5' )))
    OR( foo_column_2 IS NULL AND( foo_column_1 IS NULL ) AND( foo_column_3 LIKE '%baz%' )))
    OR not(foo_column_1 IS NULL AND( foo_column_2
    NOT IN( 'foo_column_2_bar_1', 'foo_column_2_bar_2', 'foo_column_2_bar_3', 'foo_column_2_bar_4' ) )
    OR foo_column_3 IS NULL AND( foo_column_1 IS NULL )))
    THEN 0 ELSE 1 END )
    conditional_count
    FROM bar
    WHERE TRUE AND( ts >= '2019-01-01 00:00:00.0' AND ts < '2019-06-21 00:00:00.0' )
    GROUP BY foo_id ORDER BY count_entries DESC LIMIT 10 OFFSET 0 ROWS;

    The problem looks the way Calcite 1.19 is rewriting the NOT condition, that was introduced in 4.7 release.
    Generally speaking there isn't any important difference between inequalities and equalities



  • 3.  RE: Query plan massively slower for the same query on MapD 4.7.1

    Posted 19 days ago
    Hi Candido, 

    Unfortunately I terminated the machine running the code because it was very expensive. 
    I will try to run it on a smaller dataset so we can compare the results.
    However, before terminating it I was able to grab the explain analyze and I've noticed that they look pretty different at the point where the old one is 221 lines and the new one is 319 lines for the same query.
    4.5.0:
    IR for the GPU:
    ===============
    
    ; Function Attrs: uwtable
    define void @query_group_by_template(i8** nocapture readnone %byte_stream, i8* nocapture readonly %literals, i64* nocapture readnone %row_count_ptr, i64* nocapture readonly %frag_row_off_ptr, i32* %max_matched_ptr, i64* %agg_init_val, i64** %group_by_buffers, i64** %small_groups_buffer, i32 %frag_idx, i64* %join_hash_tables, i32* %total_matched, i32* %error_code) #20 {
    .entry:
      %0 = getelementptr i8*, i8** %byte_stream, i32 0
      %1 = load i8*, i8** %0
      %2 = getelementptr i8*, i8** %byte_stream, i32 1
      %3 = load i8*, i8** %2
      %4 = getelementptr i8*, i8** %byte_stream, i32 2
      %5 = load i8*, i8** %4
      %6 = getelementptr i8*, i8** %byte_stream, i32 3
      %7 = load i8*, i8** %6
      %8 = getelementptr i8*, i8** %byte_stream, i32 4
      %9 = load i8*, i8** %8
      %10 = getelementptr i8*, i8** %byte_stream, i32 5
      %11 = load i8*, i8** %10
      %12 = getelementptr i8*, i8** %byte_stream, i32 6
      %13 = load i8*, i8** %12
      %14 = getelementptr i8*, i8** %byte_stream, i32 7
      %15 = load i8*, i8** %14
      %16 = getelementptr i8, i8* %literals, i16 0
      %17 = bitcast i8* %16 to i64*
      %literal_0 = load i64, i64* %17
      %18 = getelementptr i8, i8* %literals, i16 8
      %19 = bitcast i8* %18 to i64*
      %literal_8 = load i64, i64* %19
      %20 = getelementptr i8, i8* %literals, i16 16
      %21 = bitcast i8* %20 to i32*
      %literal_16 = load i32, i32* %21
      %22 = getelementptr i8, i8* %literals, i16 24
      %23 = bitcast i8* %22 to i64*
      %literal_24 = load i64, i64* %23
      %24 = getelementptr i8, i8* %literals, i16 32
      %25 = bitcast i8* %24 to i32*
      %literal_32 = load i32, i32* %25
      %26 = getelementptr i8, i8* %literals, i16 36
      %27 = bitcast i8* %26 to i32*
      %literal_36 = load i32, i32* %27
      %28 = getelementptr i8, i8* %literals, i16 40
      %29 = bitcast i8* %28 to i32*
      %literal_40 = load i32, i32* %29
      %30 = getelementptr i8, i8* %literals, i16 44
      %31 = bitcast i8* %30 to i32*
      %32 = load i32, i32* %31
      %33 = and i32 %32, -65536
      %34 = lshr i32 %33, 16
      %literal_44_length = and i32 %32, 65535
      %literal_44_start_address = getelementptr i8, i8* %literals, i32 %34
      %35 = getelementptr i8, i8* %literals, i16 48
      %36 = bitcast i8* %35 to i64*
      %literal_48 = load i64, i64* %36
      %37 = getelementptr i8, i8* %literals, i16 56
      %38 = bitcast i8* %37 to i16*
      %literal_56 = load i16, i16* %38
      %39 = getelementptr i8, i8* %literals, i16 58
      %40 = bitcast i8* %39 to i16*
      %literal_58 = load i16, i16* %40
      %41 = getelementptr i8, i8* %literals, i16 64
      %42 = bitcast i8* %41 to i64*
      %literal_64 = load i64, i64* %42
      %row_count = load i64, i64* %row_count_ptr, align 8
      %43 = load i32, i32* %max_matched_ptr, align 4
      %crt_matched = alloca i32
      %old_total_matched = alloca i32
      %44 = call i32 @pos_start_impl(i32* %error_code)
      %45 = call i32 @pos_step_impl()
      %46 = call i32 @group_buff_idx_impl()
      %47 = sext i32 %44 to i64
      %48 = getelementptr i64*, i64** %group_by_buffers, i32 %46
      %49 = load i64*, i64** %48, align 8
      %50 = call i64* @init_shared_mem_nop(i64* %49, i32 0)
      %51 = icmp slt i64 %47, %row_count
      br i1 %51, label %.loop.preheader, label %.exit
    
    .loop.preheader:                                  ; preds = %.entry
      %52 = sext i32 %45 to i64
      br label %.forbody
    
    .forbody:                                         ; preds = %.forbody, %.loop.preheader
      %pos = phi i64 [ %47, %.loop.preheader ], [ %54, %.forbody ]
      %53 = call i32 @row_func_hoisted_literals(i64* %50, i64* null, i32* %crt_matched, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off_ptr, i64* %row_count_ptr, i8* %literals, i8* %1, i8* %3, i8* %5, i8* %7, i8* %9, i8* %11, i8* %13, i8* %15, i64* %join_hash_tables, i64 %literal_64, i16 %literal_58, i64 %literal_24, i32 %literal_16, i64 %literal_8, i64 %literal_0, i64 0, i8* %literal_44_start_address, i32 %literal_44_length, i32 %literal_32, i32 %literal_36, i32 %literal_40, i64 %literal_48, i16 %literal_56)
      %54 = add i64 %pos, %52
      %55 = icmp slt i64 %54, %row_count
      br i1 %55, label %.forbody, label %._crit_edge
    
    ._crit_edge:                                      ; preds = %.forbody
      br label %.exit
    
    .exit:                                            ; preds = %._crit_edge, %.entry
      call void @write_back_nop(i64* %49, i64* %50, i32 0)
      ret void
    }
    
    ; Function Attrs: alwaysinline
    define i32 @row_func_hoisted_literals(i64* %group_by_buff, i64* %small_group_by_buff, i32* %crt_match, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off, i64* %num_rows_per_scan, i8* %literals, i8* %col_buf0, i8* %col_buf1, i8* %col_buf2, i8* %col_buf3, i8* %col_buf4, i8* %col_buf5, i8* %col_buf6, i8* %col_buf7, i64* %join_hash_tables, i64 %arg_literal_64, i16 %arg_literal_58, i64 %arg_literal_24, i32 %arg_literal_16, i64 %arg_literal_8, i64 %arg_literal_0, i64, i8* %arg_literal_44_start_address, i32 %arg_literal_44_length, i32 %arg_literal_32, i32 %arg_literal_36, i32 %arg_literal_40, i64 %arg_literal_48, i16 %arg_literal_56) #21 {
    entry:
      %1 = load i64, i64* %frag_row_off
      %2 = call i64 @fixed_width_int_decode(i8* %col_buf5, i32 4, i64 %pos)
      %3 = icmp sge i64 %2, %arg_literal_0
      %4 = and i1 true, %3
      %5 = icmp slt i64 %2, %arg_literal_8
      %6 = and i1 %4, %5
      br i1 %6, label %filter_true, label %filter_false
    
    filter_true:                                      ; preds = %entry
      %7 = call i64 @fixed_width_int_decode(i8* %col_buf3, i32 4, i64 %pos)
      %8 = trunc i64 %7 to i32
      %9 = sext i32 %8 to i64
      %10 = call i64* @get_group_value_fast(i64* %group_by_buff, i64 %9, i64 1, i64 0, i32 9)
      %11 = getelementptr i64, i64* %10, i32 0
      %12 = sext i32 %8 to i64
      call void @agg_id_shared(i64* %11, i64 %12)
      %13 = getelementptr i64, i64* %10, i32 1
      %14 = bitcast i64* %13 to i32*
      %15 = atomicrmw add i32* %14, i32 1 monotonic
      %16 = call i64 @fixed_width_unsigned_decode(i8* %col_buf7, i32 1, i64 %pos)
      %17 = trunc i64 %16 to i32
      %18 = trunc i32 %17 to i8
      %19 = call i32 @cast_uint8_t_to_int32_t_nullable(i8 %18, i8 -1, i32 -2147483648)
      %20 = call i8 @eq_int32_t_nullable_lhs(i32 %19, i32 %arg_literal_16, i64 -2147483648, i8 -128)
      %21 = icmp eq i32 %19, -2147483648
      %22 = call i64 @fixed_width_unsigned_decode(i8* %col_buf6, i32 2, i64 %pos)
      %23 = trunc i64 %22 to i32
      %24 = trunc i32 %23 to i16
      %25 = call i32 @cast_uint16_t_to_int32_t_nullable(i16 %24, i16 -1, i32 -2147483648)
      %26 = sext i32 %25 to i64
      %27 = call i8 @bit_is_set(i64 %arg_literal_24, i64 %26, i64 9, i64 289, i64 -2147483648, i8 -128)
      %28 = zext i1 %21 to i8
      %29 = call i8 @logical_and(i8 %28, i8 %27, i8 -128)
      %30 = call i8 @logical_or(i8 %20, i8 %29, i8 -128)
      %31 = call i8 @eq_int32_t_nullable_lhs(i32 %19, i32 %arg_literal_32, i64 -2147483648, i8 -128)
      %32 = call i8 @logical_or(i8 %30, i8 %31, i8 -128)
      %33 = call i8 @eq_int32_t_nullable_lhs(i32 %19, i32 %arg_literal_36, i64 -2147483648, i8 -128)
      %34 = call i8 @logical_or(i8 %32, i8 %33, i8 -128)
      %35 = icmp eq i32 %19, -2147483648
      %36 = call i8 @eq_int32_t_nullable_lhs(i32 %25, i32 %arg_literal_40, i64 -2147483648, i8 -128)
      %37 = zext i1 %35 to i8
      %38 = call i8 @logical_and(i8 %37, i8 %36, i8 -128)
      %39 = call i8 @logical_or(i8 %34, i8 %38, i8 -128)
      %40 = icmp eq i32 %25, -2147483648
      %41 = icmp eq i32 %19, -2147483648
      %42 = and i1 %40, %41
      %43 = call i64 @string_decode(i8* %col_buf4, i64 %pos)
      %44 = call i8* @extract_str_ptr(i64 %43)
      %45 = call i32 @extract_str_len(i64 %43)
      %46 = call i1 @string_like_simple(i8* %44, i32 %45, i8* %arg_literal_44_start_address, i32 %arg_literal_44_length)
      %47 = and i1 %42, %46
      %48 = zext i1 %47 to i8
      %49 = call i8 @logical_or(i8 %39, i8 %48, i8 -128)
      %50 = call i8 @logical_not(i8 %49, i8 -128)
      %51 = icmp eq i32 %19, -2147483648
      %52 = sext i32 %25 to i64
      %53 = call i8 @bit_is_set(i64 %arg_literal_48, i64 %52, i64 2, i64 289, i64 -2147483648, i8 -128)
      %54 = call i8 @logical_not(i8 %53, i8 -128)
      %55 = zext i1 %51 to i8
      %56 = call i8 @logical_and(i8 %55, i8 %54, i8 -128)
      %57 = call i8 @logical_or(i8 %50, i8 %56, i8 -128)
      %58 = icmp eq i32 %25, -2147483648
      %59 = icmp eq i32 %19, -2147483648
      %60 = and i1 %58, %59
      %61 = zext i1 %60 to i8
      %62 = call i8 @logical_or(i8 %57, i8 %61, i8 -128)
      %63 = icmp sgt i8 %62, 0
      br i1 %63, label %then_case, label %when_case
    
    filter_false:                                     ; preds = %end_case1, %entry
      ret i32 0
    
    end_case:                                         ; preds = %when_case, %then_case
      %64 = phi i16 [ %arg_literal_56, %then_case ], [ %arg_literal_58, %when_case ]
      %65 = getelementptr i64, i64* %10, i32 2
      %66 = sext i16 %64 to i64
      %67 = call i64 @agg_sum_skip_val_shared(i64* %65, i64 %66, i64 -32768)
      %68 = call i64 @fixed_width_int_decode(i8* %col_buf0, i32 4, i64 %pos)
      %69 = trunc i64 %68 to i32
      %70 = getelementptr i64, i64* %10, i32 3
      %71 = sext i32 %69 to i64
      %72 = call i64 @agg_sum_shared(i64* %70, i64 %71)
      %73 = getelementptr i64, i64* %10, i32 4
      %74 = sext i32 %69 to i64
      %75 = call i64 @agg_sum_shared(i64* %73, i64 %74)
      %76 = getelementptr i64, i64* %10, i32 5
      %77 = sext i32 %69 to i64
      %78 = call i64 @agg_count_shared(i64* %76, i64 %77)
      %79 = call i64 @fixed_width_int_decode(i8* %col_buf2, i32 4, i64 %pos)
      %80 = trunc i64 %79 to i32
      %81 = getelementptr i64, i64* %10, i32 6
      %82 = sext i32 %80 to i64
      %83 = bitcast i8* %literals to i64*
      %84 = getelementptr i64, i64* %83, i32 -1
      %85 = load i64, i64* %84
      %86 = bitcast i8* %literals to i64*
      %87 = getelementptr i64, i64* %86, i32 -2
      %88 = load i64, i64* %87
      call void @agg_approximate_count_distinct_gpu(i64* %81, i64 %82, i32 11, i64 %85, i64 %88)
      %89 = call i64 @fixed_width_int_decode(i8* %col_buf1, i32 8, i64 %pos)
      %90 = icmp eq i64 %89, -9223372036854775808
      %91 = xor i1 %90, true
      br i1 %91, label %then_case2, label %when_case3
    
    then_case:                                        ; preds = %filter_true
      br label %end_case
    
    when_case:                                        ; preds = %filter_true
      br label %end_case
    
    end_case1:                                        ; preds = %when_case3, %then_case2
      %92 = phi i64 [ %89, %then_case2 ], [ %arg_literal_64, %when_case3 ]
      %93 = getelementptr i64, i64* %10, i32 7
      %94 = call i64 @agg_sum_skip_val_shared(i64* %93, i64 %92, i64 -9223372036854775808)
      br label %filter_false
    
    then_case2:                                       ; preds = %end_case
      br label %end_case1
    
    when_case3:                                       ; preds = %end_case
      br label %end_case1
    }

    4.6.0+:
    Explanation
    IR for the GPU:
    ===============
    
    ; Function Attrs: uwtable
    define void @query_group_by_template(i8** nocapture readnone %byte_stream, i8* nocapture readonly %literals, i64* nocapture readnone %row_count_ptr, i64* nocapture readonly %frag_row_off_ptr, i32* %max_matched_ptr, i64* %agg_init_val, i64** %group_by_buffers, i32 %frag_idx, i64* %join_hash_tables, i32* %total_matched, i32* %error_code) #22 {
    .entry:
      %0 = getelementptr i8*, i8** %byte_stream, i32 0
      %1 = load i8*, i8** %0
      %2 = getelementptr i8*, i8** %byte_stream, i32 1
      %3 = load i8*, i8** %2
      %4 = getelementptr i8*, i8** %byte_stream, i32 2
      %5 = load i8*, i8** %4
      %6 = getelementptr i8*, i8** %byte_stream, i32 3
      %7 = load i8*, i8** %6
      %8 = getelementptr i8*, i8** %byte_stream, i32 4
      %9 = load i8*, i8** %8
      %10 = getelementptr i8*, i8** %byte_stream, i32 5
      %11 = load i8*, i8** %10
      %12 = getelementptr i8*, i8** %byte_stream, i32 6
      %13 = load i8*, i8** %12
      %14 = getelementptr i8*, i8** %byte_stream, i32 7
      %15 = load i8*, i8** %14
      %16 = getelementptr i8*, i8** %byte_stream, i32 8
      %17 = load i8*, i8** %16
      %18 = getelementptr i8, i8* %literals, i16 0
      %19 = bitcast i8* %18 to i64*
      %literal_0 = load i64, i64* %19
      %20 = getelementptr i8, i8* %literals, i16 8
      %21 = bitcast i8* %20 to i64*
      %literal_8 = load i64, i64* %21
      %22 = getelementptr i8, i8* %literals, i16 16
      %23 = bitcast i8* %22 to i32*
      %literal_16 = load i32, i32* %23
      %24 = getelementptr i8, i8* %literals, i16 20
      %25 = bitcast i8* %24 to i32*
      %literal_20 = load i32, i32* %25
      %26 = getelementptr i8, i8* %literals, i16 24
      %27 = bitcast i8* %26 to i32*
      %literal_24 = load i32, i32* %27
      %28 = getelementptr i8, i8* %literals, i16 28
      %29 = bitcast i8* %28 to i32*
      %literal_28 = load i32, i32* %29
      %30 = getelementptr i8, i8* %literals, i16 32
      %31 = bitcast i8* %30 to i32*
      %literal_32 = load i32, i32* %31
      %32 = getelementptr i8, i8* %literals, i16 36
      %33 = bitcast i8* %32 to i32*
      %literal_36 = load i32, i32* %33
      %34 = getelementptr i8, i8* %literals, i16 40
      %35 = bitcast i8* %34 to i32*
      %literal_40 = load i32, i32* %35
      %36 = getelementptr i8, i8* %literals, i16 44
      %37 = bitcast i8* %36 to i32*
      %literal_44 = load i32, i32* %37
      %38 = getelementptr i8, i8* %literals, i16 48
      %39 = bitcast i8* %38 to i32*
      %literal_48 = load i32, i32* %39
      %40 = getelementptr i8, i8* %literals, i16 52
      %41 = bitcast i8* %40 to i32*
      %literal_52 = load i32, i32* %41
      %42 = getelementptr i8, i8* %literals, i16 56
      %43 = bitcast i8* %42 to i32*
      %literal_56 = load i32, i32* %43
      %44 = getelementptr i8, i8* %literals, i16 60
      %45 = bitcast i8* %44 to i32*
      %literal_60 = load i32, i32* %45
      %46 = getelementptr i8, i8* %literals, i16 64
      %47 = bitcast i8* %46 to i32*
      %literal_64 = load i32, i32* %47
      %48 = getelementptr i8, i8* %literals, i16 68
      %49 = bitcast i8* %48 to i32*
      %literal_68 = load i32, i32* %49
      %50 = getelementptr i8, i8* %literals, i16 72
      %51 = bitcast i8* %50 to i32*
      %literal_72 = load i32, i32* %51
      %52 = getelementptr i8, i8* %literals, i16 76
      %53 = bitcast i8* %52 to i32*
      %literal_76 = load i32, i32* %53
      %54 = getelementptr i8, i8* %literals, i16 80
      %55 = bitcast i8* %54 to i32*
      %literal_80 = load i32, i32* %55
      %56 = getelementptr i8, i8* %literals, i16 84
      %57 = bitcast i8* %56 to i32*
      %58 = load i32, i32* %57
      %59 = and i32 %58, -65536
      %60 = lshr i32 %59, 16
      %literal_84_length = and i32 %58, 65535
      %literal_84_start_address = getelementptr i8, i8* %literals, i32 %60
      %61 = getelementptr i8, i8* %literals, i16 88
      %62 = bitcast i8* %61 to i16*
      %literal_88 = load i16, i16* %62
      %63 = getelementptr i8, i8* %literals, i16 90
      %64 = bitcast i8* %63 to i16*
      %literal_90 = load i16, i16* %64
      %65 = getelementptr i8, i8* %literals, i16 96
      %66 = bitcast i8* %65 to i64*
      %literal_96 = load i64, i64* %66
      %row_count = load i64, i64* %row_count_ptr, align 8
      %67 = load i32, i32* %max_matched_ptr, align 4
      %crt_matched = alloca i32
      %old_total_matched = alloca i32
      %68 = call i32 @pos_start_impl(i32* %error_code)
      %69 = call i32 @pos_step_impl()
      %70 = call i32 @group_buff_idx_impl()
      %71 = sext i32 %68 to i64
      %72 = getelementptr i64*, i64** %group_by_buffers, i32 %70
      %col_buffer = load i64*, i64** %72, align 8
      %result_buffer = call i64* @init_shared_mem_nop(i64* %col_buffer, i32 0)
      %73 = icmp slt i64 %71, %row_count
      br i1 %73, label %.loop.preheader, label %.exit
    
    .loop.preheader:                                  ; preds = %.entry
      %74 = sext i32 %69 to i64
      br label %.forbody
    
    .forbody:                                         ; preds = %.forbody, %.loop.preheader
      %pos = phi i64 [ %71, %.loop.preheader ], [ %76, %.forbody ]
      %75 = call i32 @row_func_hoisted_literals(i64* %result_buffer, i32* %crt_matched, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off_ptr, i64* %row_count_ptr, i8* %literals, i8* %1, i8* %3, i8* %5, i8* %7, i8* %9, i8* %11, i8* %13, i8* %15, i8* %17, i64* %join_hash_tables, i64 %literal_96, i32 %literal_40, i32 %literal_36, i32 %literal_32, i32 %literal_28, i32 %literal_24, i32 %literal_44, i16 %literal_90, i64 %literal_0, i32 %literal_20, i64 %literal_8, i32 %literal_16, i32 %literal_48, i32 %literal_52, i32 %literal_56, i32 %literal_60, i32 %literal_64, i32 %literal_68, i32 %literal_72, i32 %literal_76, i32 %literal_80, i64 0, i8* %literal_84_start_address, i32 %literal_84_length, i16 %literal_88)
      %76 = add i64 %pos, %74
      %77 = icmp slt i64 %76, %row_count
      br i1 %77, label %.forbody, label %._crit_edge
    
    ._crit_edge:                                      ; preds = %.forbody
      br label %.exit
    
    .exit:                                            ; preds = %._crit_edge, %.entry
      call void @write_back_nop(i64* %col_buffer, i64* %result_buffer, i32 0)
      ret void
    }
    
    ; Function Attrs: alwaysinline
    define i32 @row_func_hoisted_literals(i64* %group_by_buff, i32* %crt_match, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64* %frag_row_off, i64* %num_rows_per_scan, i8* %literals, i8* %col_buf0, i8* %col_buf1, i8* %col_buf2, i8* %col_buf3, i8* %col_buf4, i8* %col_buf5, i8* %col_buf6, i8* %col_buf7, i8* %col_buf8, i64* %join_hash_tables, i64 %arg_literal_96, i32 %arg_literal_40, i32 %arg_literal_36, i32 %arg_literal_32, i32 %arg_literal_28, i32 %arg_literal_24, i32 %arg_literal_44, i16 %arg_literal_90, i64 %arg_literal_0, i32 %arg_literal_20, i64 %arg_literal_8, i32 %arg_literal_16, i32 %arg_literal_48, i32 %arg_literal_52, i32 %arg_literal_56, i32 %arg_literal_60, i32 %arg_literal_64, i32 %arg_literal_68, i32 %arg_literal_72, i32 %arg_literal_76, i32 %arg_literal_80, i64, i8* %arg_literal_84_start_address, i32 %arg_literal_84_length, i16 %arg_literal_88) #23 {
    entry:
      %1 = load i64, i64* %frag_row_off
      %2 = call i64 @fixed_width_int_decode(i8* %col_buf8, i32 1, i64 %pos)
      %3 = trunc i64 %2 to i8
      %4 = icmp sgt i8 %3, 0
      br i1 %4, label %is_deleted, label %is_not_deleted
    
    is_deleted:                                       ; preds = %entry
      ret i32 0
    
    is_not_deleted:                                   ; preds = %entry
      %5 = call i64 @fixed_width_int_decode(i8* %col_buf5, i32 4, i64 %pos)
      %6 = icmp sge i64 %5, %arg_literal_0
      %7 = and i1 true, %6
      %8 = icmp slt i64 %5, %arg_literal_8
      %9 = and i1 %7, %8
      br i1 %9, label %filter_true, label %filter_false
    
    filter_true:                                      ; preds = %is_not_deleted
      %10 = call i64 @fixed_width_int_decode(i8* %col_buf3, i32 4, i64 %pos)
      %11 = trunc i64 %10 to i32
      %12 = sext i32 %11 to i64
      %13 = call i64* @get_group_value_fast(i64* %group_by_buff, i64 %12, i64 2, i64 0, i32 9)
      %agg_col_ptr = getelementptr i64, i64* %13, i32 0
      %14 = sext i32 %11 to i64
      call void @agg_id_shared(i64* %agg_col_ptr, i64 %14)
      %15 = getelementptr i64, i64* %13, i32 1
      %16 = bitcast i64* %15 to i32*
      %17 = atomicrmw add i32* %16, i32 1 monotonic
      %18 = call i64 @fixed_width_unsigned_decode(i8* %col_buf7, i32 1, i64 %pos)
      %19 = trunc i64 %18 to i32
      %20 = trunc i32 %19 to i8
      %21 = call i32 @cast_uint8_t_to_int32_t_nullable(i8 %20, i8 -1, i32 -2147483648)
      %22 = call i8 @ne_int32_t_nullable_lhs(i32 %21, i32 %arg_literal_16, i64 -2147483648, i8 -128)
      %23 = icmp eq i32 %21, -2147483648
      %24 = xor i1 %23, true
      %25 = call i64 @fixed_width_unsigned_decode(i8* %col_buf6, i32 2, i64 %pos)
      %26 = trunc i64 %25 to i32
      %27 = trunc i32 %26 to i16
      %28 = call i32 @cast_uint16_t_to_int32_t_nullable(i16 %27, i16 -1, i32 -2147483648)
      %29 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_20, i64 -2147483648, i8 -128)
      %30 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_24, i64 -2147483648, i8 -128)
      %31 = call i8 @logical_and(i8 %29, i8 %30, i8 -128)
      %32 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_28, i64 -2147483648, i8 -128)
      %33 = call i8 @logical_and(i8 %31, i8 %32, i8 -128)
      %34 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_32, i64 -2147483648, i8 -128)
      %35 = call i8 @logical_and(i8 %33, i8 %34, i8 -128)
      %36 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_36, i64 -2147483648, i8 -128)
      %37 = call i8 @logical_and(i8 %35, i8 %36, i8 -128)
      %38 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_40, i64 -2147483648, i8 -128)
      %39 = call i8 @logical_and(i8 %37, i8 %38, i8 -128)
      %40 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_44, i64 -2147483648, i8 -128)
      %41 = call i8 @logical_and(i8 %39, i8 %40, i8 -128)
      %42 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_48, i64 -2147483648, i8 -128)
      %43 = call i8 @logical_and(i8 %41, i8 %42, i8 -128)
      %44 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_52, i64 -2147483648, i8 -128)
      %45 = call i8 @logical_and(i8 %43, i8 %44, i8 -128)
      %46 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_56, i64 -2147483648, i8 -128)
      %47 = call i8 @logical_and(i8 %45, i8 %46, i8 -128)
      %48 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_60, i64 -2147483648, i8 -128)
      %49 = call i8 @logical_and(i8 %47, i8 %48, i8 -128)
      %50 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_64, i64 -2147483648, i8 -128)
      %51 = call i8 @logical_and(i8 %49, i8 %50, i8 -128)
      %52 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_68, i64 -2147483648, i8 -128)
      %53 = call i8 @logical_and(i8 %51, i8 %52, i8 -128)
      %54 = zext i1 %24 to i8
      %55 = call i8 @logical_or(i8 %54, i8 %53, i8 -128)
      %56 = call i8 @logical_and(i8 %22, i8 %55, i8 -128)
      %57 = call i8 @ne_int32_t_nullable_lhs(i32 %21, i32 %arg_literal_72, i64 -2147483648, i8 -128)
      %58 = call i8 @logical_and(i8 %56, i8 %57, i8 -128)
      %59 = call i8 @ne_int32_t_nullable_lhs(i32 %21, i32 %arg_literal_76, i64 -2147483648, i8 -128)
      %60 = call i8 @logical_and(i8 %58, i8 %59, i8 -128)
      %61 = icmp eq i32 %21, -2147483648
      %62 = xor i1 %61, true
      %63 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_80, i64 -2147483648, i8 -128)
      %64 = zext i1 %62 to i8
      %65 = call i8 @logical_or(i8 %64, i8 %63, i8 -128)
      %66 = call i8 @logical_and(i8 %60, i8 %65, i8 -128)
      %67 = icmp eq i32 %28, -2147483648
      %68 = xor i1 %67, true
      %69 = icmp eq i32 %21, -2147483648
      %70 = xor i1 %69, true
      %71 = or i1 %68, %70
      %72 = call i64 @string_decode(i8* %col_buf4, i64 %pos)
      %73 = call i8* @extract_str_ptr(i64 %72)
      %74 = call i32 @extract_str_len(i64 %72)
      %75 = call i1 @string_like_simple(i8* %73, i32 %74, i8* %arg_literal_84_start_address, i32 %arg_literal_84_length)
      %76 = xor i1 %75, true
      %77 = or i1 %71, %76
      %78 = zext i1 %77 to i8
      %79 = call i8 @logical_and(i8 %66, i8 %78, i8 -128)
      %80 = icmp eq i32 %21, -2147483648
      %81 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_80, i64 -2147483648, i8 -128)
      %82 = zext i1 %80 to i8
      %83 = call i8 @logical_and(i8 %82, i8 %81, i8 -128)
      %84 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_20, i64 -2147483648, i8 -128)
      %85 = call i8 @logical_and(i8 %83, i8 %84, i8 -128)
      %86 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_24, i64 -2147483648, i8 -128)
      %87 = call i8 @logical_and(i8 %85, i8 %86, i8 -128)
      %88 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_28, i64 -2147483648, i8 -128)
      %89 = call i8 @logical_and(i8 %87, i8 %88, i8 -128)
      %90 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_32, i64 -2147483648, i8 -128)
      %91 = call i8 @logical_and(i8 %89, i8 %90, i8 -128)
      %92 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_36, i64 -2147483648, i8 -128)
      %93 = call i8 @logical_and(i8 %91, i8 %92, i8 -128)
      %94 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_40, i64 -2147483648, i8 -128)
      %95 = call i8 @logical_and(i8 %93, i8 %94, i8 -128)
      %96 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_44, i64 -2147483648, i8 -128)
      %97 = call i8 @logical_and(i8 %95, i8 %96, i8 -128)
      %98 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_48, i64 -2147483648, i8 -128)
      %99 = call i8 @logical_and(i8 %97, i8 %98, i8 -128)
      %100 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_52, i64 -2147483648, i8 -128)
      %101 = call i8 @logical_and(i8 %99, i8 %100, i8 -128)
      %102 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_56, i64 -2147483648, i8 -128)
      %103 = call i8 @logical_and(i8 %101, i8 %102, i8 -128)
      %104 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_60, i64 -2147483648, i8 -128)
      %105 = call i8 @logical_and(i8 %103, i8 %104, i8 -128)
      %106 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_64, i64 -2147483648, i8 -128)
      %107 = call i8 @logical_and(i8 %105, i8 %106, i8 -128)
      %108 = call i8 @ne_int32_t_nullable_lhs(i32 %28, i32 %arg_literal_68, i64 -2147483648, i8 -128)
      %109 = call i8 @logical_and(i8 %107, i8 %108, i8 -128)
      %110 = call i8 @logical_or(i8 %79, i8 %109, i8 -128)
      %111 = icmp eq i32 %28, -2147483648
      %112 = icmp eq i32 %21, -2147483648
      %113 = and i1 %111, %112
      %114 = zext i1 %113 to i8
      %115 = call i8 @logical_or(i8 %110, i8 %114, i8 -128)
      %116 = icmp sgt i8 %115, 0
      br i1 %116, label %then_case, label %when_case
    
    filter_false:                                     ; preds = %end_case20, %is_not_deleted
      ret i32 0
    
    end_case:                                         ; preds = %when_case, %then_case
      %117 = phi i16 [ %arg_literal_88, %then_case ], [ %arg_literal_90, %when_case ]
      %agg_col_ptr15 = getelementptr i64, i64* %13, i32 2
      %118 = icmp eq i16 %117, -32768
      %119 = sext i16 %117 to i64
      %120 = select i1 %118, i64 -9223372036854775808, i64 %119
      %121 = call i64 @agg_sum_skip_val_shared(i64* %agg_col_ptr15, i64 %120, i64 -9223372036854775808)
      %122 = call i64 @fixed_width_int_decode(i8* %col_buf0, i32 4, i64 %pos)
      %123 = trunc i64 %122 to i32
      %agg_col_ptr16 = getelementptr i64, i64* %13, i32 3
      %124 = sext i32 %123 to i64
      %125 = call i64 @agg_sum_shared(i64* %agg_col_ptr16, i64 %124)
      %agg_col_ptr17 = getelementptr i64, i64* %13, i32 4
      %126 = sext i32 %123 to i64
      %127 = call i64 @agg_sum_shared(i64* %agg_col_ptr17, i64 %126)
      %agg_col_ptr18 = getelementptr i64, i64* %13, i32 5
      %128 = sext i32 %123 to i64
      %129 = call i64 @agg_count_shared(i64* %agg_col_ptr18, i64 %128)
      %130 = call i64 @fixed_width_int_decode(i8* %col_buf2, i32 4, i64 %pos)
      %131 = trunc i64 %130 to i32
      %agg_col_ptr19 = getelementptr i64, i64* %13, i32 6
      %132 = sext i32 %131 to i64
      %133 = bitcast i8* %literals to i64*
      %134 = getelementptr i64, i64* %133, i32 -1
      %135 = load i64, i64* %134
      %136 = bitcast i8* %literals to i64*
      %137 = getelementptr i64, i64* %136, i32 -2
      %138 = load i64, i64* %137
      call void @agg_approximate_count_distinct_gpu(i64* %agg_col_ptr19, i64 %132, i32 11, i64 %135, i64 %138)
      %139 = call i64 @fixed_width_int_decode(i8* %col_buf1, i32 8, i64 %pos)
      %140 = icmp eq i64 %139, -9223372036854775808
      %141 = xor i1 %140, true
      br i1 %141, label %then_case21, label %when_case22
    
    then_case:                                        ; preds = %filter_true
      br label %end_case
    
    when_case:                                        ; preds = %filter_true
      br label %end_case
    
    end_case20:                                       ; preds = %when_case22, %then_case21
      %142 = phi i64 [ %139, %then_case21 ], [ %arg_literal_96, %when_case22 ]
      %agg_col_ptr23 = getelementptr i64, i64* %13, i32 7
      %143 = call i64 @agg_sum_skip_val_shared(i64* %agg_col_ptr23, i64 %142, i64 -9223372036854775808)
      br label %filter_false
    
    then_case21:                                      ; preds = %end_case
      br label %end_case20
    
    when_case22:                                      ; preds = %end_case
      br label %end_case20
    }




  • 4.  RE: Query plan massively slower for the same query on MapD 4.7.1

    Posted 19 days ago
    Do you mean the IR you get with analyze? the new one is longer but not so .



    Attachment(s)

    txt
    471ir.txt   3K 1 version
    txt
    441ir.txt   2K 1 version


  • 5.  RE: Query plan massively slower for the same query on MapD 4.7.1

    Posted 19 days ago
    the query has to be different then; you have multiple sums, a count and an approximate count distinct


  • 6.  RE: Query plan massively slower for the same query on MapD 4.7.1

    Posted 19 days ago
    The query is slightly different but I don't think that any other field is having issues because if I comment out the CASE condition it works in the same amount of time of 4.5.0.
    However you are comparing 4.4.1 but I am running it in 4.5.0.