Complex Query Fails

Apologies for the complexity of the query, but I could not simplify it any further and still have it fail. I’m getting the following error, which leads me to believe that the Omnisci server is failing on my query and not reporting the true error back to the JDBC client (DBeaver in this case).

TIOStreamTransport.java:read:130:org.apache.thrift.transport.TTransportException: Socket is closed by peer.]

select *
FROM
(
    -- dataset_join_node(out, [out_pop_out])
    SELECT r1_1.store__id AS store__id, r1_1.month__id AS month__id,
        r1_1.percent_chg AS "metric1"
    FROM
    (
        -- dataset_join_node(out_pop_out, [out_current, out_prior, out_d])
--        SELECT COALESCE(r2_3.month__id, r2_1.month__id, r2_2.month__id) AS month__id, COALESCE(r2_1.store__id, r2_2.store__id) AS store__id,
        SELECT COALESCE(r2_1.month__id, r2_2.month__id) AS month__id, COALESCE(r2_1.store__id, r2_2.store__id) AS store__id,
            CASE WHEN r2_1.expr IS NULL THEN -1 WHEN r2_2.expr IS NULL THEN NULL WHEN COALESCE(r2_2.expr, 0) = 0 THEN 0 ELSE (r2_1.expr - r2_2.expr) / (1.0 * ABS(r2_2.expr)) END AS "percent_chg", r2_1.expr AS "expr_current", 
                r2_2.expr AS "expr_prior", COALESCE(r2_1.expr, 0) - COALESCE(r2_2.expr, 0) AS "expr_delta"
        FROM
        (
            -- leaf_node(out_current, fact_sales)
            SELECT max_dim_date.the_year AS period_seq__id, max_dim_date.the_month AS month_of_year__id, max_dim_date.month_year_name AS month__id, fact_sales.store_id AS store__id,
                SUM(fact_sales.sales_amt) AS expr
            FROM fact_sales
            JOIN max_dim_date
              ON max_dim_date.the_date = fact_sales.the_date
             AND (max_dim_date.the_date BETWEEN '2015-01-01' AND '2015-12-31')
            GROUP BY max_dim_date.the_year, max_dim_date.the_month, max_dim_date.month_year_name, fact_sales.store_id
        ) AS r2_1
        LEFT OUTER JOIN
        (
            -- leaf_node(out_prior, fact_sales)
            SELECT max_dim_date.the_year + 1 AS period_seq__id, max_dim_date.the_month AS month_of_year__id, max_dim_date.month_year_name AS month__id, fact_sales.store_id AS store__id,
                SUM(fact_sales.sales_amt) AS expr
            FROM fact_sales
            JOIN max_dim_date
              ON max_dim_date.the_date = fact_sales.the_date
             AND (max_dim_date.the_date BETWEEN '2014-01-01' AND '2014-12-31')
            GROUP BY max_dim_date.the_year + 1, max_dim_date.the_month, max_dim_date.month_year_name, fact_sales.store_id
        ) AS r2_2
          ON r2_2.period_seq__id = r2_1.period_seq__id
         AND r2_2.month_of_year__id = r2_1.month_of_year__id
         AND r2_2.store__id = r2_1.store__id
        LEFT OUTER JOIN
        (
            -- leaf_node(out_d, max_dim_date)
            SELECT max_dim_date.the_year AS period_seq__id, max_dim_date.the_month AS month_of_year__id, max_dim_date.month_year_name AS month__id,
                COUNT(*) AS "count"
            FROM max_dim_date
            WHERE (max_dim_date.the_date BETWEEN '2015-01-01' AND '2015-12-31')
            GROUP BY max_dim_date.the_year, max_dim_date.the_month, max_dim_date.month_year_name
        ) AS r2_3
          ON r2_3.period_seq__id = COALESCE(r2_1.period_seq__id, r2_2.period_seq__id)
         AND r2_3.month_of_year__id = COALESCE(r2_1.month_of_year__id, r2_2.month_of_year__id)
    ) AS r1_1
) AS r
LEFT OUTER JOIN dim_store
  ON dim_store.store_id = r.store__id
LEFT OUTER JOIN
(
    SELECT DISTINCT month_year_name, month_year_sequence
    FROM max_dim_date
) AS max_dim_date_ord
  ON max_dim_date_ord.month_year_name = r.month__id

The query runs if I comment out the following lines from the inner query:

--        LEFT OUTER JOIN
--        (
--            -- leaf_node(out_d, max_dim_date)
--            SELECT max_dim_date.the_year AS period_seq__id, max_dim_date.the_month AS month_of_year__id, max_dim_date.month_year_name AS month__id,
--                COUNT(*) AS "count"
--            FROM max_dim_date
--            WHERE (max_dim_date.the_date BETWEEN '2015-01-01' AND '2015-12-31')
--            GROUP BY max_dim_date.the_year, max_dim_date.the_month, max_dim_date.month_year_name
--        ) AS r2_3
--          ON r2_3.period_seq__id = COALESCE(r2_1.period_seq__id, r2_2.period_seq__id)
--         AND r2_3.month_of_year__id = COALESCE(r2_1.month_of_year__id, r2_2.month_of_year__id)

Also, it’s definitely affected by the presence of both of the following joins in the root node:

LEFT OUTER JOIN dim_store
  ON dim_store.store_id = r.store__id
LEFT OUTER JOIN
(
    SELECT DISTINCT month_year_name, month_year_sequence
    FROM max_dim_date
) AS max_dim_date_ord
  ON max_dim_date_ord.month_year_name = r.month__id

Because if I comment out one or the other (or both) and leave in the original LEFT OUTER JOIN for the leaf_node(out_d, max_dim_date), it works!

Don’t know how you’re going to debug what’s going on here, but wanted to send it your way. I would be happy to provide more details if possible.

Thanks for taking a look.

-Larry

Here’s a simpler query that gives the generic Socket is closed by peer error:

select *
FROM
(
    -- leaf_node(out, fact_sales)
    SELECT dim_store.first_sales_date AS first_sales_date__id,
        SUM(fact_sales.sales_amt) AS amount
    FROM fact_sales
    JOIN dim_store
      ON dim_store.store_id = fact_sales.store_id
    GROUP BY dim_store.first_sales_date
) AS r
LEFT OUTER JOIN max_dim_date
  ON max_dim_date.the_date = r.first_sales_date__id

If I remove the LEFT OUTER JOIN, the query works:

select *
FROM
(
    -- leaf_node(out, fact_sales)
    SELECT dim_store.first_sales_date AS first_sales_date__id,
        SUM(fact_sales.sales_amt) AS amount
    FROM fact_sales
    JOIN dim_store
      ON dim_store.store_id = fact_sales.store_id
    GROUP BY dim_store.first_sales_date
) AS r
--LEFT OUTER JOIN max_dim_date
--  ON max_dim_date.the_date = r.first_sales_date__id

Hi @Larry_Parker,

I tried a query similar to your simpler query and I canì’t reproduce thhe issue

SELECT * 
FROM (
              SELECT airplanes_fk,carrier_name,
                             sum(depdelay) 
              FROM    flights_sk 
              JOIN      carriers_sk 
              ON        ìì carriers_fk=carriers_id 
              GROUP BY 1,2) as r 
              LEFT OUTER JOIN airplanes_sk 
              ON airplanes_fk=airplanes_id

tried various queries using datatypes similar to yours with no luck.

Could you post the server log

@candido.dessanti it works when an inner JOIN is used instead of a LEFT OUTER JOIN.

But I’ll see if I can get you the server logs for when it fails.

-Larry

Yap, but on my query, that’s similar to yours, I am using a left outer join so the issue has something to do with tables data

Here’s the error we just got on the server:

2020-09-22T19:33:22.126640 F 45401 2 ColumnIR.cpp:512 Check failed: hash_pos_it != cgen_state_->scan_idx_to_hash_pos_.end()

Here’s another SQL example that probably falls under the category of “complex query fails”.

    -- leaf_node(a, fact_sales)
    SELECT dim_product."brand_id" AS "brand__id",
        SUM(fact_sales."sales_amt") AS "amount"
    FROM fact_sales
    JOIN dim_product
      ON dim_product.product_id = fact_sales.product_id
    JOIN max_dim_date
      ON max_dim_date.the_date = fact_sales.the_date
    JOIN
    (
        -- join_qualifier_node(q)
        SELECT r."brand__id",
            r."first_date"
        FROM
        (
            -- leaf_node(q, fact_sales)
            SELECT dim_product."brand_id" AS "brand__id",
                MIN(max_dim_date."the_date") AS "first_date"
            FROM fact_sales
            JOIN dim_product
              ON dim_product.product_id = fact_sales.product_id
            JOIN dim_store
              ON dim_store.store_id = fact_sales.store_id
            JOIN dim_region
              ON dim_region.region_id = dim_store.region_id
             AND (dim_region.name  ILIKE 'southeast')
            JOIN max_dim_date
              ON max_dim_date.the_date = fact_sales.the_date
             AND (max_dim_date.the_year = 2015)
            GROUP BY dim_product."brand_id"
        ) AS r
    ) AS q
      ON q.brand__id = dim_product.brand_id
    -- the WHERE clause causes the query to fail
    WHERE (max_dim_date.the_date = q.first_date)
    GROUP BY dim_product."brand_id"

If I remove the WHERE clause, the query runs.