Query is crashing omnisql

Hi,

I have installed omniscidb 5.9 on a Centos 7 server to test it.
While running the following query from omnisql:

CREATE TABLE last_date AS
(SELECT alpha, beta, DATEDIFF(‘DAY’, CURRENT_DATE(), max(daydt)) AS last_dt
FROM hits
WHERE hits_val>0 AND
(alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND
beta NOT IN (‘1000’)
GROUP BY alpha,beta);

I got this output:
Thrift error: No more data to read.
Thrift connection error: No more data to read.
Retrying connection
Thrift error: No more data to read.
Thrift connection error: No more data to read.
Retrying connection
Thrift: Sat Dec 11 00:14:41 2021 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe
Thrift error: write() send(): Broken pipe
Thrift connection error: write() send(): Broken pipe
Retrying connection
Thrift: Sat Dec 11 00:14:49 2021 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe
Thrift error: write() send(): Broken pipe
Thrift connection error: write() send(): Broken pipe
Retrying connection

Looking at the logs I saw:
2021-12-10T23:40:09.458391 F 10663 0 2 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2)

The tables used are the following.
CREATE TABLE hits (
daydt DATE NOT NULL,
alpha TEXT NOT NULL,
beta TEXT NOT NULL,
hits_val SMALLINT NOT NULL
);

CREATE TABLE base (
daydt DATE NOT NULL,
alpha TEXT NOT NULL,
beta TEXT NOT NULL,
base_val BOOLEAN NOT NULL
);

The query runs with no issues, when I modify it to:

CREATE TABLE last_date AS
(SELECT alpha, beta, DATEDIFF(‘DAY’, CURRENT_DATE(), max(daydt)) AS last_dt
FROM hits
WHERE hits_val>0 AND
(alpha) IN (SELECT DISTINCT alpha FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND
beta NOT IN (‘1000’)
GROUP BY alpha,beta);

Can somebody help me out?

Many thanks.

Hi @amok,

Could you check the file system that you are using to store data? Probably it’s full so it’s crashing the server.

In case it’s full you should allocate more space to it, or move omnisci_storage in a more capable file system.

If you need some assistance on such an operation feel free to ask

Regards,
Candido

Hi candido,

thank you for the prompt reply.
I don’t think it is a storage issue:

If you look at the filesystem, there is plenty of free space everywhere. Especially in the mount point where omnisci storage is located, there is around 4.2T free:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 24K 252G 1% /dev/shm
tmpfs 252G 82M 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/nvme0n1p4 100G 59G 42G 59% /
/dev/nvme0n1p7 100G 64M 100G 1% /tmp
/dev/nvme0n1p6 100G 22G 79G 22% /home
/dev/nvme0n1p9 5.1T 918G 4.2T 18% /core/omnisci
/dev/nvme0n1p2 509M 123M 387M 24% /boot
/dev/nvme0n1p1 256M 12M 245M 5% /boot/efi
tmpfs 51G 0 51G 0% /run/user/1005

Anything else I should check?

The error in the log, that I mentioned in the previous post, looks suspicious:
2021-12-10T23:40:09.458391 F 10663 0 2 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2)

By the way, I am using the CPU version for my initial testing.

Many thanks!

Hi,

I’m truly sorry for the wrong suggestion; I just didn’t see the error message you posted in your message.

I’m going to try to reproduce your error and eventually file a Jira to our backend engineering team because it looks like a bug.

Regards,
Candido

Hi,

here are sample files to help you in your testing:
hits.csv
2021-05-04,121,1000,1
2020-04-20,131,2000,1
2020-04-21,131,2000,0
2020-04-22,131,2000,2
2020-04-23,131,2000,3
2020-04-24,131,2000,2
2020-04-25,131,2000,0
2020-04-26,131,2000,4
2020-04-27,131,2000,5
2020-04-28,131,2000,1

base.csv
2021-05-04,121,1000,1
2021-12-10,131,2000,1
2021-12-11,131,2000,1
2021-12-12,131,2000,1
2021-12-13,131,2000,1
2021-12-14,131,2000,1
2021-12-15,131,2000,1
2021-12-16,131,2000,1
2021-12-17,131,2000,1
2021-12-18,131,2000,1

Thanks.

Hi,

I will try with this data.

I already tried with some data(50m rows for both tables) I derived by another table, with omniscidb version from 5.5 to 5.9 and I don’t get any error with a query or with cats (CPU or GPU is the same)

Which version are you using?
Which are the cardinality of alpha and beta (I am on 4.8 million distinct rows on 50m total) and the size of tables?
How many records is the query itself is returning (I am around 100)

Sorry for the massive number of questions.

Candido.

P.s. I can share my data, just to know if it’s a problem with the data

Hi,

thanks for your support.
This is the version I have:
omnisql> \status
Server Version : 5.9.0-20211123-d294f1e842

I get the error, even with this super slim dataset I shared (10 lines per table).
The query is not returning anything because it is crashing.

Another thing I just noticed. Every time I connect to omnisql, I get this:
Thrift: Sat Dec 11 19:25:26 2021 TSocket::open() connect() <Host: localhost Port: 6274>: Connection refused
Do you get something like this?

Please, let me know how can I help?

Hi Candido,

I have tested the scenario above, using the docker image (two different servers) and I get the some error.
Steps:

  1. Followed the instructions from here Docker Open Source CPU - OmniSci Docs to install docker
  2. Connected to the container’s bash and added a folder
    mkdir …/omnisci-storage/data/mapd_import/
    added file base.csv and hits.csv with just the 10 lines.
  3. Connected to the container’s omnisql:
    CREATE TABLE hits ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, hits_val SMALLINT NOT NULL );
    CREATE TABLE base ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, base_val SMALLINT NOT NULL );
    copy base from ‘/omnisci-storage/data/mapd_import/base.csv’ with (header = ‘false’);
    copy hits from ‘/omnisci-storage/data/mapd_import/hits.csv’ with (header = ‘false’);
    CREATE TABLE last_date AS (SELECT alpha,beta,DATEDIFF(‘daydt’, CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 and (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt and base_val=1) and beta NOT IN (‘1000’) GROUP BY alpha,beta);

After the last query the container crashed.

Regards.

Hi,

tried with your data and everything worked without issues

omnisql> \d base
CREATE TABLE base (
  daydt DATE NOT NULL ENCODING DAYS(32),
  alpha TEXT NOT NULL ENCODING DICT(32),
  beta TEXT NOT NULL ENCODING DICT(32),
  base_val SMALLINT NOT NULL);
omnisql> \d hits
CREATE TABLE hits (
  daydt DATE NOT NULL ENCODING DAYS(32),
  alpha TEXT NOT NULL,
  beta TEXT NOT NULL,
  hits_val SMALLINT NOT NULL,
  SHARED DICTIONARY (alpha) REFERENCES hits(alpha),
  SHARED DICTIONARY (beta) REFERENCES hits(beta));
mnisql> select * from base;
daydt|alpha|beta|base_val
2021-05-04|121|1000|1
2021-12-10|131|2000|1
2021-12-11|131|2000|1
2021-12-12|131|2000|1
2021-12-13|131|2000|1
2021-12-14|131|2000|1
2021-12-15|131|2000|1
2021-12-16|131|2000|1
2021-12-17|131|2000|1
2021-12-18|131|2000|1
10 rows returned.
Execution time: 26 ms, Total time: 28 ms
10 rows returned.
Execution time: 62 ms, Total time: 64 ms
omnisql> select * from hits;
daydt|alpha|beta|hits_val
2021-05-04|121|1000|1
2020-04-20|131|2000|1
2020-04-21|131|2000|0
2020-04-22|131|2000|2
2020-04-23|131|2000|3
2020-04-24|131|2000|2
2020-04-25|131|2000|0
2020-04-26|131|2000|4
2020-04-27|131|2000|5
2020-04-28|131|2000|1
10 rows returned.
Execution time: 26 ms, Total time: 28 ms
omnisql> \cpu
omnisql> create table last_date as SELECT /*+ cpu_mode */ alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta;
omnisql> SELECT /*+ cpu_mode */ alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta;
alpha|beta|last_dt
131|2000|-592
1 rows returned.
Execution time: 54 ms, Total time: 54 ms

except for the huge time, the query tools to return everything is ok.

Ok, tomorrow I will try with a docker installation, because I am trying on an Ubuntu 20 tarball installation.

Candido

Thanks Candido,
This is what I get:

omnisql> \d base
CREATE TABLE base (
daydt DATE NOT NULL ENCODING DAYS(32),
alpha TEXT NOT NULL ENCODING DICT(32),
beta TEXT NOT NULL ENCODING DICT(32),
base_val SMALLINT NOT NULL);
omnisql> \d hits
CREATE TABLE hits (
daydt DATE NOT NULL ENCODING DAYS(32),
alpha TEXT NOT NULL ENCODING DICT(32),
beta TEXT NOT NULL ENCODING DICT(32),
hits_val SMALLINT NOT NULL);
omnisql> select * from base;
daydt|alpha|beta|base_val
2021-05-04|121|1000|1
2021-12-10|131|2000|1
2021-12-11|131|2000|1
2021-12-12|131|2000|1
2021-12-13|131|2000|1
2021-12-14|131|2000|1
2021-12-15|131|2000|1
2021-12-16|131|2000|1
2021-12-17|131|2000|1
2021-12-18|131|2000|1
10 rows returned.
Execution time: 19 ms, Total time: 22 ms
omnisql> select * from hits;
daydt|alpha|beta|hits_val
2021-05-04|121|1000|1
2020-04-20|131|2000|1
2020-04-21|131|2000|0
2020-04-22|131|2000|2
2020-04-23|131|2000|3
2020-04-24|131|2000|2
2020-04-25|131|2000|0
2020-04-26|131|2000|4
2020-04-27|131|2000|5
2020-04-28|131|2000|1
10 rows returned.

I don’t get the lines “SHARED…”

Hi @amok

I have reproduced your error but changed the query in this way.

SELECT alpha, beta, DATEDIFF(‘DAY’, CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND (alpha, beta) IN (SELECT DISTINCT alpha, beta FROM base WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN (‘1000’) GROUP BY alpha, beta;

So using more than one column in the IN clause against a subquery.

The software needs just one column returned by the subquery, so a fatal error is issued, and the server has to stop, and with just one column into IN clause, it’s almost impossible to get such an error.

Can you try to run this query?

SELECT alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta;

And if you are getting an error, please post a more detailed log, starting from the start of statement execution like this one.

2021-12-12T09:21:19.656922 I 149 0 3 DBHandler.cpp:1273 stdlog_begin sql_execute 25 0 omnisci admin 843-0UvY {"query_str"} {"CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);"}
2021-12-12T09:21:19.658816 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql 'CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);'
2021-12-12T09:21:19.679722 I 149 0 3 Calcite.cpp:588 Time in Thrift 1 (ms), Time in Java Calcite server 19 (ms)
2021-12-12T09:21:19.681614 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql '(SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND ROW(alpha, beta) IN (SELECT DISTINCT alpha, beta             FROM base             WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN ('1000') GROUP BY alpha, beta)'
2021-12-12T09:21:19.724882 I 149 0 3 Calcite.cpp:588 Time in Thrift 1 (ms), Time in Java Calcite server 42 (ms)
2021-12-12T09:21:19.725740 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql '(SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND ROW(alpha, beta) IN (SELECT DISTINCT alpha, beta             FROM base             WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN ('1000') GROUP BY alpha, beta)'
2021-12-12T09:21:19.760475 I 149 0 3 Calcite.cpp:588 Time in Thrift 0 (ms), Time in Java Calcite server 34 (ms)
2021-12-12T09:21:19.802247 F 149 0 3 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2)
2021-12-12T09:21:19.940335 I 149 0 11 MapDServer.cpp:317 Interrupt signal (6) received.

Regards,
Candido

Hi Candido,

thank you for your support.

In my original post I have mentioned that using (alpha, beta) IN (SELECT…) crashes the server, but if I change it to (alpha) IN (SELECT…) it works.

Could you please explain if this is a bug or the software is designed to work like this? If we can’t have two columns in the IN clause, could you please explain how we should write the queries? Is there some documentation I could read?

Apologies for all the questions, but I would like to understand if the software will fit my use case.

Hi @amok ,

Don’t worry, if possible you can re-write your query with an EXIST instead the IN this way

SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits h WHERE hits_val>0 AND exists (SELECT 1 FROM base b WHERE CURRENT_DATE()=daydt AND base_val=1 AND h.alpha=b.alpha AND h.beta=b.beta) AND beta NOT IN ('1000') GROUP BY alpha,beta;

Candido

Many thanks Candido,

your recommendation works fine.

Is there a way to do CONCAT?

Well let me know if the performances are fine, because we can rewrite the query in other ways.

about CONCAT of strings, you can enable using that switch enable-interoperability=true, but it’s a developer option and it’s not working on aggregates, but just in projections queries.

so you should run a CTAS and then a select to make it works

I am not sure I understood your suggestion for CONCAT :smiley:
Could you provide an example?

You have to set up the parameter I mentioned in the previous post and then you can use combat using a double pipe on projections or updates (simple. and against a text encoding none column).

So

Select alpha||'-'||beta
from hits

You cannot use it with the group by queries, so it’s not so useful for your queries

Candido