Cannot connect to OmniSci Server error using GPU in ubuntu instance

I follow the step here to install Omnisci on Ubuntu GPU instance.https://docs.omnisci.com/v5.2.1/4_ubuntu-apt-gpu-os-recipe.html But after the data are ingested, running the select query will show the following error. Cuda is installed. How can I fix the problem?
Thrift error: No more data to read.

Thrift connection error: No more data to read.

Retrying connection

Thrift error: No more data to read.

Thrift connection error: No more data to read.

Retrying connection

Thrift: Sun Aug 16 03:20:41 2020 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe

Thrift error: write() send(): Broken pipe

Thrift connection error: write() send(): Broken pipe

Retrying connection

Thrift: Sun Aug 16 03:20:49 2020 TSocket::write_partial() send() <Host: localhost Port: 6274>: Broken pipe

Thrift error: write() send(): Broken pipe

Thrift connection error: write() send(): Broken pipe

Retrying connection

Cannot connect to OmniSci Server.

Hi @Youan_Lu,

I suggest you check out if there are errors in the logs, that are located into /var/lib/omnisci/mapd_logs.

Probably you run out of free disk space while loading the data, but I can’t be sure about that, so maybe you are hitting a bug or something; I can’t say without knowing the DDL of the table, the query or the logs.
If you think you have a problem on the GPUs, you can try to run the query in CPU mode using the /*+ cpu_mode */ hint or the \cpu command in omnisql command

e.g.

select /*+ cpu_mode */ col1,sum(col2) from table1 group by col1;

or

\cpu
select col1,sum(col2) from table1 group by col1;

regards.

If I switch to CPU , everything is working. But the GPU version hit this error and I need to do some task on comparing CPU and GPU.
Here’s the error log: 2020-08-16T03:20:33.642922 F 13654 4 NvidiaKernel.cpp:122 Check failed: cuLinkAddData_v2(link_state, CU_JIT_INPUT_PTX, static_cast<void*>(const_cast<char*>(ptx.c_str())), ptx.length() + 1, 0, 0, nullptr, nullptr) == CUDA_SUCCESS (218 == 0)
~
I use sample flight data with 10k rows and this query SELECT origin_city AS “Origin”, dest_city AS
“Destination”, AVG(airtime) AS “Average Airtime” FROM flights_2008_10k WHERE distance < 175 GROUP BY origin_city, dest_city;

Hi,

which GPU are you using? which driver version?
From 5.2 onwards the driver version required is the 418.39 (or later)

NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 Product name: GRID K520.
I think the driver version is matched with the requirement?

I am not sure about k520 and Cuda 11, but probably the problem is the grid configuration; I never tried to run out database on such enviroment

I asked internally btw

What environment do you use ? any tutorial for installation?

Hi @Youan_Lu,

Our head engineer pointed me in the right direction; the k520q isn’t supported because the card while using Kepler architecture, is using the older of the two and it’s limited to Cuda Capabilities 3.0, while we are generating code for at least cc 3.5, so the crash when you try to run the query.

To try out the database using GPU acceleration you can use any instance with a Kepler k80, any Pascal, Turing or Volta card. I strongly recommend to skip K80 and go for Pascal (GTX, Quadro or Tesla it’s not important), and upgrade to Volta (Tesla) if you are planning to use geo functions in an heavly manner (because the increased number of fp64 cores).

My personal hardware span from small pascal Gpu in my gaming notebook to sever Turing Gpu in my workstation.

Hope’s this helps

That helps a lot. But I’m not that familiar with GPU stuff. But does these GPU work for the database? NVIDIA V100, NVIDIA K80, NVIDIA M60, NVIDIA T4.

Except for M60 (the Maxwell archs isn’t fully supported, so is better to skip it) all other cards works, while the Kepler (K80) is the less capable of the group (some queries would fall back to cpu) and has the days numbered; nVidia is going to cease the support for such old cards

Thank you so much! I use K80 and problem is solved!

I’m happy everything is working, but with K80 is likely that some aggregates are going to fall back to CPU, because it lacks some features, like atomic operations with doubles datatypes.