Mapdql hangs for sql return big result set


#1

Hi,

I am running the open source version of MapD Core with CPU using mapdql.

The mapdql frequently hangs (not always) when I issue a sql that will return a big result set. (to be specific, a sql that generate 6 millions rows result) However, for those sqls issued to the same table but result set is small, the mapdql behave well.

The interesting thing is, the “mapd_server.INFO” log shows that the sql is already finished, something like:
“sql_execute-COMPLETED Total: 12330 (ms), Execution: 2056 (ms)”

But the mapdql hangs and does not return result for me. I set “enable-dynamic-watchdog” to “true” and set the “dynamic-watchdog-time-limit” to some value but still cannot solve this problem.

Currently, what I do is I issue the sqls multiple times until it succeed. (the success chance is low)

It will be great if any one can help with this.

Thanks.


#2

Just to clarify, you’re trying to return 6mm rows via mapdql, not using export?

https://www.omnisci.com/docs/v4.0.0/6_exporting_data.html


#3

Yes, I am doing performance testing so I am not saving any result.

Adding “COPY TO” made mapdql stable! Thank you!

However, I cannot get the timing information of the pure SQL after adding “COPY TO”.
Instead, I can only see the “Total Time” for the whole “COPY TO” command in “mapd_server.INFO” file, which may not be accurate enough for me.

Do you have any good way to address that?


#4

Sorry to keep answering your questions with different questions, but what are you trying to benchmark? Using a CPU-only build is going to be the slowest that OmniSci will perform, and returning 6 million rows is generally not going to be a high-performance query (since I presume this is either a select * query or a join to build a table)


#5

Yes, I am doing benchmark and CPU-only mode is one of the benchmark I am trying to do.

The reason the result set has 6 million rows is I am testing some really big table. Not only “SELECT *” or “JOIN” will produce this many rows, but also some meaningful sqls like GROUP BY high cardinality columns.

I am still trying to figure out this problem. Is there any way that can allow user to choose only execute the sql but not produce any result?

Thanks.