What's the use of scan_limit ?


#1

When executing a query, the system needs to evaluate a value of scan_limit.Who can tell me what scan_limit does.


#2

Hi @tianhz -

The purpose of the scan_limit is to avoid runaway queries. OmniSci is optimized towards returning results in sub-second time, and we have a functionality called ‘watchdog’ that evaluates how many results would be returned from a query. So if you write a query like select * from largetable, you might trigger the watchdog because that query would be expected to take many seconds/minutes/hours.

The way to avoid this is to add a limit clause to the query select * from largetable limit 1000000 if that’s really what you mean to do. Additionally, we have a parameter you can set when the server is started to disable watchdog altogether as described here:


#3

Hi @randyzwitch-
Thank you very much for answering my question.But my actual problem is a little more complicated.Scan_limit tells the system how many lines to print .There are two ways to calculate scan_limit:
(1)Specify “limit” in the query,eg:“select * from largetable limit 1000000”
(2)If we do not specify scan_limit, the system will autom atically evaluate scan_limit
eg:select table1.ID from table1,table2 where table1.ID = table2.ID
When two tables are join,The system calculates “count” for this query, and the value of “count” is equal to scan_limit.However, I commented out the calculation of this part and modified the detection logic of “watch dog”. The query results are still correct and unaffected.I don’t understand, what’s the purpose of adding one more operation of “count”.


#4

I still think it’s part of watchdog:

But I will see if I can find an internal engineer to answer the question more precisely.


#5

Have you tried what’s happens when you have a n-n relationship on join?


#6

Hi @tianhz,

I may be misunderstanding your question – to clarify, are you referring to this count?

Let me try and give a little background on all the different counts… in a parallel executor such as ours, you need some way to estimate the size of your output buffers so you can have multiple threads writing to preallocated space at the same time. We have some additional requirements, namely that we don’t want to interrupt the CUDA kernel to fetch more memory (though we are looking into that for certain queries) and we want to be as conservative as possible, due to the limited memory typically available on GPUs – so, for projections with filters we run a preflight or “filtered” count so we can properly size the output buffer.

For group by queries we can just use the expression range for the column, which we get from chunk metadata (from disk, essentially), so we usually skip the prefiltered count. And if the query has a low enough limit, we just use that … e.g. SELECT * FROM my_table WHERE x > 10 LIMIT 1000. If you commented out the filtered count above but kept a limit, I would expect the query to continue working correctly. Take the limit off, you’re likely to run into a bunch of memory errors.

The join here is largely irrelevant, I believe. We build the hash table independently of dispatching kernels to execute the query, and once the hash table is built the kernel for query execution just probes the hash table as part of its normal workload.

On the subject of watchdog, some of those checks are getting to be a bit outdated. We are working on modifying some of those checks and deploying a more dynamic version of watchdog which will instrument the query and halt or interrupt long running queries based on actual run time. We hope to roll out both of these features in the first half of this year.

Hope this helps – let us know if you have more questions!


#7

Hi @alexb,
Thank you for solving my problem.I went on to look at the code and have two new questions.
(1)Where to apply for the size of the output buffer ?
(2)Can the system perform only one GPU computation in one query ?


#8

hi @randyzwitch
think you,I have seen this part of the code, which is part of the problem, but it is not my fundamental problem. Alexb has solved it for me.


#9

On (1), I am not sure I understand the question. The scan limit will be saved in the RelAlgExecutionUnit. For group by we calculate the size based on the expression range of the column – that info gets stored in the QueryMemoryDescriptor. Then QueryExecutionContext and ExecutionDispatch use both descriptors to allocate memory.

For (2) we can currently only run one query step on GPU at a time. Specifically, we only dispatch one work unit at a time. However, a work unit can have multiple operations – e.g. a RelCompound work unit could have a filter, aggregate, and project all in the same kernel. Since most queries execute in a few hundred ms, we can easily pipeline steps and make better use of GPU compute resources (GPU schedulers don’t really like multiple workloads, though with Volta/Turing this is getting a lot better).