Mapd_server pegging CPU on trivial queries


#1

Hi all,
I’ve downloaded and am trying out

Community Edition
Core v4.4.2
Backend Rendering Enabled

In trying some trivial queries against the included flights_2008_7M table, I’m seeing odd performance where the CPU is pegging at 100% for queries which should be pretty trivial for Omnisci (this is running in GPU mode btw). I’m running the queries through the mapdql client so it doesn’t have to do with the dashboarding gui or web server.

Some information on my setup:

Here’s a screen capture showing cpu and gpu usage and response times on the queries: https://drive.google.com/file/d/1qPWqpESX95_wyHFQjWJCQ1tcZwWUC3kZ/view?usp=sharing

For example I’m seeing times like this, even after having already pulled data from disk on earlier queries:
mapdql> select max(arrtime) from flights_2008_7M;
EXPR$0
2400
1 rows returned.
Execution time: 12720 ms, Total time: 12720 ms

Any help appreciated.


#2

Hi @livfaast -

While OmniSci accelerates queries using GPUs, we’re not solely limited to using GPUs for all parts of the process. In this case, I suspect the behavior you are seeing is due to our LLVM compilation step, which happens each time a new query type is seen.

CPU usage can also occur when swapping data from GPU to CPU, which can occur when the data doesn’t fit completely in GPU RAM, during some projection operations (i.e. adding columns), and other places as well.

If you run the same query several times (note that select * from table where col1 = 12 and select * from table where col1 = 15 are the “same query” from the purposes of LLVM compilation), do you still see this level of CPU usage?


#3

Hi @livfaast

To identify the problematic step on query execution, you should enable the debug timer; to do that you have to add enable-debug-timer=true to you mapd.conf file and restart the server

into the mapd_server.INFO file, you should check for lines like that

    I0104 18:19:19.341622 26475 measure.h:80] Timer end                           parse_to_ra                         parse_to_ra: 4644 elapsed 22 ms  --parse time
    I0104 18:19:19.470356 26475 measure.h:80] Timer end               execution_dispatch_comp                     executeWorkUnit: 1033 elapsed 91 ms --time to compile the plan generated
    I0104 18:19:19.633479 27549 measure.h:80] Timer end                           fetchChunks                         fetchChunks: 1683 elapsed 163 ms --time to load data from disk and populate caches
    I0104 18:19:19.777536 27549 measure.h:80] Timer end                          lauchGpuCode                       launchGpuCode:   97 elapsed 43 ms --time to execute the query on gpu
    I0104 18:19:19.781117 26475 measure.h:80] Timer end                          convert_rows                        convert_rows: 4095 elapsed 3 ms --time the cpu spent to decode text dictionary strings and do the finale projection
    I0104 18:19:19.781141 26475 MapDHandler.cpp:757] sql_execute-COMPLETED Total: 361 (ms), Execution: 358 (ms)

the log would help everyone to identify the bottleneck you are hitting.

on the same release, you are using I have normal execution times (this is a fresh started instance, so nothing cached and no query plans precompiled)

mapd@zion-ubuntu:/opt/mapd/mapd-ce-4.4.2-20190109-6fb58bf441-Linux-x86_64-render$ bin/mapdql -u mapd -p HyperInteractive --db mapd -q -t -n -s 192.168.1.10
mapdql> select count(distance) from flights_2008_7M;
7009728
1 rows returned.
Execution time: 1075 ms, Total time: 1077 ms
mapdql> select count(distinct distance) from flights_2008_7M;
1435
1 rows returned.
Execution time: 130 ms, Total time: 131 ms
mapdql> select min(distance) from flights_2008_7M;
11
1 rows returned.
Execution time: 100 ms, Total time: 100 ms
mapdql> select max(distance) from flights_2008_7M;
4962
1 rows returned.
Execution time: 98 ms, Total time: 100 ms
mapdql> select max(arrtime) from flights_2008_7M;
2400
1 rows returned.
Execution time: 64 ms, Total time: 65 ms
mapdql> select min(arrtime) from flights_2008_7M;
1
1 rows returned.
Execution time: 14 ms, Total time: 14 ms
mapdql> select count(arrtime) from flights_2008_7M;
6858079
1 rows returned.
Execution time: 12 ms, Total time: 13 ms

in the attach a snap of the log of mapd_server.INFO containing the debug timer section; you can compare with the one of your system to get an idea about the problem

log_timer.txt (36.3 KB)


#4

Thanks to you both. @aznable I’ll alter the config and let you know what I see-- probably later today or tomorrow. Last time I had a look at the conf file it was read-only. I believe it was also showing this after I had stopped mapd_server (but maybe I’m not remembering that right). Could you let me know what I need to do to have it editable?


#5

Could you try clearing the driver’s JIT cache?

rm -rf $HOME/.nv/ComputeCache

where $HOME is for the user running mapd_server.

As @randyzwitch mentioned, we use LLVM to generate & compile GPU kernels on the fly when distinct queries come in, which sometimes causes issues with the GPU kernel cache. Clearing the cache and restarting the service usually brings things back to the sub-second range.


#6

The mapd.conf would be editable all the time by the user that owns and run the software (typically mapd)


#7

debug timer mapd_server-log.txt (15.6 KB)

@aznable compared the first 3 queries with your attachment. Looks like the problem is at step execution_dispatch_comp

I haven’ tried @andrew’s suggestion yet-- will do that tomorrow.

Thanks for your help.


#8

It should be the compilation stage, so @andrew suggestion would resolve your problem