Benchmarking basic aggregation query


#1

Hello:

I run a regular query like:

SELECT count(* ) as c , SUM( time_numerical ) , answer_by_agent_textual, extract(year from date_temporal ) as yy
FROM oid_stream
GROUP BY answer_by_agent_textual, yy
ORDER BY c DESC
;

on 800K documents.
The first time it takes 4.5 seconds ( I will skip the machine specs the exact number is not important right now )
The second time it takes 21.5 milliseconds.
Much faster so I assume it is faster because the table is loaded into the GPU’s memory

To validate this, I change the query slight to make sure I am no just getting a cached response to this:

SELECT count(* ) as c , AVG( time_numerical ) , answer_by_agent_textual, extract(year from date_temporal ) as yy
FROM oid_stream
GROUP BY answer_by_agent_textual, yy
ORDER BY c DESC
;
I changed sum to average and that cannot be computed with the cached answer so it needs to run the query again. However I still expect the query to finish in about 30 milliseconds because it should have the table loaded into GPU memory. Instead, it takes 4.5 seconds again.

What is going on ?

To continue testing this, I will install a dedicated CUDA card. Right now I’ve been using the graphics card GPU so may the GPU memory gets evicted by the Video driver ?


#2

Hi @jfvillal. We’re all curious about this one. Can you give us those machine specs, and possibly some sample data? We’d like to try and repro this issue.


#3

Hi @jfvillal,

you are right, the first time you launch a query on a cold started Mapd DB, the Memory Manager will load from disk the columns needed by the query itself into CPU’s memory, then the columns involved in filtering, grouping and aggregate ops will be transferred into GPU’s memory, then the optimizer generates and compile LLVM code for query’s execution and this operation would take, more or less 100ms on this simple query (this is because changing the SUM with an AVG will force the optimizer to generate different code), but 4500ms is absolutely unjustified by this operation; looks like the database has been restarted or the caches have been manually cleared, so it has to re-read data from disk.

Mapd hasn’t a result cache, it just caches columns on CPU/CPU memory and query plans (to avoid un-necessarily LLVM generation); it’s also unlikely the GPU memory has been evicting by video driver, and even it were, the memory transfer between CPU and GPU memory would take so much (every record on you query would take 10bytes, are you sure about just 800k records?).


#4

Sure. I’ll be switching my testing to public data, to reproduce the error then share all information necessary so you can duplicate the state.

My machine has a
GeForce GTX 950 which doubles as video card and GPU.
12-core i7-6800K CPU @ 3.40GHz
128GB of memory
An SSD is used for the database.
Running Ubuntu 16.04 ( kernel 4.15.0-36-generic )
Nvidia driver 384.130
MapDQL Version: 4.1.3-20180926-66c2aee949


#5

@jfvillal Thanks. We were unable to repro on our own hardware with the flights dataset, but seeing your specs helps. And, if you’re able to scrub your data, that could also be useful for repro attempts.

The first thing we noticed is that we wish we could buy you some new hardware, and your drivers are a bit out of date. But we’re not sure that’s actually the problem. We think there might be an issue here, and a few of our engineers want to investigate, and perhaps chat with you in more depth.


#6

Also, one more thing you could try for us would be running your queries in CPU mode just to see if the issue is related to memory management and caching.


#7

I updated the GPU to a 780 Ti. OmniSci did not work after reboot. Updating the configuration to only use the new GPU ( as opposed to both the old card and the new one ) got it back to working.
Now, when I run the slightly modified query I get 100ms longer runs, which fitx @aznable 's description
Time : 0.0467448234558 ( query compiled )
Time : 0.142632007599 ( query not compiled )

Thanks for the help.


#8

Hi @jfvillal,
I want to point out, that the query has to be compiled of the structure of the query is unknown to the system, but the compiled code will be reused, resulting in faster execution times if you change just literals.

The data of the columns used on the following queries are on GPU memory, but the queries have to be compiled

so

mapdql> select c1_year,avg(arrdelay) from flights where c1_year between 1989 and 2000 group by 1;
[cut]
12 rows returned.
Execution time: 111 ms, Total time: 112 ms
mapdql> select c1_year,avg(arrdelay),sum(arrdelay) from flights where c1_year between 1989 and 2000 group by 1;
[cut]
12 rows returned.
Execution time: 135 ms, Total time: 136 ms

changing just the literals used for filtering will reuse the compiled plan of the queries

mapdql> select c1_year,avg(arrdelay) from flights where c1_year between 1993 and 2007 group by 1;
[cut]
15 rows returned.
Execution time: 32 ms, Total time: 33 ms

mapdql> select c1_year,avg(arrdelay),sum(arrdelay) from flights where c1_year between 1989 and 2001 group by 1;
[cut]
13 rows returned.
Execution time: 38 ms, Total time: 38 ms

#9

@jfvillal I’m going to give you the credit for solving your own issue since it was mostly about upgrading your hardware and drivers, but we’d love to hear if the suggestions from @aznable are helpful as well.