My GPU is poor?



I’m trying to use sample flights database 7M, 14M, 21M and 117M records.
I checked the response time for each of CPU mode and GPU mode.
My results showed that CPU mode got a little faster with all datasets.
I think that this result was caused my GPU’s spec is not good, right?
It’s better to think upgrade the GPU ?
Or I have to do some tuning for MapD ?


SQL statement:

select avg(arrdelay), origin_city from flights_nM group by origin_city;

Response time (msec)

| DB records |      CPU mode       |      GPU mode       |
|            | 1st time | 2nd time | 1st time | 2nd time |
|         7M |    1,619 |       53 |    1,701 |      112 |
|        14M |    1,662 |       70 |    1,805 |      190 |
|        21M |    1,718 |       82 |    1,934 |      265 |
|       117M |    2,684 |      235 |    3,548 |    1,150 |


Xeon E5-2620 (6core) ×2
DDR3-1333 512G

CUDA 192 Cores
GPU Memory 2.0 GB DDR3
Memory Interface 128-bit
Memory Bandwidth 28.5 GB/sec

fragment size = 2M, database is on HDD (RAID10).


it would be nice if users would post their performance on a standard dataset to help us to choose the right hardware and help to discover performance or configurations problems.

anyway here some numbers of my configuration on a view joining the flights data with airports and carriers data

i am assuming your first time is with all data on disks

number of records: 123534969

loading time from csv: 168241 ms

first run:
CPU 3600 ms
GPU 3030 ms

second run:
CPU: between 214 and 293 ms
GPU: between 55 and 145 ms (with pascal gaming cards you cant fix the P-State so the frequencies changes a lot)

My Config

ddr3-1600 32GB

CUDA cores 3584
GPU Memory 11GB
Memory Interface 352 bit
Memory Bandwidth 484 GB/sec

Anyway i guess you are not using all your cpu processing power if you are on default parameters creating tables, because for what i know ( i hope to dont be wrong) the columns are fragmented with “extents” of 32 millions and a cpu core process this chunk of data in an exclusive manner; so with 117m or records you are using just 4 cores, and with 7/14/21 just one.

You gpu isnt exactly powerful, but you could try to fix the P-State. Here is a link



@kiuchi Yes your machine is not very balanced at the moment. You have reasonably powerful CPU but very ‘limited’ GPU.

You have 24 CPU threads on that machine, so with a fragment size of 2M you specified, you do not fully utilize all of your CPU cores until your data size reaches 48M (fragment size * number of cores) as we do not allocate smaller than a single fragment to a core for processing. This means at 7M scale you are only using 4 cores, 14M is using 7 cores, 21M is using 11 cores. (Side note: Depending on the version of the Intel chip hyperthreaded cores may or may not be useful at reducing wall clock, but for this discussion we’ll assume they are useful.) So the behavior you see of good cpu scaling is to be expected with the volume increase.

Currently your GPU performace is relatively linear to the volume as you would expect as all the work is being done on the single GPU. If you had more GPU’s you would expect to see the same kind of improvements as you see with the CPU scaling.



@aznable @dwayneberry Thanks for your reply.

Thank you so much for explaining the fragment details.
And I did not know about P-State function.
Is it better to limit to the minimum frequency of the CPU?
So, I’ll try using the same GPU card GTX 1080ti this week.
MapD response time depends on FP32 or FP64?
I’d like to know if Tesla is the best choice to use MapD or not.



For the minimum lag you should leave your cpu in c2 or c0 state because the cpu arent reactive changing state; I change regurly all parameters on servers I install oracle or other in-memory systems like qlikview because the servers are configured tin power save mode; in qv I got performance improvements of 25%, with oracle Rac a cut in half of cluster transfers times and a good improvement on in-memory operations. With Mapd I have to try with multiprocessors systems
The trade-off is the increased power consumption when the server is idling



It depends on the query and the datatypes as to whether we are using 64 or 32 bit, so there is no single answer to “MapD response time depends on FP32 or FP64?”.

On a production redhat or centos server where absolutely maximum performance is the goal we recommend setting tuned-adm to latency-performance, this will increase the power usage of the server, but we assume the server is going to very heavily used so should always be in high performance state.
If using tesla cards we also recommend setting the GPU clock speeds to maximum via the nvidia-smi ac command.

For production server deployment we currently recommend the Telsa P40 cards which have 24GB of memory per card.