I’m currently running MapD core with 1.2 billion rows of the nyc taxi data set on the following hardware:
CPU: Intel i7-4790 CPU @ 3.60GHz
Memory: 16GB DDR3 RAM
GPU: NVidia GTX 980
I’m trying to run some of the example queries from Mark’s benchmark blog http://tech.marksblogg.com/billion-nyc-taxi-rides-aws-ec2-p2-8xlarge-mapd.html. but encounter problems on query 4:
extract(year from pickup_datetime) AS pickup_year,
cast(trip_distance as int) AS distance,
count(*) AS the_count
GROUP BY passenger_count,
ORDER BY pickup_year,
I understand the hardware is far from good , especially with a dataset this size but from what I’ve read about MapD it should still be able to run queries by spilling into RAM and to the Hard disk.
On execution in GPU mode the above query errors with the exception:
Exception: Failed to run the cardinality estimation query: Query couldn’t keep the entire working set of columns in GPU memory
It does however run to completion on CPU mode.
I’ve already read Is there a memory replacement mechanism for mapd? and set both flags mentioned. My config file is as follows:
port = 9091
http-port = 9090
data = "/home/mapd/MAPD_STORAGE/data"
null-div-by-zero = true
gpu = true
allow-cpu-retry = true
enable-watchdog = false
port = 9092
frontend = “/home/mapd/mapd/frontend”
After the query returns with the error running
\memory_summary displays the following:
MapD Server Memory Usage
CPU RAM IN BUFFER USE : 2990.72 MB
GPU VRAM USAGE (in MB’s)
GPU MAX ALLOC IN-USE FREE
0 3432.91 3086.68* 2990.72 95.96
As you can see it fails with lot’s of free RAM to still use. Could anyone suggest why this is?
Additionally could anyone possibly help to diagnose the problem I’m facing and suggest a possible fix to allow the query to run to completion in GPU mode?