We are doing a quick proof of concept to see if mapd could replace our OLAP Cube.
We’ve been able to do load up most of our data from the database pretty easily and it seems that the performance is acceptable.
However, we have another part of our dataload that requires to get some static data from the mapd database to enrich a record we are trying to insert into a table that contains about 150 columns (fact table).
So view it this way:
- get a record.
- enrich it with data from mapd already
- insert 1 by 1.
this process is extremely slow.
We tried to batch records where we do:
- get n records
- process all records in parallel and enrich them
- batch insert.
If we only do 1 and 3 with 5000 messages, we get decent performance. Using dropwizzard to calculate performance, this is what we get:
So in average it takes 1/2 a second to insert 5000 records. which is not too bad considering the number of columns.
Now, when we insert the pre-process part before insert, we are not able to process that many messages because the pre-process part is extremely slow.
So this time, we go from 5000 to 20 messages.
To pre-process 7500 messages (20 messages processed in parallel using parallel streams in java) we get these timings:
on average it takes 700 ms to enrich our data.
We have 5 tables we enrich from:
1 table 2 columns 41 records . fragment_size default
1 table 6 columns 20k records . fragment_size default
1 table 28 columns 600k records fragment_size 30k
1 table 64 columns 200k records fragment_size 10k
1 table 7 columns 11M records fragment_size 500k
The only other property we have changed is cpu-buffer-mem-bytes to 4GB
The machine we run the mapd server CE 4.0.2 has
16cores hyper threaded showing 32 CPUs
380 GB of memory
I am not sure about the hard drive but i am pretty sure it is an SSD drive.
Running a profiler, it showed a lot of time wasted in thrift.
Now i understand that this is the CPU only version.
But i was not expecting such poor performance.
Is there anything else we can do to improve performance other than trying with GPUs?
Thank you very much