I have noticed poor select performance on our MapD table compared to MySQL, particularly when doing individual inserts. I understand individual inserts cause a lot of extra metadata to be generated that has to be read from disk occasionally when doing other selects and that it’s recommended to use bulk inserts instead.
I could not find much in the documentation on this metadata except for this forum post:
Over a period of time with many inserts (both bulk or individual) will table performance continue to degrade as more metadata is generated? Is there a way to cleanup/compact this metadata so MapD treats the table as if all data was just bulk inserted again?
If we want data to continue to stream in regularly all day what would be best? Let’s assume 1000 rows per second come in. What would be best:
a) Individual insert for each and every row.
b) Batching up 1000 rows and inserting bulk once per second.
c) Batching up X rows and inserting bulk once per Y (please guide on X and Y here, or the thinking and reasoning behind choosing X and Y).
Why does a second select operation after the first one cause delayed performance on a system where the entire table can fit into GPU memory?