About the execution time


There is a /timing option to show the execution. When I run a sql query the first time and run the same sql query the second time, execution will be much quicker. so what makes this different? And when we are testing the performace of MapD, we should just use the result from the first time we run some sql, right?



The first time you refer to a table in a MapD query there is some overhead to load and prepare the metadata about the table, it will also load any string dictionaries required for all dictionary encoded columns of this table. This is a one time cost per table you access.

The first time you refer to a column in an SQL query the data needs to be transfered off disk and into memory. The large additional time you see is the disk to memory cost it is a one time cost per column you access.

The first time MapD sees a new query it builds a new kernel to execute, the over head to build the kernel will add a some small amount of time (10s of ms) to the execution time. This is a one time cost per query pattern as literals are hoisted so the same kernel can be used for any subsequently seen similar queries.

When we benchmark we normally keep the first query time separate from other query executions.

the first time is useful for getting an idea of how fast the disk subsystem is.
the subsequent times are useful for measuring how fast the MapD Query engine is.

If you are assuming the subsequent timing is query result caching that is not the case, there is no query result caching in MapD so each query is re evaluated when it is seen. You can confirm the speed by doing similar queries but change the filter ranges around.



when we do 'copy table from file ', we are not copying the records into GPU RAM?
and in mapd hardware shedule says one NVIDIA P40 can handle 417M hot data, so can we mannually put records into GPU RAM so that we can reduce the IO time cost?



COPY FROM prepares and loads the data onto the disk system used by MapD it does not move it into memory or GPU VRAM. it is quite likely that your queries do not need most of the columns loaded to be on the GPU. We are precious with GPU ram and so do not load columns onto it unless we know it is needed (column is used for calculations, aggregation or filtered).

if you wish to preload the columns required for a known set of queries you ask MapD to run a series of queries on startup, which in turn will load the appropriate memory levels. This option is db-query-list but it will not remove the IO cost just move it to when the server very first starts up.