Fine grain performance analysis through operator timings



I’m interested in better understanding MapD performance on my GPU. Is there a way to get a more detailed execution time breakdown? I would be interested in getting execution time numbers on the operator level (operators from the explain calcite such as LogicalAggregate, LogicalFilter, LogicalJoin, etc.). Is there an easy way to get these timing numbers? Or does anyone have any suggestions on how to instrument the codebase to get the operator timings (and where in the codebase/which files to look at)?



Hi @efurst,

Thanks for your interest in MapD. Since MapD compiles its queries (via LLVM), rather than interprets them, we can fuse many relational algebra operators into one function such that a thread (on CPU or GPU) will execute these in one swoop while keeping the intermediate results in register memory (giving significant performance benefits). Of course certain operators like subqueries or joins need to be materialized, and thus require multiple kernels/functions to be executed, but in general you won’t be able to get granular timings for a single operator.

You can do a bit of probing on the cost of different operators by comparing timings of queries that only differ by a single operator (say a filter). You might also be able to use a profiler to examine the cost of individual parts of the generated code. (As an aside, if you ever want to see the generated code for a query, just put EXPLAIN before the query in mapdql.)

Recently we added a framework for logging timings for different sections of the executor, which you can invoke via --enable-debug-timer. Currently this is only available on master in our github repo, but it will be in the next release. This won’t help with timing the actual execution of the various RA operators, as explained above, but could be useful to time other parts of the execution path like memory allocations or query compilation.

I hope this helps, and let us know if you’d like to chat more in depth.