I’m currently testing MapD as potential supplement to BigQuery. I’ve constructed a typical query that we run on BQ to run on MapD and was wondering whether the results I’m getting are correct or there is some optimisation potential.
We have two tables with the following # of rows and unique user_ids:
user_stats (facts table)
users (dimensions table)
The field user_id is defined in both tables as “user_id TEXT ENCODING DICT”
We now try to run the following test query.
SELECT stats.date, u.age, count(1) FROM user_stats_90d stats LEFT JOIN users u USING(user_id) GROUP BY 1,2 LIMIT 10
This query takes about 10s to finish (when the data is already in memory)
Is that an expected performance in this case or am I doing something wrong?
- Latest community edition of MapD (cpu only version on Ubuntu)
- Google Cloud Compute engine instance with 64cpu and 420GB of memory (nothing else running on it)
- CPU platform: intel skylake
- 415GB of free memory after the test
Looking forward to your replies.