I am currently evaluating MapD with various sizes of dataset.
I am running MapD 3.61 Community Edition in AWS on a p2.8xlarge instance. The specs for this machines are as follows:-
Nvidia K80 - 8 GPU’s, 32 vCPUs, 488 GiB RAM
When I run this with a 1.25 billion dataset Immerse works as expected, much like the demos on the MapD website.
However when I try a 5 billion dataset Immerse becomes unresponsive and in some cases returns errors. I am thinking that this is due to the dataset being too big for the box to handle, but I would like a better way of measuring and confirming this.
My first query is how to more accurately work out when the problems are due to the data set being too big for the hardware provisioned.
Will MapD log somewhere when the Engine or Immerse cannot cope?
Are there other/better ways to identify when a MapD installation needs scaling?
Does anybody have any thoughts on how the p2.8xlarge and p2.16xlarge instances would scale before needing to scale out rather than up?
How many billion rows could I expect the p2.8xlarge instance to scale to, and how many for p2.xlarge?
The p2.16xlarge instance is:-
Nvidia K80 - 16 GPUs, 64 vCPUs, 732 GiB RAM
The data set I am testing with is a variation of the taxi data, and mostly I am working with a point map in Immerse.
Any help or guidance would be greatly appreciated.