MapD Charting Will Only Render Exactly 10 Million Geographic Coordinates After Table Size Exceeds Threshold


#1

Hello,

We are using MapD Charting to render geographic coordinates on a map.

We are encountering an issue where we get this error once we load enough rows onto our table:

Exception: Not enough OpenGL memory to render the query results

We have tried setting enable-watchdog to false and allow-cpu-retry to true. However, it seems to have this weird behavior in which when pass a certain threshold of the number of rows in a table, it can only load exactly 10 million points. Once I go 10,000,001 points, then I receive the error. We do this by setting a .cap() on the rasterLayer(“points”).

However, if the number of rows in the table is a relatively small number, then it can load more than 10 million points. Why is it behaving like this? We noticed that the map will only render at 10 million rows once the table size exceeds approximately 75 million rows.

For reference, we are running MapD on a p2.8xlarge on an AWS EC2 Instance.

Is our hardware not equipped to deal with such large datasets? Are there any configurations we can add to render more points on the map.

Our goal is to be able to load approximately 6 billion rows of data, with each having its own latitude/longitude coordinates.


Point Map rendering auto limit 2M
#2

We also tried adjusting the render-mem-bytes but it doesn’t seem to help at all.


#3

And this is our table DDL:

CREATE TABLE test_table ( column TEXT ENCODING DICT(32), column TEXT ENCODING DICT(16), column TIMESTAMP, column TEXT ENCODING DICT(8), column INTEGER ENCODING FIXED(16), column INTEGER ENCODING FIXED(16), column TEXT ENCODING DICT(8), column TEXT ENCODING DICT(16), column SMALLINT, column SMALLINT, column DECIMAL(12,7), column DECIMAL(12,7), column TEXT ENCODING DICT(8), column SMALLINT, column TEXT ENCODING DICT(32), column SMALLINT, column FLOAT, column SMALLINT, column SMALLINT, column INTEGER ENCODING FIXED(8), column TEXT ENCODING DICT(8), column TEXT ENCODING DICT(32), column BOOLEAN, column TEXT ENCODING DICT(16), column TEXT ENCODING DICT(32), column TEXT ENCODING DICT(32), column TEXT ENCODING DICT(16), column INTEGER ENCODING FIXED(8), column TEXT ENCODING DICT(8), column INTEGER ENCODING FIXED(8), column INTEGER ENCODING FIXED(16), column TEXT ENCODING DICT(8), column TEXT ENCODING DICT(16), column INTEGER ENCODING FIXED(8), column INTEGER ENCODING FIXED(8), column TEXT ENCODING DICT(8), column TEXT ENCODING DICT(8), column TEXT ENCODING DICT(8), column INTEGER ENCODING FIXED(8), column TEXT ENCODING DICT(8), column SMALLINT, column TEXT ENCODING DICT(16), column BOOLEAN, column INTEGER ENCODING FIXED(8), column INTEGER ENCODING FIXED(8), column BOOLEAN, column TEXT ENCODING DICT(8));


#4

Hi,

Is the render being called from mapD Immerse?

How high did you set render-mem-bytes did you go to 2000000000?

Regards


#5

No the render is being called via MapD Charting. I went up to 3000000000.


#6

Hi,

Could you share the vega call from the mapd_server.INFO log for the render call being made when you see the failure.

Regards


#7

These are the log lines for when I receive the error:

I1011 05:27:44.498344 58077 MapDHandler.cpp:2008] render_vega :rjLfmMT6xKYyOQcTeKCz50yOYVTtzF6t:widget_id:1:compressionLevel:3:vega_json:{“width”:1410,“height”:256,“data”:[{“name”:“pointtable”,“sql”:“SELECT conv_4326_900913_x(device_lon) as x,conv_4326_900913_y(device_lat) as y,table_name.rowid FROM table_name WHERE (device_lon >= -179.9999999999995 AND device_lon <= 179.99999999999898) AND (device_lat >= -1.2078378628389714 AND device_lat <= 53.84876223007643) LIMIT 10000001”}],“scales”:[{“name”:“x”,“type”:“linear”,“domain”:[-20037508.340039942,20037508.340039887],“range”:“width”},{“name”:“y”,“type”:“linear”,“domain”:[-134465.85559934264,7141565.541647498],“range”:“height”}],“marks”:[{“type”:“points”,“from”:{“data”:“pointtable”},“properties”:{“x”:{“scale”:“x”,“field”:“x”},“y”:{“scale”:“y”,“field”:“y”},“size”:1,“fillColor”:“blue”}}]}
I1011 05:27:44.498411 58077 QueryRenderManager.cpp:323] Active render session [userId: rjLfmMT6xKYyOQcTeKCz50yOYVTtzF6t, widgetId: 1]
I1011 05:27:44.499716 58077 Calcite.cpp:247] User mapd catalog mapd sql 'SELECT conv_4326_900913_x(device_lon) as x,conv_4326_900913_y(device_lat) as y,table_name.rowid FROM table_name WHERE (device_lon >= -179.9999999999995 AND device_lon <= 179.99999999999898) AND (device_lat >= -1.2078378628389714 AND device_lat <= 53.84876223007643) LIMIT 10000001’
I1011 05:27:44.514883 58077 Calcite.cpp:260] Time in Thrift 1 (ms), Time in Java Calcite server 14 (ms)
W1011 05:27:44.515233 58077 QueryBuffer.cpp:100] QueryBuffer 0: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.515383 58077 QueryBuffer.cpp:100] QueryBuffer 1: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.515874 58077 QueryBuffer.cpp:100] QueryBuffer 2: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.515950 58077 QueryBuffer.cpp:100] QueryBuffer 3: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.516019 58077 QueryBuffer.cpp:100] QueryBuffer 4: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.516082 58077 QueryBuffer.cpp:100] QueryBuffer 5: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.516145 58077 QueryBuffer.cpp:100] QueryBuffer 6: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
W1011 05:27:44.516203 58077 QueryBuffer.cpp:100] QueryBuffer 7: couldn’t successfully map all 3000000000 bytes. Was only able to map 18446744072414584320 bytes for cuda.
E1011 05:27:44.555598 58077 MapDHandler.cpp:2134] Exception: Not enough OpenGL memory to render the query results


#8

It looks like it’s not using all of CPU + GPU when I see this error in the p2.16xlarge in AWS.

I would think I would be able to render more geographic coordinates, given my hardware. I’m only able to load approximately 130 million geographic points before it errors out.

MapD Server CPU Memory Summary:
MAX USE ALLOCATED FREE
588404.38 MB 2602.04 MB 4096.00 MB 1493.96 MB

MapD Server GPU Memory Summary:
[GPU] MAX USE ALLOCATED FREE
[0] 8262.62 MB 488.28 MB 2048.00 MB 1559.72 MB
[1] 8262.62 MB 488.28 MB 2048.00 MB 1559.72 MB
[1] 8262.62 MB 488.28 MB 2048.00 MB 1559.72 MB
[1] 8262.62 MB 488.28 MB 2048.00 MB 1559.72 MB
[1] 8262.62 MB 488.28 MB 2048.00 MB 1559.72 MB
[1] 8262.62 MB 160.63 MB 2048.00 MB 1887.37 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB
[1] 8262.62 MB 0.00 MB 0.00 MB 0.00 MB


#9

Hi,

The issue you are experiencing is not related to the overall scale of your machine. it is related to how much we can render in a call and how the open-gl memory is used.

Please read through How to contorl OpenGL memory? as a starting point.

i would recommend you add a .sampling(true) option on your mapd-charting request for that chart and maybe drop the .cap your have set down to say 5M.

regards


#10

Hello,

Thanks for your reply.

I read through that article and I tried adjusting the render-mem-bytes configuration. I adjusted it to go from 200,000,000,000 to 400,000,000,000. It did help me process more points (from 80 million geographic points to 130 million geographic points). However, anything beyond that did not help at all.

I did add a .sampling(true) option. But I’m wondering how setting the cap to 5 million would help me. Our eventual goal would be to display more than a billion geographic points. The original problem was that if I loaded 70 million geographic points, I was able to see all 70 million. However, when I loaded 140 million geographic points to the table, it would fail. And when I did a .cap of on the 140 million row table, I can only see up to exactly 10 million rows as a cap. If I did a .cap of 20 million rows, it would still get that OpenGL error.

And I read these responses in that link you posted:

The memory buffer should be considered as a scratch area for all the render processing, not just the buffer for the final render. In the MapD world a point can also include additional attributes which need to be ‘blended’ at the render level so its not quite so simple to say there are only 315 points.

Just to add a bit more detail regarding memory requirements, even though there are only 315 distinct airport locations Immerse currently only allows rendering by non-grouped measures, which means that we need enough memory to render all 1.4M records. We have on our roadmap functionality to allow “grouped” rendering, i.e. rendering only one point per airport, which would take much less memory in this use case. We also have some optimizations in mind to have rendering take less memory and thus require smaller buffer sizes.

So you’re saying that you guys need memory to render all records in a table, correct? So does that mean I would need to limit my table size?


#11

So are you saying that we can’t render anything more in our call? Are there any potential solutions to this?


#12

Hi,

What is the end goal of 1B point rendered what will you render than many points on,a regular monitor only has ~ 4 million pixels? It is probably going to look like a solid mass.

The idea of sampling and capping gives you a good visual to work with and then as you drill into the data via zoom you can get fidelity you need. When you look at our tweetmap demo with 400M geo points you are, I believe, seeing at max 5M points actually rendered on the screen.

If you are looking to aggregate the query points and generate heat map or density style I can understand but it still would not produce 1B points

regards


#13

That’s a fair point. We are now exploring alternatives to displaying all 1 billion + rows.

But is there a way we can get past that 10 million row limit? And is there some solution to that OpenGL problem? Can we make some configurations to render more points?

Thanks


#14

Hi,

You do not need to limit your table size. You need to limit the total number of records you push from the query to the renderer. As mentioned above if you set your cap to say 10M and set sampling(true) you will get good visuals.

I did some testing on a 640M dataset with a point every square kilometer of the earths surface and with the chart capped at 10M with sampling it looks like this

Without more understanding of how you intend to consume more points, its hard to understand the issue.

maybe i am misunderstanding you, is it that you still see issues when capped at 10M with sampling?

regards


#15

Hello,

Thanks for your reply.

  1. To answer your above question, I see issues when capped at 10 M with sampling. I noticed that when I had 1 billion rows in my table, that the map would look like it had more points (more dense) than a table that had 200 M rows, even though the .cap(10000000) and the .sampling(true) were set to be the same? Why would that be?

  2. And where does that 10 M limit come from? When the table size is 90 M rows, it will have no problem rendering all 90 M points on the map. However, when I increase the table size to 110 M rows, then I have to do .cap(10000000) or it will get OpenGL error. And if I do anything like .cap(10000001) or anything greater, I will still get an OpenGL error.

  3. And another question. Why does the render-mem-bytes not work after I set it to greater than 4,000,000,000? I noticed that when I set the render-mem-bytes to anything greater, then I will get an OpenGL error no matter what, when my table size is bigger.


#16

Here is a picture of my map at 1 B rows:

And here is a picture of my map at 200 M Rows:

The .cap(10000000) and .sampling(true) is set to both. But it looks like the table with 1 B rows has more points than the table with 200 M rows. Is this due to the .sampling(true)?


#17

And add to the list of above questions:

  1. Is there any way to fix this OpenGL issue and possibly load render more points? Like 20 M points? I know you said there wouldn’t be any point of loading all billion points on a map, since it will look like a big blob. But does rendering 1 billion points on a map not feasible due to memory/CPU/GPU issues? Or do you guys attempt to limit that from happening? Is it even possible?

#18

Hi

  1. As sampling picks a random set of points from the total set, the larger the data size being sampled from, the more likely a point not in the smaller set will be picked that is not an already existing point, adding an additional viewable point.

  2. It is not a chosen 10M limit it is what the memory allocation and algorithms being used are finding as the max operationally. The rows are projected and passed to the render process it is basically having issues processing the data to image once the input set goes over a particular size. For a simple render like yours we had expected to be able to process more points and so we will be reviewing why the 10M in your case, in my simple case I was able to go to 20M. We are always refining and improving this process.

As a second part of your question, and a real mystery to me, is your ability to render 90M rows when the table only contained 90M rows. My suspicion here is that even though your table had 90M rows, once the filter was applied to the data you were not actually getting that many points being passed to the render process. we could confirm this suspicion if you still have the 90M table by doing a COUNT query on your table with same filters as the render call

SELECT count FROM table_name WHERE (device_lon >= -179.9999999999995 AND device_lon <= 179.99999999999898) AND (device_lat >= -1.2078378628389714 AND device_lat <= 53.84876223007643) LIMIT 10000001

  1. Not sure on that one. Will need to look at code, anything near a 4G magic number makes me suspicious of a 32 bit integer being used.

  2. As mentioned above its just a matter or working the problem. Time to work problem is based on priorities, when we get enough requests where people (and as you would expect paying customers ‘wants’ rate higher) are seeing issues around this space we would prioritize.

Part of our approach is to allow you to zoom into the fine grained data so being able to quickly zoom into your 1B data set and see the points at a lower level finer grained visual. Slowing down the render process to actually populate 1B at the world level is probably not what everyone would see as win (unless it was super quick). If you look at the ships demo we are looking at 18B rows of geo data, which you can zoom into down to individual ships tracks. We would need to understand why what you are doing is so different that makes this approach inappropriate

regards


#19

One more question, I am attempting to load up to 6 Billion data points onto a table using a SQL script that’s piped to mapdql. However, after some time, it will fail with this error:

Thrift: Thu Oct 12 17:11:13 2017 TSocket::write_partial() send() <Host: localhost Port: 9091>Broken pipe
Thrift: Thu Oct 12 17:11:13 2017 TSocket::open() connect() <Host: localhost Port: 9091>Connection refused
Thrift: Thu Oct 12 17:11:13 2017 TSocket::open() connect() <Host: localhost Port: 9091>Connection refused
Thrift error: connect() failed: Connection refused

Do you happen to know why this would happen?


#20

I tried it again, and it seems like i can’t load past 1,073,708,701 exactly. Do you know why that is?