I have a table with lat, lon and total sales per record (each lat, lon is a record) and I want to compute averages and total records for all points that are inside a polygon (as defined in a shapefile).
In my understanding this general case is not currently supported, due to ST_CONTAINS not currently supported by MapD core (Spatial intersections, https://github.com/mapd/mapd-core/issues/1, https://github.com/mapd/mapd-core/issues/58, https://github.com/mapd/mapd-core/blob/master/ROADMAP.md#gis).
In light of this I decided to change the problem a little bit to what I believe is a much simpler problem that can use ST_DISTANCE:
Divide the US in a fishnet (grid) of x miles by x miles and make it into a mapD table of this type:
grid_cell_name, grid_lat_centroid, grid_lon_centroid, geometry - similar to this: http://www.giscourse.com/creating-a-fishnet-grid-using-arcgis-10/
Then assign to each record (lat, lon) a cell within the grid by calculating the distance for each record to each centroid point in all cells in the grid - note: here of course multiple records will be assign to the same grid. In this case my “shapefile” is basically the fishnet.
I think I can use ST_DISTANCE which seems to be supported and do a SELECT MIN DISTANCE to assign the corresponding grid. In theory sounds doable but I wanted to get some advise as of which query to use - please assume one table is called us_grid and the other one is called records.
note: If I have billions of records this could be expensive since I’d have to compute n*m distances and then get the minimum for each record - I assume some trick on along the lines of sorting both tables, records table and grid table would make for a better approach.
All advise is welcome here, thanks!