Working in tableau errors - mapdql also errors


#1

Hi,

I am using MapD core 3.5.0 enterprise version on a p2.xlarge node. I am also using an ODBC driver provided by the enterprise version.

I can connect to MapD core w/o any issue and I can see the tables. Currently for testing purposes I am trying to use reproduce a tableau dashboard.

However, when I try to do some visualizations I keep getting an error (though sometimes it’d work, and sometimes it wouldn’t and I have no idea why).

The table has 23.6 MM rows, and right now it’s the only table in the mapd db.
mapdql> \memory_summary
MapD Server CPU Memory Summary:
MAX USE ALLOCATED FREE
49125.02 MB 3153.73 MB 12288.00 MB 9134.27 MB

MapD Server GPU Memory Summary:
[GPU] MAX USE ALLOCATED FREE
[0] 10835.04 MB 698.33 MB 8192.00 MB 7493.67 MB

The SNOTE (simplest non trivial example) or most basic table where this fails:

CREATE TABLE natgas__nominationsjoineda2015 (
facility TEXT ENCODING DICT(8),
loc_name TEXT ENCODING DICT(16),
gas_day DATE ENCODING FIXED(32),
rhettdate DATE,
net_sch_capacity DOUBLE,
latdec DECIMAL(19,10),
londec DECIMAL(19,10));

So the following steps work:
(1) put lat/lon points on a map
(2) put different facilities and name on the map
(3) size them according to capacity


(4) FAILS: size them according to capacity/ distinct count for DAY variable (i.e., average capacity across time).
image

I get the same error on mapdql:
mapdql> SELECT COUNT(DISTINCT “natgas__nominationsjoineda2015”.“gas_day”) AS “TEMP_Calculation_625437446534221826__1847537827__0_”, SUM(“natgas__nominationsjoineda2015”.“net_sch_capacity”) AS “TEMP_Calculation_625437446534221826__3856207910__0_”, AVG(“natgas__nominationsjoineda2015”.“latdec”) AS “avg_latdec_ok”, AVG(“natgas__nominationsjoineda2015”.“londec”) AS “avg_londec_ok”, “natgas__nominationsjoineda2015”.“facility” AS “facility”, “natgas__nominationsjoineda2015”.“loc_name” AS “loc_name” FROM “natgas__nominationsjoineda2015” GROUP BY 5, 6 ;
Exception: Not enough host memory to execute the query


#2

you can try adding those paramters

enable-watchdog= false
allow-cpu-retry = true

to mapd.conf file, then restart the mapd server, but i am not so confident the query will run because the way this kinda of query are resolved by the engine.

as a workaround for this Viz you can add a column to the table containing the gas_day of text type and applying the count distinct to this new column and using the date type column for other Vizs


#3

Hi @aznable. Thanks for the help I’ll try this workaround and report back. In the meantime would you mind explaining me further this:

“not so confident the query will run because the way this kinda of query are resolved by the engine”.

It seems you have a good understanding as of what the issue is. Thanks!


#4

I am confident the workaround works with very good performance; anyway a feedback is always appreciated.

i dont think disabling watchdog would help here, because the watchdog is stopping the query on a memory estimate (you should find an error message on log with the estimation and it would be greater that 2GB) and based on a personal assumption that the engine is needing a bitmap for each resulting group large as max value of the field; the date type on mapd is a date-time so with a max value of 2016-07-18 would be 1468800000 bits per group resulting on a massive request of memory and subpar performance.


Exception: Hash join failed, reason(s): Not enough memory for the columns involved in join
#5

@aznable You are the boss. I am happy to report that the date->string workaround worked.


#6

Hi @aznable I hit another wall :frowning:

Now when I try to do the same for time series (not map) plots I error on both, the string trick and the date.

(1) When I use date, same error as before, which is not surprising. Host mem error.
SELECT COUNT(DISTINCT “natgas__nominationsjoineda2015”.“gas_day_str”) AS “TEMP_Calculation_625437446427807744__1414298084__0_”,
SUM(“natgas__nominationsjoineda2015”.“net_sch_capacity”) AS “TEMP_Calculation_625437446427807744__3856207910__0_”,
“natgas__nominationsjoineda2015”.“loc_name” AS “loc_name”,
“natgas__nominationsjoineda2015”.“name” AS “name”,
{fn CONVERT(“natgas__nominationsjoineda2015”.“gas_day”, SQL_DATE)} AS “tdy_gas_day_ok”
FROM “natgas__nominationsjoineda2015”
GROUP BY 3,
4,
5

(2) When I use the string on tableau it does not know it’s a date and therefore it sends the wrong query to tableau - which also sends an error but albeit a different one the real problem is not the error but rather the query being not what one would want…

SELECT (CASE WHEN (MIN(“natgas__nominationsjoineda2015”.“gas_day_str”) IS NULL) THEN 0 WHEN NOT (MIN(“natgas__nominationsjoineda2015”.“gas_day_str”) IS NULL) THEN 1 ELSE NULL END) AS “TEMP_Calculation_625437446427807744__1414298084__0_”,
SUM(“natgas__nominationsjoineda2015”.“net_sch_capacity”) AS “TEMP_Calculation_625437446427807744__3856207910__0_”,
“natgas__nominationsjoineda2015”.“gas_day_str” AS “gas_day_str”,
“natgas__nominationsjoineda2015”.“loc_name” AS “loc_name”,
“natgas__nominationsjoineda2015”.“name” AS “name”
FROM “natgas__nominationsjoineda2015”
GROUP BY 3,
4,
5


#7

You should use the string version of field for count distinct only, for the rest of operations you should use the date version of field (min,max etc.); on this kind operation the performance on date field is ok and this would fix the second query.

The first query is using { fn convert( … etc. ; I got this problem using mapd ce with a odbc-jdbc bridge and a customized JDBC driver last summer, but I don’t remember exactly how I resolved the problem (a rewrite on the driver or parameters you can set on generic odbc on tableau). Can’t check now because I am on the phone, sorry


#8

nice, good hint to know it’s only on the count distinct… the first query is basically the one that tableau is sending to mapd when using date type to see time series…

Sure, when you have time just take a look, your help is really appreciated here :slight_smile:


Testing issue / -not enought host memory to execute the query
#9

i just sent a message with a tds and tdc file i used last summer with tableau 10.3, but i just noticed you are having a network error; a problem in driver or in the server?!