When mapd can support "update table from another"


#1

i check the doc :
UPDATE
[ WITH [ RECURSIVE ] with_query [, …] ]
UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ]
SET { column_name = { expression | DEFAULT } |
( column_name [, …] ) = [ ROW ] ( { expression | DEFAULT } [, …] ) |
( column_name [, …] ) = ( sub-SELECT )
} [, …]
[ FROM from_list ]
[ WHERE condition | WHERE CURRENT OF cursor_name ]
[ RETURNING * | output_expression [ [ AS ] output_name ] [, …] ]

but I try update t1 set f1 = t2.f1 from t2 where f1.f1 = t2.f1, it is exception,


#2

Obviously, the doc is wrong because actually, you can’t use correlated subqueries on DML operations, so you can use data coming from a query, but you can use subqueries just as a filter

You can use a query on CTAS.


#3

do you have a schedule to support this ?


#4

UPVOTE - This should be a huge priority.

I concur that the (sub-select) in the documentation is clearly wrong.
Sub-selects generally seem to be widely failing. I’m often blocked by the inability to do a simple sub select statement.

I need to update data on a regular basis and it seems my only real option currently is to maintain a SQL server elsewhere to get my job done and then recreate my tables in MAPD. And then I get the joy of recreating all the Dashboards built on that data because there is an unusually strict relationship there.

Stop the rant.
My question is does anyone have a strategy to work-around this odd shortcoming? Have you found success in a periodic (weekly or bi-weekly) changing data scenario. (different from just adding new data)


#5

Hey @davidrubert , with regards to this question:

My question is does anyone have a strategy to work-around this odd shortcoming? Have you found success in a periodic (weekly or bi-weekly) changing data scenario. (different from just adding new data)

It won’t work for all cases, but in many cases you can use the COPY command to export a SELECT query to a file, and then use COPY to import the exported file to the table that needs to have the data updated, then delete the duplicate rows. (And no it’s not ideal, but it’s a workaround in some cases, and may obviate the need for the separate SQL server)

eg.
DELETE FROM mytable WHERE rowid NOT IN (SELECT MAX(rowid) FROM mytable GROUP BY myuniqueid);

HTH


#6

Good suggestions there. I have a couple key fields that could provide reliable enough reference for an approach that creates “duplicates” but then delete the correct/older dup.

Thanks


#7

BTW @davidrubert just clarifying that rowid in the query above is a built-in OmniSci unique id contained in all tables, so as long as you have a single column you can aggregate (myuniqueid in the example) the query should work for you.