I have billions of records with columns A, B, C, D. I’d like to keep track of only the unique combination of columns. Is there an efficient way to do that? In other words, I’d like to insert a new record only if that combination has never been seen before.
I read through the “Avoiding Duplicate Rows” thread. In my case, all the data will come from a single table in another database.
Ideally, the functionality I want to eventually develop is to detect when a new combination of values comes in. Perhaps I can initially seed the mapd table by using some other tool to deduplicate outside of mapd before loading and then just use mapd to do querys on new records as they come in to see if they exist in mapd yet…