Getting random sample of rows from OmniSQL

Hi,
I was wondering whether anyone found a way to get a sample that is not deterministic for each row. The normal solution of using rand() is not possible and the usage of sample_ratio() is deterministic for each row which disqualifies it for my purposes.

I would happy to hear any suggestions.

Hi @ldnker,

Thanks for joining our forum.

We haven’t implemented the rand function yet, but you can re-use the procedure we use to distribute the point evenly in our point maps, making it randomly.

try this

for 0.1 percent

SELECT col1,col2,..., col2 
FROM table
WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) < 4294967296/1000 
AND rowid != 0
LIMIT 1000;

for 1 percent

SELECT col1,col2,..., col2 
FROM table
WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) < 4294967296/100
AND rowid != 0 
LIMIT 1000;

the LIMIT is optional, and with a little more complicated query you can get a proximate number of records close to the one needed.

so if you want around 1000 rows

SELECT col1,col2,..., col2 
FROM table
WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) <
(SELECT (4294967296/count(*))*1002 from table)
AND rowid != 0 
LIMIT 1000;

I used a projection query as an example, but it fits also for aggregate queries.

being based on epoch that’s is returned in seconds, if you run the query in the same seconds, the same rows are returned, but you can change the filter adding in the mix microseconds or nanoseconds

something like EXTRACT(epoch FROM now()+EXTRACT(NANOSECOND from now())

Regards,
Candido

Hi Candido,

thank you so much for your help. That is very helpful.

The credits for the idea go to @todd, I just elaborated his idea.

to makes the syntax simpler, using UDFs would help, but as you can read in the docs, additional software is needed and I would check the performances of the implementation.