How to pass ARRAY[] parameters in sql and python?

Hi Experts,

I’ve implemented a UDF which takes 2 parameters, both are ARRAY float.
I was wondering how to pass values to it via pymapd and sql?

SQL:
I’ve tried this one
select feature, float_dot(feature, ARRAY[1.25 ,1.25 ]) from samplevector ;
and got the following error
Exception: Function float_dot(FLOAT, NUMERIC(3,2)) not supported.
Existing extension function implementations:
float_dot({float*, i64, i8}, {float, i64, i8}*) -> float

seems like the array was converted to NUMERIC by default.
and then I tried another
select * , float_dot(feature, ARRAY[CAST(1.25 AS FLOAT),CAST(1.25 AS FLOAT)]) from samplevector where float_dot(feature, ARRAY[CAST(1.25 AS FLOAT),CAST(1.25 AS FLOAT)]) >5 ;

it run successfully for the first time
d|feature|rowid|EXPR$3

1|{1.2, 3.4}|0|5.75

Howeveer if I run it second time, I will get error:

Thrift error: No more data to read.

Thrift connection error: No more data to read.

Retrying connection

Could you please give some suggestion?

Also How to use pymapd to pass array?

I tried in Pyson:

import pymapd
import pandas as pd
con = pymapd.connect(user=“admin”, password=“HyperInteractive”, host=“localhost”)
data=[1.23,1.25]
query = “”“select feature, float_dot(feature, :testdata) from samplevector”""
df = pd.read_sql(query, con,params={‘testdata’: data})
Traceback (most recent call last):
File “/home/faceos/.local/lib/python3.7/site-packages/pymapd/cursor.py”, line 117, in execute
nonce=None, first_n=-1, at_most_n=-1)
File “/home/faceos/.local/lib/python3.7/site-packages/omnisci/mapd/MapD.py”, line 1598, in sql_execute
return self.recv_sql_execute()
File “/home/faceos/.local/lib/python3.7/site-packages/omnisci/mapd/MapD.py”, line 1627, in recv_sql_execute
raise result.e
omnisci.mapd.ttypes.TMapDException: TMapDException(error_msg=‘Exception: Function float_dot(FLOAT, NULL) not supported.\n Existing extension function implementations:\n float_dot({float*, i64, i8}, {float, i64, i8}*) -> float\n’)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py”, line 1586, in execute
cur.execute(args, *kwargs)
File “/home/faceos/.local/lib/python3.7/site-packages/pymapd/cursor.py”, line 119, in execute
raise _translate_exception(e) from e
pymapd.exceptions.Error: Exception: Function float_dot(FLOAT[], NULL) not supported.
Existing extension function implementations:
float_dot({float
, i64, i8}
, {float*, i64, i8}*) -> float

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py”, line 1590, in execute
self.con.rollback()
AttributeError: ‘Connection’ object has no attribute ‘rollback’

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “”, line 1, in
File “/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py”, line 412, in read_sql
chunksize=chunksize,
File “/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py”, line 1633, in read_query
cursor = self.execute(args)
File “/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py”, line 1595, in execute
raise ex from inner_exc
pandas.io.sql.DatabaseError: Execution failed on sql: select feature, float_dot(feature, :testdata) from samplevector
Exception: Function float_dot(FLOAT[], NULL) not supported.
Existing extension function implementations:
float_dot({float
, i64, i8}, {float, i64, i8}*) -> float

unable to rollback

Any suggestions?

Best,
Zipepng

Hi @zipeng -

Currently, we do not support using variable-width functions in UDFs. So passing arrays, strings or geospatial types won’t work. Using integers, float, boolean should work.

As we develop the user-defined function functionality further, we hope to support these other types, but unfortunately I don’t know the timeline for that.

Best,
Randy

Edited based on @alexb clarification

Thanks a lot Randy for the clarification. Looking forward to new stuff coming.

1 Like

Hi @zipeng,

Thanks for the question! To clarify @randyzwitch’s response, we do support variable-length datatypes as UDF inputs – namely arrays and geospatial types (we do not yet support string inputs). We also support varlen array outputs now, in C++ – see https://github.com/omnisci/omniscidb/blob/master/Tests/UdfTest.cpp#L473 for an example.

However, we do not support the ARRAY function yet. Our variable length runtime is built to read directly from input buffers. ARRAY attempts to mimic an input buffer, but isn’t quite identical. We have a project underway to move the varlen runtime from reading from input buffers to having its own intermediate representation (the representation already exists, called ArrayDatum), but it is performance sensitive so we have to be a little careful.

I’ll post here once we have some progress – should be weeks to months, not months to years.

Thanks,
Alex

1 Like