Python Integration

Hi Team,

I have been trying to optimize my workflow between Python & Omnisci Core Db using pymapd & cudf but running into some roadblocks (use case = export to cudf dataframe, export to csv):

1- Is pymapd the right library? (Conscious there is pyomnisci available though running through compatibility issues when trying to conda install with cudf in a dedicated conda environment.)

2- Using the select_ipc_gpu method (or select_ipc), it throws out an exception due to text columns being included (“TOmniSciException(error_msg=‘TEXT is not supported in Arrow result sets.’)”).

3- When trying another export method by using execute(“COPY (SELECT * FROM table) TO ‘/home/extract.csv’;”), date fields aren’t showing as date, is there an option to force this in the “WITH” clause?

Current workflow works fine using pandas.read_sql and pandas.to_csv though it’s on the slow(ish) side, what would be your recommendation(s) to improve this type of workflow?

Thanks,
Laurent

Hi @Laurent,

  1. the correct library is pyomnisci, the latest version of the driver; we know about the compatibility issues, but we will resolve them to work with pyarrow 5.0.0 needed by the newest CUDF version soon.
    You should have some trouble also using pymapd btw.

  2. using pyomnsci, I don’t get any exception using the select_ipc method when using text encoded dictionary text, while the exception is returned on a field that has the text not encoded. So I guess you have to change your DDL to use text on your queries.

e.g.

>>> query = "SELECT depdelay, arrdelay,plane_type FROM flights_2008_10k limit 100"
>>> df = con.select_ipc(query)
>>> df
    depdelay  arrdelay   plane_type
0        4.0     -10.0  Corporation
1       -1.0       2.0  Corporation
2        1.0      -8.0  Corporation
3       10.0       4.0  Corporation
4       15.0       1.0  Corporation
..       ...       ...          ...
95      -3.0     -18.0  Corporation
96      58.0      52.0  Corporation
97      22.0      25.0  Corporation
98      -1.0     -14.0  Corporation
99      -1.0       3.0  Corporation
  1. The copy to command export the dates, timestamp, and time as epochs (so seconds, ms, etc. depending on the precision of the field), and right now, you cannot change the output format of the fields.

Regards,
Candido

Thanks for the quick response Candido, appreciated. One of my text column wasn’t dictionary encoded, when I tried with dictionary encoded text columns, it did work (even in pymapd but time to upgrade).

Best,
Laurent

1 Like

Hi,

Seeing in the code the corrections needed to correctly implement Arrow/5.0.0,.so a version of pyomnsci with support of cudf is likely to land soon.