C++ API and ML Library integration

I am building an app that requires a GPU database and Omni-sci seems to fit the bill quite well, I’ve only just started looking into Omni-Sci and am still at the stage of feeling overwhelmed with all the new information. I’m looking for some advice to help focus my efforts in the right direction.

I have a C++ Server and am looking for a way to interact with Omni Sci using C++
Ideally, I would like my C++ server to completely control all interactions with OmniSci. There will be no direct user interaction with the database. I would like the server to be able to:
create/delete databases,
create/alter/delete tables,
setup schemas/indexes(if these are used),
run queries,
extract query results
I would also like to send functions as part of the queries, I am led to believe that the LLVM compiler might allow this but I’m not sure how exactly that would work yet.

My server also sends workloads to a C++ PyTorch API, so I would like to be able to access data from OmniSci to send to my PyTorch library. The simple way to do this would be to query for some data and then send the query results to my ML libraries, but I get the feeling there might be a way to leverage Arrow and do a non-copy transfer of data to my ML Libraries.

My question is how much of the above requirements could be achieved by using pre-existing APIs or connectivity like Thrift. And if so what do I need to look into. Do you have any dedicated C++ API’s

What if any of the above would require me to work directly with the OmniSci source code, which id like to avoid if I can.

Any information that helps clear up my thoughts would be greatly appreciated.


You can do everything you need using a c++ thrift layer; all needed should be in the dependencies needed to compile the server.

You can find examples in some languages in this subdirectory.

particularly in

About sending functions, I’m not sure you can do it using C++, but you can do something using python and the RBC project; check out this example.

euclidean.ipynb · GitHub.

or you can use UDFs (not at runtime)

You can also define UDTFs on C++, but it’s undocumented, and I’m not sure it is already in the master, but if needed, I can resemble an example.

On Arrow, you are right; you can run a query locally to the machine and get an ADF (and recently you can do this remotely, but it’s obviously slow; if interested, take a look at this blog written recently by Joel Clay); while the examples are written in python, they are likely based on a trifht c++ endpoint.

I used GPU Data Frames in python using a pymapd connector, and they are quite effective, so you would get too much to use them on a C++ server.

Except for the C++ UDTFs, anything I wrote about, is usable without the need of compiling the OmniSci Database right now.

I hope I replied in a satisfactory way to your questions.