OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

When:  Jun 13, 2019 from 10:00 AM to 11:00 AM (PT)
In this session, attendees will learn how the OmniSci GPU-accelerated SQL engine fits into the overall RAPIDS partner ecosystem for open-source GPU analytics. Using open bike-share data, users will learn how to ingest streaming data from Apache Kafka into OmniSci, perform descriptive statistics and feature engineering using both SQL and cuDF with Python and return the results as a GPU DataFrame. By the end of the session, users should feel comfortable that an entire data science workflow can be accomplished using tools from the RAPIDS eco-system, all without the data ever leaving the GPU.

Topics to be highlighted:
- What is RAPIDS? (discussion of NVIDIA open-source RAPIDS project, how it relates to Apache Arrow, etc.)
- What is OmniSci and how does it fit into the RAPIDS eco-system
- Example:
- Ingesting a data stream from Apache Kafka into OmniSci
- Using pymapd (Python) to query data from OmniSci and do basic visualizations
- Use cudf to do data cleaning and feature engineering
- Show how cudf dataframes can be passed to machine learning libraries like Tensorflow, PyTorch or xgboost.

Speaker: Randy Zwitch, Senior Developer Advocate at OmniSci


Online Instructions:
Login: Register for the webinar.