Dataset Library

Dataset Library

Get started with OmniSci Cloud using one of the datasets from our dataset library! This page has a collection of datasets that vary in category and size that cover demographic and corporate data. Using these datasets, you can begin creating your own charts and dashboards with ease in OmniSci. If you have an interesting dataset that you think would make a great addition to our library, feel free to contact us at community@omnisci.com.


# Dataset Title Description Category Relevant Links
1 MLB Baseball Pitch Data (2015-2018) This dataset contains over 2.87 million rows of data for every pitch thrown during the 2015-2018 Major League Baseball regular seasons. The information contained in this dataset include the type of pitch, names of batter and pitcher, speed of the pitch, etc.

Data types include: int float str
Sports Download
Blog Post
2 Ford GoBike Data This dataset contains 519,700 rows of Ford GoBike data with 15 columns during 2017, which includes information such as bike user attributes (gender, age, user type), starting and ending bike station, duration of the ride, etc.

Data types: int timestamp float str smallint
Sports Download Blog Post
3 NYC 311 Service Calls Data This dataset currently contains over 21.2 million rows and 41 columns of 311 service calls data since 2010 made in New York City and is updated daily. The information contained in this dataset includes location, complaint type, complaint description, resolution date, and more.

Data types: str float int date smallint bigint
Social Services Download Blog Post
4 Probe Vehicle Data This dataset currently contains 317,000 rows and 34 columns based on the field testing conducted in Fairfax County, Virginia, which utilized a fleet of 10 vehicles for the Advanced Messaging Concept Development (AMCD) project. The information contained in this dataset includes speed, elevation, lat/long values, and more for all PVDs generated during the AMCD field testing program.

Data types: str float int smallint bigint
Transportation Download Blog Post
5 Spotify's Worldwide Daily Song Ranking Data This dataset contains 3.4 million rows and 7 columns of data detailing the daily ranking of the 200 most streamed songs in 53 countries from 2017 to 2018 by Spotify users. The information contained in this dataset include the artist, track name, number of streams, and Spotify country code.

Data types: smallint str int date
Music Download Blog Post
6 Part D Prescriber Data CY 2015 Data This dataset contains 24.5 million rows and 21 columns of data that provides information on prescription drugs prescribed by individual physicians and other health care providers and paid for under the Medicare Part D Prescription Drug Program.

Data types: str int float
Healthcare Download Blog Post
7 Caltrans Traffic Data Multiple telemetry data series are available, ranging from speed to traffic incident information. The data is recorded by stations located throughout the freeways and the stations collect a variety of data including speed (in miles/hr) and occupancy (percent that the lane is full).

Data types: int timestamp float str smallint
Transportation Download Blog Post
8 Big Data Bowl Data Player tracking data one 2017 game. Player, play, and game-level data that correspond to the tracking data.

Data types: int timestamp float str smallint bigint
Sports Download
Blog Post
9 Uber Movement Data +
San Francisco TAZ Data
The Uber movement data contains 1 month (plus few days from the subsequent month) worth of rides around the San Francisco area.

The TAZ stands for the Traffic Area Zones which is an ID used by the Uber dataset. The TAZ contains the polygons for each traffic zone and can be joined, based on the id, to create a choropleth to describe the data movement.

Data types: double int str multipolygon bigint
Transportation Download Uber Download TAZ Blog Post
10 Jekyll Island LIDAR Data This dataset contains LiDAR data from Jekyll Island, one of the barrier islands off the coast of the state of Georgia, along with Microsoft Building Footprint data that is used for the buildings (polygons) on Jekyll Island, joined with parels tax information from Glynn County to estimate building value.

Data types:
LIDAR Download Blog Post