Loading parquet files from non-HDFS file system


Is it possible to load a series of parquet files into a MapD database/table that are residing in a non-HDFS file system (i.e. from a regular Linux/Windows based OS disk space). Which command should I use.



We can pull data from HDFS parquet files via sqoop but i am not sure how that would work if you parquet files are already sitting in a non-hdfs file system.

I think you may have to convert your parquet files to csv prior to the load now you have them on linux fs. Looking at the literature it looks like the easiest way to do this is via a spark-shell process.



What if the parquet files are in a S3 bucket. Can I use the sqoop command or copy command to load the data into a table?



I suspect if you ran from an Amazon EMR environment you would be able to use sqoop to export from the s3 to MapD. i am unaware of anyone having tried this.

The sqoop command would look something like this if I were going to try it

  --connect "jdbc:mapd:<MapDServer>:mapd" \
  --driver com.mapd.jdbc.MapDDriver --username mapd \
  --password HyperInteractive --direct --batch```

It would expect the table already to exist in the MapD DB.