CREATE FOREIGN Usage

Hi Team,

I was trying to use FOREIGN storage from s3. I tried enabling the flag and while using CREATE FOREIGN TABLE SQL is throwing parsing error.

I’m compiling the OmnisciDB from scratch and enabled ENABLE_S3_FSI. Am I missing anything. I example of creating a table from S3 would be really helpful.

Thanks
Gopinath

Hi @Gopinath_Jaganmohan,

I just asked, but have you tried to use the s3 parameters of copy command docs here when you created the server for foreign data wrapper?

Could you share the syntax you used to try to load your external file?

regards,
Candido

Hi @Gopinath_Jaganmohan

FOREIGN storage with S3 is not currently available. We’ll notify you when it is.

Regards,
Pey

Thanks for the Reply.

Ok, I just tried with one of the examples for this forum. Will the below example work? What I have to do to enable this while compiling the code.

"create foreign table flight_fsi_en ( flight_year SMALLINT, flight_mmonth SMALLINT, flight_dayofmonth SMALLINT, flight_dayofweek SMALLINT, deptime SMALLINT, crsdeptime SMALLINT, arrtime SMALLINT, crsarrtime SMALLINT, uniquecarrier TEXT ENCODING NONE, flightnum SMALLINT, tailnum TEXT ENCODING NONE, actualelapsedtime SMALLINT, crselapsedtime SMALLINT, airtime SMALLINT, arrdelay SMALLINT, depdelay SMALLINT, origin TEXT ENCODING NONE, dest TEXT ENCODING NONE, distance SMALLINT, taxiin SMALLINT, taxiout SMALLINT, cancelled SMALLINT, cancellationcode TEXT ENCODING NONE, diverted SMALLINT, carrierdelay SMALLINT, weatherdelay SMALLINT, nasdelay SMALLINT, securitydelay SMALLINT, lateaircraftdelay SMALLINT, dep_timestamp TIMESTAMP(0) ENCODING FIXED(32), arr_timestamp TIMESTAMP(0) ENCODING FIXED(32), carrier_name TEXT ENCODING NONE, plane_type TEXT ENCODING NONE, plane_manufacturer TEXT ENCODING NONE, plane_issue_date DATE ENCODING DAYS(16), plane_model TEXT ENCODING NONE, plane_status TEXT ENCODING NONE, plane_aircraft_type TEXT ENCODING NONE, plane_engine_type TEXT ENCODING NONE, plane_year SMALLINT, origin_name TEXT ENCODING NONE, origin_city TEXT ENCODING NONE, origin_state TEXT ENCODING NONE, origin_country TEXT ENCODING NONE, origin_lat FLOAT, origin_lon FLOAT, dest_name TEXT ENCODING NONE, dest_city TEXT ENCODING NONE, dest_state TEXT ENCODING NONE, dest_country TEXT ENCODING NONE, dest_lat FLOAT, dest_lon FLOAT) server test_server with (file_path=‘flights/flights.csv.gz’);

We have requirement to use the FOREIGN storage at least to start with I’m good with local files. I found test case but not sure functionalities are implemented.

Hi @Gopinath_Jaganmohan,

As @Pey.Silvester stated the FSI on s3 storage isn’t ready yet, but it shouldn’t take a long time to be generally available

About the functionality on local files, it should be available but, if I am not wrong, you have .to enable the feature at database startup explicitly

The parameter is called enable-fsi and has to be set to true.

so first of all, start the server with the parameter enable-fsi turned to true (or add the parameter on omnisci.conf)

bin/omnisci_server --data /mapd_storage/data_test --enable-fsi=true

then connect to the database with omnisql, create your SERVR then the table

omnisql> CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv  
WITH (storage_type = 'LOCAL_FILE', base_path = '/home/candido/');
omnisql> CREATE FOREIGN TABLE test_table_fsi 
(test_column_1 INTEGER NOT NULL,
test_column_2 text encoding dict(32),
test_column_3 date not null) 
SERVER test_server 
WITH (file_path='test_fsi_table.csv', header='false');
omnisql> SELECT * FROM test_table_fsi LIMIT 2;
test_column_1|test_column_2|test_column_3
1|Hello|2020-10-12
2|Goddbye|2021-01-13
2 rows returned.
Execution time: 157 ms, Total time: 170 ms

If you are starting the server from the command line, be sure to have the number of files opened set to unlimited; you can change with limit -n 65535.

basically, the options are the same as the COPY command, so if you have a pipe-delimited file, you have to specify the delimiter in the WITH clause

omnisql> CREATE FOREIGN TABLE test_table_fsi_s 
(test_column_1 INTEGER NOT NULL,
test_column_2 text encoding dict(32),
test_column_3 date not null) 
SERVER test_server 
with (file_path='test_fsi_table_s.csv', 
header='false', 
delimiter='|');
omnisql> select * from test_table_fsi_s limit 2;
test_column_1|test_column_2|test_column_3
1|Hello|2020-10-12
2|Goddbye|2021-01-13
2 rows returned.
Execution time: 129 ms, Total time: 152 ms

If you have further questions, don’t hesitate to ask.

Regards,
Candido

Thank you very much, Candido. It worked I thought it enabled by default. I know it’s difficult, but when is the expected timeline for S3 FSI.

Some clarification: What does “CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv” command achieve?

Also, I’m planning to use Arrow instead of CSV, Pls can you provide an example of Using Arrow format too.

Regards
Gopinath

Hi,

It’s not enabled by default because it’s a quite new feature, that needs some refining.

The S3 storage type could be ready for the next release, but it’s still in the works so it could shift.
Anyway, we will notify thru the community forum when will be released.

The CREATE SERVER is needed to describe where the files are located, which kind of file is expected, and other fixed attributes (e.g. username, URL, the key of S3 storage) that would be annoying to specify in each CREATE FOREIGN TABLE statement.
So if you have a fixed local path with parquet files you will issue a command like

CREATE SERVER myfiles_srv FOREIGN DATA WRAPPER omnisci_parquet WITH ( storage_type = 'LOCAL_FILE' , base_path = '/some_path' );

So the users that have granted the use of that server just need to know the name and the fields of parquet files without the need to specify that the files are in parquet format and to know where the files are located in the server.
In the case of S3, you don’t need to give any part or credential to end-users to make them able to load the files.

If you need to move the files in another filesystem you have just to alter the server definition (I haven’t tried it yet), without the need of altering all the foreign tables.

Right now we support CSV and PARQUET files only; I will check if and when we will extend the support to arrow files too, and I will report back.

Regards,
Candido

Hi @candido.dessanti ,

Thank you very much. Once I tried to implement Arrow directly I think in release 5.4.1 or the first version of Create Temp with CSV file. I think I can contribute to the Arrow part.

Regards
Gopinath

Thanks,

Anyway I will ask internally, because it would be a nice feature to have in the database.

If the disk space is the concern you can read arrow files and load into a temporary table using pymapd; the temporary tables are IM so they won’t use disk space

Thanks all, I appreciate this discussion! :+1: