According the 'Whitepaper ’ of mapd,i want to get the performance of mapd with 1.2B rows of data from
http://stat-computing.org/dataexpo/2009/the-data.html. I executed 5 SQL statements, but query 3 and query 5 could not be executed successfully.It told me
cannot apply 'PG_DATE_TRUNC' to arguements of type "PG_DATE_TRUNC(<CHAR(5)>,<INTEGER>)',supported form(x):'PG_DATE_TRUNC(<STRING>,<DATETIME>)'". The query 3 is
select data_trunc(month,dep_timestamp) as ym,avg(arrdelay) as del from flights group by ym. But the type of ‘dep_timestamp’ is INTEGER. Should i transform the type of ‘dep_timestamp’ from INTEGER to DATETIME?
Hi @hchen -
As provided by the url you posted, the data aren’t in timestamp form, so as you noticed using the
date_trunc function isn’t going to work. As the data are provided by the website, there are no timestamp columns.
However, if you’ve created the timestamp columns yourself but they are
integer, you should reimport the data with the column type as a timestamp. This will then allow you to use the
date_trunc function. Unfortunately, casting from integer to timestamp within a query is not supported yet. If you are interested in seeing which types of casting we do support, please see the ‘Type Cast Support’ table at the following link:
The datasetyou are using is the one I use to test the performance of jois in an small dataset, but as @randyzwitch already pointed out, it doesn’t contains any timestamp/date field.
anyway I can help you in two ways:
Supplying a simple script that uses awk to add to the original files a date/timestamp field or giving you access to the files with the timestamp/date field added an the ddl for the tables on a public s3 storage
Thank you very much.
I have solved the problem by changing the type of
dep_timestamp from INTEGER to TIMESTAMP when i was creating the table. The type of the column was transformed from INTEGER to TIMESTAMP automatically.
Thank you very much, Mr randyzwitch. I solved the problem with your suggestion.
Besides, i also have a question. when i was inserting data into mapd, i found that using copy statements was faster than using \copy statements. I tried executing \copy statements by multi-threads, and the speed was faster than using one thread. So i want to know whether copy is consist of several \copy statements? Thank you.
They are different commands. The copy command works with files local to the server (you can also use files located on s3) and uses by default a number of threads equals to the number of cores on the system.
Following up on @aznable comment that they are two different commands, here is our documenation around importing data using the copy commands:
As you found,
\copy is expected to be slower, not due to the number of threads per se, but because of where the data is expected to be loaded from (
\copy on the client side,
copy on the server)
Thank you,Mr randyzwitch.
Thank you for your answer.