General

Expand all | Collapse all

Exception Error : TEXT is not supported in Arrow result sets

  • 1.  Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-01-2019 08:55
    Hi

    I am using Core with Pymapd

    I have a table which has loaded without any problems , but I get this message when I run a simple select statement

    TMapDException: TMapDException(error_msg='Exception: TEXT is not supported in Arrow result sets.')
    Statement 
    SELECT * FROM mytable  WHERE DecisionReference is not null LIMIT 5
    There are a mixture of column data types in the table- some are strings - surely this isn't a problem?

    many thanks

    Michael
    #Core

    ------------------------------
    Michael Kay
    [[Unknown]]
    ------------------------------


  • 2.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-01-2019 15:15
    Hi @Michael Kay

    I tried a similar Pymapd call using OmniSci 4.5 querying a TEXT encoded field in the demo flights dataset (flights_2008_7M) that we provide with OmniSci. For reference, the field I am using in this example is plane_aircraft_type which is defined as:
    plane_aircraft_type TEXT ENCODING DICT(32)

    I ran the query using Pymapd with both "is not null" and "is null" and you can see the output below.

    Connection(mapd://mapd:***@35.236.9.189:6274/mapd?protocol=binary)

    select * from flights_2008_7M WHERE plane_aircraft_type is not null LIMIT 2

    2

    [(2008, 1, 3, 4, 2037, 2020, 2129, 2115, u'WN', 1230, u'N677AA', 52, 55, 36, 14, 17, u'MCI', u'STL', 237, 4, 12, 0, None, 0, None, None, None, None, None, datetime.datetime(2008, 1, 3, 20, 37), datetime.datetime(2008, 1, 3, 21, 29), u'Southwest Airlines', u'Corporation', u'BOEING', datetime.date(1986, 1, 14), u'737-3A4', u'Valid', u'Fixed Wing Multi-Engine', u'Turbo-Jet', 1985, u'Kansas City International', u'Kansas City', u'MO', u'USA', 39.297603607177734, -94.71390533447266, u'Lambert-St Louis International', u'St Louis', u'MO', u'USA', 38.74768829345703, -90.35999298095703, -10543504.0, 4764391.0, -10058828.0, 4685594.0), (2008, 1, 3, 4, 901, 905, 948, 1000, u'WN', 1488, u'N461WN', 47, 55, 36, -12, -4, u'MCI', u'STL', 237, 4, 7, 0, None, 0, None, None, None, None, None, datetime.datetime(2008, 1, 3, 9, 1), datetime.datetime(2008, 1, 3, 9, 48), u'Southwest Airlines', u'Corporation', u'BOEING', datetime.date(2004, 6, 8), u'737-7H4', u'Valid', u'Fixed Wing Multi-Engine', u'Turbo-Fan', 2004, u'Kansas City International', u'Kansas City', u'MO', u'USA', 39.297603607177734, -94.71390533447266, u'Lambert-St Louis International', u'St Louis', u'MO', u'USA', 38.74768829345703, -90.35999298095703, -10543504.0, 4764391.0, -10058828.0, 4685594.0)]

    =====

    Connection(mapd://mapd:***@35.236.9.189:6274/mapd?protocol=binary)

    select * from flights_2008_7M WHERE plane_aircraft_type is null LIMIT 2

    2

    [(2008, 1, 5, 6, 1111, 1115, 1405, 1410, u'WN', 147, u'N321SW', 114, 115, 94, -5, -4, u'BNA', u'PHL', 675, 7, 13, 0, None, 0, None, None, None, None, None, datetime.datetime(2008, 1, 5, 11, 11), datetime.datetime(2008, 1, 5, 14, 5), u'Southwest Airlines', None, None, None, None, None, None, None, None, u'Nashville International', u'Nashville', u'TN', u'USA', 36.12447738647461, -86.67818450927734, u'Philadelphia Intl', u'Philadelphia', u'PA', u'USA', 39.871952056884766, -75.24114227294922, -9648971.0, 4317762.5, -8375805.5, 4847352.0), (2008, 1, 5, 6, 1848, 1830, 1935, 1910, u'WN', 793, u'N320SW', 107, 100, 87, 25, 18, u'BOI', u'LAS', 520, 13, 7, 0, None, 0, 7, 0, 7, 0, 11, datetime.datetime(2008, 1, 5, 18, 48), datetime.datetime(2008, 1, 5, 19, 35), u'Southwest Airlines', None, None, None, None, None, None, None, None, u'Boise Air Terminal', u'Boise', u'ID', u'USA', 43.56444549560547, -116.2227783203125, u'McCarran International', u'Las Vegas', u'NV', u'USA', 36.080360412597656, -115.15233612060547, -12937860.0, 5398284.5, -12818699.0, 4311684.5)]

    Regards,
    Veda



    ------------------------------
    Veda Shankar
    OmniSci
    ------------------------------



  • 3.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-02-2019 01:58
    Thanks Veda
    I am going to investigate on a reduced number of columns to see if I can isolate the error and will post back.

    The CREATE TABLE statement defined the data type of the text columns as text encoding dict (see below) NOT text encoding dict(32)?

    Could that be the reason behind the error message?

    CREATE TABLE mytable (AccomType text encoding dict,AccomTypeCount integer,AccomTypeUnique text encoding dict,AddressMoveDate text encoding dict,AddressMoveDateCount integer,AddressMoveDateUnique text encoding dict,TargetTestVariant text encoding dict,Age DOUBLE,AgeCount integer, ........


    Cheers
    Michael

    ------------------------------
    Michael Kay
    [[Unknown]]
    ------------------------------



  • 4.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-02-2019 03:02
    Hi Veda

    As an update - I have just recreated the table using "text encoding dict(32)" but I am getting the same error message?

    Do you know what this message means exactly - is there some limitation or issue with Arrow that would throw up that message?

    thanks for your support

    cheers
    Michael

    ------------------------------
    Michael Kay
    [[Unknown]]
    ------------------------------



  • 5.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-02-2019 05:21
    Hi @Michael Kay, I suspect this is a bug, or a holdover from earlier versions of Arrow having issues with text. Would you mind filing an issue on the pymapd repo so that we can track this down for you? ​

    ------------------------------
    Randy Zwitch
    Senior Director of Developer Advocacy
    OmniSci
    Philadelphia PA
    ------------------------------



  • 6.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-02-2019 15:33
    Hi Randy

    Happy to do this but unsure how to file the issue on the pymapd repo - please can you send me a link and quick instructions.

    Given that Veda has shown that this Arrow conversion for text is clearly possible for at least the demo flights dataset - do we have an idea why it would fail in my case?

    Can you give me any steer on whether this problem can be resolved?

    I am building code to hopefully replicate this problem from scratch for a small toy dataset and will post here soon

    cheers and thanks for looking into this.

    best
    Michael




  • 7.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-03-2019 05:38
    Sorry about that @Michael Kay, here's the link: https://github.com/omnisci/pymapd/issues

    It's subtle, but Veda's example isn't actually doing the same thing as yours. If you are getting an Arrow error, that means you are using the `select_ipc`​ method. Veda used `execute`, which returns a list of tuples.

    So if you could write up a toy example, then either myself or another one of our maintainers can step through a debugger and figure out the issue.

    Best,
    Randy


  • 8.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-04-2019 07:29
    hi @Randy Zwitch and @Michael Kay,

    Installed the latest driver and selecting data containing strings using arrow doesn't look to be a problem

    >>> import pymapd
    >>> import sys
    >>> import datetime
    >>> import time
    >>> import humanfriendly
    >>> from psutil import virtual_memory
    >>> import gc
    >>> import os
    >>> conn=pymapd.connect(
    ... user=connection_attrs['user'],
    ... password=connection_attrs['password'],
    ... host=connection_attrs['host'],
    ... port=connection_attrs['port'],
    ... dbname=connection_attrs['dbname'])
    >>> df = conn.select_ipc('select id,abbr,name from omnisci_states');
    >>> df = conn.select_ipc('select * from flights_2008_10k');
    >>> df.head()
    flight_year flight_month flight_dayofmonth flight_dayofweek deptime ... dest_lon origin_merc_x origin_merc_y dest_merc_x dest_merc_y
    0 2008 1 3 4 2003.0 ... -82.533249 -8622341.0 4713729.5 -9187559.0 3245881.75
    1 2008 1 3 4 754.0 ... -82.533249 -8622341.0 4713729.5 -9187559.0 3245881.75
    2 2008 1 3 4 628.0 ... -76.668198 -9606247.0 4824950.0 -8534665.0 4746827.50
    3 2008 1 3 4 926.0 ... -76.668198 -9606247.0 4824950.0 -8534665.0 4746827.50
    4 2008 1 3 4 1829.0 ... -76.668198 -9606247.0 4824950.0 -8534665.0 4746827.50

    [5 rows x 56 columns]
    >>> df = conn.select_ipc('select * from flights_2008_7M');
    >>> df.head()
    flight_year flight_month flight_dayofmonth flight_dayofweek deptime ... dest_lon origin_merc_x origin_merc_y dest_merc_x dest_merc_y
    0 2008 1 9 3 1849.0 ... -90.359993 -10606397.0 3458052.75 -10058828.0 4685594.00
    1 2008 1 9 3 933.0 ... -90.359993 -10606397.0 3458052.75 -10058828.0 4685594.00
    2 2008 1 9 3 1451.0 ... -82.533249 -10606397.0 3458052.75 -9187559.0 3245881.75
    3 2008 1 9 3 1903.0 ... -82.533249 -10606397.0 3458052.75 -9187559.0 3245881.75
    4 2008 1 9 3 752.0 ... -82.533249 -10606397.0 3458052.75 -9187559.0 3245881.75

    [5 rows x 56 columns]
    >>> df.tail()
    flight_year flight_month flight_dayofmonth flight_dayofweek deptime ... dest_lon origin_merc_x origin_merc_y dest_merc_x dest_merc_y
    7009723 2008 12 3 3 710.0 ... -95.339722 -8223461.0 4979540.5 -10613169.0 3501040.00
    7009724 2008 12 3 3 903.0 ... -118.408073 -9111433.0 5073138.5 -13181127.0 4021088.75
    7009725 2008 12 3 3 1338.0 ... -117.189659 -10613169.0 3501040.0 -13045493.0 3859991.25
    7009726 2008 12 3 3 919.0 ... -117.189659 -10613169.0 3501040.0 -13045493.0 3859991.25
    7009727 2008 12 3 3 1258.0 ... -81.849396 -10613169.0 3501040.0 -9111433.0 5073138.50

    [5 rows x 56 columns]

    ​​


  • 9.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-04-2019 07:40
    This is a good point @Candido Dessanti...we've been doing a lot of improvements to pymapd, and the Arrow project has been as well. So the first step should be to try the newest version of pymapd (currently, 0.10) and see if that resolves the problem. And if not, please file a bug, and we'll get it straightened out.​​


  • 10.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-05-2019 23:11
    Thanks All for your help and suggestions
    I have tracked down the error to a bug in my code
    The "TEXT is not supported " error  also occurs if you query a table that has not been properly created, as I found out!
    So treatment of text columns turned out to be a red herring...
    Maybe that is a useful item of information for somebody else in the future anyway
    thanks again


  • 11.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-06-2019 03:25
    Edited by Candido Dessanti 04-06-2019 14:37
    Hi @Michael Kay,
    With table not properly created what do you mean? a table containing text fields not encoded?
    ​​


  • 12.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-08-2019 05:48
    Hi @Candido Dessanti
    To be honest I am not sure. The error occurred with a table that was built automatically (CREATE TABLE statement) using a dictionary of column names and data types. I have now re-written the code and it works. I have a feeling the error was due to the fact that a query was run against a non-existent table.


  • 13.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-08-2019 07:29
    Hi @Michael Kay

    Don't worry, I already reproduced the error.
    Probably you have created a table with a text encoding none field; doing that the driver fails when you try to run a query on table.

    The error querying a nonexistent table is completly different ;)


  • 14.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-09-2019 13:17
    Hi @Candido Dessanti
    Please can you explain what a text encoding none field is​. Any examples of good practice to avoid errors when creating a table would be good. I am still getting inexplicable error messages when I try to append data from pandas to a pre existing table
    many thanks


  • 15.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-09-2019 20:57
    hi @Michael Kay,

    when you create a table with string datatypes, whatever the alias you use,  the database will create a dictionary encoded text field

    example

    omnisql> create table test_strings (string1 varchar(40), string2 char(10), string3 text, string4 text encoding none) ;
    omnisql> \d test_strings
    CREATE TABLE test_strings (
    string1 TEXT ENCODING DICT(32),
    string2 TEXT ENCODING DICT(32),
    string3 TEXT ENCODING DICT(32),
    string4 TEXT ENCODING NONE)


    Dictionary encoded text fields can be used in group by and joins to save memory/processing power on GPUs, while text encoding none could be used as filter or detail ones.
    you can get an idea about the data types on omnisci database following this link
    omnisci datatypes
    You can use both, but dictionary encoded text fields are generally preferable because you can save a lot of memory and disk space, and flexibility of use.

    Which errors are you getting?




  • 16.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 03:40
    Hi @Candido Dessanti
    Thanks for your explanation
    All the text columns in my table have been created as "text encoding dict(32)"​

    The table has been created with a "CREATE TABLE ...." using  Pymapd

    query =  "CREATE TABLE adobe(AccomType text encoding dict(32), AccomTypeCount integer, AccomTypeUnique text encoding dict(32), AddressMoveDate text encoding dict(32), AddressMoveDateCount integer, AddressMoveDateUnique text encoding dict(32), AdobeTargetTestVariant text encoding dict(32), Age DOUBLE, AgeCount integer, AgeUnique text encoding dict(32), AgreementId DOUBLE, AppRefKey text encoding dict(32), ApplicationJourney text encoding dict(32), BestTimeContact text encoding dict(32), Browser text encoding dict(32), Browser_height integer, Browser_width integer, Campaign text encoding dict(32), Channel text encoding dict(32), CompletionStatus DOUBLE, Connection_type text encoding dict(32), CustomerStatus DOUBLE, DecibelInsightId text encoding dict(32), DecisionReference DOUBLE, DecisionType DOUBLE, Domain text encoding dict(32), Duration integer, Email text encoding dict(32), EmailCount integer, EmailUnique text encoding dict(32), FirstPaymentDate text encoding dict(32), GenerateApplicationId text encoding dict(32), GeoCity text encoding dict(32), GeoCityCount integer, GeoCityUnique text encoding dict(32), GeoCountry text encoding dict(32), GeoCountryCount integer, GeoCountryUnique text encoding dict(32), GeoDma integer, GeoDmaCount integer, GeoDmaUnique text encoding dict(32), GeoRegion text encoding dict(32), GeoRegionCount integer, GeoRegionUnique text encoding dict(32), GeoZip text encoding dict(32), GeoZipCount integer, GeoZipUnique text encoding dict(32), HaveCreditCards text encoding dict(32), HaveCreditCardsCount integer, HaveCreditCardsUnique text encoding dict(32), IncomeFrequency text encoding dict(32), IncomeFrequencyCount integer, IncomeFrequencyUnique text encoding dict(32), IncomePaymentMethod text encoding dict(32), IncomePaymentMethodCount integer, IncomePaymentMethodUnique text encoding dict(32), IncomeSource text encoding dict(32), IncomeSourceCount integer, IncomeSourceUnique text encoding dict(32), IncomeValue DOUBLE, IncomeValueCount integer, IncomeValueUnique text encoding dict(32), Ip text encoding dict(32), IpCount integer, IpUnique text encoding dict(32), LoanCalcInteractionValuesCount integer, LoanCalcInteractionValuesLast text encoding dict(32), LoanCalcInteractionValuesUnique text encoding dict(32), LoanCalcInterest DOUBLE, LoanCalcInterestCount integer, LoanCalcInterestUnique text encoding dict(32), LoanCalcTerm DOUBLE, LoanCalcTermCount integer, LoanCalcTermUnique text encoding dict(32), LoanUse text encoding dict(32), LoanUseCount integer, LoanUseUnique text encoding dict(32), MaritalStatus text encoding dict(32), MaritalStatusCount integer, MaritalStatusUnique text encoding dict(32), Max_time TIMESTAMP(0), Min_time TIMESTAMP(0), NewRepeatSelfIdentify text encoding dict(32), NewRepeatSelfIdentifyCount integer, NewRepeatSelfIdentifyUnique text encoding dict(32), NumDependents text encoding dict(32), NumDependentsCount integer, NumDependentsUnique text encoding dict(32), OtherLoans text encoding dict(32), OtherLoansCount integer, OtherLoansUnique text encoding dict(32), OutgoingsCredit DOUBLE, OutgoingsCreditCount integer, OutgoingsCreditUnique text encoding dict(32), OutgoingsOther DOUBLE, OutgoingsOtherCount integer, OutgoingsOtherUnique text encoding dict(32), Page_url text encoding dict(32), PersonId DOUBLE, PrefPayDow text encoding dict(32), Ref_type text encoding dict(32), ReferringDomain text encoding dict(32), RentMortgage DOUBLE, RentMortgageCount integer, RentMortgageUnique text encoding dict(32), Resolution text encoding dict(32), Search_engine text encoding dict(32), Title text encoding dict(32), TitleCount integer, TitleUnique text encoding dict(32), User_agent text encoding dict(32), Visit_keywords text encoding dict(32), company text encoding dict(32), mcvisid text encoding dict(32), monthvisitid integer, monthyear integer, pagename text encoding dict(32))"​

    I have confirmed the table exists and looked at the table details with con.get_table_details()

    ColumnDetails(name='AccomType', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='AccomTypeCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='AccomTypeUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='AddressMoveDate', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='AddressMoveDateCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='AddressMoveDateUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='AdobeTargetTestVariant', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Age', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='AgeCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='AgeUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='AgreementId', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='AppRefKey', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='ApplicationJourney', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='BestTimeContact', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Browser', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Browser_height', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Browser_width', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Campaign', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Channel', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='CompletionStatus', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Connection_type', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='CustomerStatus', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='DecibelInsightId', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='DecisionReference', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='DecisionType', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Domain', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Duration', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Email', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='EmailCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='EmailUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='FirstPaymentDate', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GenerateApplicationId', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoCity', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoCityCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoCityUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoCountry', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoCountryCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoCountryUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoDma', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoDmaCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoDmaUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoRegion', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoRegionCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoRegionUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoZip', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='GeoZipCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='GeoZipUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='HaveCreditCards', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='HaveCreditCardsCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='HaveCreditCardsUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomeFrequency', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomeFrequencyCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IncomeFrequencyUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomePaymentMethod', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomePaymentMethodCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IncomePaymentMethodUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomeSource', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomeSourceCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IncomeSourceUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IncomeValue', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IncomeValueCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IncomeValueUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Ip', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='IpCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='IpUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanCalcInteractionValuesCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanCalcInteractionValuesLast', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanCalcInteractionValuesUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanCalcInterest', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanCalcInterestCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanCalcInterestUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanCalcTerm', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanCalcTermCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanCalcTermUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanUse', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='LoanUseCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='LoanUseUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='MaritalStatus', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='MaritalStatusCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='MaritalStatusUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Max_time', type='TIMESTAMP', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='Min_time', type='TIMESTAMP', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='NewRepeatSelfIdentify', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='NewRepeatSelfIdentifyCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='NewRepeatSelfIdentifyUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='NumDependents', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='NumDependentsCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='NumDependentsUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='OtherLoans', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='OtherLoansCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='OtherLoansUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='OutgoingsCredit', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='OutgoingsCreditCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='OutgoingsCreditUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='OutgoingsOther', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='OutgoingsOtherCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='OutgoingsOtherUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Page_url', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='PersonId', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='PrefPayDow', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Ref_type', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='ReferringDomain', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='RentMortgage', type='DOUBLE', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='RentMortgageCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='RentMortgageUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Resolution', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Search_engine', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Title', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='TitleCount', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='TitleUnique', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='User_agent', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='Visit_keywords', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='company', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='mcvisid', type='STR', nullable=True, precision=0, scale=0, comp_param=32), ColumnDetails(name='monthvisitid', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='monthyear', type='INT', nullable=True, precision=0, scale=0, comp_param=0), ColumnDetails(name='pagename', type='STR', nullable=True, precision=0, scale=0, comp_param=32)]:​


    Then I try to load the data with:

    con.load_table(tablename,pandasdataframe,preserve_index=False)​


    and I get this message

    TMapDException                            Traceback (most recent call last)
    <ipython-input-89-dccfa205baba> in <module>()
    ----> 1 con.load_table(tablename,dfpdecx,preserve_index=False)
    
    ~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table(self, table_name, data, method, preserve_index, create)
        458         if method == 'infer':
        459             if (_is_pandas(data) or _is_arrow(data)) and _HAS_ARROW:
    --> 460                 return self.load_table_arrow(table_name, data)
        461 
        462             elif _is_pandas(data):
    
    ~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table_arrow(self, table_name, data, preserve_index)
        560                                            preserve_index=preserve_index)
        561         self._client.load_table_binary_arrow(self._session, table_name,
    --> 562                                              payload.to_pybytes())
        563 
        564     def render_vega(self, vega, compression_level=1):
    
    ~/anaconda3/lib/python3.6/site-packages/mapd/MapD.py in load_table_binary_arrow(self, session, table_name, arrow_stream)
       2378         """
       2379         self.send_load_table_binary_arrow(session, table_name, arrow_stream)
    -> 2380         self.recv_load_table_binary_arrow()
       2381 
       2382     def send_load_table_binary_arrow(self, session, table_name, arrow_stream):
    
    ~/anaconda3/lib/python3.6/site-packages/mapd/MapD.py in recv_load_table_binary_arrow(self)
       2402         iprot.readMessageEnd()
       2403         if result.e is not None:
    -> 2404             raise result.e
       2405         return
       2406 
    
    TMapDException: TMapDException(error_msg='Exception: Expected binary col')
    
    
    ​​

    I thought it could be a problem with certain columns so I changed the "CREATE TABLE" to just 9 columns from the original 100, reduced the Pandas dataframe accordingly and that has loaded OK. 

    So I thought it could be a problem with nulls

    But it turns out that some of the 9 columns that loaded OK did actually have null values.

    So now I am confused.

    The error message doesn't help me narrow down what seems to be the problem column(s) I am trying to load

    TMapDException(error_msg='Exception: Expected binary col')​

    Do you have any idea where the problem could be or why I get this error message generally?

    Many thanks for your help



     
     
     
     
     
    1





  • 17.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 04:11
    Just as an update, I reduced the number of rows in the dataframe to be loaded to 50 (from approx 1M) 
    Instead of getting the"binary col expected" message this time I get this
    ArrowNotImplementedError                  Traceback (most recent call last)
    <ipython-input-38-49ae081e7898> in <module>()
    ----> 1 con.load_table('adobe',dfpx,preserve_index=False)
    
    ~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table(self, table_name, data, method, preserve_index, create)
        458         if method == 'infer':
        459             if (_is_pandas(data) or _is_arrow(data)) and _HAS_ARROW:
    --> 460                 return self.load_table_arrow(table_name, data)
        461 
        462             elif _is_pandas(data):
    
    ~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table_arrow(self, table_name, data, preserve_index)
        558         from ._pandas_loaders import _serialize_arrow_payload
        559         payload = _serialize_arrow_payload(data, metadata,
    --> 560                                            preserve_index=preserve_index)
        561         self._client.load_table_binary_arrow(self._session, table_name,
        562                                              payload.to_pybytes())
    
    ~/anaconda3/lib/python3.6/site-packages/pymapd/_pandas_loaders.py in _serialize_arrow_payload(data, table_metadata, preserve_index)
        139 
        140     if isinstance(data, pa.RecordBatch):
    --> 141         writer.write_batch(data)
        142     elif isinstance(data, pa.Table):
        143         writer.write_table(data)
    
    ipc.pxi in pyarrow.lib._RecordBatchWriter.write_batch()
    
    error.pxi in pyarrow.lib.check_status()
    
    ArrowNotImplementedError: Unable to convert type: null​


    so there must be some problem here with nulls?


  • 18.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 05:37
    Edited by Candido Dessanti 04-10-2019 06:52
    Hi @Michael Kay

    I'm not precisely the right person to talk about pandas.
    Anyway I did some tries, and the first error, TMapDException(error_msg='Exception: Expected binary col'), looks like a type mismatch between the database column and panda data frame column, while about the seconds have not idea but I'm confident, it's not a null problem; looks like that cannot translate the type.

    We are working with release of , so the results could be different (I'm using 0.10.0)

    Could you try to launch con.load_table(table_name,df,preserve_index=False) with a non-existent table name? A new table will be created with the matching types from pandas DF.

    e.g.
    >>> df = pandas.read_csv('1986.csv',parse_dates=[0])
    >>> df.head()
    Year-Month-DayofMonth Year Month DayofMonth DayOfWeek DepTime ... Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
    0 1987-10-14 1987 10 14 3 741.0 ... 0 NaN NaN NaN NaN NaN
    1 1987-10-15 1987 10 15 4 729.0 ... 0 NaN NaN NaN NaN NaN
    2 1987-10-17 1987 10 17 6 741.0 ... 0 NaN NaN NaN NaN NaN
    3 1987-10-18 1987 10 18 7 729.0 ... 0 NaN NaN NaN NaN NaN
    4 1987-10-19 1987 10 19 1 749.0 ... 0 NaN NaN NaN NaN NaN

    [5 rows x 30 columns]
    >>> conn.load_table('flights',df,preserve_index=False)

    For whatever reason the interpreted the DepTime field as decimal while into omnisci's table is defined as smallint, so I get an error similar to your

    Tmapd.ttypes.TMapDException: TMapDException(error_msg='Exception: Expected int16 type')

    loading into a no-existant table the table is created and the data is loaded (while I didn't like )
    conn.load_table('flights2',df,preserve_index=False)

    omnisql> select * from flights2 limit 10;

    Flight_timestamp|c1_Year|c2_Month|DayofMonth|DayOfWeek|DepTime|CRSDepTime|ArrTime|CRSArrTime|UniqueCarrier|FlightNum|TailNum|ActualElapsedTime|CRSElapsedTime|AirTime|ArrDelay|DepDelay|Origin|Dest|Distance|TaxiIn|TaxiOut|Cancelled|CancellationCode|Diverted|CarrierDelay|WeatherDelay|NASDelay|SecurityDelay|LateAircraftDelay
    1987-10-24 00:00:00|1987|10|24|6|1226|1230|1610|1613|UA|496|NULL|164|163|NULL|-3|-4|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-25 00:00:00|1987|10|25|7|1230|1230|1603|1613|UA|496|NULL|153|163|NULL|-10|0|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-26 00:00:00|1987|10|26|1|1228|1230|1613|1613|UA|496|NULL|165|163|NULL|0|-2|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-13 00:00:00|1987|10|13|2|1258|1300|1550|1548|UA|74|NULL|112|108|NULL|2|-2|ORD|LGA|733|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-14 00:00:00|1987|10|14|3|1256|1300|1543|1548|UA|74|NULL|107|108|NULL|-5|-4|ORD|LGA|733|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-27 00:00:00|1987|10|27|2|1226|1230|1620|1613|UA|496|NULL|174|163|NULL|7|-4|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-28 00:00:00|1987|10|28|3|1229|1230|1615|1613|UA|496|NULL|166|163|NULL|2|-1|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-29 00:00:00|1987|10|29|4|1225|1230|1627|1613|UA|496|NULL|182|163|NULL|14|-5|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-15 00:00:00|1987|10|15|4|1256|1300|1545|1548|UA|74|NULL|109|108|NULL|-3|-4|ORD|LGA|733|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL
    1987-10-30 00:00:00|1987|10|30|5|1227|1230|1612|1613|UA|496|NULL|165|163|NULL|-1|-3|SLC|ORD|1249|NULL|NULL|0|NULL|0|NULL|NULL|NULL|NULL|NULL






  • 19.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 06:26
    At this point @Michael Kay, it's probably best if you file an issue on the pymapd repo: ​https://github.com/omnisci/pymapd/issues

    Not that you'll get different answers (I do a lot of the pymapd maintenance), but rather this will help us track the issue and fix it.


  • 20.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 13:01
    Thanks Guys for your help
    I am going to try and find the column in my table which is causing the problem using an iterative loading procedure and see which column trips the error
    I will then come back to you and log an issue
    cheers
    Michael


  • 21.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-10-2019 13:14
    Edited by Candido Dessanti 04-10-2019 13:45
    Before doing this brute force approach take a look to your dataset using df.dtypes to figure out the problem; you can save time and energy (your time is precious as planet environment 😉)


  • 22.  RE: Exception Error : TEXT is not supported in Arrow result sets

    Posted 04-16-2019 13:10
    Hi Guys
    just a final note to say that I have managed to find a solution to loading data from pandas. The error was somewhere in my previous code. I have cleaned it up and the error messages gone. I have a Jupiter notebook with code that automatically builds create table statements iteratively loading one column at a time using a dictionary generated from the original pandas dataframe. Important if like in my case you have tables with lots of columns and one might be causing a problem. Happy to share if there is interest. Thanks for your help with this issue and feedback. Much appreciated. Cheers