Discussions

Expand all | Collapse all

Using GPU on AWS AMI

  • 1.  Using GPU on AWS AMI

    Posted 05-27-2019 02:32
    Hello,

    I've been testing omnisci as well as other tools for a month now using AWS stations with GPU. We are particulary interrested in pymapd since we mainly use python in our codebase. But I am totaly unable to use GPU from pymapd. The execute and select_ipc functions work but I didn't manage to use the select_ipc_gpu. It succesfully return a cudf dataframe object but I can't do anything with it without getting an error.

    At first I think maybe I had messed up with my cuda installation or my GPU isn't compatible so I changed my testing station to the p2.xlarge instance initalised with Omnisci AMI (4.6.1 Open source edition) and just added pymapd with conda according to the documentation.

    Then I solved the missing library libcudart.9.2

    And I always fall back on the same cuda error: 
    RuntimeError: CUDA error encountered at: /conda/envs/gdf/conda-bld/libcudf_1558047478465/work/cpp/src/bitmask/bitmask_ops.cu:147: 48 cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device
    It's clearly a cudf error, but I can't find any related solution on the web, and since I'm on a omnisci pre-setuped machine I think you'll be the most qualified to answer me. There must be somthing I have missed but I don't now what.


    Summary:
    • AWS p2.xlarge with latest Omnisci AMI (opensource edition)
    • Install Anaconda (with their .sh installer)
    • Install pymapd ("conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c numba -c conda-forge -c defaults cudf" or "conda install -c nvidia -c rapidsai -c numba -c conda-forge -c defaults cudf" I tested both in virtual environments)
    • Install the missing library libcudart.9.2
    • Run the Getting started code

    I also note that the AMI installs cuda 10.1 by default and I do not changed it as I assumed that it was on purpose, and on my previous station I had tested with all cuda version and ended up with the same error.
    #Core


  • 2.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 04:35
    Message From: Candido Dessanti

    Hi @Stanislas Deneuville,

    Maybe I'm going to say an incredibly wrong thing, but afaik a cuDf is a pointer to the Gpu memory, and Kepler arch doesn't allow a transparent mapping of Gpu memory in the system memory.
    Just an idea


  • 3.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 04:40
    I made slightly different tests following a colleague guess that maybe only head() function doesn't work. I find out that there are some operations that still works and that in fact head() works too but only if the dataframe doesn't contain null values nor omnisci special types.

    For example loading/saving from files or pandas is broken too and query from Omnisci database don't work with all Omnisci types.

    There is way to much trouble to be usable but it somehow half work . It still guess it's a cudf/cuda/hardware incompatibility problem but I don't understand why in this case this architecture is specified as the recommended architecture on the AWS store.


  • 4.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 04:36
    Hi @Stanislas Deneuville,

    I did a brief try on my workstation with Omnisci 4.5 and pymapd 0.10, then I moved to Omnisci 4.6/4.6.1 with the latest version of pymap, and I'm not experiencing any issue on text encoded strings or NULLs values., but I'm just doing a little more than selecting and printing data.

    >>> df = conn.select_ipc_gpu("select case when uniquecarrier = 'OH' then null else uniquecarrier end, avg(depdelay) from flights_2008_7M group by 1")
    >>> print(df)
        EXPR$0              EXPR$1
    0      AQ -1.3977829337458108
    1      YV  12.000675279875033
    2      XE  11.395866476493499
    3      HA  0.4552013450206487
    4      F9   5.919601516833923
    5      B6  12.653395748122113
    6          11.536153117856601
    7      AA  13.280898264437912
    8      MQ  10.695641776641581
    9      US   5.717489671893907
    [10 more rows]
    >>> df = conn.select_ipc_gpu("select cast(nasdelay/100 as integer),count() from flights_2008_7M group by 1")
    >>> print(df.sort_values("EXPR$0").tail())
       EXPR$0   EXPR$1
    10      10        3
    11      11        2
    12      12        2
    13      13        2
    14          5484993
    Maybe a problem with null when converting the cudf.dataframe.dataframe.DataFrame to pandas? I'm not sure
    I'm on ubuntu 18.04, CUDA 10.0 and turing cards; I installed everything using pypi.
    Can you share an example of what isn't working on your environment?



  • 5.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 05:47
    Hi @Stanislas Deneuville

    It appears that cudf doesn't support Kepler architecture:

    https://github.com/rapidsai/cudf/issues/1283#issuecomment-481106527

    In terms of the other parts of pymapd, you are correct in that some OmniSci data types don't work for all methods. There are various reasons why not all of the data types work, and we're trying to work through them in OmniSciDB or work around them when the issue is upstream of us (such as not being supported in Apache Arrow)

    ​Best,
    Randy


  • 6.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 06:01
    In fact I not sure if it's null value fault as there is too many basic dataframe manipulation that ended crashing everything. I quickly tested since I pretty sure it's not the python code fault. So I didn't realy go deeper than dropping all columns/lines with NaN value to "solve" the issue in some cases.

    About the lines you used the first one immediatly failed on my station:
    >>> df = conn.select_ipc_gpu("select case when uniquecarrier = 'OH' then null else uniquecarrier end, avg(depdelay) from flights_2008_10K group by 1")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/pymapd/connection.py", line 306, in select_ipc_gpu
        df = _parse_tdf_gpu(tdf)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/pymapd/_parsers.py", line 208, in _parse_tdf_gpu
        for k, v in reader.to_dict().items():
      File "cudf/bindings/gpuarrow.pyx", line 221, in cudf.bindings.gpuarrow.GpuArrowReader.to_dict
      File "cudf/bindings/gpuarrow.pyx", line 165, in cudf.bindings.gpuarrow.GpuArrowNodeReader.make_series
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/series.py", line 83, in __init__
        dtype=dtype)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/columnops.py", line 235, in as_column
        data = data.set_mask(mask)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 245, in set_mask
        return self.replace(mask=mask, null_count=null_count)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/numerical.py", line 49, in replace
        return super(NumericalColumn, self).replace(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 358, in replace
        return type(self)(**params)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/numerical.py", line 43, in __init__
        super(NumericalColumn, self).__init__(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/columnops.py", line 37, in __init__
        super(TypedColumnBase, self).__init__(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 140, in __init__
        self._update_null_count(null_count)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 148, in _update_null_count
        size=len(self)
      File "cudf/bindings/cudf_cpp.pyx", line 367, in cudf.bindings.cudf_cpp.count_nonzero_mask
      File "cudf/bindings/cudf_cpp.pyx", line 377, in cudf.bindings.cudf_cpp.count_nonzero_mask
    RuntimeError: CUDA error encountered at: /conda/envs/gdf/conda-bld/libcudf_1558047478465/work/cpp/src/bitmask/bitmask_ops.cu:147: 48 cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device​
    and the second one failed at on the print step:
    >>> df = con.select_ipc_gpu("select cast(nasdelay/100 as integer),count() from flights_2008_10K group by 1")
    >>> print(df)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/dataframe.py", line 441, in __str__
        return self.to_string(nrows=nrows, ncols=ncols)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/dataframe.py", line 430, in to_string
        cols[h] = self[h].values_to_string(nrows=nrows)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/series.py", line 341, in values_to_string
        values = self[:nrows]
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/series.py", line 287, in __getitem__
        col = self._column[arg]         # slice column
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 410, in __getitem__
        col = self.replace(data=subdata, mask=submask)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/numerical.py", line 49, in replace
        return super(NumericalColumn, self).replace(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 358, in replace
        return type(self)(**params)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/numerical.py", line 43, in __init__
        super(NumericalColumn, self).__init__(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/columnops.py", line 37, in __init__
        super(TypedColumnBase, self).__init__(**kwargs)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 140, in __init__
        self._update_null_count(null_count)
      File "/home/centos/anaconda3/envs/cuda_env/lib/python3.7/site-packages/cudf/dataframe/column.py", line 148, in _update_null_count
        size=len(self)
      File "cudf/bindings/cudf_cpp.pyx", line 367, in cudf.bindings.cudf_cpp.count_nonzero_mask
      File "cudf/bindings/cudf_cpp.pyx", line 377, in cudf.bindings.cudf_cpp.count_nonzero_mask
    RuntimeError: CUDA error encountered at: /conda/envs/gdf/conda-bld/libcudf_1558047478465/work/cpp/src/bitmask/bitmask_ops.cu:147: 48 cudaErrorNoKernelImageForDevice no kernel image is available for execution on the device​


    I have the default Omnisci AMI e.g. Centos with CUDA 10.1 and Tesla K80 card where I added pip  pymapd, pyspark and pyyaml (and as it didn't worked the same with Conda later)
    Before I changed to this workstation I had an Ubuntu 18.04 with a Tesla M60 card (still on AWS) where I tried alternatively with CUDA 10.0 and 9.2 and with both pypi and conda. I ended with the same errors on similar yet simplier requests on my datas.

    I'm not realy going deep in debugging since I think the problem is that I have a wrong initial setup, but I don't know what I could have done wrong since I have done almost nothing on that new station.


    EDIT: ok, so I guess I absolutly need to upgrade the GPU card.


  • 7.  RE: Using GPU on AWS AMI

    Posted 05-28-2019 07:26
    Edited by Candido Dessanti 05-28-2019 23:19
    I have to try on a p2 AWS then.
    Somewhere at home I have a Maxwell card, but this  isn't supported by the database