How to use NvLink in omnisci-server?


We are testing TPCH query with 2080ti * 2 with NvLink.

We set the utilization counters to count specific NvLink transactions by ‘nvidia-smi nvlink -sc 0bz’. Then, watch link utilization counter by ‘watch -n 1 nvidia-smi nvlink -g 0’. But while test TPCH query on my omnisci-server, it seems to there was no transactions between gpus.
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-add48b1f-7e86-b076-36ed-19c21f43cc78)
Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes
Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-7ba62428-ecac-9bd9-a546-40e2a748daed)
Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes
Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes

I can’t see any set-up or optimization in document.
Can I use nvlink in omnisci-server?
Thank you.

Hi @wnn156,

I’m not sure if P2P transfers are in use in the software, but I didn’t see a precise scenario where you would increase the performances dramatically.
Anyway, the memory pool CPU and GPU are caches for the data that reside in the storage layer (disk).
If a chunk of data that lives in the CPU’s pool is needed, it’ll be transferred by the PCI-ex BUS, in an HostToDevice fashion rather than a device to device.
Also, the reductions are made mainly by the CPU, so a P2P transfer is unlikely to happens.

Naturally, if GPU’s are connected to Host through a Nvlink bus, the nvlink will be used to transfer data between Host and Devices and vice versa.

Echoing what @candido.dessanti said, we currently don’t make use of nvlink for peer-to-peer gpu connectivity. Nvlink would primarily be useful for intra/inter gpu reductions. A few months ago we moved all our reduction code to LLVM, which enables us to run reductions on GPU (we currently do this with GPU shared memory). This opens the door for use of nvlink, which is on our roadmap. But, we’re looking at achieving maximum throughput from host to GPU by making better use of cuda streams, etc first.