This part of the lab is presented as a Hands-on Labs Interactive Simulation. This will allow you to experience steps which are too time-consuming or resource intensive to do live in the lab environment. In this simulation, you can use the software interface as if you are interacting with a live environment.
The orange boxes show where to click, and the left and right arrow keys can also be used to move through the simulation in either direction.
In this lab there will be a lot of typing in a putty window. Each command that needs to be type will be highlighted in blue.
This iSim module walks you through the steps of running a TensorFlow job on a VM without GPU to use GPU on another VM. With Bitfusion FlexDirect resource scheduler running on a GPU server VM, client VM can run a TensorFlow job to use the GPU on the remote GPU server VM. The GPU VM is configured with NVIDIA P100 in Passthrough mode (for more information of configuring GPU in Passthrough mode, please refer to module 3: Using GPUs in Passthrough Mode)
Start Bitfusion FlexDirect Resource Scheduler on a GPU VM
- Notice bf-gpuvm-01 VM's IP address, 172.16.31.181
- Click on the Putty icon in the menu bar at the bottom of the screen
- Type the VM's IP address, 172.16.31.181 in the Host Name box
- Click on Open
- Log on to the VM with:
- Username = root
- Password = no password press Enter
Note: if the text stops, hit the tab key for command autocompletion
We will verify that the NVIDIA GPU is enabled on this VM
Type nvidia-smi and press Enter
We now need to start the FlexDirect Resource Scheduler on this VM
Type nohub flexdirect resource_scheduler &> /dev/null & and press Enter
- Click on X to close the putty window
- Click OK to end your putty session
Configure the CPU VM to point to the GPU VM
- Click on the scrolling bar, to find the bf-cpuvm-01 VM in the Bitfusion-CPU-VMs resource pool
- Click on bf-cpuvm-01
- Click on ACTIONS
- Click on Power
- Click on Power On
- Take note of this VM's IP address, 172.16.31.185
- Click on the Putty icon in the menu bar at the bottom of the screen
- Type the VM's IP address, 172.16.31.185 in the Host Name box
- Click on Open
- Log on to the VM with:
- Username = root
- Password = no password press Enter
Note: if the text stops, hit the tab key for command autocompletion
Verify that on the CPU VM, it can see the GPU on the remote GPU VM
Type nvidia-smi and press Enter, to show there is no NVIDIA controller installed on this VM
Type echo '172.16.31.181' > /etc/bitfusionio/servers.conf and press Enter, to add the GPU VM to the servers.conf file so it can be by the CPU VM
Type cat /etc/bitfusionio/servers.conf and press Enter, to verifies the CPU VM is listed in the server configuration
Run a TensorFlow ML Workload on the CPU VM
Type flexdirect list_gpus and press Enter, to confirm the CPU VM can see the GPU on the GPU VM
Type flexdirect run -n 1 nvidia-smi and press Enter, to verify the CPU VM can use the GPU on the CPU VM
Type flexdirect run -n 1 -- python /m1_data/nzhang/benchmarks/scripts/tf_cnn_benchmarks.py and press Enter, to execute the TensorFlow ML Workload
To return to the lab, click the link in the top right corner or close this browser tab.