Quantcast
Channel: Intel Developer Zone Blogs
Viewing all articles
Browse latest Browse all 181

Working with Mellanox* InfiniBand Adapter on System with Intel® Xeon Phi™ Coprocessors

$
0
0

InfiniBand* is a network communications protocol commonly used in the HPC area because the protocol offers very high throughput. Intel and Mellanox* are among the most popular InfiniBand* adapter manufacturers. In this blog, I will share my experience of installing and testing Mellanox* InfiniBand* adapter cards with three different versions of OFED* (Open Fabrics Enterprise Distribution), OpenFabrics OFED-1.5.4.1, OpenFabrics OFED-3.5.2-mic and Mellanox* OFED 2.1, on systems containing Intel® Xeon Phi™ coprocessors.

In order to allow native applications on the coprocessors to communicate with the Mellanox* InfiniBand adapters, the Coprocessor Communication Link (CCL) must be enabled. All three mentioned above OFED stacks support CCL when used with the Mellanox* InfiniBand adapters.

1. Hardware Installation

Two systems, each equipped with an Intel® Xeon® E5-2670 2.60 GHz processor and two Intel® Xeon Phi™ coprocessors, were used. Both systems were running RHEL 6.3. They had Gigabit Ethernet adapters and were connected through a Gigabit Ethernet router.

Prior to the test, both systems were power-off and one Mellanox* ConnectX-3 VPI InfiniBand Adapter card was installed into an empty PCIe slot in each machine. Since there were only two systems, the ports of the adapters were connected using an InfiniBand cable with no intervening router (a back-to-back connection). After powering up the two systems, the “lspci” command was used on each system to check that the Mellanox* InfiniBand Adapter cards were correctly detected:

# lspci | grep Mellanox
84:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

The first field of the output (84:00.0) shows the PCI bus slot number; the second field shows the slot name (Network controller); the last field shows the device name, which is the Mellanox* InfiniBand Adapter in this case.

2. Installing MPSS with OFED Support

After installing the Mellanox*adapter successfully, each of the three different OFED stacks for use with Mellanox* InfiniBand Adapter supported under MPSS 3.3 (OFED-1.5.4.1, OFED-3.5.2-mic and Mellanox* OFED 2.1) was installed following the directions in the MPSS User Guide shipped with MPSS 3.3. Each of the three different OFED stacks was installed on a different hard drive, allowing the different versions to be verified independently without requiring complete uninstallation of the previous version between tests.

2.1 OFED 1.5.4.1

  • Follow the steps described in section 2.3 to download from www.openfabrics.org. Need to have zlib-devel and tcl-devel packages in order to install OFED-1.5.4.1.

  • Install the basic MPSS 3.3 according to section 2.2 in readme file.

  • Install Intel MPSS OFED from the folder mpss-3.3/ofed. The warning message is expected and can be ignored.

# cd mpss3-3
# cp ofed/modules/*`uname –r`*.rpm ofed
# rpm –Uvh ofed/*.rpm
warning: ofed/ofed-ibpd-3.3-r0.glibc2.12.2.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID d536787c: NOKEY
Preparing...             ########################################### [100%]
1:dapl                   ########################################### [ 11%]
2:ofed-ibpd              ########################################### [ 22%]
3:ofed-driver-2.6.32-279.
                         ########################################### [ 33%]
4:libibscif              ########################################### [ 44%]
5:libibscif-devel        ########################################### [ 56%]
6:ofed-driver-devel-2.6.3
                         ########################################### [ 67%]
7:dapl-devel             ########################################### [ 78%]
8:dapl-utils             ########################################### [ 89%]
9:dapl-devel-static      ########################################### [100%]
  • Reboot the system

2.2 OFED 3.5.2-MIC

Follow the instructions in User’s Guide Section 2.4 to install OFED-3.5.2-MIC from https://www.openfabrics.org/downloads/ofed-mic/ofed-3.5-2-mic/ and then reboot the system. Note that this package installation is different from the previous one and there is no need to install the additional package in mpss-3.3/ofed/.

2.3 Mellanox* OFED 2.1

Follow the instructions in User’s Guide Section 2.5 to install Mellanox* OFED 2.1.x:

  • From www.mellanox.com, navigate to Products > Software > InfiniBand VPI Driver, and download Mellanox OpenFrabrics Enterprise Distribution for Linux OFED software: MLNX_OFED_LINUX-2.1-1.0.6.-rhel6-3–x86_64.tgz

  • Untar and read the documentation

    # tar xvf MLNX_OFED_LINUX-2.1-1.0.6.-rhel6-3–x86_64.tgz
    # cd MLNX_OFED_LINUX-2.1-1.0.6.-rhel6-3–x86_64

  • Install the following packages tcl, tk and libnl-devel from the RHEL installation disk.

  • Install the stack:

    # ./mlnxofedinstall
  • Install Intel MPSS OFED ibpd rmp:

    # rpm –U mpss-3.3/ofed/ofed-ibpd*.rpm
  • From the mpss-3.3/src folder, compile dapl, libibscif and ofed-driver source RPMs:

    # rpmbuild –rebuild –define “MOFED 1” mpss-3.3/src/dapl*.src.rpm mpss-3.3/src/libibscif*.src.rpm mpss-3.3/src/ofed-driver*.src.rpm
  • Install the resultant RPMs now in $HOME/rpmbuild/RPMS/x86_64

    # ls $HOME/rpmbuild/RPMS/x86_64
    dapl-2.0.42.2-1.el6.x86_64.rpm
    dapl-devel-2.0.42.2-1.el6.x86_64.rpm
    dapl-devel-static-2.0.42.2-1.el6.x86_64.rpm
    dapl-utils-2.0.42.2-1.el6.x86_64.rpm
    libibscif-1.0.0-1.el6.x86_64.rpm
    libibscif-devel-1.0.0-1.el6.x86_64.rpm
    ofed-driver-2.6.32-279.el6.x86_64-3.3-1.x86_64.rpm
    ofed-driver-devel-2.6.32-279.el6.x86_64-3.3-1.x86_64.rpm
    # rpm –U $HOME/rpmbuild/RPMS/x86_64/*.rpm
  • Reboot the system

3. Starting MPSS with OFED Support

For the versions of OFED discussed here, the process for starting the MPSS with OFED support is the same regardless of the version used. This section describes the steps used to bring up the MPSS and all InfiniBand related services. Prior to performing these steps, password-less SSH login to the coprocessors was set up following the instructions in section 2.5 of the MPSS readme file. The following steps where performed before testing for each version of OFED used:

First, if the MPSS service is not running, start it:

# service mpss start
Starting Intel(R) MPSS:                                    [  OK  ]
mic0: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic1: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)

Next, start the InfiniBand and HCA service:

# service openibd start
Starting psmd:                                             [  OK  ]
Setting up InfiniBand network interfaces:                  [  OK  ]
No configuration found for ib0
Setting up service network . . .                           [  done  ]

On one (and only one) of the systems, start the subnet manager for the InfiniBand network:

# service opensmd start
Starting IB Subnet Manager.                                [  OK  ]

Start the ibscif virtual adapter for the coprocessor:

#service ofed-mic start
Starting OFED Stack:
host                                                       [  OK  ]
mic0                                                       [  OK  ]
mic1                                                       [  OK  ]

Finally, start the CCL-proxy service:

#service mpxyd start
Starting mpxyd daemon:                                     [  OK  ]

4. Basic Testing

For each version of OFED tested, after starting the MPSS with OFED support as shown in section 3, the InfiniBand device on each host was queried using the command “ibv_devinfo”. The output from one of the hosts is shown with a virtual device scif0 and a physical adapter mlx4_0:

# ibv_devinfo
hca_id: scif0
        transport:                      iWARP (1)
        fw_ver:                         0.0.1
        node_guid:                      4c79:baff:fe14:0033
        sys_image_guid:                 4c79:baff:fe14:0033
        vendor_id:                      0x8086
        vendor_part_id:                 0
        hw_ver:                         0x1
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1000
                        port_lmc:               0x00
                        link_layer:             Ethernet
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.30.8000
        node_guid:                      f452:1403:007c:bd30
        sys_image_guid:                 f452:1403:007c:bd33
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1100120019
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

Similarly, executing the ibv_devinfo command on coprocessors mic0 and mic1 on each system provided the following output (note that the physical adapter mlx4_0 is shown because of CCL enabled):

# ssh mic0 ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.30.8000
        node_guid:                      f452:1403:007c:bd30
        sys_image_guid:                 f452:1403:007c:bd33
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

hca_id: scif0
        transport:                      SCIF (2)
        fw_ver:                         0.0.1
        node_guid:                      4c79:baff:fe14:0032
        sys_image_guid:                 4c79:baff:fe14:0032
        vendor_id:                      0x8086
        vendor_part_id:                 0
        hw_ver:                         0x1
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1001
                        port_lmc:               0x00
                        link_layer:             SCIF

# ssh mic1 ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.30.8000
        node_guid:                      f452:1403:007c:bd30
        sys_image_guid:                 f452:1403:007c:bd33
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

hca_id: scif0
        transport:                      SCIF (2)
        fw_ver:                         0.0.1
        node_guid:                      4c79:baff:fe1a:03da
        sys_image_guid:                 4c79:baff:fe1a:03da
        vendor_id:                      0x8086
        vendor_part_id:                 0
        hw_ver:                         0x1
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1002
                        port_lmc:               0x00
                        link_layer:             SCIF

The existence of InfiniBand HCA and virtual scif in /sys/class/infiniband were verified:

# ls /sys/class/infiniband
mlx4_0  scif0

# ssh mic0 ls /sys/class/infiniband
mlx4_0
scif0

# ssh mic1 ls /sys/class/infiniband
mlx4_0
scif0

To display all InfiniBand host nodes, the command “ibhosts” was used. In this case, it shows two hosts, knightscorner5 and knightscorner7:

#ibhosts
Ca      : 0xf4521403007d2b90 ports 1 "knightscorner5 HCA-1"
Ca      : 0xf4521403007cbd30 ports 1 "knightscorner7 HCA-1"

The command iblink was used to display link information. In this case, the output shows one port on knightscorner5 (0xf4521403007d2b91) and one port on knightscorner7 (0xf4521403007cbd31):

#iblinkinfo
CA: knightscorner5 HCA-1:
      0xf4521403007d2b91      2    1[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>       1    1[  ] "knightscorner7 HCA-1" ( )
CA: knightscorner7 HCA-1:
      0xf4521403007cbd31      1    1[  ] ==( 4X       14.0625 Gbps Active/  LinkUp)==>       2    1[  ] "knightscorner5 HCA-1" ( )

The command ibstat was used to query the status of only the local InfiniBand available link on a host, which shows “LinkUp” status in this case:

# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 1
        Firmware version: 2.30.8000
        Hardware version: 0
        Node GUID: 0xf4521403007cbd30
        System image GUID: 0xf4521403007cbd33
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251486a
                Port GUID: 0xf4521403007cbd31
                Link layer: InfiniBand

Finally, the utility “ibping” (equivalent to the traditional Internet Protocol ping utility) was used to test the connectivity between InfiniBand nodes. To do this, the ibping server must first be started on the system to be pinged (knightscorner5 in this case):

# ibping –S

Then on the system knightscorner7, ibping was started using the specific port GUID for the destination (knightscorner5) system (0xf4521403007d2b91) as shown in the ibstat output:

# ibping -G 0xf4521403007d2b91
Pong from knightscorner5.(none) (Lid 2): time 0.106 ms
Pong from knightscorner5.(none) (Lid 2): time 0.105 ms
Pong from knightscorner5.(none) (Lid 2): time 0.135 ms
. . . . . . . . . . . . . . . . . . . . . . . . . . .

 

This confirmed the InfiniBand connectivity between the two systems.

 

5. Conclusion

This blog briefly showed how Mellanox* InfiniBand Adapter cards were installed on two systems and connected back-to-back with a cable. Then three different OFED stacks that support coprocessors were installed with the Intel® MPSS. A set of standard commands were used to bring the necessary services up in order to enable CCL. Finally, the configuration and connectivity were checked with simple InfiniBand test commands to make sure the hardware was working.

 


Viewing all articles
Browse latest Browse all 181

Trending Articles