Machine details: smaug-3

Machine: smaug-3

CPU: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz
Number of cores: 576
NUMA configuration: 16x36

Topology Information

TODO: likwid input here

Pairwise Data

The topology information gives us a rough understanding of the expected performance. We complement this with real measurements conducted on the hardware. For this purpose, we use ''pairwise'' a micro benchmark that ping-pongs messages between any combination of cores.

The benchmark measures the send, receive and roundtrip times, i.e. the time it takes until smlt_qp_send() or smlt_qp_recv() return.

Receive

Send

RTT latencies on Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz

RTT

Message Passing micro benchmark

A comparison of this benchmark can be found on this page.

We now show the results of our micro benchmarks. For reference, see bench/ab-bench in the Smelt directory.

Multicast benchmark

A comparison of this benchmark can be found on this page.

We now show the results of our micro benchmarks for multicasts. For reference, see bench/ab-bench-scale in the Smelt directory.

Showing plot ab.

Showing plot reduction.

Showing plot barriers.

Showing plot agreement.

EPCC benchmark

A comparison of this benchmark can be found on this page.

Execution of the EPCC benchmark with gcc's unmodified OpenMP compared to an instance using Smelt's barrier.