Home/

Machine details: tomme

Machine: tomme

CPU: Intel Core Bloomfield processor
Number of cores: 16
NUMA configuration: 2x8

Topology Information

TODO: likwid input here

Pairwise Data

The topology information gives us a rough understanding of the expected performance. We complement this with real measurements conducted on the hardware. For this purpose, we use ''pairwise'' a micro benchmark that ping-pongs messages between any combination of cores.

The benchmark measures the send, receive and roundtrip times, i.e. the time it takes until smlt_qp_send() or smlt_qp_recv() return.

Send latencies on Intel Core Bloomfield processor
Send
RTT latencies on Intel Core Bloomfield processor
RTT

Message Passing micro benchmark

A comparison of this benchmark can be found on this page.

We now show the results of our micro benchmarks. For reference, see bench/ab-bench in the Smelt directory.

Multicast benchmark

A comparison of this benchmark can be found on this page.

We now show the results of our micro benchmarks for multicasts. For reference, see bench/ab-bench-scale in the Smelt directory.

Showing plot ab.

Showing plot reduction.

Showing plot barriers.

Showing plot agreement.

Collective operations

A comparison of this benchmark can be found on this page.

The following is a benchmark for collective operations in MPI, OpenMP and Smelt.

EPCC benchmark

A comparison of this benchmark can be found on this page.

Execution of the EPCC benchmark with gcc's unmodified OpenMP compared to an instance using Smelt's barrier.

Showing plot csv.

PARSEC Streamcluster

A comparison of this benchmark can be found on this page.

PARSEC Streamcluster solves the online clustering problem. We execute it with various barrier implementations and report the runtime.