Opterons, NUMA, and subdomains, oh my!

Looking for new hardware to run WRF? Intel or AMD? Check this forum.
Posts: 199
Joined: Sun Jun 24, 2012 8:42 pm
Location: Los Angeles, CA, USA

Opterons, NUMA, and subdomains, oh my!

Post by pattim » Sun Jun 24, 2012 8:45 pm

I have a NUMA Opteron machine and although I think Linux nowadays is pretty good at keeping data in the right memory for each CPU (48 of them accessing 4 different memory banks) - I think it's important to make sure that adjacent computational subdomains (divided up for parallel processing with OpenMPI or whatever) have their boundaries located on adjacent processors so that no data is forced to hop twice to get to get across a boundary to a logically adjacent subdomain. Can someone please correct me if I'm wrong? I'm trying to get the most out of my 48-core Opteron box.


EDIT: Here's why I'm concerned - the bandwidth between die0 and die3 is half that between adjacent dies! (in the 2P configuration)
http://www.anandtech.com/show/2978/amd- ... ore-xeon/2

More bad news! There are no HT3 x16 links between sockets in the Magny-Cours 4P configuration (but there is in the 2P configuration). I wonder if INFINIBAND could be used to "crossbar" the I/O channel's HT3 links between sockets 0->2 and 1->3 in the 4P setup?

You know, like:
Socket0 -> HT3 -> I/O -> HT3 -> Socket2
Socket1 -> HT3 -> I/O -> HT3 -> Socket3

That is probably a fantasy, but it sort of makes me wish my company had purchased two 2P Magny-Cours x12 boxes plus infiniband instead of a single 4P Magny-Cours x12's

