Is this the world going full circle and re introducing mini-pcs of the 80's and 90's era?
I understand, I think, the usefulness of having these in a data centre, each customer having their own DPU which would present to them as a bare metal device.
I understand, I think, the crypto guys loving this for compute power, easily expandible.
I also understand this is not for average consumers... but this is HN... what other uses can we put through this/ what advantages does this physical architecture give us?
If anyone could elucidate ....?
It looks exciting, but I'm not sure of the scope.
Nvidia DPU is based on Mellanox acquisition
AMD DPU is based on Pensado acquisition
Intel has an in-house DPU
Cadence summary: https://community.cadence.com/cadence_blogs_8/b/breakfast-by...
James Hamilton summary: https://perspectives.mvdirona.com/2019/02/aws-nitro-system/
The founder of Annapurna-Nitro is now at https://www.lightbitslabs.com/, which has created the NVMEoF (NVME over Fabric, i.e. Ethernet or FiberChannel) standard. This is implemented via a software driver or hardware accelerator. Both DPUs and NVMEoF can be viewed as an attempt to standardize "composable data center" architectures pioneered by AWS Nitro.
> but this is HN... what other uses can we put through this / what advantages does this physical architecture give us?
A good place to start is the decade-old NetFPGA SmartNIC research project from University of Cambridge, now in the 5th generation of hardware, with earlier boards sometimes available on eBay. https://netfpga.org/
A line-rate, flexible, and open platform for research,
and classroom experimentation. More than 3,500 NetFPGA
systems have been deployed at over 300 institutions in
over 60 countries around the world.
I just recently did an experiment on the difference between non-nitro and nitro-enabled (m4.xlarge vs. m5.xlarge) instances, in a production-ish trendy setup -- Ceph running on Kubernetes (managed by Rook), leveraged to run postgres and pgbench.
The increase in performance was around +45% TPS, just from switching to the nitro enabled instance. The absolute amount of TPS wasn't high because the setup was untuned but simply switching created quite the difference.
On top of all that, the Nitro instance was actually just slightly cheaper than the non-nitro instance.
AWS hit it out of park in the making -- they've been working on it since 2014 supposedly and they certainly built something worth replicating.
> Based on Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, the M5 instances are designed for highly demanding workloads and will deliver 14% better price/performance than the M4 instances on a per-core basis.
If we take AWS at their word here, all other things equal 30% is a pretty decent result (with absolutely no tuning) -- I'll update the post with this caveat!
All together though, this does still contribute to the point that Nitro is well worth the money.
This comment (as well as the article) is quite heavy on storage(?) specific terminology AND products so as someone who isn't into that area it all appears very opaque in meaning since there isn't any context on the what particularly this solves and why it's needed. I know I've made this error a bunch of times myself when writing, and it's ok when you have a very specific audience but for a wider audience like here you lose a lot of potential interest.
Line-rate (10G, 25G, 100G, 400Gbps) packet processing enables software control ("virtualization") of networking and storage. Instead of needing a human to make a physical connection between server and storage or network device, it can be automated via software. This saves money for data centers and allows new products to be launched quickly. It allows customer self-service for purchase and management of storage/network capacity.
Line-rate packet processing also enables automated (authorized) intercept of storage and network traffic.
This is a demo and is otherwise pretty absurd, but the main use-case of these devices can be to do some pre-processing on the network traffic before handing off to the host. In addition to the ARM cores, these cards have hardware capable of packet filtering/etc so it can be useful in scenarios such as DDoS mitigation where the card can easily filter out malicious traffic even at rates of millions of packets per second, and only pass through legitimate traffic to the host server.
It also significantly decreases load on the main CPU. Makes a huge difference when doing any file/network operations that saturate the CPU for just handling I/O.
More - https://www.redhat.com/en/blog/optimizing-server-utilization...
Another decent comparison would be older SBCs. Going that route you could build a big storage jbod for presenting blockdevices or blob storage over a network. That’s much cheaper than building a big x86 box with big chassis, x86 cpu, ram etc. And now you have cheaper, fast, NAS for your dc full of virtual hosts.
Random - the Commodore 1541 drive (all of their drives really) had their own 65xx CPU and some RAM (2K IIRC for the 1541). There was copy/backup software where you could hook up 2 devices, load the program and then disconnect the drives from the computer. You could put the master in the first drive, blank disk in the second and every time you swapped the backup it would make a new copy.
An electrical engineer looks at a computer design and says "If only we had faster switching transistors!" A computer scientist looks at a computer design and says "If only my compiler could optimize code better!"
A computer engineer looks at state of the art across the board and says "This architecture will allow us to maximize performance with currently available components."
If networking or IO is being held back by the CPU (or vice versus)... there's performance to be gained by doing things differently.
It's nowhere near the Nvidia one, but it does have hardware acceleration for network-related tasks so if you're looking to offload networking/packet filtering/etc, it will do the job perfectly.
Also JFYI, all of the Bluefield SKUs are passively cooled, so they normally aren't a great fit for a homelab due to the airflow requirements...
Everything was run from the backplane. Want a new CPU? Plug one in!
AWS needs Nitro/DPU for isolation and security, and self hosted should go with the more blades over beefy boxes + DPUs?