Serving 2.7 billion folks each and every month throughout a circle of relatives of apps and repair isn’t simple — simply ask Fb. In recent times, the Menlo Park tech massive has migrated clear of general-purpose in prefer of specialised accelerators that promise efficiency, energy, and potency boosts throughout its knowledge facilities, specifically within the house of AI. And towards that finish, it as of late introduced a “next-generation” platform for AI style coaching — Zion — in conjunction with customized application-specific built-in circuits (ASICs) optimized for AI inference — Kings Canyon — and video transcoding — Mount Shasta.
Fb says the trio of platforms, which it’s donating to the Open Compute Platform, will dramatically boost up AI coaching and inference. “AI is used throughout a variety of products and services to assist folks of their day by day interactions and supply them with distinctive, customized reports,” Fb engineers Kevin Lee, Vijay Rao, and William Christie Arnold wrote in a weblog submit. “AI workloads are used all over Fb’s infrastructure to make our products and services extra related and support the enjoy of folks the usage of our products and services.”
Zion — which is adapted to care for a “spectrum” of neural networks architectures together with CNNs, LSTMs, and SparseNNs — accommodates 3 portions: a server with 8 NUMA CPU sockets, an eight-accelerator chipset, and Fb’s vendor-agnostic OCP accelerator module (OAM). It boasts excessive reminiscence capability and bandwidth, thank you to 2 high-speed materials (a coherent cloth that connects all CPUs, and a cloth that connects all accelerators), and a versatile structure that may scale to a couple of servers inside a unmarried rack the usage of a top-of-rack (TOR) community transfer.
“Since accelerators have excessive reminiscence bandwidth, however low reminiscence capability, we need to successfully use the to be had combination reminiscence capability via partitioning the style in the sort of approach that the information this is accessed extra regularly is living at the accelerators, whilst knowledge accessed much less regularly is living on DDR reminiscence with the CPUs,” Lee, Rao, and Arnold provide an explanation for. “The computation and verbal exchange throughout all CPUs and accelerators are balanced and happens successfully via each low and high pace interconnects.”
As for Kings Canyon, which was once designed for inferencing duties, it’s break up into 4 elements: Kings Canyon inference M.2 modules, a Dual Lakes single-socket server, a Glacier Level v2 service card, and Fb’s Yosemite v2 chassis. Fb says it’s participating with Esperanto, Habana, Intel, Marvell, and Qualcomm to expand ASIC chips that beef up each INT8 and high-precision FP16 workloads.
Every server in Kings Canyon combines M.2 Kings Canyon accelerators and a Glacier Level v2 service card, which hook up with a Dual Lakes server; two of those are put in right into a Yosemite v2 sled (which has extra PCIe lanes than the first-gen Yosemite) and connected to a TOR transfer by means of a NIC. Kings Canyon modules come with an ASIC, reminiscence, and different supporting elements — the CPU host communicates to the accelerator modules by means of PCIe lanes — whilst Glacier Level v2 packs an built-in PCIe transfer that permits the server to get admission to to all of the modules directly.
“With the right kind style partitioning, we will run very huge deep finding out fashions. With SparseNN fashions, as an example, if the reminiscence capability of a unmarried node isn’t sufficient for a given style, we will additional shard the style amongst two nodes, boosting the quantity of reminiscence to be had to the style,” Lee, Rao, and Arnold stated. “The ones two nodes are attached by means of multi-host NICs, taking into consideration high-speed transactions.”
So what about Mount Shasta? It’s an ASIC evolved in partnership with Broadcom and Verisilicon that’s constructed for video transcoding. Inside Fb’s knowledge facilities, it’ll be put in on M.2 modules with built-in warmth sinks, in a Glacier Level v2 (GPv2) service card that may space a couple of M.2 modules.
The corporate says that on moderate, it expects the chips shall be “again and again” extra environment friendly than its present servers. It’s focused on encoding a minimum of two instances 4K at 60fps enter streams inside a 10W energy envelope.
“We think that our Zion, Kings Canyon, and Mount Shasta designs will deal with our rising workloads in AI coaching, AI inference, and video transcoding respectively,” Lee, Rao, and Arnold wrote. “We will be able to proceed to support on our designs via and tool co-design efforts, however we can not do that on my own. We welcome others to sign up for us in within the procedure of increasing this type of infrastructure.”