Machine Learning in Data Center Architectures
Having the right architecture can be crucial for successfully adding machine learning to data centers. The data center’s infrastructure can be the key to enabling the IT architecture’s functionality since most of its content is passed through or sourced by the IT architecture.
When developing machine learning for the data center, all factors affecting the facility’s performance, scalability, and resiliency should be considered at the beginning, during the planning period. To accomplish this, system designers typically focus on developing a flexible architecture that can support new applications quickly.
Failure to address these aspects during planning can lead to inefficient or inaccurate data architectures that can ultimately cause total system and power failure, which can cause data center operators to lose massive volumes of critical data.
There are several factors that can cause this kind of failure, including incorrectly calculating the power requirements, incorrectly choosing the power equipment, and incorrectly designing the automatic transfer mechanism. To help optimize the capabilities of machine learning in today’s data center architecture, system designers should consider the common issues causing system failure and the components that can address the causes of these issues.
One issue in today’s data center industry is that the terms artificial intelligence (AI) and machine learning are commonly used interchangeably, which could lead to confusion and inaccuracies. What is key to remember is that machine learning is typciallly a subset of AI. With AI, the focus is on developing machines that can “think,” while machine learning is usually about defining algorithms to allow machines to “learn” through repetitive functions. While machine learning is not a new concept, one of the changes over the last several years is the increase in computer processing power with a decrease in the cost per bit. This can enable designers to find opportunities to make machine learning a greater part of our everyday lives, from recommendation engines such as Google and Netflix to integrated social media apps and fingerprint and facial recognition on smartphones.
Since the emergence of the earliest machine learning applications in the aerospace and airline industries, which has many enabled aircraft designers to innovate in wing designs, many designers have evolved their thinking about machine learning's uses toward a comprehensive understanding of aligning potentials and tradeoffs when looking to optimize design.
This shift is having an impact in the data center, where we now often see the use of AI and machine learning transitioning from limited applications in standard computers and standalone boxes to a range of applications that are silicon-specific. This cna help open the door for the industry to focus on optimizing the data center network, using AI and machine learning in every part of these network to expand functionality over time.
Over the next five years, the use of machine learning is expected to shift away from dedicated infrastructures to more flexible infrastructures which can provide the capacity to scale, change, and diversify instantly. While one of the primary functions of machine-learning data gathering is centered around human-to-machine interactions, an opportunity is moving toward machine-to-machine interactions, which may not require the need for human data or inputs. When machines begin creating communications paths, the systems could help process and transmit data in ways that might give us new insights on larger volumes of data.
The process of adding machine learning into data systems such as servers and data racks can often vary, depending on what the system designer is attempting to achieve, as well as the data center operator’s workload for adding machine learning into the core.
Adding machine learning into new or existing data centers is commonly done to solve an existing known problem, like problems in large learning pods or network problems because of algorithms that were developed from received data. Most solutions today tend to be customized to match the size of the problem.
Some challenges in designing customer solutions include optimizing power distribution, reducing thermal levels, and improving high-speed/low-latency performance in the interconnects. Because normally everything in the data center must be interconnected, the system components should offer the flexibility to be installed and work in small spaces and address the expected speed requirements without increasing thermal output.
Often, designers need to account for the close proximity of hardware when planning the architecture. Getting this balance right can require tradeoffs in design, cost, and in power and cooling structures. Normally, a ceiling exists for how much power can be properly controlled and cooled in data centers, which can provide designers with guidance on making tradeoffs and can enable them to develop more efficient accelerators and system designs and adopting advanced thermals.
At TE, we partner with our customers to help design and manufacture the components that can address machine learning requirements in data centers ranging from the hyperscale and collocation to requirements that can enable edge computing. We can offer solutions engineered for high speed and power efficiency in data center architectures, including direct attach, external copper cable assemblies – with interfaces, as well as our extra large array (XLA) socket technology, card edge connectors, Strada Whisper backplane connector cable assembliess, and internal high-speed copper cables. These products are often chosen for our customers based on, more than anything else, the system design in which they are going to be used. Our power designs include power cable assemblies, busbars, power distribution systems, and thermal management solutions.
In partnering with TE, our customer can expect reliable, durable, high performance solutions while receiving expert guidance from engineers who can address the architecture issues they may need to solve. Through these collaborations, our customers can develop efficient, next-generation solutions that they can scale quickly to integrate machine learning into their data system core. Our engineers can help tackle performance problems, so our customer can focus on other priorities, such as working on developing projects at the application level and solving abstract software-related concerns.
TE Authors
- Mike Tryson, Vice President and Chief Technology Officer, Data and Devices
- Erin Byrne, Vice President and Chief Technology Officer, Sensors
- Dave Helster, TE Engineering Fellow, Data and Devices
- Jonathan Lee, Senior Engineering Manager – Global Bulk Cable, Data and Devices
- Christopher Blackburn, Technologist – System Architecture, Data and Devices