AI Bridging Cloud Infrastructure Implications On Deep Learning

Among all emerging technologies, AI has the most disruptive potential. It has implications across various industries and fields, including robots, Big Data, IoT, and advanced biotech. In recent years the focus of everyone interested in AI has shifted towards its practical applications.

Paring thousands of sensors, ingesting big data into AI algorithms, and empowering AI with other-worldly computing power are all exciting concepts. Especially interesting is a subfield of Machine Learning called Deep Learning. Today, we will talk about deep learning, AI Bridging Cloud Infrastructure, and how these two are connected. 

What is Deep Learning?

You are probably familiar with the concept of Machine Learning (ML). It’s one of the most popular data science fields exploring and building various algorithms able to learn and improve automatically through learning and experience. 

Deep Learning is a subfield of ML. Deep Learning is also all about building algorithms able to automatically improve and learn, but this time the algorithms are inspired by the specific function and structure of the brain. The function and structure of the brain is an artificial neural network in this case.

Deep learning is also one of the newer concepts in the ML field. We had to wait for advancements in technology to start benefiting from it. Another prerequisite we were not meeting until recently is large enough data to train these large artificial neural networks. 

Deep learning excels at scalability and feature learning. For instance, the good old ML algorithms often reach a performance plateau. At his point, even if you continue to feed data into them, they won’t be able to improve in a manner that is statistically relevant. On the other hand, deep learning algorithms are based on artificial neural networks, and they don’t have a similar performance plateau.

Feature learning is also something traditional ML algorithms can’t achieve. Deep learning models can automatically extract features from the raw data and discover good representations at multiple levels

What is AI Bridging Cloud Infrastructure?

AI Bridging Cloud Infrastructure is just a fancy way of saying supercomputer, but what is a supercomputer in this day and age? We are speaking about petaflops upon petaflops of processing power. 

So far, humanity’s most powerful supercomputer was Sunway TaihuLight, a Chinese project with a performance rating of 93 petaFlOPS. However, its performance was limited due to the use of LINPACK – a software library for performing numerical linear algebra.

To overcome the limits set by LINPACK computing instructions, we needed something brand new. The plan hatched in Japan, where scientists decided to build a supercomputer system and use it exclusively for AI, ML, and DL. Long story short – they’ve built it. It features the performance of 550 AI-petaFLOPS (half precision) and 37 petaFLOPS (double precision). Its power consumption is 2.3 megawatts.

Thus AI Bridging Cloud Infrastructure (ABCI) came into the world as the first large-scale Open AI Computing Infrastructure – high-performance computing (HPC) system with the ability to perform complex calculations and process big data at ultra-high speeds. National Institute of Advanced Industrial Science and Technology is responsible for constructing and operating it.

So, what is the role of ABCI in Deep Learning projects? Here is everything you need to knot to understand the importance of ABCI when it comes to pushing the frontiers of DL. 

Docker and Singularity Container Engines

Deep Learning algorithms process large sets of data. They need to read hundreds of thousands of files and write the results. That’s why they need to run on infrastructure that supports containers. The great thing about ABCI is that it has support for both Docker and Singularity container engines.

Although Docker has some isolation problems, such as isolation of user IDs by default and isolated filesystems, some developers prefer it over other container engines. Singularity is a cutting edge container engine build around the integration idea rather than isolation we see in Docker. 

It can load images from Docker registries even that it has its own image file format. It also stores images as files on the filesystem. 

Both Docker and Singularity are great container engines for batch data processing, and having access to both facilitates the execution of different DL projects via ABCI. Since ABCI is designed as a multiuser system, it uses Univa Grid Engine (now Altair Grid Engine) for job scheduling, which supports both Docker and Singularity container engines.

Due to the container-based software ecosystem, ABCI enables developers to easily publicize anything developed on ABCI.

Distributed Computing

Every deep learning project revolves around data and processing power. One needs to have access to both to keep training deep learning algorithms. However, there are very few enterprises that can afford such an infrastructure. 

Even then, the question is whether they will pour all their money into a deep learning project. It’s the major challenge when it comes to pushing the limits in the AI and ML field and coming up with solutions that can benefit our society as a whole.

ABCI solves this problem by being a cloud-native platform and delivering distributed computing. Researchers and industrial users can use ABCI to run AI-powered deep learning workloads across domains now. Thanks to distributed computing, researchers can now overcome the performance limits and training DL algorithms much faster.

ABCI tackles the big data processing challenges by introducing both Apache Hadoop and Apache Spark. Apache Hadoop streamlines distributed processing of big data sets across clusters of computers. It uses the MapReduce programming model to deliver a framework for processing and storing huge data sets.

Apache Spark is an analytics engine for data processing at a large scale. Developers can use it to program clusters with fault tolerance and implicit data parallelism. Thanks to Hadoop and Spark, developers can enable their DL models to learn faster due to the high performance of both streaming and batch data and the ability to use computation and storage resources outside of ABCI as well.

For instance, ABCI has the latest generation NVIDIA GPUs and deploys the NGC container registry to provide researchers, developers, and data scientists access to GPU-accelerated software. NGC containers are optimized to work perfectly with NVIDIA GPUs, even on the cloud. 

Developers will be happy to know that ABCI optimizes both single-node and distributed deep learning frameworks with NNabla and ChainnerMN.

Great Portability

Researchers and industrial users often have to run deep learning projects across multiple platforms. The portability was a major challenge in the niche, one that ABCI successfully overcame. NGC and Singularity container engines enable users to use different platforms simultaneously. 

They can develop, test, and deploy their projects across platforms and scale up and down on demand. It’s a huge thing in the world of DL. Developers can now take their projects to a new level and efficiently and seamlessly use their in-house systems along with ABCI.

Ultra-Fast Deep Learning Speed

The success of every DL project depends on the computing speed. Not all computers or data centers can deliver the necessary performance output to meet the demands of artificial neural networks. They need access to a lot of computing power. ABCI was built to advance the social utilization of AI technologies, accelerate research and development and verification of AI technologies.

To do it, ABCI leverages unparalleled computing power. When we say unparalleled, we mean it. The latest ABCI challenge report clearly showcases that ABCI achieves the world’s fastest learning speed. If you want your DL initiative to advance at a rapid pace, ABCI will deliver to you the power you need. 

The research group of Sony Corporation ran the speed test with ResNet-50, a 50 layers deep convolutional neural network. ABCI managed to complete ResNet-50 learning in 3.7 minutes with 2176 GPUs processing the ImageNet image classification dataset.

A Variety of Machine Learning Libraries and Deep Learning Frameworks

Imagine having to install and set up deep learning frameworks and machine learning libraries from scratch. It is tedious work that requires experience, knowledge, and a lot of time. On top of that, both ML libraries and DL frameworks are regularly updated.

Keeping them updated makes this work even more complicated. However, it has to be completed to enable artificial neural networks to learn at the most optimal pace. 

ABCI delivers the latest ML libraries and DL frameworks to end-users, including:

  • Caffe and Caffe2
  • TensorFlow
  • Theano
  • Torch
  • PyTorch
  • CNTK
  • MXnet
  • Chainer
  • Keras
  • NVIDIA GPU Cloud (NGC)

It turns out that ABCI plays a vital role in the global Deep Learning initiative. Researchers and industry users now have access to cutting-edge supercomputers that can meet even the highest requirements of deep learning projects of any scale. The ultra-fast computing speeds, distributed computing, variety of development tools, and abundance of ML libraries and DL frameworks make ABCI an MVP in the DL game.