Recent advances in AI via deep learning (DL) have been dramatic across a range of tasks in computer vision in autonomous driving,  natural language processing (NLP) and playing games like Chess and Go. But there is a dark side - the carbon footprint!

 

The issue was brought to light by a study conducted by a group of researchers at the University of Massachusetts, Amherst who performed a life cycle assessment for training several common large AI models in NLP. They found that the process to train one of the state of the art models at that time, the Transformer architecture, can emit more than 626,000 pounds of carbon dioxide equivalent - nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself)! See the table.

 

But this is nothing compared to GPT-3 released by OpenAI in June 2020 with a fanfare announcement that it had built the biggest AI model in history! GPT-3 consists of a whopping 175 billion parameters. To put this figure in perspective, its predecessor model GPT-2 - which was considered state-of-the-art when it was released last year—had only 1.5 billion parameters. While last year’s GPT-2 took a few dozen petaflop-days to train—already a massive amount of computational input—GPT-3 required several thousand.

 

In the deep learning era, the computational resources needed to produce a best-in-class AI model has on average doubled every 3.4 months; this translates to a 300,000x increase between 2012 and 2018. GPT-3 is just the latest embodiment of this exponential trajectory.

 

This exponential increase in size is accompanied by similar increases in datasets used to train these models which continue to balloon in size. In 2018, the BERT model achieved best-in-class NLP performance after it was trained on a dataset of 3 billion words. XLNet outperformed BERT based on a training set of 32 billion words. Shortly thereafter, GPT-2 was trained on a dataset of 40 billion words. Dwarfing all these previous efforts, a weighted dataset of roughly 500 billion words was used to train GPT-3. Larger models and larger datasets  translate to soaring compute and energy requirements.

Another factor driving AI’s massive energy drain is the extensive experimentation and tuning required to develop a model. Deep learning intrinsically involves extensive trial and error: one has to build hundreds of versions of a given model during training, experimenting with different neural architectures and hyperparameters before identifying an optimal design.

 

The 2019 paper mentioned above includes a telling case study. The researchers picked an average-sized model - much smaller than headline-grabbing behemoths like GPT-3 - and examined not just the energy required to train the final version, but the total number of trial runs that went into producing that final version.Over the course of six months, 4,789 different versions of the model were trained, requiring 9,998 total days’ worth of GPU time (more than 27 years). Taking all these runs into account, the researchers estimated that building this model generated over 78,000 pounds of CO2 emissions in total—more than the average American adult will produce in two years.

Common carbon footprint benchmarks

 

AI has a major problem! In general, much of the latest research in AI neglects efficiency, as very large neural networks have been found to be useful for a variety of tasks, and companies and institutions that have abundant access to computational resources can leverage this to obtain a competitive advantage, However, for AI to be widely adopted by consumers to realize the dream of ubiquitous AI, this cannot work. And the planet cannot sustain this crazy energy drain.

“If rapid progress in AI is to continue, we need to reduce its environmental impact,” says John Cohn, an IBM fellow and member of the MIT-IBM Watson AI Lab. “The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better.”

 

Embedl’s mission is to enable more widespread and sustainable deployment of AI by cutting large AI models down to size, reducing their energy and resource consumption and thus drastically reducing their carbon footprint.





 

 

DOWNLOAD OUR GUIDE HERE  Follow the link below to get our guide  "Overcome 4 main challenges when deploying deep learning in embedded systems" GUIDE

 

You may also like

Keys to Success: Algorithms, Optimization and a World Class Team!
Keys to Success: Algorithms, Optimization and a World Class Team!
18 August, 2023

The results are in from the Edge AI and Vision Alliance's 9th annual Computer Vision Developer Survey 2022. For the past...

What do Vision Transformers See?
What do Vision Transformers See?
22 November, 2023

CNNs have long been the workhorses of vision ever since they achieved the dramatic breakthroughs of super-human performa...

Knowledge Distillation: A Powerful and Versatile Tool
Knowledge Distillation: A Powerful and Versatile Tool
15 March, 2024

Knowledge Distillation (KD) is a pivotal technique in model optimization, where the detailed information stored in a hig...