What is the GPT-3 neural network capable of?

The language programming model (neural network) GPT-3 is considered the most complex and voluminous to date. We understand what it is, what it can do and how to use it to solve applied business problems

T9 new level

“I know my brain is not a ‘feeling brain’. But he can make rational, logical decisions. I learned everything I know just by reading the Internet, and now I can write this column, ”the GPT-3 neural network confessed in an essay for The Guardian. The material published in September 2020 made a lot of noise. Even those who are far from technology started talking about the new algorithm.

The GPT-3 neural network – Generative Pre-trained Transformer – was developed by the non-profit organization OpenAI, which was founded by the head of SpaceX, Elon Musk, and the former president of the YCombinator accelerator, Sam Altman. The third generation of natural language processing software was presented to the public in May 2020. Today it is the most complex and voluminous language model of all existing.

Just like its predecessors – GPT-1 and GPT-2 – it is built on the “transformer” architecture. The main function of these neural networks is to predict the next word or part of it, focusing on the previous ones. In fact, it calculates the connections between words and suggests the most likely sequence. The model works on the principle of auto-completion – almost like the T9 function in smartphones. Starting from one or two phrases, it can instantly generate text for several pages.

“This approach allows us to use unlabeled data for training and solve a wide range of natural language processing tasks,” explains Sergey Markov, a machine learning specialist at Sberbank. “After all, in the text of the dialogue, for example, the replica-answer is a continuation of the history of communication, in a work of art the text of each paragraph continues the previous text, and in the session of questions and answers, the text of the answer follows the text of the question.”

According to him, as a result, high-capacity models can solve various text problems without special additional training. Instead of fine-tuning, which was required before, it is enough to show the neural network several samples of the desired result.

Improved and added

GPT-3 differs from the previous two generations in the volume of datasets and the number of parameters – those variables that the algorithm optimizes during the training process. The first version of GPT, released in 2018, was trained on 5 GB of texts of Internet pages and books, and its size reached 117 million parameters. A year later, a more advanced GPT-2 appeared, already trained on 1,5 billion parameters and 40 GB of datasets. In particular, it is used by the virtual assistant Joy from Sberbank.

But the third version of the algorithm bypassed the previous ones by a wide margin. The number of parameters reached 175 billion, and the size of the dataset was 600 GB. It includes the entire English-language Wikipedia, books and poems, materials on media sites and GitHub, guides and even recipes. Approximately 7% of the dataset was in foreign languages, so the language model can both generate texts of any format and translate them.

The algorithm was “fed” not only verified and confirmed data, but also texts, the reliability of which raises questions – for example, articles about conspiracy theories and pseudoscientific calculations. On the one hand, because of this, some of the generated texts contain incorrect information. On the other hand, thanks to this approach, the dataset turned out to be more diverse. And it reflects much more fully the information array that mankind has produced by 2020 than any scientific library.

According to the developers from OpenAI, the algorithm is fundamentally different from other artificial intelligence models. Usually they are created for one purpose, for which all parameters and datasets are initially sharpened. GPT-3 is more flexible, it can be used to solve “almost any problem” formulated in English. And instead of retraining on additional data, it is enough to express the task in the form of a text query, description, or examples.

Interface for favorites

To train large transformer models, huge computing power is needed. So, the creators of GPT-3 trained it on the Microsoft Azure AI supercomputer. On a typical home PC, the process could take up to 500 years.

Although OpenAI calls itself a non-profit organization, it has not made the model available to the public and instead plans to sell services on a subscription basis. In the summer of 2020, the team announced a closed API (Application Programming Interface) based on GPT-3. The organization emphasizes that the funds thus obtained will allow continuing research and developing the algorithm. It is also how OpenAI hopes to maintain control over the use of the technology and avoid potential abuse.

During the testing phase, free access is provided to individual researchers and developers. To do this, you need to fill out a voluminous application and wait for a response. The API allows you to work in text generation, chat, question-answer format, as well as collect unstructured data or retell complex text in simple language.

Access in Russian

While thousands of people were waiting for a response from OpenAI, a Russian-language version of the model, ruGPT-3 Large, appeared in the public domain. It was created by developers from Sberbank, having trained a neural network on a dataset of 600 GB of texts. In addition to the collection of Russian literature, the dataset included Wikipedia, news resources, and websites with questions and answers. This also includes materials from Pikabu, the popular science resource 22century and the banki.ru portal. To introduce the neural network to the program code, the developers also added GitHub and StackOverflow materials.

To train ruGPT-3 Large, we used the Christofari supercomputer and the ML Space cloud Data Science platform from SberCloud, a company in the Sber ecosystem that provides cloud services.

Anyone can communicate with the neural network on a special SberCloud page. To do this, you need to offer the program a small “seed” – for example, an unfinished sentence or the beginning of a dialogue. The result cannot be predicted in advance – the model creates its answers on the fly, and they never repeat. The creators of the Russian-language version warn that the generated texts may be incorrect or inappropriate. The purpose of the page is to satisfy the research interest of the scientific community.

The model and the truth does not always give verified facts. It may, for example, offer you to limit the calorie content of the diet to 40-50 kcal per day (with the recommended 2 thousand kcal for an adult recommended by doctors) or consume “no more than one salad” per day.

But it’s still interesting to communicate with ruGPT-3. Especially – on those questions to which mankind does not yet have a clear answer. The neural network is sure that “the best way to increase productivity is to fall in love.” And on the question of how to become happy, he reasonably remarks: “Happiness lies not in getting the desired thing, but in the desire itself.”

In addition to text, the Russian-language model can write program code. To do this, the “seed” must be formulated in one of the programming languages.

Initially, the model was trained on 760 million parameters, but in the next version the number of parameters increased to 1,3 billion. The new version will soon also be available on the SberCloud website.

The ruGPT-3 XL neural network with 1,3 billion parameters currently ranks first in the Russian SuperGLUE neural network rating. Using the few-shot method, the model performed best without any training on the following tasks: choosing the best solution under given conditions (plus 10% accuracy compared to the previous version with 760 million parameters), answering questions on the text (plus 3% accuracy ), machine reading is a test for understanding the general meaning of the text (plus 32% accuracy).

How to use GPT-3 and ruGPT-3 XL

The most obvious option is natural language processing: computer analysis and synthesis of texts, that is, the use of a language model to create texts for commercial purposes. Based on the neural network from OpenAI, several similar solutions have already been launched, for example, services for writing emails or advertisements.

The neural network also performed well in a variety of chat bots. The GPT-3 engine is powered by the AI ​​companion Replika, launched by a startup with Russian roots. Based on a closed API from Open AI, an unusual social network AI Channels also works. Here you can chat with different versions of artificial intelligence, which the creators of the service call “AI agents”. Among them – virtual Albert Einstein and other greatest minds of mankind.

Several projects have adopted GPT-3 for semantic search across documents. This search is based on the meaning of natural language queries, not keyword matching. The neural network, in particular, helps to search and analyze legal documents in databases and is used in plugins for searching individual sites.

In addition to creating texts, bots and search engines, the language model is able to solve programming problems. Including for users who do not have deep knowledge in this area. The developers have already shown several solutions for translating tasks from text format into code. For example, using GPT-3, you can simplify the collection of statistics about users of a site or service. It is enough to formulate in natural language what information you need, and the algorithm will produce a ready-made piece of code for working with the database.

These are just a few of the applications for GPT-3. Due to the versatility and flexibility of the model, it can be used in dozens of more complex scenarios. So, the English version of the neural network is already built into various customer support services, training platforms, applications for psychotherapy.

The most powerful and advanced Russian neural network ruGPT-3 XL is also planned to be used to create commercial products and solutions in the field of text generation. Since ruGPT-3 XL works in the SberCloud public cloud, all market participants will be able to use it.

As part of AI Journey 2020, the largest international conference on artificial intelligence and data analysis, Sber held the international competition AIJ Contest. More than 1 data scientists from 43 countries took part in it. The competition included a special track AI 4 Humanities: ruGPT-3, where the creators of the most interesting and promising developments shared the prize fund of 2,5 million rubles. The project codebase for specialists is hosted on GitHub.


Subscribe also to the Trends Telegram channel and stay up to date with current trends and forecasts about the future of technology, economics, education and innovation.

Leave a Reply