MIL-OSI Russia: Stir, Don’t Shake: HSE and AIRI Accelerate Further Training of Neural Networks

Translation. Region: Russian Federal

Source: State University Higher School of Economics – State University Higher School of Economics –

Researchers from the Higher School of Economics and AIRI have proposed a method for quickly fine-tuning neural networks: data is processed in groups, which are then mixed in an optimal way to improve their interaction. The method copes better than its analogues with generating and analyzing images, and fine-tuning text models. At the same time, it requires less memory and training time. Results The works were presented at the NeurIPS 2024 conference.

The larger the neural network, the more difficult it is to quickly adjust it to a new task. Retraining a model from scratch is long and expensive. Therefore, developers are looking for low-cost ways to adapt it to a specific task while maintaining the overall quality of the original version.

One of them is fine-tuning using orthogonal matrices: unlike alternative approaches, they preserve important features of the original model. But popular options like block-diagonal or butterfly matrices have drawbacks: they are either limited or require a lot of calculations.

Researchers Faculty of Computer Science HSE and AIRI have proposed a new method for constructing matrices, which they call “Group-and-Shuffle”. Instead of working with all the data, they divide its parameters into small groups, process each one separately, and shuffle them together. This structure turned out to be both flexible and compact: it helps the model to more accurately adapt to the task, but at the same time requires less computation and memory.

Based on GS matrices, the researchers developed the GSOFT method, a new implementation of orthogonal fine-tuning of neural networks. Unlike previous approaches, GSOFT uses fewer parameters, but maintains stability and training quality even with a small amount of data. The team also proposed a two-sided version of the method, Double GSOFT, which allows you to change the parameters on both sides at once, increasing the flexibility and accuracy of the model.

“We figured out how to form orthogonal matrices using just two matrices of a special type, rather than five or six as in previous approaches. This saves resources and training time,” explains Nikolai Yudin, a research intern Scientific and educational laboratory of matrix and tensor methods in machine learning National Research University Higher School of Economics.

The researchers tested the approach on three types of tasks. In additional training of the RoBERTa language model, the method worked better with a comparable number of parameters. In image generation, where the model must preserve the features of the original but adapt to the user’s request, GSOFT and Double GSOFT performed better than popular approaches such as LoRA and BOFT, while requiring less memory and training time.

The authors also tested their approach on convolutional neural networks, which are most often used to analyze images and videos, such as in face recognition. They adapted GS matrices even for cases where the model needs to be highly resistant to noise and distortion.

Please note: This information is raw content directly from the source of the information. It is exactly what the source states and does not reflect the position of MIL-OSI or its clients.

MIL OSI Russia News