Exploring Models with LoRA and Ray: A Journey into Advanced Model Search

4 min readDec 18, 2023

LLM (Large Language Model) has been dominating the field of recent machine learning models due to its high versatility. Within this realm, there’s a growing trend of utilizing fine-tuned models. Various fine-tuning techniques have emerged, and among them, LoRA (Low-Rank Adaptation) has become a significant technology.

Why LoRA is Important?

The importance of LoRA can be attributed to several factors:

Enhancement of Transfer Learning: LoRA serves as a transfer learning technique, allowing a pre-trained model to be effectively applied to a new task. This increases the potential to adapt to new tasks with minimal labeled data.
Effectiveness in Domain Adaptation: Domain adaptation involves applying a model trained in one domain to a different domain. LoRA, through a low-rank transformation matrix, enhances the adaptation of features from different domains, improving the effectiveness of domain adaptation.
Efficient Use of Data: Even in situations where labeled data is scarce, LoRA can effectively utilize the available data for model refinement. This addresses the sparsity of data, enabling efficient learning.

In summary, LoRA is gaining attention as a technique that allows machine learning models to adapt flexibly and effectively to different tasks and domains.

I put the paper’s page on arXiv here.

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and…

arxiv.org

Challenges in LoRA

On the other hand, there are several challenges. I’ll pick up two points here.

Hyperparameter Selection: LoRA involves several hyperparameters, and finding the optimal values can be challenging. Incorrect settings may lead to a decrease in performance.
Computational Cost: LoRA often deals with high-dimensional transformation matrices, leading to high computational costs. This can be a challenge when resources or time are constrained.

Addressing these challenges requires careful experimentation, adjustment of parameters, and a deep understanding of the domain.

Here comes Ray!

Ray is known as a distributed system and reinforcement learning library. It serves as a toolkit for building distributed applications in Python. Here are some key features and use cases of Ray:

Distributed Computing: Ray provides a framework for executing Python code on multiple machines or clusters, enabling efficient parallel processing and large-scale computations.
Parallel Processing: Ray supports the parallel execution of functions and class methods, making it easy to achieve parallelism and accelerate processing.
Reinforcement Learning: Ray is used for implementing and experimenting with reinforcement learning algorithms. A subpackage called RLlib offers a variety of advanced algorithms for different reinforcement learning tasks.
Distributed Training: Ray includes features for distributing the training of machine learning models across multiple machines.

Ray is particularly notable for its capabilities in the realm of distributed machine learning. Offering a robust framework, Ray facilitates the distribution of machine learning model training across multiple machines or clusters. This feature is invaluable for accelerating the training process and handling larger datasets by parallelizing computations. With Ray’s support for distributed computing, practitioners can seamlessly scale their machine learning workflows, making it a versatile tool in the domain of distributed training for models.
Ray can indeed be utilized for training with Hugging Face Transformers. Leveraging the characteristics of Ray allows for efficient parameter search for models like RoLA. Furthermore, Ray facilitates seamless transitions in the learning environment. With minimal code modifications, one can easily shift from a single GPU core environment to one with multiple GPU cores or even a GPU cluster. This adaptability simplifies the process of scaling up the training environment, offering flexibility in the choice of hardware for model training.

Let’s combine these elements and move forward!

The code for this project was developed using Google Colab, and here are the key points for this session:

The dataset used is the Yelp Review dataset.
HuggingFace’s peft library was employed for LoRA training. peft allows training large language models with LoRA techniques by specifying several configurations.
Ray was used for training; however, in the Google Colab environment, the GPU core count is limited to 1. Therefore, I conducted a hyperparameter search using grid search exclusively.
To expedite the process, parameter searches and training data sizes were reduced.

For further details and explanations, I provided some comments in the notebook. Feel free to check the notebook for more comprehensive information.

GitHub - toohsk/LoRA-with-Ray-Tuning: Finding the best parameters for LoRA(low rank adaptation)…

Finding the best parameters for LoRA(low rank adaptation) with Ray Tuning - GitHub - toohsk/LoRA-with-Ray-Tuning…

github.com

Also, there is a link to execute it in Google Colab on the notebook, so please try running it from there as well.

After finding the parameter, Now what?

After exploring the hyperparameters considered optimal, what should we do next? Various scenarios are conceivable. For instance, in experiments, the goal might be to trace the performance of parameters and LLM with respect to the data. In this blog, we will proceed with fine-tuning the LLM using the discovered parameters, aiming to obtain a robust model. Additionally, I implemented the inference part and visualized the performance of the fine-tuned LLM using a confusion matrix. Regarding this part, please refer to the Inference Time! section in the notebook within the repository.

Conclusion‍

In this blog, I introduced an implementation approach that combines Ray Tuning to search for hyperparameters that need to be explored when fine-tuning LLM using RoLA. Additionally, I have summarized the entire workflow in a notebook, covering fine-tuning with the parameters obtained after hyperparameter exploration. If you found this article helpful, please consider giving a star to the GitHub repository!