Introduction to Ludwig
The development of Natural Language Machines (NLP) and Artificial Intelligence (AI) has significantly impacted the field. These models can understand and generate human-like text, enabling applications like chatbots and document summarization. However, to fully utilize their capabilities, they need to be fine-tuned for specific use cases. Ludwig, a low-code framework, is designed for creating custom AI models, including LLMs and deep neural networks. This article provides a comprehensive guide to fine-tuning LLMs using Ludwig, focusing on creating state-of-the-art models for real-world scenarios.
Learning Outcomes
- Understand the significance of fine-tuning Natural Language Machines (NLP) and Artificial Intelligence (AI) models for specific use cases.
- Learn about Ludwig, a low-code framework designed for creating custom AI models, including Large Language Models (LLMs) and deep neural networks.
- Explore Ludwig’s key features, including training, fine-tuning, hyperparameter optimization, model visualization, and deployment.
- Gain proficiency in preparing for LLM fine-tuning, including environment setup, data preparation, and YAML configuration.
- Master the steps involved in fine-tuning LLMs using Ludwig, including model training, evaluation, and deployment.
- Understand how to extend and adapt the fine-tuning process for various NLP tasks beyond instruction tuning, showcasing the flexibility of the Ludwig framework.
This article was published as a part of the Data Science Blogathon.
Understanding Ludwig: A Low Code Framework For LLM Fine Tuning
Ludwig, known for its user-friendly, low-code approach, supports a wide array of machine learning (ML) and deep learning applications. This flexibility makes it an ideal choice for developers and researchers aiming to build custom AI models without deep programming requirements. Ludwig’s capabilities include but are not limited to training, fine-tuning, hyperparameter optimization, model visualization, and deployment.
Key Features of Ludwig
- Training and Fine-Tuning: Ludwig supports a range of training paradigms, including full training and fine-tuning of pre-trained models.
- Model Configuration: Utilizing YAML files for configuration, Ludwig allows detailed specification of model parameters, making it highly customizable and flexible.
- Hyperparameter Tuning: Ludwig integrates tools for automatic hyperparameter optimization, enhancing model performance.
- Explainable AI: Tools within Ludwig provide insights into model decisions, promoting transparency.
- Model Serving and Benchmarking: Ludwig makes it easy to serve models and benchmark their performance under different conditions.
Preparing for Fine-Tuning
Before we start, let’s get familiar with Ludwig and its ecosystem. As introduced earlier, Ludwig is a low-code framework for building custom AI models, like Large Language Models and other Deep neural networks. Technically, Ludwig can be used for training and finetuning any Neural Network and support wide range of Machine Learning and Deep Learning use-cases. Ludwig also has support for visualizations, hyperparameter tuning, explainable AI, model benchmarking as well as model serving.
It utilizes yaml file where all the configurations are to be specified like, model name, type of task to be performed, number of epochs to run in case of finetuning, hyperparameter for training and finetuning, quantization configurations etc. Ludwig supports wide range of LLM focused tasks like Zero-shot batch inference, RAG, Adapter-based finetuning for text generation, instruction tuning etc. In this article, we will fine-tune Mistral 7B model to follow human instructions. We will also explore how to define a yaml configuration for Ludwig.
It’s critical to understand the prerequisites and the setup required:
- Environment Setup: Installing the necessary software and packages.
- Data Preparation: Selecting and preprocessing the appropriate datasets.
- YAML Configuration: Defining model parameters and training options in a YAML file.
- Model Training and Evaluation: Executing the fine-tuning and assessing model performance.
Detailed Steps for Fine-Tuning LLMs with Ludwig
Setting Up the Development Environment: Please note that I’ve VSCode environment for running this code. But it can be run on Kaggle notebook environment, Jupyter Servers as well as Google Colab.
Step1: Install Necessary Packages
Execute if you get the Transformers version runtime error.
%pip install ludwig==0.10.0 ludwig[llm]
%pip install torch==2.1.2
%pip install PyYAML==6.0
%pip install datasets==2.18.0
%pip install pandas==2.1.4
%pip install transformers==4.30.2
Step2: Import Necessary Libraries and Dependencies
import yaml
import logging
import torch
import datasets
import pandas as pd
from ludwig.api import LudwigModel
Step3: Data Preparation and Pre-Processing
For this guide, we will use the Alpaca dataset from Stanford, specifically designed for instruction-based fine-tuning of LLMs. The dataset, created using OpenAI’s text-davinci-003 engine, comprises 52,000 entries with columns for instructions, corresponding tasks, and LLM outputs.
We’ll focus on the first 5,000 rows to manage computational demands efficiently. The dataset is accessed and loaded into a pandas dataframe through Hugging Face’s dataset library.
data = datasets.load_dataset("tatsu-lab/alpaca")
df = pd.DataFrame(data["train"])
df = df[["instruction", "input", "output"]]
df.head()
Step4: Create YAML Configuration
Create a YAML configuration file named model.yaml to set up a model for fine-tuning using Ludwig. The configuration includes:
Model Type: Identified as an LLM.
- Base Model: Uses ‘mistralai/Mistral-7B-Instruct-v0.1’ from Hugging Face’s repository, although local model checkpoints can also be specified.
- Input and Output Features: Defines ‘instruction’ and ‘output’ as text types for handling dataset inputs and model outputs respectively.
- Prompt Template: Specifies how the model should format its responses based on the given instruction and input from the dataset.
- Input and Output Features: Defines ‘instruction’ and ‘output’ as text types for handling dataset inputs and model outputs respectively.
- Prompt Template: Specifies how the model should format its responses based on the given instruction and input from the dataset.
- Text Generation Parameters: Sets the temperature to 0.1 for randomness in response generation and max_new_tokens to 64, balancing response completeness and training efficiency.
- Adapter and Quantization: Utilizes the LoRA adapter and 4-bit quantization to manage model size and computational efficiency.
- Data Preprocessing: Sets global_max_sequence_length to 512 to standardize the length of input tokens and uses a random split for training and validation datasets with specific probabilities.
- Trainer Settings: Configures the model to fine-tune for one epoch using a batch size of 1, with a paged_adam optimizer and a cosine learning rate scheduler, including a warmup phase.
This YAML configuration organizes and specifies all necessary parameters for effective model training and fine-tuning. For additional customization, refer to Ludwig’s documentation.
Define Setting Inline Within YAML File
Below is an example of how to define these settings inline within the YAML file:
import os
import logging
from ludwig.api import LudwigModel
# Set your Hugging Face authentication token here
hugging_face_token = <your_huggingface_api_token>
os.environ["HUGGING_FACE_HUB_TOKEN"] = hugging_face_token
qlora_fine_tuning_config = yaml.safe_load(
"""
model_type: llm
base_model: mistralai/Mistral-7B-Instruct-v0.2
input_features:
- name: instruction
type: text
output_features:
- name: output
type: text
prompt:
template: >-
Below is an instruction that describes a task, paired with an input
that provides further context. Write a response that appropriately
completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
generation:
temperature: 0.1
max_new_tokens: 64
adapter:
type: lora
quantization:
bits: 4
preprocessing:
global_max_sequence_length: 512
split:
type: random
probabilities:
- 0.95
- 0
- 0.05
trainer:
type: finetune
epochs: 1 # Typically, you want to set this to 3 epochs for instruction fine-tuning
batch_size: 1
eval_batch_size: 2
optimizer:
type: paged_adam
gradient_accumulation_steps: 16
learning_rate: 0.0004
learning_rate_scheduler:
decay: cosine
warmup_fraction: 0.03
"""
)
Step5: LLM Fine Tuning with LoRA (Low Rank Adaptation)
To begin the training, all we need to do is call the model’s object by passing the yaml configuration defined previously as an argument to the model object and a logger to track the finetuning! And then we call the train function model.train().
Install the following transformers runtime if you get an error:
%pip install transformers==4.30.2
model = LudwigModel(
config=qlora_fine_tuning_config,
logging_level=logging.INFO
)
results = model.train(dataset=df[:5000])
In just 2 lines, we have initialized our LLM finetuning and we have taken only the first 5000 rows for sake of compute time, memory and speed! Here, I used Kaggle’s GPU P100 as a performance accelerator which you can as well pick up for boosting the finetuning speed and performance!
Step6: Evaluating the Model’s Performance
test_examples = pd.DataFrame([
{
"instruction": "Name two famous authors from the 18th century.",
"input": "",
},
{
"instruction": "Develop a list of possible outcomes of given scenario",
"input": "A fire has broken out in an old abandoned factory.",
},
{
"instruction": "Tell me what you know about mountain ranges.",
"input": "",
},
{
"instruction": "Compose a haiku describing the summer.",
"input": "",
},
{
"instruction": "Analyze the given legal document and explain the
key points.",
"input": 'The following is an excerpt from a contract between
two parties, labeled "Company A" and "Company B": nn"Company A
agrees to provide reasonable assistance to Company B in ensuring
the accuracy of the financial statements it provides.
This includes allowing Company A reasonable access to personnel and
other documents which may be necessary for Company B’s review.
Company B agrees to maintain the document provided by
Company A in confidence, and will not disclose the information
to any third parties without Company A’s explicit permission.',
},
])
predictions = model.predict(test_examples, generation_config={
"max_new_tokens": 64,
"temperature": 0.1})[0]
for input_with_prediction in zip(
test_examples['instruction'],
test_examples['input'],
predictions['output_response']
):
print(f"Instruction: {input_with_prediction[0]}")
print(f"Input: {input_with_prediction[1]}")
print(f"Generated Output: {input_with_prediction[2][0]}")
print("nn")
Deploy the Fine-tuned Model to HuggingFace
Let us now deploy the fine-tuned model to HuggingFace. Follow the below steps:
Step1: Create a Model Repository on Hugging Face
- Navigate to the Hugging Face website and log in
- Click on your profile icon and select “New Model.”
- Fill in the necessary details and specify a name for your model.
Step2: Generate a Hugging Face API Key
- Still on the Hugging Face website, click your profile icon, then go to “Settings.”
- Select “Access Tokens” and click on “New Token.”
- Choose “Write” access when generating the token
Step3: Authenticate with Hugging Face CLI
- Open your command line interface
- Use the following command to log in, replacing <API_KEY> with your generated API key
huggingface-cli login --token <API_KEY>
Step4: Upload Your Model to Hugging Face
Use the command below, replacing <repo-id> with your model repository ID and <model-path> with the local path to your saved mod
ludwig upload hf_hub --repo_id <repo-id> --model_path <model-path>
Extending and Adapting the Fine-Tuning Process
This section expands on how the fine-tuning process can be adapted and extended for various applications, showcasing the flexibility and robustness of the Ludwig framework.
The code and configurations provided can be adapted to a wide range of NLP tasks beyond instruction tuning. Here’s how you can modify the process:
- Data Source Flexibility: Adjust the data preparation step to incorporate different datasets as needed for your specific task.
# Huggingface datasets and tokenizers
from datasets import load_dataset
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers.trainers import WordLevelTrainer
from tokenizers.pre_tokenizers import Whitespace
- Task Customization: Modify the YAML configuration to reflect the new task requirements by changing the input and output features and adapting the prompt template as necessary.
- Model Selection and Adaptation: Choose a different base model from Hugging Face’s model repository that better suits the new task, adjusting the model parameters accordingly.
- Hyperparameter Optimization: Utilize Ludwig’s built-in tools for hyperparameter tuning to optimize the model further based on the new task’s specific needs.
Conclusion
Ludwig’s low-code framework offers a streamlined pathway for fine-tuning Large Language Models (LLMs) to specific tasks, combining ease of use with powerful customization options. By utilizing Ludwig’s comprehensive feature set for model development, training, and evaluation, developers can create robust, high-performance AI models that are tailored to meet the demands of a wide array of real-world applications.
Key Takeaways
- Ludwig is a low-code framework designed for creating custom AI models, including Large Language Models (LLMs) and deep neural networks, making AI development more accessible to developers and researchers.
- Fine-tuning LLMs using Ludwig involves steps such as environment setup, data preparation, YAML configuration, model training, evaluation, and deployment.
- Ludwig offers key features such as training, fine-tuning, hyperparameter optimization, model visualization, and deployment, providing a comprehensive solution for AI model development.
- By leveraging Ludwig’s capabilities, developers can create robust and high-performance AI models tailored to specific use cases, such as document summarization, chatbots, and instruction-based tasks.
- The flexibility of Ludwig allows for the adaptation and extension of the fine-tuning process to various NLP tasks beyond instruction tuning, ensuring versatility in AI model development.
References and Further Reading
This extended guide provides a detailed walkthrough of the LLM fine-tuning process using Ludwig, covering both technical details and practical applications to ensure developers and researchers can fully leverage this powerful framework for their AI model development endeavors.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
By Analytics Vidhya, May 8, 2024.