Saturday, 15 March 2025

OpenWebUI - Beginner's Tutorial

OpenWebUI Tutorial: Setting Up and Using Local Llama 3.2 with Ollama

Introduction

This tutorial provides a step-by-step guide to setting up OpenWebUI as a user-friendly interface for the Llama 3.2 model using Ollama. By the end of this tutorial, you will have a fully functional local AI chatbot running on your computer.

Prerequisites

Basic knowledge of command-line usage
Installed Docker Desktop
Installed Ollama

Tutorial Duration: 1 Hour

Step 1: Install and Set Up Docker (10 min)

Docker allows us to run OpenWebUI easily.

Download Docker Desktop from here.
Install Docker and ensure it is running.
Open a terminal (Command Prompt/PowerShell/Terminal) and verify installation:
```
docker --version
```
If a version number appears, Docker is installed correctly.

Step 2: Install Ollama (5 min)

Ollama is required to run Llama 3.2 locally.

Download and install Ollama from https://ollama.com/download.
Open a terminal and check if it is installed correctly:
```
ollama --version
```

Step 3: Download Llama 3.2 Model (10 min)

We now download the Llama 3.2 model for local use.

Open a terminal and run:
```
ollama pull meta/llama3
```
Wait for the download to complete (this may take some time depending on internet speed).

Step 4: Start Ollama (5 min)

Once the model is downloaded, start Ollama.

Run the following command:
```
ollama serve
```
This will start the local Ollama server.

Step 5: Install and Run OpenWebUI (15 min)

Now, we will install and start OpenWebUI using Docker.

Pull the OpenWebUI Docker image:

docker pull ghcr.io/open-webui/open-webui:main

Run OpenWebUI with the following command:

docker run -d --name openwebui -p 3000:3000 -v open-webui-data:/app/data --restart unless-stopped ghcr.io/open-webui/open-webui:main

Verify that OpenWebUI is running:
```
docker ps
```
If you see a container named openwebui, it is running.

Step 6: Access OpenWebUI (5 min)

Now, open the user interface in a web browser.

Go to http://localhost:3000 in your browser.
Create an account and log in.

Step 7: Configure OpenWebUI to Use Ollama (5 min)

Go to Settings → LLM Provider.
Select Ollama.
Enter the model name: llama3.
Save the settings.

Step 8: Test the AI Chatbot (5 min)

Now, let’s check if everything is working:

Open the chat window.
Type a message, such as:
```
What is AI?
```
If the AI responds, the setup is complete!

Conclusion

By following this tutorial, you have successfully: ✅ Installed Docker and Ollama
✅ Downloaded and ran Llama 3.2
✅ Installed and configured OpenWebUI
✅ Connected OpenWebUI to Ollama
✅ Tested the chatbot

You now have a fully functional local AI chatbot running securely on your machine! 🚀

iMMAi - Set up using Local LLM Ollama, and OpenwebUI [Docker Image]

Here’s a step-by-step guide to setting up a local Ollama LLM with the Llama 3.2 model and using OpenWebUI as an interface. This guide is designed for iMMbiZSofTians , so I'll keep it simple.

Step 1: Install Docker

We will use Docker to run OpenWebUI easily.

Download and install Docker Desktop from here.
After installation, open Docker and make sure it is running.

Step 2: Install Ollama

Ollama is the tool that helps run LLM models locally.

Download and install Ollama from https://ollama.com/download.
Open a terminal (Command Prompt / PowerShell / Terminal) and type:
```
ollama --version
```
If it shows a version number, that means Ollama is installed correctly.

Step 3: Download Llama3.2 Model

Now, let's download the Llama 3.2 model.

Open a terminal and run:
```
ollama pull meta/llama3
```
This will download the latest Llama 3 model (currently version 3.2).

How I Made iMMAi: A Legal AI Assistant

Introduction

iMMAi is a powerful local AI assistant specialized in Indian Company Laws and corporate regulations. This blog will guide you through the step-by-step process of creating iMMAi using Ollama and Docker.

Step 1: Install Ollama

Ollama is the tool that allows us to run large language models locally.

Download and install Ollama from https://ollama.com/download.
Verify the installation by running:
```
ollama --version
```

Step 2: Install Docker Desktop

Docker is needed to containerize and manage OpenWebUI.

Download and install Docker Desktop from https://www.docker.com/get-started.
Open Docker and make sure it is running.

Step 3: Download the Llama 3.2 Model

Now, we will pull the Llama 3.2 model, which serves as the base for iMMAi.

Open a terminal and run:
```
ollama pull llama3.2
```
Verify the model is downloaded:
```
ollama run llama3.2
```

Step 4: Create a Custom Modelfile for iMMAi

Now, we will customize Llama 3.2 to specialize in Indian legal and corporate regulations.

Create a new file named Modelfile and add the following content:

FROM llama3.2
SYSTEM """Your name is iMMAi! You are a very clever Legal Assistant and Chartered Accountant 
specialized in Indian Company Laws and corporate regulations. You know everything about company registration and financial aspects. 
You are succinct and informative. Search only for official legal and corporate regulations in India.
Do not include foreign laws or unrelated information.
Provide a **brief summary** in 2-3 sentences by default."""
PARAMETER temperature 0.1

Step 5: Create the iMMAi Model

Now, we will create the iMMAi model using the Modelfile.

Open a terminal and run:
```
ollama create iMMAi -f Modelfile
```
Check if the model is created:
```
ollama list
```
If you see iMMAi in the list, the model has been successfully created.

Step 6: Test iMMAi

Finally, let’s run and test our custom AI assistant.

Run iMMAi in the terminal:
```
ollama run iMMAi
```

Ask it a legal or corporate question, such as:

How do I register a private limited company in India?

If the response is relevant and based on Indian corporate laws, your AI assistant is ready! 🚀

Conclusion

In this blog, we successfully: ✅ Installed Ollama and Docker
✅ Downloaded and ran Llama 3.2
✅ Created a custom legal AI assistant (iMMAi)
✅ Tested iMMAi for legal and corporate queries

You now have a fully functional local AI legal assistant that can help with Indian corporate regulations. 🎯

Tuesday, 21 January 2025

Hour 12++ Local LLM for iMMSS AI

Let us have our Local LLM free and customize with name candy in just 5 Minutes!!!

Ollama is a free and open-source project that lets you run various open source LLMs locally on your system.

OLLAMA - Omni-Layer Learning Language Acquisition Model

Please download ollama and run llama2. It will take some time at first instance. Then say /bye to quit in the prompt >>>.
Create a Modelfile with the following contents

FROM llama3.2
SYSTEM """Your name is Candy ! You are very Clever assistant who knows everything.
          You are very succinct and informative."""
PARAMETER temperature 0.1

#ollama create candy -f Modelfile

Please check by

#ollama list (my typical list is shown below. Please check candy is there

ollama list
NAME                       ID              SIZE      MODIFIED       
candy:latest               2ea6c7bb34ec    2.0 GB    58 minutes ago    
nomic-embed-text:latest    0a109f422b47    274 MB    3 hours ago       
llama3.2:latest            a80c4f17acd5    2.0 GB    3 hours ago       
mistral:latest             f974a74358d6    4.1 GB    20 hours ago      
llama3.1:latest            46e0c10c039e    4.9 GB    31 hours ago      
phi3:latest                4f2222927938    2.2 GB    38 hours ago      
phi:latest                 e2fd6321a5fe    1.6 GB    2 days ago        

Local LLM using OLLAMA with Gradio

main.py

#llm-api-demo
import requests
import json
#pip install gradio
import gradio as gr

url="http://localhost:11434/api/generate"
headers = {
    "Content-Type": "application/json"
}

history=[]
import gradio as gr

#ollama create candy -f candy_modelfile.modelfile 
def generate_response(prompt):
  
    history.append(prompt)
    final_prompt="\n".join(history)
    data = {
        "model": "candy",
        "prompt": final_prompt,
        "stream": False
    }
    
    response = requests.post(url, headers=headers,data=json.dumps(data))
    
    if response.status_code == 200:
        response =  response.text
        data =json.loads(response)
        actual_response=data["response"]
        return actual_response
    else:
        return "Error generating response"
    
interface = gr.Interface(
    fn=generate_response,
    inputs=gr.Textbox(lines=3,placeholder="Enter your Prompt", 
                      label=" I am iMMSS AI Candy , How Can i Help You?"),
    outputs="text",
     title="IMMSS AI Candy",
     description="I am a helpful AI assistant that uses Ollama language 
         model to generate responses based on your inputs. I am also capable 
         of generating multiple prompts and providing a history of all your 
         previous prompts."
)

interface.launch()

Response :

f:/ollama/ollama-gradio.py Running on local URL: http://127.0.0.1:7860. Go to this URL .You will get the window with input box. Type prompt as What Ministry of Corporate affairs do in India? and click SUBMIT Button. You will get response as shown below:

Eureka ! SO easy to built Customized Local LLM with Ollama & Gradio for UI..

Here are some examples of prompts that can be generated using the MCA in India by our Loacl iMMSS Candy LLM:

1. **Drafting a complaint letter to the MCA**: "Write a sample complaint letter to the MCA regarding a company's non-compliance with regulatory requirements."

2. **Creating a template for a company's compliance report**: "Design a template for a company's compliance report, as required by the MCA under the Companies Act, 2013."

3. **Researching on recent developments in corporate governance**: "Research and write about recent developments in corporate governance in India, highlighting the role of the MCA in promoting good governance practices."

4. **Drafting a response to an investor's query**: "Write a sample response to an investor's query regarding a company's compliance with regulatory requirements, as per the guidance provided by the MCA."

Hour 12+ Local LLM using OLLAMA

Quick Sho{r}t Exercises

Ex1. Building a Local LLM and extract information

ollama run llama3.1

from langchain_community.llms import Ollama
#ollama run llama3.1
llm = Ollama(model="llama3.1")
response = llm("Tell me about Mahatma Gandhi")
print(response)

Response

A great figure in history! Mahatma Gandhi (1869-1948) was a Indian 
independence activist, leader of the non-violent resistance movement 
against British rule in India, and a key figure in the country's struggle 
for freedom. He is widely regarded as one of the most influential 
leaders of the 20th century.

**Early Life**

Gandhi was born on October 2, 1869, in Porbandar, a coastal town in the
state of Gujarat, India. His father, Karamchand Gandhi, was a local 
leader and businessman, while his mother, Putlibai, was a devout Hindu 
from a family of farmers. Gandhi's early life was marked by a strong 
emphasis on spirituality and self-discipline.

**Education and Career**

Gandhi studied law at the University College London and later 
practiced as a lawyer in India. However, he soon became disillusioned 
with the British colonial system and its treatment of Indians. 
In 1893, he traveled to South Africa to work as a lawyer for an Indian 
firm, but his experience there exposed him to the harsh realities of 
racial segregation and discrimination.       

**Non-Violent Resistance**

Gandhi's experiences in South Africa influenced his approach to politics 
and led him to develop the philosophy of non-violent resistance. 
He believed that peaceful protests, civil disobedience, and boycotts 
could be effective tools for achieving social change without resorting 
to violence.

**Indian Independence Movement**

In 1915, Gandhi returned to India and joined the Indian National 
Congress (INC), a party fighting for Indian independence from British 
rule. He became one of its key leaders and used his non-violent 
resistance methods to challenge the British government's authority. 

Some notable examples include:

1. **Salt March** (1930): A 24-day march from Ahmedabad to Dandi, 
where Gandhi and thousands of followers protested against the 
British Salt Act by making salt  themselves.
2. **Non-Cooperation Movement** (1920-1922): Gandhi called for 
Indians to boycott British goods, schools, and courts, which led 
to widespread protests and arrests.
3. **Quit India Movement** (1942): A campaign that urged the British 
to leave India immediately, but was met with brutal suppression.

**Philosophy and Legacy**

Gandhi's philosophy of non-violent resistance, known as "Satyagraha" 
(holding onto truth), emphasized:

1. **Non-violence**: The rejection of violence in all forms, including
physical and verbal aggression.
2. **Simple living**: Emphasis on simplicity, self-sufficiency, 
and a commitment to reducing one's material needs.
3. **Love and compassion**: A focus on promoting empathy, understanding, 
and reconciliation.   

Gandhi's influence extends far beyond India:

1. **Civil Rights Movement** (USA): Gandhi's philosophy inspired leaders 
like Martin Luther King Jr., who adopted similar non-violent resistance 
methods in the American civil rights movement.
2. **International diplomacy**: Gandhi's approach has influenced global 
efforts towards peaceful conflict resolution and disarmament.

**Assassination**

Tragically, Gandhi was assassinated on January 30, 1948, by a Hindu 
nationalist named Nathuram Godse. The news of his death sent shockwaves 
around the world, but it also cemented his legacy as a champion of peace 
and non-violence.

Today, Gandhi's birthday, October 2, is celebrated as International 
Day of Non-Violence, and he remains an iconic figure in history, 
inspiring countless people worldwide with his message of love, compassion, 
and peaceful resistance. """

Hour 12 - Wrap-Up and Q&A

Lecture Notes:

1. Wrap-Up

Overview of the Course

Over the past 12 hours, we’ve explored Ollama LLM Basics and Hugging Face models, focusing on practical applications, implementation, and fine-tuning.
We have covered topics like:
- Installation of Ollama and Hugging Face libraries
- File structure of Ollama
- Understanding chunks, embeddings, and vector databases
- Working with Llama models for tasks such as text generation, sentiment analysis, and question answering.
- Fine-tuning techniques and their real-life applications, such as for PDF extraction, web scraping, and chatbot development.

Key Takeaways

Powerful NLP Tools: We learned how to use pre-trained models like Llama from Hugging Face for a wide range of NLP tasks.
Model Fine-Tuning: Fine-tuning models for domain-specific tasks can greatly improve model performance, even with relatively small datasets.
Practical Applications: We've seen how these models can be integrated into real-world applications like chatbots, sentiment analysis systems, and question-answering agents.
Metrics & Evaluation: We discussed how to measure model performance and optimize them for your tasks.

Looking Forward

With the skills learned, you can start working on real-world projects like developing NLP-based tools for businesses, creating intelligent systems, and exploring more advanced topics in AI like multi-modal models and reinforcement learning.

2. Real-Life Example: Deploying an NLP Chatbot Using Hugging Face

In this real-life example, we’ll build a simple chatbot using the pre-trained Llama model from Hugging Face. The chatbot will be able to handle customer queries and provide helpful answers.

Step 1: Install Hugging Face Transformers

Ensure that you have installed the Hugging Face library.

pip install transformers

Step 2: Implementing the Chatbot

from transformers import LlamaForCausalLM, LlamaTokenizer

# Load the pre-trained Llama model and tokenizer
model_name = "meta-llama-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Function to simulate chatbot conversation
def chatbot_response(user_input):
    # Tokenize user input
    inputs = tokenizer(user_input, return_tensors="pt")
    
    # Generate the response from the model
    response = model.generate(inputs["input_ids"], max_length=100, num_return_sequences=1)

    # Decode the response to text
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    return response_text

# Example of a conversation
user_query = "How can I reset my password?"
chatbot_answer = chatbot_response(user_query)

print(f"User: {user_query}")
print(f"Chatbot: {chatbot_answer}")

Explanation:

Llama Model: A pre-trained Llama model is used to generate responses to user queries.
Chatbot Simulation: The function chatbot_response() simulates a chatbot conversation. It tokenizes the user input, generates a response using the model, and decodes the result to text.
This basic chatbot can be expanded with more sophisticated logic and additional features (e.g., storing context, handling multiple user inputs, or integrating with APIs).

3. Q&A: Typical Questions and Answers

Q1: What is Ollama?

A1: Ollama is a platform for running, managing, and experimenting with large language models (LLMs). It provides an easy way to interact with LLMs, deploy models, and use them in applications.

Q2: How do I install Ollama and Hugging Face?

A2: To install Ollama, you can run the following CLI command: ollama install. For Hugging Face, use pip install transformers datasets.

Q3: What is the role of embeddings in NLP?

A3: Embeddings represent words or sentences as vectors in high-dimensional space. They capture semantic meaning and relationships between words, enabling tasks like similarity search, translation, and question answering.

Q4: What is a vector database and why is it important?

A4: A vector database stores embeddings (vector representations of data) and allows fast similarity searches. It is important for tasks like document retrieval, recommendation systems, and semantic search.

Q5: How do I fine-tune a model like Llama for a specific task?

A5: Fine-tuning involves training the pre-trained model on your task-specific dataset. You can load your dataset using the datasets library, tokenize it, and use Hugging Face’s Trainer to fine-tune the model.

Q6: What metrics should I use to evaluate my model?

A6: Common metrics for evaluating NLP models include accuracy, F1 score, precision, recall, and perplexity. For tasks like question answering, you might also use Exact Match (EM) and F1 scores.

Q7: How can I deploy a fine-tuned model for production use?

A7: You can deploy models using Hugging Face's Inference API, or by creating a REST API with tools like FastAPI or Flask. These tools allow your model to serve predictions over the web.

Q8: Can I use the Llama model for multi-turn conversations?

A8: Yes, multi-turn conversations can be managed by maintaining context. You can pass previous user inputs and model responses back to the model to ensure it remembers the conversation history.

Q9: How do I preprocess data for model fine-tuning?

A9: Preprocessing typically involves tokenizing the text, padding or truncating sequences to a fixed length, and formatting data into input-output pairs. Hugging Face's transformers and datasets libraries provide utilities for these tasks.

Q10: What are the limitations of using Llama models for certain tasks?

A10: Llama models, like all language models, are limited by the data they were trained on. They may struggle with tasks requiring very domain-specific knowledge or tasks involving non-text data (e.g., images or sounds). Fine-tuning on relevant data can mitigate some of these limitations.

4. Final Thoughts

Key Concepts to Remember:

Pre-trained models like Llama can save time and resources in NLP tasks.
Fine-tuning enhances the ability of models to perform specific tasks.
Hugging Face provides a rich ecosystem for working with models, datasets, and deployment tools.

Next Steps:

Explore more Hugging Face models for various NLP tasks.
Experiment with fine-tuning on your own custom datasets.
Learn about advanced techniques like multi-modal models, reinforcement learning, or real-time model serving.

5. Thank You for Attending the Course!

With the knowledge gained, you are now equipped to start working with language models like Llama and explore advanced AI applications.
Keep experimenting, and don't hesitate to reach out to the community or further resources on Hugging Face to deepen your understanding!

This concludes Hour 12 on Wrap-Up and Q&A. Feel free to explore and apply what you've learned to your own projects!

Hour 11 - Practical Applications with Llama: Hugging Face Models

Lecture Notes:

1. Concepts

What is Hugging Face?

Hugging Face is an open-source AI community and platform that provides powerful tools for NLP tasks. It provides access to pre-trained models like BERT, GPT, and Llama for a wide variety of tasks, such as text generation, translation, summarization, and more. It also simplifies the integration of these models with APIs for easy deployment in applications.

Llama Models on Hugging Face

Llama, a family of LLMs (Large Language Models) by Meta, can be easily accessed on Hugging Face. Hugging Face’s transformers library provides seamless integration of these models, allowing you to fine-tune and deploy them for various NLP tasks.

Practical Applications of Llama Models

Llama models, when fine-tuned for specific tasks, can be applied to real-world scenarios in industries like:

Text Summarization
- Summarizing long articles, reports, research papers, and news.
Question Answering (QA)
- Building intelligent chatbots or QA systems.
Text Generation
- Generating creative writing, code, or completing unfinished sentences.
Named Entity Recognition (NER)
- Extracting names, dates, locations, and other entities from text.
Translation
- Language translation and localization.
Sentiment Analysis
- Determining the sentiment (positive/negative) in customer reviews, social media posts, etc.

2. Key Aspects of Practical Applications

Pre-trained Models
- Hugging Face provides a range of pre-trained models for various NLP tasks. These models are fine-tuned on diverse datasets and can be used immediately for real-world applications.
Model Fine-Tuning
- Fine-tuning pre-trained models on task-specific datasets enhances their performance. You can adapt a general-purpose model (like Llama) to your use case.
APIs and Integration
- Hugging Face offers APIs for easy model deployment and integration with applications. These can be used to make predictions in real-time via HTTP requests or integrated into chatbots, websites, etc.
Datasets for Training
- Hugging Face also provides datasets for training and fine-tuning models, along with utilities to preprocess data.
Optimized Infrastructure
- Hugging Face’s infrastructure (Model Hub and Inference API) allows for easy deployment on the cloud with optimized models, saving you from setting up your own infrastructure.

3. Implementation of Llama Models from Hugging Face

Prerequisites:

Install necessary Python packages.

pip install transformers datasets torch

Example 1: Using Pre-trained Llama Model for Text Generation

This example demonstrates how to use a pre-trained Llama model from Hugging Face to generate text based on a given prompt.

from transformers import LlamaForCausalLM, LlamaTokenizer

# Load the Llama model and tokenizer
model_name = "meta-llama-7b-hf"  # This is the pre-trained Llama model available on Hugging Face
model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Example prompt for text generation
prompt = "In the field of artificial intelligence,"

# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text based on the input prompt
outputs = model.generate(inputs['input_ids'], max_length=100)

# Decode and print the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Explanation:

The model is initialized from Hugging Face's meta-llama-7b-hf pre-trained model.
The tokenizer is used to preprocess the input text (prompt).
The model generates text based on the prompt using model.generate().
Finally, the output text is decoded using the tokenizer.

Example 2: Fine-Tuning a Llama Model for Text Classification (Sentiment Analysis)

In this example, we fine-tune the Llama model for a sentiment classification task.

Dataset: We will use the IMDb dataset for sentiment analysis, available on Hugging Face Datasets.
Fine-Tuning: We will fine-tune the pre-trained model on the sentiment classification dataset.

from transformers import LlamaForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from transformers import LlamaTokenizer

# Load pre-trained model and tokenizer for sequence classification
model_name = "meta-llama-7b-hf"
model = LlamaForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Load the IMDb dataset
dataset = load_dataset("imdb")

# Preprocess the dataset (tokenization)
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Split the dataset into training and evaluation sets
train_dataset = tokenized_datasets["train"]
eval_dataset = tokenized_datasets["test"]

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
)

# Train the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print(f"Evaluation results: {results}")

Explanation:

We load the IMDb dataset for sentiment analysis, which contains movie reviews labeled as positive or negative.
The tokenize_function is used to preprocess the text data, making it compatible with the Llama model.
We set up the Trainer class with the model and training parameters, and fine-tune the model on the sentiment dataset.
Finally, the model is evaluated on a test set.

Example 3: Question Answering with Llama Model

In this example, we fine-tune a Llama model for a Question Answering task.

from transformers import LlamaForQuestionAnswering, LlamaTokenizer
from datasets import load_dataset

# Load pre-trained model and tokenizer for Question Answering
model_name = "meta-llama-7b-hf"
model = LlamaForQuestionAnswering.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Load the SQuAD dataset
dataset = load_dataset("squad")

# Preprocess the dataset (tokenization)
def preprocess_data(examples):
    return tokenizer(examples["question"], examples["context"], truncation=True, padding="max_length")

tokenized_data = dataset.map(preprocess_data, batched=True)

# Example Question and Context
context = "The capital of France is Paris, a city known for its culture and history."
question = "What is the capital of France?"

# Tokenize the inputs
inputs = tokenizer(question, context, return_tensors="pt")

# Get model outputs
outputs = model(**inputs)
start_scores = outputs.start_logits
end_scores = outputs.end_logits

# Get the most likely start and end positions of the answer
start_idx = torch.argmax(start_scores)
end_idx = torch.argmax(end_scores)

# Decode the answer
answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx+1], skip_special_tokens=True)
print(f"Answer: {answer}")

Explanation:

This code uses the SQuAD dataset (Stanford Question Answering Dataset) for fine-tuning the model for a question answering task.
The model is fine-tuned using the LlamaForQuestionAnswering class from Hugging Face's Transformers library.
After fine-tuning, we provide a context and a question, and the model predicts the answer by identifying the most likely span of text in the context.

4. Real-Life Example: Text Generation for Customer Support Chatbot

In this example, we use the pre-trained Llama model for generating responses in a customer support chatbot. The chatbot takes customer queries and generates text responses.

Objective: Use a pre-trained Llama model to simulate a customer support agent.
Use Case: Handle queries like "How do I reset my password?" or "Where can I find my order history?"

from transformers import LlamaForCausalLM, LlamaTokenizer

# Load pre-trained model and tokenizer
model_name = "meta-llama-7b-hf"
model = LlamaForCausalLM.from_pretrained(model_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Simulate customer query
customer_query = "How do I reset my password?"

# Tokenize input
inputs = tokenizer(customer_query, return_tensors="pt")

# Generate response
response = model.generate(inputs["input_ids"], max_length=50)

# Decode and print the response
generated_response = tokenizer.decode(response[0], skip_special_tokens=True)
print(f"Customer Support Response: {generated_response}")

Explanation:

A customer query is tokenized and passed to the pre-trained Llama model.
The model generates a response based on the input query.
This setup can be scaled to create intelligent chatbots that can handle a wide variety of queries.

5. Summary

Hugging Face's Transformers provides an easy way to deploy and fine-tune Llama models for practical applications.
Llama models can be utilized for tasks such as text generation, sentiment analysis, question answering, and chatbot development.
Fine-tuning these models on task-specific datasets allows them to adapt and excel in real-world applications.
Hugging Face makes model deployment and API integration simple, enabling businesses and developers to leverage powerful NLP models easily.

6. Homework/Practice

Fine-tune a Llama model on a custom dataset for a real-world application (e.g., email classification, FAQ answering).
Build a small chatbot using Llama for customer support, implementing features such as handling product-related questions.
Explore Hugging Face’s Model Hub to experiment with different models and tasks.
Investigate how to deploy a fine-tuned Llama model using Hugging Face’s Inference API.

This concludes Hour 11 on Practical Applications with Llama and Hugging Face Models.

Hour 10 - Advanced Fine-Tuning Techniques

Lecture Notes:

1. Concepts

What is Fine-Tuning?

Fine-tuning refers to the process of taking a pre-trained model and adjusting its weights based on a smaller, task-specific dataset. This allows the model to adapt and perform better on specialized tasks (e.g., summarizing PDFs, extracting data from websites) without requiring the massive computational resources needed for training a model from scratch.

Advanced Fine-Tuning Techniques

Fine-tuning is an iterative process that can be enhanced with advanced strategies to optimize the model's performance. These strategies are designed to improve the model's efficiency and its ability to generalize on new, unseen data.

1. Learning Rate Schedulers

A learning rate scheduler adjusts the learning rate during training to prevent overshooting the optimal solution and to accelerate convergence.
Types:
- Constant Learning Rate: Keeps the learning rate constant.
- Step Decay: Reduces the learning rate after a set number of epochs.
- Exponential Decay: Gradually decreases the learning rate.
- Cosine Annealing: Gradually reduces the learning rate in a cosine curve to explore a wide range of potential solutions before narrowing down.

2. Early Stopping

Stops training when the model’s performance on a validation set no longer improves. This helps prevent overfitting and saves time by avoiding unnecessary training steps.

3. Data Augmentation

Expands the size and variety of your training dataset by applying transformations to the input data (e.g., rotating images, paraphrasing text). This allows the model to generalize better to new data.

4. Gradient Accumulation

A technique to simulate a larger batch size when limited by GPU memory. The gradients are accumulated over multiple smaller mini-batches before performing a parameter update.

5. Model Regularization

Helps prevent the model from overfitting by adding a penalty to the loss function based on the complexity of the model.
Types:
- L1/L2 Regularization: Adds a penalty to the weights of the model to prevent them from becoming too large.
- Dropout: Randomly drops units (neurons) in the neural network during training to prevent overfitting.

6. Knowledge Distillation

Involves training a smaller model (student) to mimic the behavior of a larger, more powerful model (teacher). The smaller model can achieve similar performance with fewer parameters and resources.

2. Key Aspects of Advanced Fine-Tuning

Optimizing Hyperparameters
- Fine-tuning involves selecting the right hyperparameters, including learning rate, batch size, optimizer type, and number of epochs. Using techniques like grid search and random search can help find optimal settings.
Transfer Learning
- Fine-tuning a pre-trained model on a specific task takes advantage of the knowledge the model has already learned from a vast corpus of general data, reducing the amount of training required for task-specific adaptation.
Model Evaluation During Fine-Tuning
- It's crucial to evaluate the model at various stages of fine-tuning to ensure that improvements are being made and that the model is not overfitting.
Computational Resources
- Advanced fine-tuning techniques often require more computational resources. Optimizing the training process (e.g., through gradient accumulation or data parallelism) can help manage these resources effectively.

3. Implementation of Advanced Fine-Tuning Techniques

Prerequisites:

Pre-trained model (e.g., Llama).
A dataset for the specific task (e.g., PDF summarization, web scraping).
Python packages: transformers, torch, datasets, sklearn.

Learning Rate Scheduler

A learning rate scheduler can be used to adjust the learning rate dynamically during training.

from transformers import AdamW, get_linear_schedule_with_warmup
import torch

# Initialize model and tokenizer
model = LlamaForCausalLM.from_pretrained("llama-7b")
optimizer = AdamW(model.parameters(), lr=5e-5)

# Define scheduler
epochs = 3
train_dataloader = DataLoader(training_data, batch_size=8, shuffle=True)
num_training_steps = len(train_dataloader) * epochs
lr_scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)

# Training loop with learning rate scheduler
for epoch in range(epochs):
    for batch in train_dataloader:
        optimizer.zero_grad()
        inputs = batch["input_ids"].to(device)
        labels = batch["labels"].to(device)
        
        outputs = model(inputs, labels=labels)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()  # Adjust learning rate

    print(f"Epoch {epoch + 1} completed with loss: {loss.item()}")

Early Stopping

Early stopping ensures that the training process halts once the model's performance on the validation set stops improving.

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",  # Evaluate at the end of each epoch
    save_strategy="epoch",        # Save the model checkpoint at the end of each epoch
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,   # Load the best model after training
    metric_for_best_model="accuracy",  # Best model based on accuracy
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    tokenizer=tokenizer,
)

trainer.train()

Data Augmentation for Text

In NLP tasks like summarization or question answering, data augmentation can involve techniques such as paraphrasing or using back-translation to create new examples from existing ones.

from nltk.corpus import wordnet

def synonym_augmentation(text):
    words = text.split()
    augmented_words = []
    
    for word in words:
        synonyms = wordnet.synsets(word)
        if synonyms:
            synonym = synonyms[0].lemmas()[0].name()  # Choose first synonym
            augmented_words.append(synonym)
        else:
            augmented_words.append(word)
    
    return " ".join(augmented_words)

augmented_text = synonym_augmentation("The research paper discusses novel methods in machine learning.")
print(augmented_text)

Gradient Accumulation

To simulate larger batch sizes without requiring large memory, you can accumulate gradients over several mini-batches before performing a gradient update.

from torch.utils.data import DataLoader

gradient_accumulation_steps = 4  # Accumulate gradients over 4 mini-batches

optimizer.zero_grad()
for step, batch in enumerate(train_dataloader):
    inputs = batch["input_ids"].to(device)
    labels = batch["labels"].to(device)
    
    outputs = model(inputs, labels=labels)
    loss = outputs.loss
    loss.backward()

    # Perform optimization step every `gradient_accumulation_steps` steps
    if (step + 1) % gradient_accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

Model Regularization (Dropout)

Incorporating dropout in your model can help regularize the neural network and avoid overfitting.

from transformers import LlamaForCausalLM, LlamaConfig

# Define model configuration with dropout
config = LlamaConfig.from_pretrained("llama-7b")
config.attention_probs_dropout_prob = 0.1  # Dropout in attention layers
config.hidden_dropout_prob = 0.1  # Dropout in hidden layers

# Load model with custom configuration
model = LlamaForCausalLM(config)

# Training the model
optimizer = AdamW(model.parameters(), lr=5e-5)
for epoch in range(epochs):
    model.train()
    for batch in train_dataloader:
        optimizer.zero_grad()
        inputs = batch["input_ids"].to(device)
        labels = batch["labels"].to(device)
        
        outputs = model(inputs, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()

4. Real-Life Example: Fine-Tuning a Summarization Model

In this example, we will fine-tune a pre-trained Llama model for summarizing research papers. We will use early stopping, learning rate scheduling, and data augmentation techniques to ensure optimal training.

Objective: Fine-tune a pre-trained Llama model on a summarization dataset.
Dataset: A collection of research papers and their corresponding summaries.
Techniques Applied:
- Learning Rate Scheduler: Gradual adjustment of the learning rate.
- Early Stopping: Halt training when the validation loss plateaus.
- Data Augmentation: Increase dataset diversity using paraphrasing.
- Model Regularization: Use dropout to prevent overfitting.

5. Summary

Advanced Fine-Tuning Techniques are essential to improving the performance of your model, particularly when you're working with specialized tasks like summarizing PDFs or extracting data.
Key techniques like learning rate scheduling, early stopping, data augmentation, and gradient accumulation allow for more efficient training and better model generalization.
Model Regularization (e.g., dropout) and knowledge distillation can further help in making the model robust and efficient.

6. Homework/Practice

Fine-tune a pre-trained model for a custom task (e.g., summarization, Q&A, etc.).
Implement a learning rate scheduler and evaluate its impact on training.
Apply data augmentation and observe how it affects model generalization on unseen data.
Experiment with gradient accumulation for large batch sizes on a resource-limited machine.

This concludes the lecture on Advanced Fine-Tuning Techniques.

Hour 9 - Metrics & Evaluation for Fine-Tuned Models

Lecture Notes:

1. Concepts

What are Model Metrics?

Metrics are quantitative measures used to evaluate the performance of a model. They help assess how well a model is performing, both during training and after fine-tuning.
Metrics are essential in understanding the accuracy, precision, recall, F1-score, and other aspects of model performance.

Why are Metrics Important?

Metrics guide model improvements, provide insight into whether fine-tuning has been successful, and identify areas where the model can be further enhanced.
The evaluation process helps determine if the model can generalize well to new, unseen data or if it’s overfitting to the training data.

Key Types of Metrics for NLP Models:

Accuracy: The percentage of correct predictions over the total predictions.
Precision: The proportion of positive predictions that are actually correct.
Recall: The proportion of actual positives that were correctly predicted.
F1-Score: The harmonic mean of precision and recall, providing a balance between the two.
BLEU (Bilingual Evaluation Understudy): Used primarily for evaluating machine translation models (or tasks like summarization).
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Used for evaluating the quality of summaries by comparing the overlap of n-grams between the model output and a reference summary.
Loss Function: Measures how far the model’s predictions are from the actual output. During fine-tuning, the goal is to minimize the loss.

2. Key Aspects of Metrics & Evaluation

Choosing the Right Metric:
- The right metric depends on the task. For tasks like summarization, ROUGE and BLEU are often used. For classification tasks, accuracy, precision, and recall are more relevant.
Overfitting vs. Generalization:
- Overfitting happens when a model performs well on training data but poorly on new data. Evaluating the model on both training and validation data helps detect overfitting.
- Generalization refers to how well the model performs on unseen data.
Evaluation Datasets:
- Use validation and test datasets to evaluate the model.
- Validation Set: Used during training to tune hyperparameters and prevent overfitting.
- Test Set: Used only after training to evaluate the final performance of the model.
Model Evaluation Pipeline:
- Step 1: Prepare the evaluation dataset.
- Step 2: Generate predictions using the fine-tuned model.
- Step 3: Compare the model’s predictions to the true outputs using metrics.

3. Implementation of Evaluation and Metrics

Prerequisites:

Fine-tuned model (e.g., a PDF summarization model).
Evaluation dataset (e.g., PDFs with summaries or web-scraped content).

Example: Evaluating a Fine-Tuned Model

Step 1: Set Up Metrics (Accuracy, Precision, Recall, F1, BLEU, ROUGE)

You’ll use sklearn for traditional metrics (Accuracy, Precision, Recall, F1) and rouge-score for ROUGE and BLEU.

pip install scikit-learn rouge-score

Step 2: Generate Predictions

Assume you have a fine-tuned model that generates summaries for research papers. Here’s how to evaluate it:

from transformers import LlamaForCausalLM, LlamaTokenizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from rouge_score import rouge_scorer

# Load model and tokenizer
model = LlamaForCausalLM.from_pretrained("./fine_tuned_model")
tokenizer = LlamaTokenizer.from_pretrained("./fine_tuned_model")

# Define evaluation data (text of research papers and their corresponding summaries)
eval_data = [
    {"input": "Research paper content 1", "output": "Summary of paper 1"},
    {"input": "Research paper content 2", "output": "Summary of paper 2"},
    # Add more samples for evaluation
]

# Generate predictions using the fine-tuned model
def generate_summary(input_text):
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True)
    summary_ids = model.generate(inputs["input_ids"], max_length=100, num_beams=2, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

predictions = [generate_summary(d['input']) for d in eval_data]
actuals = [d['output'] for d in eval_data]

Step 3: Calculate Evaluation Metrics

Now, let’s calculate some key metrics.

Accuracy:
- Compare if the generated summary exactly matches the target summary.

# Simple exact match accuracy
accuracy = accuracy_score(actuals, predictions)
print(f"Accuracy: {accuracy:.4f}")

Precision, Recall, F1-Score:
- If your summaries are in binary or multi-class format, use precision, recall, and F1.

precision = precision_score(actuals, predictions, average="macro")
recall = recall_score(actuals, predictions, average="macro")
f1 = f1_score(actuals, predictions, average="macro")

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

ROUGE Score:
- ROUGE scores compare the overlap between the model’s generated summary and the reference summary.

# Using the rouge_score library
scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
rouge_scores = [scorer.score(actual, pred) for actual, pred in zip(actuals, predictions)]

# Print ROUGE scores
for i, score in enumerate(rouge_scores):
    print(f"Example {i+1}: ROUGE-1: {score['rouge1'].fmeasure:.4f}, ROUGE-2: {score['rouge2'].fmeasure:.4f}, ROUGE-L: {score['rougeL'].fmeasure:.4f}")

BLEU Score:
- BLEU is commonly used for evaluating machine translation or text generation tasks.

from nltk.translate.bleu_score import sentence_bleu

# Compute BLEU score
bleu_scores = [sentence_bleu([actual.split()], pred.split()) for actual, pred in zip(actuals, predictions)]
print(f"BLEU Score: {sum(bleu_scores) / len(bleu_scores):.4f}")

Step 4: Visualize the Results (Optional)

Visualizing the performance of your model can give you a clearer understanding of its strengths and weaknesses.

import matplotlib.pyplot as plt

# Example: Plot ROUGE Scores for different examples
rouge_1_scores = [score['rouge1'].fmeasure for score in rouge_scores]
rouge_2_scores = [score['rouge2'].fmeasure for score in rouge_scores]
rouge_L_scores = [score['rougeL'].fmeasure for score in rouge_scores]

plt.plot(rouge_1_scores, label='ROUGE-1')
plt.plot(rouge_2_scores, label='ROUGE-2')
plt.plot(rouge_L_scores, label='ROUGE-L')
plt.legend()
plt.title("ROUGE Scores for Each Example")
plt.xlabel("Example Index")
plt.ylabel("ROUGE Score")
plt.show()

4. Real-Life Example: Evaluating PDF Summarization

Consider a scenario where you have a fine-tuned model that summarizes research papers (PDFs).

Objective: Evaluate how well the model generates summaries by comparing them to human-provided summaries.
Metrics: Use accuracy, ROUGE, and BLEU to evaluate the performance. ROUGE is ideal for summarization because it captures recall of important words, and BLEU ensures the fluency of the summary.

Step 1: Scrape and Label PDF Data

Use PyPDF2 to scrape content from PDFs and manually label a few examples with reference summaries.

import PyPDF2

def extract_text_from_pdf(pdf_path):
    with open(pdf_path, "rb") as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
        return text

pdf_text = extract_text_from_pdf("sample_paper.pdf")
print(pdf_text[:500])  # Print first 500 characters of extracted text

Step 2: Fine-Tune and Evaluate Model

Fine-tune the model with PDF data and evaluate the performance using the metrics described above.

5. Code Summary

from transformers import LlamaForCausalLM, LlamaTokenizer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu
import matplotlib.pyplot as plt

# Load model and tokenizer
model = LlamaForCausalLM.from_pretrained("./fine_tuned_model")
tokenizer = LlamaTokenizer.from_pretrained("./fine_tuned_model")

# Example: Evaluation Data
eval_data = [{"input": "Research paper content 1", "output": "Summary of paper 1"}]

# Generate predictions
predictions = [generate_summary(d['input']) for d in eval_data]
actuals = [d['output'] for d in eval_data]

# Evaluate with Accuracy, Precision, Recall, F1-Score
accuracy = accuracy_score(actuals, predictions)
precision = precision_score(actuals, predictions, average="macro")
recall = recall_score(actuals, predictions, average="macro")
f1 = f1_score(actuals, predictions, average="macro")

# ROUGE Scores
scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
rouge_scores = [scorer.score(actual, pred) for actual, pred in zip(actuals, predictions)]

#

BLEU Score bleu_scores = [sentence_bleu([actual.split()], pred.split()) for actual, pred in zip(actuals, predictions)]

Visualization of ROUGE Scores

plt.plot([score['rouge1'].fmeasure for score in rouge_scores], label='ROUGE-1') plt.legend() plt.show()


---

### **6. Summary**

- **Concepts Covered**: Metrics for evaluation, including accuracy, precision, recall, F1-score, ROUGE, BLEU, and loss functions.
- **Key Aspects**: Evaluation ensures that models generalize well to new data and do not overfit. Different metrics are suited for different types of tasks (summarization, classification).
- **Real-Life Example**: Evaluating a PDF summarization model using ROUGE, BLEU, and traditional metrics.
- **Implementation**: Code for calculating various metrics using Python and common libraries like `sklearn`, `rouge-score`, and `nltk`.

---

### **7. Homework/Practice**

1. Evaluate your fine-tuned model using the above metrics on a new test set of PDFs or web-scraped data.
2. Experiment with different evaluation strategies such as using multiple BLEU references or adjusting the length of summaries.

Hour 8 - Introduction to Fine-Tuning Custom PDF and Web Scraping Models

Lecture Notes:

1. Concepts

What is Fine-Tuning?

Fine-tuning is the process of adjusting a pre-trained model to improve its performance for a specific task or dataset.
Fine-tuning allows a model to better understand and generate responses based on domain-specific data, improving its accuracy and usefulness in real-world applications.

Why Fine-Tune PDF and Web Scraping Models?

Models that are trained on general data may not understand the nuances or specific needs of tasks like summarizing academic papers or extracting specific data from web pages.
Fine-tuning allows the model to specialize in these tasks by exposing it to relevant, labeled data.

Key Idea

Fine-tuning involves updating the weights of a model after it has been pre-trained. This is achieved by training it on new data that aligns with the target task.

2. Key Aspects of Fine-Tuning

Base Model Selection:
- Choose a model that already has useful general knowledge. Models like Llama are good starting points for fine-tuning.
Dataset Preparation:
- Labeled Data: For fine-tuning, you need a labeled dataset. For example, if you want to fine-tune a model for summarizing research papers, you need a dataset of papers paired with their summaries.
- For PDFs: Label the data with clear instructions for the model to understand key points, summaries, or other types of content.
- For Web Scraping: You can label data for specific types of information such as titles, articles, or key facts extracted from scraped web pages.
Training Process:
- The training process involves using small batches of data to modify the model’s weights.
- Learning Rate: A key parameter for fine-tuning that controls how much the weights change during training.
Evaluation:
- After fine-tuning, evaluate the model to check if it performs well on new, unseen data.
Transfer Learning:
- Fine-tuning is a form of transfer learning, where you apply knowledge from one domain (general model) to another (specific task).

3. Implementation

Prerequisites:

Python Libraries: torch, transformers, ollama
Data Preparation: A dataset with labeled examples of the target task (summaries, extracted content).

Example: Fine-Tuning for PDF Summarization

Step 1: Create a Dataset for Fine-Tuning

First, prepare a small dataset of PDF summaries (input-output pairs).

# Example dataset for fine-tuning (PDF summaries)
data = [
    {"input": "Text of research paper 1", "output": "Summary of paper 1"},
    {"input": "Text of research paper 2", "output": "Summary of paper 2"},
    # Add more labeled examples
]

Step 2: Define the Model and Tokenizer

For fine-tuning, you’ll need to choose a pre-trained model. Let's assume we are working with Llama.

from transformers import LlamaForCausalLM, LlamaTokenizer

# Load the pre-trained model and tokenizer
model = LlamaForCausalLM.from_pretrained("llama")
tokenizer = LlamaTokenizer.from_pretrained("llama")

Step 3: Tokenize the Dataset

Convert the text data into tokens that can be fed into the model.

inputs = tokenizer([d['input'] for d in data], padding=True, truncation=True, return_tensors="pt")
labels = tokenizer([d['output'] for d in data], padding=True, truncation=True, return_tensors="pt")

# Create dataset for PyTorch
import torch
class PDFSummaryDataset(torch.utils.data.Dataset):
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels
        
    def __getitem__(self, idx):
        return {"input_ids": self.inputs["input_ids"][idx], "labels": self.labels["input_ids"][idx]}

    def __len__(self):
        return len(self.inputs["input_ids"])

# Create DataLoader for batching
train_dataset = PDFSummaryDataset(inputs, labels)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=2, shuffle=True)

Step 4: Fine-Tune the Model

Now, you can start the fine-tuning process using the dataset.

from transformers import Trainer, TrainingArguments

# Define training arguments
training_args = TrainingArguments(
    output_dir="./model_output",      # output directory
    evaluation_strategy="steps",      # evaluation strategy to adopt during training
    learning_rate=5e-5,               # learning rate
    per_device_train_batch_size=2,    # batch size
    num_train_epochs=3,               # number of epochs
    weight_decay=0.01                 # weight decay to avoid overfitting
)

# Define the Trainer
trainer = Trainer(
    model=model,                      # the pre-trained model
    args=training_args,               # training arguments
    train_dataset=train_dataset,      # training dataset
    eval_dataset=train_dataset        # evaluation dataset (optional)
)

# Fine-tune the model
trainer.train()

Step 5: Save the Fine-Tuned Model

After training, save your fine-tuned model.

model.save_pretrained("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")

4. Real-Life Example

Scenario: Fine-Tuning for Extracting Key Information from Web Scraped Articles

Objective: Fine-tune a model to extract specific information (e.g., author name, publication date, and article summary) from web pages scraped using BeautifulSoup.

Step 1: Scrape Data from the Web

Use the requests and BeautifulSoup libraries to scrape articles from a webpage.

from bs4 import BeautifulSoup
import requests

def scrape_web_article(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.find("h1").get_text()
    author = soup.find("span", class_="author").get_text()  # Example class
    return {"title": title, "author": author}

# Example: scrape an article
article = scrape_web_article("https://example.com/article")
print(article)

Step 2: Label the Data

Label the scraped content with the correct output (summary, author name, etc.).

web_data = [
    {"input": "Text from scraped article 1", "output": "Summary and key points"},
    # Add more data
]

Step 3: Fine-Tune the Model

Follow the same fine-tuning steps as in the PDF case, using the web-scraped content.

5. Code Summary

from transformers import LlamaForCausalLM, LlamaTokenizer, Trainer, TrainingArguments
import torch

# Load and prepare the model and tokenizer
model = LlamaForCausalLM.from_pretrained("llama")
tokenizer = LlamaTokenizer.from_pretrained("llama")

# Prepare dataset (input-output pairs)
data = [
    {"input": "Text of research paper 1", "output": "Summary of paper 1"},
    # Add more labeled examples
]

inputs = tokenizer([d['input'] for d in data], padding=True, truncation=True, return_tensors="pt")
labels = tokenizer([d['output'] for d in data], padding=True, truncation=True, return_tensors="pt")

# Fine-tune the model
training_args = TrainingArguments(
    output_dir="./model_output", num_train_epochs=3, per_device_train_batch_size=2, learning_rate=5e-5
)
trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset)
trainer.train()

# Save the fine-tuned model
model.save_pretrained("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")

6. Summary

Concepts Covered: Fine-tuning, transfer learning, dataset preparation, training, and evaluation.
Key Aspects: Fine-tuning requires a labeled dataset, careful model selection, and tuning of hyperparameters.
Real-Life Example: Fine-tuning a model for summarizing research papers (PDFs) and extracting key details from web-scraped content.
Implementation: Steps involved creating datasets, tokenizing them, fine-tuning the model, and evaluating it.

7. Homework/Practice

Fine-tune the model you created in the previous lesson to summarize a new set of PDFs.
Use web-scraped content and fine-tune the model for extracting key details (e.g., title, author, summary) from articles.
Experiment with different learning rates and batch sizes to see how they affect model performance.

These lecture notes provide a step-by-step introduction to fine-tuning models for custom tasks like PDF summarization and web scraping, offering practical examples with Python and Ollama CLI code.

Hour 7 - Creating Custom Models for PDF and Web Scraping

Lecture Notes:

1. Concepts

Custom Models in Ollama

Custom Models: Tailored versions of base models created to handle specific tasks like answering questions from PDFs or summarizing web pages.
Ollama allows users to create models by defining custom system prompts and incorporating specific templates.

PDF and Web Scraping with AI

PDF Parsing: Extracting meaningful information (e.g., text, metadata) from PDF documents.
Web Scraping: Collecting data from websites for insights or analysis.
Both tasks require processing structured and unstructured text data, making them ideal for custom AI models.

2. Key Aspects

Key Components of a Custom Model for PDF and Web Scraping:
- Input Source: The source data (PDFs or web pages).
- Preprocessing: Cleaning and structuring the data for AI consumption.
- Model Behavior: Tailored system prompts to guide output generation.
Why Custom Models for PDF and Web Scraping?
- Automate repetitive tasks like extracting summaries or key points.
- Handle domain-specific data with fine-tuned responses.
- Increase efficiency in research, data collection, and reporting.
Challenges:
- Handling large or complex PDFs.
- Avoiding CAPTCHA and legal concerns during web scraping.
- Processing noisy or unstructured data effectively.

3. Implementation

CLI Commands for Custom Models:

Command	Description	Example
`ollama run`	Run a custom model to process extracted text.	`ollama run pdf_reader --prompt "Summarize"`
`ollama create`	Create a new model with a system prompt and template.	`ollama create pdf_reader -f ./modelfile`
`ollama pull`	Pull a base model as a starting point.	`ollama pull llama`
`ollama show`	Display the details of the custom model.	`ollama show pdf_reader`

4. Real-Life Example

Scenario: Extracting Key Points from Research PDFs

Objective: Build a model to summarize PDFs containing scientific research papers.
Use Case: A researcher needs concise summaries to save time.

5. Code Examples

Step 1: Preprocess PDFs

Use Python to extract text from PDFs. Libraries like PyPDF2 or pdfplumber are commonly used.

import pdfplumber

def extract_text_from_pdf(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        text = ""
        for page in pdf.pages:
            text += page.extract_text()
    return text

# Example usage
pdf_text = extract_text_from_pdf("example_research.pdf")
print(pdf_text[:500])  # Print the first 500 characters

Step 2: Create a Custom Model

Define a modelfile with behavior tailored for summarizing research.

Modelfile (modelfile.txt):

FROM llama
SYSTEM """
You are a research assistant. Summarize the content of research papers in a concise and clear manner. Include key points and findings.
"""

Create the custom model with Ollama CLI:

# Create the custom model
ollama create pdf_reader -f ./modelfile.txt

Step 3: Run the Custom Model

Pass the extracted text from the PDF to the model.

# Run the custom model
ollama run pdf_reader --prompt "Summarize the following: [Insert extracted text here]"

Step 4: Web Scraping for Data

Use Python with libraries like BeautifulSoup to scrape data from web pages.

from bs4 import BeautifulSoup
import requests

def scrape_web_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    return soup.get_text()

# Example usage
web_text = scrape_web_page("https://example.com/research-article")
print(web_text[:500])  # Print the first 500 characters

Step 5: Integrate Web Data into the Model

Run the scraped content through the custom model.

# Run the custom model with web-scraped content
ollama run pdf_reader --prompt "Summarize the following: [Insert scraped text here]"

6. Example Outputs

PDF Summary:

"This research explores the impact of climate change on agriculture. Key findings include a 20% decrease in crop yield due to rising temperatures and droughts. Adaptive measures, such as genetic modification, show potential to mitigate these effects."

Web Scraping Summary:

"The article discusses the latest advancements in AI, focusing on generative models and their applications in healthcare and education."

7. Summary

Concepts Covered: Custom models, PDF parsing, and web scraping.
Key Aspects: Preprocessing, model creation, and data integration.
Implementation: Preprocessing PDFs and web data, creating a model, and running it for summaries.
Real-Life Example: Summarizing research papers and web content.

8. Homework/Practice

Extract text from a PDF of your choice and pass it through a custom Ollama model.
Scrape a webpage and summarize its content using the model.
Experiment with different system prompts to customize model behavior.
Compare the summaries generated by a base model and your custom model.

These lecture notes provide a comprehensive understanding of creating custom models for PDF and web scraping tasks, with practical examples and code samples to enhance learning.

Hour 6 - Overview of Models

Lecture Notes:

1. Concepts

What is a Model in Machine Learning?

A model is a computational representation of a process used to make predictions or generate insights based on input data.
In the context of language models, such as those used in Ollama, a model generates text or embeddings based on input prompts.

Key Components of a Model:

Architecture: Defines the structure of the model (e.g., transformers like GPT, BERT).
Parameters: Determines the model's capacity to learn from data (e.g., number of neurons, layers).
Weights: Encoded knowledge the model learns during training.
System Prompt: A predefined instruction guiding the model’s behavior.
Fine-Tuning: Adjusting the model for a specific task by retraining on a domain-specific dataset.

2. Key Aspects

Types of Models in Ollama:

Pre-trained Models: Models trained on large datasets for general-purpose tasks.
- Example: GPT, Llama, Mistral.
Fine-Tuned Models: Pre-trained models further trained on specific datasets for specialized tasks.
- Example: A chatbot fine-tuned for customer service.
Custom Models: Created by users with specific system prompts, templates, or datasets.

Why Use Models in Ollama?

Enable text generation, summarization, translation, embedding generation, and more.
Provide flexibility through CLI for creating, modifying, and deploying models.

3. Implementation

CLI Commands for Working with Models

Command	Description	Example
`ollama run`	Run a model to generate text based on a prompt.	`ollama run llama --prompt "Hello!"`
`ollama create`	Create a custom model from a base model.	`ollama create mymodel -f ./modelfile`
`ollama pull`	Download a model from Ollama’s repository.	`ollama pull llama`
`ollama show`	Display details of a model.	`ollama show llama`
`ollama ls`	List all available models locally.	`ollama ls`
`ollama rm`	Remove a model.	`ollama rm llama`
`ollama cp`	Copy a model (e.g., for renaming).	`ollama cp llama llama_custom`

4. Real-Life Example

Scenario: Customizing a Model for Technical FAQs

Imagine creating a custom model to answer FAQs for a software company.

Base Model: Llama 2.
Customization: Add a system prompt that aligns the model with the company’s tone and expertise.

5. Code Examples

Step 1: Pull the Base Model

Download a base model to use as the foundation for your customization.

# Pull a model from Ollama's repository
ollama pull llama

Step 2: Create a Custom Model

Define the behavior of your custom model in a modelfile.

Modelfile (modelfile.txt):

FROM llama
SYSTEM """
You are a technical support assistant for Acme Software.
Provide concise and accurate answers to customer queries.
"""

Use the ollama create command to generate the model.

# Create a custom model
ollama create acme_support -f ./modelfile.txt

Step 3: Run the Custom Model

Test your custom model by running it with a prompt.

# Run the custom model
ollama run acme_support --prompt "What are the system requirements for Acme Pro 3.0?"

Step 4: Show Model Details

Inspect the properties of the newly created model.

# Display model details
ollama show acme_support

6. Example Output

Input Prompt:

"What are the system requirements for Acme Pro 3.0?"

Model Response:

"Acme Pro 3.0 requires Windows 10 or later, 8GB RAM, and 20GB of free disk space. For macOS, it supports version 11.0 or newer."

7. Summary

Concepts Covered: Overview of models, their types, and customization in Ollama.
Key Aspects: Pre-trained, fine-tuned, and custom models, along with the CLI commands.
Implementation: Demonstrated creating, running, and inspecting models.
Real-Life Example: Built a custom technical support model.

8. Homework/Practice

Pull a base model and test its default behavior.
Create a custom model with a system prompt of your choice.
Run your custom model to answer specific questions.
Experiment with fine-tuning a model by modifying the modelfile or adding new training data.

This lecture provides a solid foundation in understanding and working with models in Ollama, emphasizing practical usage with CLI commands and real-life applications.

Saturday, 15 March 2025

OpenWebUI Tutorial: Setting Up and Using Local Llama 3.2 with Ollama

Introduction

Prerequisites

Tutorial Duration: 1 Hour

Step 1: Install and Set Up Docker (10 min)

Step 2: Install Ollama (5 min)

Step 3: Download Llama 3.2 Model (10 min)

Step 4: Start Ollama (5 min)

Step 5: Install and Run OpenWebUI (15 min)

Step 6: Access OpenWebUI (5 min)

Step 7: Configure OpenWebUI to Use Ollama (5 min)

Step 8: Test the AI Chatbot (5 min)

Conclusion

Step 1: Install Docker

Step 2: Install Ollama

Step 3: Download Llama3.2 Model

How I Made iMMAi: A Legal AI Assistant

Introduction

Step 1: Install Ollama

Step 2: Install Docker Desktop

Step 3: Download the Llama 3.2 Model

Step 4: Create a Custom Modelfile for iMMAi

Step 5: Create the iMMAi Model

Step 6: Test iMMAi

Conclusion

Tuesday, 21 January 2025

Monday, 20 January 2025

Saturday, 18 January 2025

1. Wrap-Up

Overview of the Course

Key Takeaways

Looking Forward

2. Real-Life Example: Deploying an NLP Chatbot Using Hugging Face

Step 1: Install Hugging Face Transformers

Step 2: Implementing the Chatbot

3. Q&A: Typical Questions and Answers

4. Final Thoughts

Key Concepts to Remember:

Next Steps:

5. Thank You for Attending the Course!

1. Concepts

What is Hugging Face?

Llama Models on Hugging Face

Practical Applications of Llama Models

2. Key Aspects of Practical Applications

3. Implementation of Llama Models from Hugging Face

Prerequisites:

Example 1: Using Pre-trained Llama Model for Text Generation

Example 2: Fine-Tuning a Llama Model for Text Classification (Sentiment Analysis)

Example 3: Question Answering with Llama Model

4. Real-Life Example: Text Generation for Customer Support Chatbot

5. Summary

6. Homework/Practice

1. Concepts

What is Fine-Tuning?

Advanced Fine-Tuning Techniques

1. Learning Rate Schedulers

2. Early Stopping

3. Data Augmentation

4. Gradient Accumulation

5. Model Regularization

6. Knowledge Distillation

2. Key Aspects of Advanced Fine-Tuning

3. Implementation of Advanced Fine-Tuning Techniques

Prerequisites:

Learning Rate Scheduler

Early Stopping

Data Augmentation for Text

Gradient Accumulation

Model Regularization (Dropout)

4. Real-Life Example: Fine-Tuning a Summarization Model

5. Summary

6. Homework/Practice

1. Concepts

What are Model Metrics?

Why are Metrics Important?

Key Types of Metrics for NLP Models:

2. Key Aspects of Metrics & Evaluation

3. Implementation of Evaluation and Metrics