RAG Meetup at Pinecone HQEvaluating RAG Applications Workshop with Weights and BiasesRegister
Preview Mode ()

Fine-tuning for GPT-3.5 turbo is finally here! The latest update gives OpenAI users the ability to create their own custom GPT-3.5 model that has been tuned towards a particular dataset.

This feature means we can teach GPT-3.5 the language and terminology of our niche domain (like finance or tech), reply in Italian, or always respond with JSON. Fine-tuning represents one of the many ways that we can take our LLMs to the next level of performance.

In the past, we'd need to spend hours or even days tweaking prompts to get the behavior we needed just to see it work at best 80% of the time. Now, we can gather examples of our ideal conversations and feed that to GPT-3.5 directly, acting as built-in "guidelines" — replacing that frustrating prompt engineering process and in most cases producing much better results.

Fine-tuning GPT-3.5

Video walkthrough for fine-tuning gpt-3.5-turbo

We'll get started by diving right into fine-tuning. For those of you who are interested, we'll discuss the methodology behind building the dataset in an upcoming article.

First, let's take a look at the data format required by the OpenAI fine-tuning endpoints. It's a JSON lines format containing a single key "messages" followed by a list of chat message dictionaries, the full list representing a single conversation.

{"messages": [{"role": "system", "content": "..."}, ...]}
{"messages": [{"role": "system", "content": "..."}, ...]}
...

Each message dictionary contains two keys:

  • The "role" — can be system, user, or assistant. Tells us where the message came from.
  • The "content" — simply the text content of the message.

We have a prebuilt training dataset in this format stored on Hugging Face datasets. To download it we can do:

When submitting this data to the OpenAI API we'll be loading it from file, so we save the dataset as a JSON lines file.

data.to_json("conversations.jsonl")

To upload the data we need the updated openai client, which we install with pip install openai==0.27.9. From there, we upload the file with openai.File.create.

We'll need the file ID generated by OpenAI, we grab it with:

It can take some time for the file to finish processing, if it hasn't complete the next step will return an error (but you can just retry until it works). We now use our training file_id and the openai.FineTuningJob.create function to begin fine-tuning.

Our fine-tuned model will not be available for use until the fine-tuning job is complete. We can see in the response that the job is not complete from the two null fields for "finished_at" and "fine_tuned_model". The "fine_tuned_model" field is where we'll find the model ID that we'll use for calling our fine-tuned model later.

For now, we can check the status of our running job with:

from time import sleep

while True:
    res = openai.FineTuningJob.retrieve(job_id)
    if res["finished_at"] != None:
        break
    else:
        print(".", end="")
        sleep(100)

(Note: OpenAI will also send you an email once fine-tuning is complete)

After completion, we can see the fine-tuned model ID in "fune_tuned_model". We grab that value and use it as our new model identifier, replacing "get-3.5-tubo" in our code.

Using Fine-Tuned Models in LangChain

With our new model ready, let's see how to use it. We fine-tuned GPT-3.5 to be a better conversation agent, specifically focusing on its usage of a "Vector Search Tool". To test the model, we'll need to initialize a conversational agent that has access to this tool.

Conversational agents require multiple components, an LLM, conversational memory, and their tools. Let's initialize the LLM and memory first.

from langchain.chat_models import ChatOpenAI  # !pip install langchain==0.0.274
from langchain.memory import ConversationBufferWindowMemory

llm = ChatOpenAI(
    temperature=0.5,
    model_name=ft_model
)

memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    k=5,
    return_messages=True,
    output_key="output"
)

Note that the llm loaded here is our fine-tuned model. All we do to use it is switch our typical model_name value of "get-3.5-turbo" for our fine-tuned model ID. Next, we need to initialize our tool. The tool will retrieve documents from an external knowledge base (a Pinecone vector DB). Therefore, to run this, we do need to construct the knowledge base.

Building the Tool's Knowledge Base

Building the knowledge base is simple; we first need a free API key and use it to initialize our connection to Pinecone.

import pinecone  # !pip install pinecone-client

pinecone.init(
  	api_key="YOUR_API_KEY",  # app.pinecone.io
  	environment="YOUR_ENV"
)

We create a new index to store the information to be retrieved:

index_name = "llama-2-arxiv-papers"

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        metric="cosine",
        dimension=1536
    )
    
index = pinecone.Index(index_name)

Now we encode and insert the data from our dataset into our index:

data = dataset.to_pandas()

batch_size = 32

for i in range(0, len(data), batch_size):
    i_end = min(len(data), i+batch_size)
    batch = data.iloc[i:i_end]
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    texts = [x['chunk'] for i, x in batch.iterrows()]
    embeds = embed.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

With that done, we can create our tool and initialize the agent.

Vector Search Tool and Conversational Agent

The code needed by our Vector Search Tool is stored in a separate chains.py file. We import it into our code and initialize the tool with it like so:

Now we initialize the agent!

from langchain.agents import AgentType, initialize_agent

agent = initialize_agent(
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    tools=[vdb_tool],
    llm=llm,
    verbose=True,
    max_iterations=3,
    early_stopping_method="generate",
    memory=memory,
    return_intermediate_steps=True
)

With that, we're ready to begin talking to our new agent.

We can see the agent successfully using the vector search tool, formatting both JSON blocks (tool and final answer) correctly. To continue the conversation, we simply make more calls to the agent.


With that, we have our own fine-tuned GPT-3.5-Turbo model. Accessible as easily as we would access our standard gpt-3.5-turbo model. Stay tuned for further updates to this article, including our walkthrough for dataset building with GPT 4.

Share: