Microsoft Azure Unveils AI Foundry, Revolutionizing Customization of OpenAI Large Language Models

Jordan Vega

Jordan Vega

January 02, 2025 · 6 min read
Microsoft Azure Unveils AI Foundry, Revolutionizing Customization of OpenAI Large Language Models

Microsoft Azure has announced the launch of AI Foundry, a groundbreaking toolkit designed to simplify the customization of OpenAI large language models (LLMs) for specific applications. This innovative solution empowers developers to fine-tune these powerful models, enhancing their performance and accuracy while reducing costs and latency.

Large language models, such as those used by Azure, are general-purpose tools for building various generative AI-powered applications, including chatbots and agent-powered workflows. However, their effectiveness relies heavily on prompt engineering, which involves crafting the prompts used to structure responses. While prompt engineering is essential, it has limitations, including the need to deliver the same prompt every time, along with user requests and associated data. This can lead to significant issues, particularly when pushing the maximum size of a model's context window, resulting in increased costs and latency.

Azure AI Foundry addresses these limitations by providing a framework for fine-tuning complex models using Low-rank Adaptation (LoRA). This technique adjusts model parameters, enabling higher-quality results with fewer tokens and reduced risk of prompt overruns and incorrect results. By focusing the underlying model and tuning it to work with specific data, developers can achieve better outcomes with fewer resources.

To get started with Azure OpenAI fine-tuning, users need a supported Azure OpenAI model in a region that allows fine-tuning, along with an account with the Cognitive Services OpenAI Contributor role to upload training data. Many Azure OpenAI models support fine-tuning, including GPT 3.5 and GPT 4o. Region support is limited, with North Central US and Sweden Central being the most suitable options.

The fine-tuning process involves sourcing and preparing training and validation data, then using the Create custom model wizard in Azure AI Foundry. The wizard guides users through the basic steps of fine-tuning, uploading data, and setting task parameters before running a training session. Formatting training data is a crucial step, with different models requiring specific types of training data. For GPT 3.5 or 4o models, data needs to be JSONL formatted for the Azure OpenAI Chat Completions API.

Microsoft recommends having thousands of examples to effectively tune data, making tools like OpenAI's preparation script essential for generating fine-tuning training sets from large data sets. The quality of training data is critical, as poor-quality data can negatively train the model, reduce accuracy, and increase errors. Building a high-quality, clean data set requires both data science and subject matter expertise to construct the necessary prompts and expected answers.

Once the prerequisites are in place, users can employ the Create custom model tool in Azure AI Foundry to initiate the fine-tuning process. This involves selecting the training data, uploading validation data (if available), and adding tuning parameters. The service offers the option of uploading validation data, which is formatted similarly to the JSONL training data. Validation data can be useful but is not necessary, and users can skip this stage if they haven't created a suitable data set.

The fine-tuning process is a batch operation that requires significant resources, and users may need to wait for their job to be queued. Once accepted, a run can take several hours, especially when working with large, complex models and large training data sets. Azure AI Foundry's tools allow users to monitor the status of a fine-tuning job, displaying results, events, and hyperparameters used.

Each pass through the training data produces a checkpoint, which is a usable version of the model with the current state of tuning. Users can evaluate these checkpoints with their code before the fine-tuning job completes, and they will always have access to the last three outputs to compare different versions before deploying their final choice.

Microsoft's AI safety rules apply to fine-tuned models, ensuring they are not made public until explicitly chosen to be published, with test and evaluation in private workspaces. Training data remains private and is not stored alongside the model, reducing the risk of confidential data leaking through prompt attacks. Microsoft scans training data before use to ensure it doesn't contain harmful content and will abort a job if unacceptable content is found.

Once a model has been tuned and tested, users can find it in their Azure AI Foundry portal, ready for deployment as a standard AI endpoint using familiar Azure AI APIs and SDKs. A tuned model can only be deployed once, and if it's not used for 15 days, it will be removed and will need to be redeployed. Deployed models can run in any region that supports fine-tuning, not just the one used to train the model.

Microsoft supports continuous fine-tuning, treating existing tunings as a base model and running the same process using new training data. This enables more open-ended use cases, such as the preview of Direct Preference Optimization (DPO), which uses human preferences to manage tuning, with training data provided as sample conversations with "preferred" and "non-preferred" outputs.

The costs for fine-tuning a model vary depending on the region and model. For example, tuning a GPT-4o model in North Central US costs $27.50 for 1 million training tokens and $1.70 per hour to host the model once training is complete. Inferencing is priced at $2.75 per million input tokens and $11 per million output tokens. While the costs may seem significant, they can be justified by the improved accuracy and reduced risk of errors and reputational damage.

In conclusion, Azure AI Foundry's fine-tuning capabilities revolutionize the customization of OpenAI large language models, enabling developers to focus these powerful models on specific responses and applications. By reducing costs, latency, and the risk of incorrect results, this innovative solution has the potential to transform the development of generative AI-powered applications.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.