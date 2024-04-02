Running fine-tuned inference with LoRA adapters

Workers AI now supports fine-tuned inference with adapters trained with Low-Rank Adaptation External link icon Open external link . This feature is in open beta and free during this period. Beta

We only support LoRAs for the following models (must not be quantized): @cf/meta-llama/llama-2-7b-chat-hf-lora @cf/mistral/mistral-7b-instruct-v0.2-lora @cf/google/gemma-2b-it-lora @cf/google/gemma-7b-it-lora

Adapter must be trained with rank r <=8 . You can check the rank of a pre-trained LoRA adapter through the adapter’s config.json file

LoRA adapter files must be named adapter_config.json and adapter_model.safetensors exactly

and exactly You can test up to 30 LoRA adapters per account

​​ Choosing compatible LoRA adapters

​​ Finding open-source LoRA adapters

We have started a Hugging Face Collection External link icon Open external link that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.

​​ Uploading LoRA adapters

In order to run inference with LoRAs on Workers AI, you’ll need to create a new fine tune on your account and upload your adapter files. You should have a adapter_model.safetensors file with model weights and adapter_config.json with your config information. Note that we only accept adapter files in these types.

Right now, you can’t edit a fine tune’s asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.

Before you upload your LoRA adapter, you’ll need to edit your adapter_model.config file to include model_type as one of mistral , gemma or llama like below.

adapter_model.config { "alpha_pattern" : { } , "auto_mapping" : null , ... "target_modules" : [ "q_proj" , "v_proj" ] , "task_type" : "CAUSAL_LM" , "model_type" : "mistral" , }

You can create a finetune and upload your LoRA adapter via wrangler with the following commands:

wrangler CLI npx wrangler ai finetune create < model_name > < finetune_name > < folder_path > npx wrangler ai finetune list ┌──────────────────────────────────────┬─────────────────┬─────────────┐ │ finetune_id │ name │ description │ ├──────────────────────────────────────┼─────────────────┼─────────────┤ │ 00000000-0000-0000-0000-000000000000 │ test-lora │ │ └──────────────────────────────────────┴─────────────────┴─────────────┘

​​ REST API

Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with Workers AI: Edit permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.

​​ Creating a fine-tune on your account

cURL curl -X POST https://api.cloudflare.com/client/v4/accounts/ { ACCOUNT_ID } /ai/finetunes/ \ -H "Authorization: Bearer {API_TOKEN}" \ -H 'Content-Type: application/json' \ -d '{ "model": "SUPPORTED_MODEL_NAME", "name": "FINETUNE_NAME", "description": "OPTIONAL_DESCRIPTION" }'

​​ Uploading your adapter weights and config

You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for adapter_model.safetensors and once for adapter_config.json . Make sure you include the @ before your path to files.

You can either use the finetune name or id that you used when you created the fine tune.

cURL curl -X POST https://api.cloudflare.com/client/v4/accounts/ { ACCOUNT_ID } /ai/finetunes/ { FINETUNE_ID } /finetune-assets/ \ -H 'Authorization: Bearer {API_TOKEN}' \ -H 'Content-Type: multipart/form-data' \ -F 'file_name=adapter_model.safetensors' \ -F 'file=@{PATH/TO/adapter_model.safetensors}' \

​​ List fine-tunes in your account

You can call this method to confirm what fine-tunes you have created in your account

cURL

JSON Output cURL curl -X GET https://api.cloudflare.com/client/v4/accounts/ { ACCOUNT_ID } /ai/finetunes/ \ -H 'Authorization: Bearer {API_TOKEN}' \ Example JSON output { "success": true, "result": [ [{ "id": "00000000-0000-0000-0000-000000000", "model": "@cf/meta-llama/llama-2-7b-chat-hf-lora", "name": "llama2-finetune", "description": "test" }, { "id": "00000000-0000-0000-0000-000000000", "model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora", "name": "mistral-finetune", "description": "test" }] ] }

​​ Running inference with LoRAs

To make inference requests and apply the LoRA adapter, you will need your model and finetune name or id . You should use the chat template that your LoRA was trained on, but you can try running it with raw: true and the messages template like below.