yaml文件/mnt/workspace/LLaMA-Factory/vllm_api.yaml

model_name_or_path: /mnt/workspace/LLaMA-Factory/models/Qwen2-VL-2B-Instruct
adapter_name_or_path: /mnt/workspace/LLaMA-Factory/saves/Qwen2-VL-2B-Instruct/lora/train_2025-01-19-16-13-00
template: qwen2_vl
finetuning_type: lora
infer_backend: vllm
vllm_enforce_eager: true

# llamafactory-cli chat lora_vllm.yaml
# llamafactory-cli webchat lora_vllm.yaml
# API_PORT=8000 llamafactory-cli api lora_vllm.yaml

终端运行

API_PORT=8000 llamafactory-cli api vllm_api.yaml

下载环境

重点:vllm 版本小于等于0.6.5

还有其他环境下载可以看报错一个个地下载

运行实例

本地下载

import requests
import base64

# 请求 URL
url = "http://localhost:8000/v1/chat/completions"

# 请求头
headers = {
    "Content-Type": "application/json"
}

# 请求体(包含模型、消息和图片数据)
data = {
    "model": "Qwen/Qwen2-VL-2B-Instruct",
    "messages": [{
        'role': 'user',
        'content': [{
            'type': 'text',
            'text': '请描述这张图片',
        }, {
            'type': 'image_url',
            "image_url": {
            "url": f"/mnt/workspace/LLaMA-Factory/data/mllm_demo_data/page1.jpg"
             }#使用本地图片的形式
        }],
    }]
}


# 发送 POST 请求
response = requests.post(url, headers=headers, json=data)

# 输出响应结果
if response.status_code == 200:
    print("请求成功!")
    print(response.json())
else:
    print(f"请求失败,状态码: {response.status_code}")
    print(response.text)

网络下载

import requests
import requests
 
url = "http://localhost:8000/v1/chat/completions"
 
# 请求头
headers = {
    "Content-Type": "application/json"
}
 
# 请求体(包含模型、消息和图片数据)
data = {
    "model": "Qwen/Qwen2-VL-2B-Instruct",
    "messages": [{
        'role':
        'user',
        'content': [{
            'type': 'text',
            'text': '中文描述一下这张图片',
        }, {
            'type': 'image_url',
            'image_url': {
                'url':
                'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg',
            },
        }],
    }]
}
 
# 发送 POST 请求
response = requests.post(url, headers=headers, json=data)
 
# 输出响应结果
if response.status_code == 200:
    print("请求成功!")
    print(response.json())
else:
    print(f"请求失败,状态码: {response.status_code}")
    print(response.text)

 

 

Logo

加入社区!打开量化的大门,首批课程上线啦!

更多推荐