如何用小红书fireredasr模型进行推理和批量化音频识别

开源代码：https://fireredteam.github.io/demos/firered_asr/ 比较接近seed-asr了。

m0_51602793

341人浏览 · 2025-09-19 19:58:23

m0_51602793 · 2025-09-19 19:58:23 发布

开源代码：https://fireredteam.github.io/demos/firered_asr/ 比较接近seed-asr了

代码注释：wenet社区也有在支持这个模型。目前现在开源只做单一的asr任务，没有带ast，也无标点、itn的功能。1.1B参数的模型，要是像sensevoice一样支持语音多任务（语音识别和语音翻译，还带标点）就更好了

环境创建和启动：

cd  ./firered-asr/FireRedASR
conda create --name fireredasr python=3.10
conda activate fireredasr
pip install -r requirements.txt
nvidia-smi



# 手动激活 Conda（假设安装路径为 /root/miniconda3）
source /root/miniconda3/bin/activate
# 验证 Conda 是否可用
conda --version  # 应输出版本号（如 conda 23.11.0）
conda activate fireredasr

cd  ./firered-asr/FireRedASR
cd examples

单卡推理脚本：

run.sh（aed模型）:

# # 指定GPU和禁用WER
# ./inference_fireredasr_aed-jlll_phone.sh --input /path/to/your_input.scp --cuda 1 --no_wer
# # 基本用法（使用默认CUDA 0设备）
# ./run.sh --input /path/to/your_input.scp
# # 带WER计算的用法
# ./run.sh --input /path/to/your_input.scp --cuda 0

#!/bin/bash

# 用法示例: 
# ./run.sh --input /path/to/input.scp --cuda 0 --no_wer

# 默认参数设置
cuda_device=0
input_file=""
calculate_wer=1

# 解析命令行参数
while [[ $# -gt 0 ]]; do
    case "$1" in
        --input) 
            input_file="$2"
            shift 2
            ;;
        --cuda) 
            cuda_device="$2"
            shift 2
            ;;
        --no_wer) 
            calculate_wer=0
            shift
            ;;
        *) 
            echo "未知参数: $1"
            exit 1
            ;;
    esac
done

# 检查必要参数
if [[ -z "$input_file" ]]; then
    echo "必须指定输入文件：--input"
    exit 1
fi

# 自动生成输出路径（输入文件路径替换后缀为.aed）
output_file="${input_file%.*}.aed"

# 环境设置
export PATH="$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH"
export PYTHONPATH="$PWD/:$PYTHONPATH"

# 模型路径配置
model_dir="$PWD/pretrained_models/FireRedASR-AED-L"

# 输入配置
wavs="--wav_scp $input_file"

# 解码参数
decode_args="
--batch_size 8 --beam_size 3 --nbest 1
--decode_max_len 0 --softmax_smoothing 1.25 --aed_length_penalty 0.6
--eos_penalty 1.0
"

# 创建输出目录
mkdir -p "$(dirname "$output_file")"

# 执行语音识别
echo "启动语音识别 (CUDA:$cuda_device)..."
CUDA_VISIBLE_DEVICES=$cuda_device \
speech2text.py --asr_type "aed" \
--model_dir "$model_dir" \
$decode_args \
$wavs \
--output "$output_file"

# 条件执行WER计算
if [[ $calculate_wer -eq 1 ]]; then
    echo "计算WER..."
    ref="${input_file%.*}.text"  # 假设参考文本与输入同目录
    wer.py --print_sentence_wer 1 --do_tn 1 --rm_special 1 \
    --ref "$ref" \
    --hyp "$output_file" > "${output_file}.wer" 2>&1
    tail -n8 "${output_file}.wer"
else
    echo "已跳过WER计算"
fi

单机多卡一次执行：

#!/bin/bash

file_path="/split_data_55/"
file_name="wav.scp.ok"
job_id=$1
hostgpus=$(nvidia-smi -L|grep GPU|wc -l)
echo "hostgpus:$hostgpus"

for i in `seq 0 $((hostgpus-1))`;do
    cur_input="$file_path${file_name}.$((job_id*4+i)).scp"
    echo "GPU$i with $cur_input"
    ./run.sh --input $cur_input --cuda $i --no_wer &
    #> gpu_${i}_wav.$((job_id+i)).scp.log 2 >&1 &
done

预训练模型目录：

/pretrained_models# tree ./
./
├── FireRedASR-AED-L
│   ├── README.md
│   ├── cmvn.ark
│   ├── cmvn.txt
│   ├── config.yaml
│   ├── dict.txt
│   ├── gitattributes
│   ├── model.pth.tar
│   └── train_bpe1000.model
├── FireRedASR-LLM-L
│   ├── Qwen2-7B-Instruct -> ../Qwen2-7B-Instruct
│   ├── README.md
│   ├── asr_encoder.pth.tar
│   ├── cmvn.ark
│   ├── cmvn.txt
│   ├── config.yaml
│   ├── gitattributes
│   └── model.pth.tar
├── Qwen2-7B-Instruct
│   ├── LICENSE
│   ├── README.md
│   ├── config.json
│   ├── generation_config.json
│   ├── gitattributes
│   ├── merges.txt
│   ├── model-00001-of-00004.safetensors
│   ├── model-00002-of-00004.safetensors
│   ├── model-00003-of-00004.safetensors
│   ├── model-00004-of-00004.safetensors
│   ├── model.safetensors.index.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── README.md
└── bak

5 directories, 30 files

基础目录和拉取配置环境中遇到的问题：

用法

从huggingface下载模型文件并将其放在文件夹中pretrained_models。

如果要使用FireRedASR-LLM-L，您还需要下载Qwen2-7B-Instruct并将其放在文件夹中pretrained_models。然后，转到文件夹FireRedASR-LLM-L并运行$ ln -s ../Qwen2-7B-Instruct

设置

创建 Python 环境并安装依赖项

git clone https://github.com/FireRedTeam/FireRedASR.git

conda create --name fireredasr python=3.10
conda activate fireredasr
pip install -r requirements.txt

设置 Linux PATH 和 PYTHONPATH

$ export PATH=$PWD/fireredasr/:$PWD/fireredasr/utils/:$PATH
$ export PYTHONPATH=$PWD/:$PYTHONPATH

将音频转换为 16kHz 16 位 PCM 格式

ffmpeg -i input_audio -ar 16000 -ac 1 -acodec pcm_s16le -f wav output.wav

快速入门

$ cd examples
$ bash inference_fireredasr_aed.sh
$ bash inference_fireredasr_llm.sh

命令行用法

$ speech2text.py --help
$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "aed" --model_dir pretrained_models/FireRedASR-AED-L
$ speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L

Python 用法

from fireredasr.models.fireredasr import FireRedAsr

batch_uttid = ["BAC009S0764W0121"]
batch_wav_path = ["examples/wav/BAC009S0764W0121.wav"]

# FireRedASR-AED
model = FireRedAsr.from_pretrained("aed", "pretrained_models/FireRedASR-AED-L")
results = model.transcribe(
    batch_uttid,
    batch_wav_path,
    {
        "use_gpu": 1,
        "beam_size": 3,
        "nbest": 1,
        "decode_max_len": 0,
        "softmax_smoothing": 1.25,
        "aed_length_penalty": 0.6,
        "eos_penalty": 1.0
    }
)
print(results)


# FireRedASR-LLM
model = FireRedAsr.from_pretrained("llm", "pretrained_models/FireRedASR-LLM-L")
results = model.transcribe(
    batch_uttid,
    batch_wav_path,
    {
        "use_gpu": 1,
        "beam_size": 3,
        "decode_max_len": 0,
        "decode_min_len": 0,
        "repetition_penalty": 3.0,
        "llm_length_penalty": 1.0,
        "temperature": 1.0
    }
)
print(results)

问题一：

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

解决：

尝试重新下载模型文件，确保下载过程中没有网络问题导致文件损坏。

问题二：

WeightsUnpickler error: Unsupported global: GLOBAL argparse.Namespace was not an allowed global by default. Please use `torch.serialization.add_safe_globals([Namespace])` or the `torch.serialization.safe_globals([Namespace])` context manager to allowlist this global if you trust this class/function.

解决：

允许 argparse.Namespace 全局对象
若你想继续使用 weights_only=True（默认值），可以把 argparse.Namespace 添加到允许列表中。
在 fireredasr.py 文件里，在调用 torch.load 之前添加以下代码：
python

加入社区！打开量化的大门，首批课程上线啦！

更多推荐

终极指南：如何避免RateLimitExceededException的Laravel API安全防护

在构建RESTful API时，避免RateLimitExceededException是每个开发者必须掌握的关键技能。Dingo API作为Laravel和Lumen框架的强大RESTful API包，提供了完整的速率限制机制来保护你的应用程序免受恶意请求和过量访问。本文将为你详细介绍如何配置和使用这些安全功能。## 🛡️ 什么是速率限制及其重要性速率限制是API安全的第一道防线，它能

量化交易与投资社区

终极指南：5步复现Spring Boot安全风险CVE-2016-1000027

Spring Boot安全风险CVE-2016-1000027是一个严重的Java反序列化问题，允许攻击者通过恶意序列化数据执行任意代码。本文将详细解析这一问题的原理、复现方法和修复方案。## 🔍 问题背景与原理剖析CVE-2016-1000027问题源于Spring Framework中的`HttpInvokerServiceExporter`和`RemoteInvocationSer