Qwen3 Python Packages

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

8.9M 79K 16K

ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

176K 14K 1K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

145K 79K 16K

hud-python

OSS RL environment + evals toolkit

89K 248 57

qwen3-embed

Lightweight ONNX inference for Qwen3 embedding and reranking models

12K 2 0

nemo-automodel

🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

4K 479 140

steadytext

Deterministic text generation and embeddings with zero configuration

1K 43 2

taiwan-asr-toolkit

Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.

759 1 0

obsidian-umbra

Turn any Obsidian vault into a Zettelkasten graph — locally, with a dozen years of notes in minutes. 4-phase pipeline: daily splitter (Qwen3-4B) → semantic backlinks (Potion-32M) → keyword linker → synonym clustering (GTE-large + HDBSCAN). Zero cloud.

692 3 0

deepsearcher

None

680 8K 752

vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

485 79K 16K

vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

484 79K 16K

vllm-hust

A high-throughput and memory-efficient inference and serving engine for LLMs

480 79K 16K

ggufloader

GGUF Loader with its Agentic Mode, and floating button, ai Models | Open Source & Offline. Mistral, Deepseek, llama, gemma, qwen

462 42 11

german-ocr

High-performance German document OCR - Local & Cloud with GPU/CPU support

439 94 6

ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

437 79K 16K

wxy-test

A high-throughput and memory-efficient inference and serving engine for LLMs

394 2K 1K

kakeyalattice

Discrete Kakeya cover for LLM KV cache: D4/E8 nested-lattice quantisation realising a Kakeya-style tube-cover over the direction sphere. 2.4x-2.8x compression at <1% perplexity loss on Qwen3, Llama-3, DeepSeek, GLM-4, Gemma. Drop-in transformers.DynamicCache. pip install kakeyalattice.

386 7 2

nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

378 79K 16K

vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

306 79K 16K

vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

281 79K 16K

vllm-musa

A high-throughput and memory-efficient inference and serving engine for LLMs

279 79K 16K

metascreener

Open-source multi-LLM ensemble tool for systematic review workflows

248 1K 48

vllm-emissary

A high-throughput and memory-efficient inference and serving engine for LLMs

188 79K 16K