PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Llm Inference Python Packages

Python packages with the GitHub topic llm-inference. Sorted by relevance, with stars and monthly downloads.
ray-project
ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

52.9M 42K 8K
flashinfer-ai
flashinfer-python

FlashInfer: Kernel Library for LLM Serving

4.1M 6K 948
flashinfer-ai
flashinfer-cubin

FlashInfer: Kernel Library for LLM Serving

2.7M 6K 948
openvinotoolkit
openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

197K 9K 959
openvinotoolkit
openvino-dev

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

170K 10K 3K
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

115K 5K 1K
nomic-ai
gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

77K 77K 8K
ray-project
ant-ray-cpp-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

48K 42K 8K
ray-project
ray-cpp

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

40K 42K 8K
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

39K 386 42
feifeibear
yunchang

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

38K 666 79
MekayelAnik
vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

30K 6 0
bentoml
openllm

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

17K 12K 807
monocle2ai
monocle-apptrace

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.

17K 94 32
character-ai
prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

15K 1K 95
lightning-AI
litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

14K 13K 1K
codelion
optillm

Optimizing inference proxy for LLMs

12K 3K 266
ray-project
ant-ray-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

12K 42K 8K
predibase
lorax-client

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

8K 4K 312
neuralmagic
deepsparse

Sparsity-aware deep learning inference runtime for CPUs

6K 3K 192
vroomfondel
dgxarley

Ansible playbooks for a 3-node K3s cluster with NVIDIA DGX Spark nodes for distributed LLM inference

6K 1 0
intel
intel-extension-for-transformers

Repository of Intel® Intel Extension for Transformers

6K 2K 217
stratusadv
dandy

Dandy is an intelligence framework for developing programmatic solutions using artificial intelligence.

6K 4 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery