Model Serving Python Packages

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.4M 79K 16K

truss

The simplest way to serve AI/ML models in production

632K 1K 102

vllm-omni

A framework for efficient model inference with omni-modality models

477K 5K 867

truss-transfer

The simplest way to serve AI/ML models in production

300K 1K 102

bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

198K 9K 959

baseten-performance-client

The simplest way to serve AI/ML models in production

182K 1K 102

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

143K 79K 16K

kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

114K 5K 1K

mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

53K 2K 303

envd

🏕️ Reproducible development environment for humans and agents

39K 2K 167

mlrun-pipelines-kfp-common

36K 2K 303

mlrun-pipelines-kfp-v1-8

36K 2K 303

mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

18K 899 72

clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

11K 164 50

nbox

The official python package for NimbleBox. Exposes all APIs as CLIs and contains modules to make ML 🌸

9K 87 13

ovmsclient

A scalable inference server for models optimized with OpenVINO™

9K 870 251

lorax-client

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

8K 4K 312

vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K

google-jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

4K 432 64

fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

4K 103 17

hsml

Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving

3K 8 20

fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

3K 4K 766

chitra

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

3K 234 37

kfserving

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

3K 5K 1K

Search Packages