PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
ray-project
ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

52.7M 42K 8K
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.4M 79K 16K
skypilot-org
skypilot

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

1.8M 10K 1K
skypilot-org
skypilot-nightly

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

482K 10K 1K
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

198K 9K 959
vllm-project
vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

143K 79K 16K
ray-project
ant-ray-cpp-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

49K 42K 8K
ray-project
ray-cpp

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

39K 42K 8K
mosecorg
mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

18K 899 72
bentoml
openllm

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

18K 12K 807
skypilot-org
trainy-skypilot-nightly

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

17K 10K 1K
NVIDIA
tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K
ray-project
ant-ray-nightly

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

12K 42K 8K
predibase
lorax-client

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

8K 4K 312
vllm-project
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K
ray-project
ant-ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

6K 42K 8K
bentoml
openllm-core

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

5K 12K 807
friendliai
friendli-client

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

5K 50 7
bentoml
openllm-client

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

4K 12K 807
superduper-io
superduper-openai

Superduper allows users to work with openai API models.

3K 5K 538
PaddlePaddle
fastdeploy-python

Deploy Kit Tool For Deeplearning models.

2K 4K 744
superduper-io
superduper-framework

Superduper: End-to-end framework for building custom AI applications and agents.

2K 5K 538
bentoml
bentoml-unsloth

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

2K 9K 959
unaidedelf8777
faster-outlines

Faster, lazy backend for the `Outlines` library

1K 5 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery