Inference Engine Python Packages

astroid

A common base representation of python source code for pylint and other projects

50.6M 575 323

qai-hub-models

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

41K 1K 175

onediff

an out-of-the-box acceleration library for diffusion models

30K 2K 129

experta

Expert Systems for Python

12K 188 46

nocturnusai

Verified knowledge for AI agents. Compress context, extract and store facts, define rules, and ask questions — get deterministic answers with proof, not LLM guesses. Connect agents via MCP, Python SDK, TypeSc

8K 2 0

aphrodite-engine

Large-scale LLM inference engine

8K 2K 194

qai-hub-models-cli

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

7K 1K 175

onediffx

OneDiff: An out-of-the-box acceleration library for diffusion models.

7K 2K 129

krasis

Krasis is no longer distributed via PyPI. Install from GitHub: https://github.com/brontoguana/krasis

5K 447 22

friendli-client

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

5K 50 7

nobodywho

NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.

4K 861 56

neurobrix

Universal Deep Learning Inference Engine — execute any AI model without model-specific code

3K 8 1

fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

3K 4K 766

m5-infer

Extraordinary speed, extraordinary quality — an MLX-based inference engine for Apple Silicon.

2K 0 1

exxa

Exa - Pytorch

2K 26 4

zllm-zse

The inference engine the open-source world built for itself.

1K 151 2

para-attn

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

1K 426 45

kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 902 107

opencv-python-inference-engine

Wrapper package for OpenCV with Inference Engine python bindings.

734 34 6

yolomosaic

A Python library for visualizing YOLO detections and segmented instances on large orthomosaic images, with the ability to generate shapefiles for GIS integration

721 0 0