PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
loicgrobol
zeldarose

Train transformer-based models.

1K 28 3
vincentzed
decontaminate

`decon`, but with python API binding.

803 2 0
alea-institute
alea-preprocess

Accessible, efficient data preprocessing library for pretrain and SFT datasets, including KL3M

766 1 0
4thel00z
ccdown

A rust based, resumable downloader cli and python library for Common Crawl data

599 0 0
lpalbou
forgellm

A comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM

530 4 0
MatthewK78
rose-opt

🌹 Rose: Range-Of-Slice Equilibration PyTorch optimizer. Stateless optimization through range-normalized gradient updates.

519 58 4
PaddlePaddle
fleet-x

飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

458 479 165
a-r-j
proteinworkshop

Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/task utilities. (ICLR 2024)

376 271 22
dean0x
autoevolve

Multi-agent research competition orchestrator for autoresearch

275 5 1
kyo-takano
chinchilla

A toolkit for scaling law research ⚖

254 63 4
dean0x
autojudge

Smarter experiment evaluation for autoresearch — replaces eyeballing val_bpb with statistical verdicts

200 5 1
dean0x
autosteer

Research direction generator for autoresearch — analyzes experiment history and suggests next steps

195 5 1
anto18671
lumenspark

Lumenspark: A Transformer Model Optimized for Text Generation and Classification with Low Compute and Memory Requirements.

195 1 0
AI-sandbox
iltm

iLTM: Integrated Large Tabular Model

188 19 0
PaddlePaddle
fleet-lightning

飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

183 479 165
open-sciencelab
graphg

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

144 1K 82
adrien-lagesse
ngab

Benchmarking and generating PE for GNNs via the Graph Alignment task. Code for our paper: Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings

104 0 0
marian-nmt
sotastream

A library for data streaming and augmentation

103 21 4
PaddlePaddle
paddle-fleet

Distributed Training Package Based on PaddlePaddle

67 479 165
    • Data from PyPI, GitHub, ClickHouse, and BigQuery