Pretraining Python Packages

zeldarose

Train transformer-based models.

1K 28 3

decontaminate

`decon`, but with python API binding.

803 2 0

alea-preprocess

Accessible, efficient data preprocessing library for pretrain and SFT datasets, including KL3M

766 1 0

ccdown

A rust based, resumable downloader cli and python library for Common Crawl data

599 0 0

forgellm

A comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM

530 4 0

rose-opt

🌹 Rose: Range-Of-Slice Equilibration PyTorch optimizer. Stateless optimization through range-normalized gradient updates.

519 58 4

fleet-x

飞桨大模型开发套件，提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

458 479 165

proteinworkshop

Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/task utilities. (ICLR 2024)

376 271 22

autoevolve

Multi-agent research competition orchestrator for autoresearch

275 5 1

chinchilla

A toolkit for scaling law research ⚖

254 63 4

autojudge

Smarter experiment evaluation for autoresearch — replaces eyeballing val_bpb with statistical verdicts

200 5 1

autosteer

Research direction generator for autoresearch — analyzes experiment history and suggests next steps

195 5 1

lumenspark

Lumenspark: A Transformer Model Optimized for Text Generation and Classification with Low Compute and Memory Requirements.

195 1 0

iltm

iLTM: Integrated Large Tabular Model

188 19 0

fleet-lightning

飞桨大模型开发套件，提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。

183 479 165

graphg

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

144 1K 82

ngab

Benchmarking and generating PE for GNNs via the Graph Alignment task. Code for our paper: Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings

104 0 0

sotastream

A library for data streaming and augmentation

103 21 4

paddle-fleet

Distributed Training Package Based on PaddlePaddle

67 479 165

Search Packages