PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Extract Python Packages

Python packages with the GitHub topic extract. Sorted by relevance, with stars and monthly downloads.
dlt-hub
dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

7.1M 5K 498
tavily-ai
tavily-python

The Tavily Python SDK allows for easy interaction with the Tavily API, offering the full range of our search, extract, crawl, map, and research functionalities directly from your Python programs. Easily integrate smart search, content extraction, and research capabilities into your applications, harnessing Tavily's powerful features.

4.9M 1K 152
lipoja
urlextract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

815K 277 64
nexB
extractcode

A mostly universal file extraction library and CLI tool to extract almost any archive in a reasonably safe way on Linux, macOS and Windows.

76K 38 23
OmkarPathak
pyresparser

A simple resume parser used for extracting information from resumes

7K 957 448
MicheleCotrufo
pdf2doi

A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.

5K 135 28
Breaka84
spooq

Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.

4K 10 2
fedecalendino
pysub-parser

Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).

4K 53 5
dlt-hub
dlt-core

dlt is an open-source python-first scalable data loading library that does not require any backend to run.

3K 5K 498
jlu5
icoextract

Extract icons from Windows PE files (.exe/.dll)

2K 150 9
hrushikeshrv
docxlatex

A python library for extracting equations, text, and images from .docx files

2K 20 3
JoshuaMKW
pyisotools

python library for working with Gamecube ISOs (GCM)

1K 45 9
Mellow-Artificial-Intelligence
openextract

Extract structured data from documents, images, audio, and video using LLMs.

1K 16 2
Junbo-Zheng
miwear

Python Miwear tools for extracting and handling archives/logs

1K 5 1
0xMassi
webclaw

Python SDK for the Webclaw web extraction API

1K 1 0
camelot-dev
excalibur-py

A web interface to extract tabular data from PDFs

1K 2K 237
MicheleCotrufo
pdf2bib

A python library/command-line tool to quickly and automatically generate BibTeX data starting from the pdf file of a scientific publication.

989 89 11
myifeng
article-parser

Extract article or news by url or html, parse the title and content, output in markdown format.

823 50 6
dopstar
ftransc

The Audio Converter

756 17 1
xiaohuohumax
auto-unpack

压缩包自动解压工具,支持多种压缩包格式。通过组合各种插件,编排流程,则可满足日常解压需求。

750 21 4
vishaltanwar96
aadhaar-py

Extract embedded information from Aadhaar Secure QR Code.

582 15 1
SermetPekin
pdfsp

pdfsp is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.

519 1 0
jlw4049
automaticdemuxer

Automatically Demux tracks from media-files

482 2 0
voidful
wikiext

Extract Knowledge from wiki dump file

449 6 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery