Extract phoneme-level timestamps from speeh audio.
A python library that generates speech data with transcriptions by collecting data from YouTube.