Automated Transcription Pipeline

Project Specs

Video short includes a demo of the pipeline. For more info, including repository and download instructions, see https://github.com/catatwork217

Speech-to-text automated transcription pipeline and end-to-end data management

OpenAI whisper CLI

- Early iterations integrated Shell scripts and OpenAI Whisper CLI in Bash terminal to enable logging, speech-to-text transcription on-demand, resulting in a text, caption-ready and JSON file.
Pipeline enhancement in python

- Enhanced pipeline to manage end-to-end transcription and record creation in Python, generating transcription record in SQLite with word count, keyword metadata, and logging.
Data management in the pipeline

- Added data management capabilities like duplicate validation, exception tracking and record adjustment.
Advanced data management in postgreSQL & pgadmin

- Integrated PostgreSQL dB end-to-end transcription data management into Python script, enabling server config, table records word search and analysis.