I want to experiment with forced aligner and create visualizations for songs and poems, potentially stringing together images for each word matched from the song/poem and creating a music video. Also could use the data gathered from forced aligner to reverse match and create videos of certain people speaking what the user inputs.