Hello to all. I'm organizing an international online music competition and since we are living on an age where AI is flourishing, I'm using it to promote the competition in different languages.
Now the best part, as you guesses I dont speak those languages. I can speak portuguese and English, and understand spanish. In order to create those videos I recorded a 2 minutes video speaking nonsense in portuguese. Then to clone my voice I had to read a 4 minutes text in portuguese. The lip sync video with my clonned vocie was processed in . While Heygen has a connection to chatgpt for translation, I experienced better results elsewhere. So the texts were translated automatically in Deepl. Next I double checked them with natives and added a audio recording from Heygen to receive the feedback if it was sounding natural. For most of the languages it was ok, some languages requires minor corrections. The automatic lipsync from text and generated audio didnt worked as expected in portuguese, as in the current version the language variant is detected by Heygen and for now the user doesn't have control. So the texted started in European Portuguese, but gradually changed but to the Brazilian variant. So I did a lip sync from an audio recorded with my voic, however, probably some letters were not recognized correctly and the lip sync isnt perfect in the Portuguese version. Something similar happened with Catalan that was mixing with the french accent. The Norwegian version was sounding as Danish. Even these less than expected results, globally the results are very professional and were made by a fraction of the cost it would take and generated very quickly. (3-4 minutes in order to generate 1 minute of video)
Then I exported the subtitles from heygen, in SRT format, then I imported them in https://editingtools.io/subtitles/ and converted them into Final Cut pro X titles, because there is more control over size and font types.
Some of the languages take more seconds than other, sometimes 10 seconds more on a 1 minute video. This required manual adjustment of the animations, and the subtitles had to by synced manually. However because there are spaces in the end of each sentence, it was quite easy to follow the speech by following the breaks in the wave form.