10/14/2024, duration: 15:58
In this episode of Debugging Dan, I discuss the challenges I faced last week, including technical difficulties with my camera, which prevented me from recording visuals for this podcast. I share my progress on automating my social media outreach to make it easier to post content across various platforms. I’ve been working on a Next.js app that integrates with Telegram for storing raw video files and utilizes Python with MoviePy for video manipulation and captioning. My goal is to streamline the process, allowing me to easily share video snippets by automating tasks like trimming, adding subtitles, and different output formats for social media. I emphasize the importance of creating an efficient workflow to maintain engagement with my audience while juggling my full-time job and side projects.
In this week's episode, camera isn't working, unfortunately, but I dive into what I've been doing last week and how I am automating my socials. Welcome to Debugging Dan, where I share weekly my journey, balancing life, a full-time job and side project. I'm Dan, your host. Let's dive in. Episode 19 of Debugging Dan.
And for people that are watching via YouTube or other video means, this week my camera failed, so I'm not able to record my facecam, so you'll only hear my voice and see what it display on the screen. For people listening to podcasts, you're not missing anything. Audio is good enough.
So, the episode of last week, where I coined a side-life balance, I mentioned that I was a little bit stuck or not doing that well, not making progress and not feeling good about it. So, after the podcast, and what I also mentioned during the episode is, I'm gonna let go for a while, so I didn't do the get-it-done episode, and I figured, as I mentioned, I'm gonna focus on socials and getting that up to speed, getting that quality good, similar to what I've been wanting to do with the podcast, so I'm recording with an improved microphone.
This week I bought a light source, so I can light myself better while I'm sitting here. Unfortunately, this week there's no camera, so I can't really see the results. And that's what I've been doing the past week. So, I've been building in Python even, because of a dependency that was only available in Python that I needed, and I've started doing some work in Next.js, which is really taking some time to get used to. It works different than I expected, which causes me to read, or forces me to read a lot of documentation, which is not bad necessarily, but it's less intuitive than I would have hoped.
So, that's what I've been doing the past week, I've been focusing on that, and in this episode I'd like to take you with me in the grand scheme of things, so how I want to do social. So, the thing is, for me it should be as low effort as possible, because I don't have a lot of time, and if something takes like one hour or two hour, I really need to plan it, and that would, for me that would mean that I will probably not do it, because it takes too much time. And, what I have in my mind that it should be easy to just pick up my phone, record a video, or record that audio snippet, and then get that ready to be posted on different social media, and even posting the stuff is already going to take a lot of time with the different social media apps that are around, like YouTube Shorts, or TikTok, or Instagram, Threads, Blue Sky, so that takes time.
So, I want to keep it as low level as possible, and that's why I've been working on automation. So, on the screen, if you're listening on a via podcast, I'm showing a flow diagram, so what I want to do, I want to make it as easy as possible, and when I record a video, or record an audio snippet, I want to be able to upload it to a Telegram bot, and I purposely did that because the Telegram bot will serve as a storage for the raw video, and after that, I can automate and take that file from Telegram and force it to all kinds of other services, and I can easily share it from my phone. I don't need to go to a webpage and say upload file and have issues with large video file uploads and stuff like that.
So, I just upload it in Telegram, I'm building a Next.js app currently that connects to Telegram, takes the input, and will be able to process it further. And in that Next.js app, I'm going to integrate FFmpeg to be able to manipulate the video. So, trim the video, extract the audio, replace the audio, add subtitles, things like that. And that Next.js app will also be connecting to the Python servers that I've been building, which uses MoviePi to create captions, overlays with text, and it uses FasterWhisper to do speech-to-text.
And speech-to-text with Whisper is pretty good, but it's not perfect. So, in the Next.js app, I'm also building an interface. So, after that, the translations have been generated, or no, not the translations, but the speech-to-text, I'm going to create an interface that allows you to change the generated text. So, for example, the way that I pronounce Dan, Whisper often interprets that as D-E-N instead of D-A-N, and some names of my services and things like that, it doesn't really understand it correctly always. So, you can correct that and then continue with the processing pipeline.
So, the most work in the past week went into creating the Python servers. I'm not really experienced with Python a lot. I've built a service before at my place of work, and I've also used that experience and also a lot of JetGPT to get a service working pretty stable that uses MoviePi to create the captioning and also the speech-to-text with Whisper. And now I'm starting to work with the Next.js app to really get it flowing.
So, the Python part, I found a sample Python notebook that somebody created where he used Faster Whisper and MoviePi to generate the captions, and I'm building on top of that to create, to be able to create clips. So, I'm showing a sample here, I'll upload that to YouTube also, you can see it here. What I did is I took the introduction that I recorded for the previous podcast, and what you see here is that the words that I'm talking are overlaid over the image, and it's replaced. Every word is shown while I'm speaking it, as you often also see with TikTok videos and other stuff, and it doesn't look that great. It's just white text over the video itself, but the potential is there, and I often implement it like this, so I create the technical capability, but since I'm not really a graphical or a good UI person, I often leave it there.
So, picking the right font, the right size, the right colors, the right position, I figured I've built it. When I'm doing the actual implementation of the pipeline for my socials, I just figure it out then. So, I'm kind of forwarding the challenge to then in the future, and for now that's okay. I love the feeling of it coming together and being able to do something in Python that works again, and that was fun to build, and the end result is there.
The example that I found only did subtitles that highlighted a specific word, and that's a preset that's also there, and also the one that I've just shown where just the word is highlighted, and that's cool. And so, what I'm going to build now in an X.js app, so yeah, so in the next slide that you can see on YouTube, but I also explain it, there you see the presets that you can build. So, the source is a video or an audio file, and the audio file then gets transformed into a video, for example, by being able to add an image background and just have the video play or loop that specific image, or it could even be a video where the audio is replaced, and have that generated.
And then I feel like after I've uploaded it, the application will ask me to pick some presets or flows that I've predefined, and it could be several steps that I've looped together. So, for example, it could remove the audio, enhance the audio with some audio surface, and then stitch back the audio into the video, it could overlay the speech-to-text part, and add an intro and outro clip, for example, and make it ready for Instagram.
And it could crop it, make it square, while something that I upload to TikTok, I keep it portrait, or I transform it from landscape to portrait, and the way that I feel it is this is all some different nodes that are able to do something, so crop, add clip, trim, that I can all place together in a flow, and I just upload the image. I pick the flows, one or more that I want to apply for that specific image, it gets to work, and at some point, I get a message back saying, hey, we've got this video back for you, you can download it, and you can upload it to social.
And so part of the work will be sent to the Python service, part of the stuff will be done with FFmpeg, with Next.js, I could even decide to add some AI with an LLM, that could even be a third input type, where I just only describe the video, and I use some kind of external API to generate the video. I'm probably not going to do that, because I want to have just me on the socials, not some AI-generated thing, but it could be, for example, that I have a podcast, I have AI summarize it from the transcripts, and I have AI then generate the script for a summary video, and that could be generated and then uploaded somewhere.
Hopefully, at some point, I might even upload it automatically to socials for the ones that have an API, so Twitter, that's pretty difficult, Instagram is also not that easy, but other services may have a better API, and those I can, I could already upload it automatically. And within that flows, you can have manual steps, as I just explained, for the captioning, that you can fix the captions manually, and when you press the button, the flow continues going, yeah, so that's what I envisioned for that part right now, and I'm kind of focusing on that, I'm not focusing on the other products at the moment, I just want to get a good quality debugging then, podcast channel out, I want to be able to promote stuff on socials, for example, now I'm working with Next.js, and I'm having a hard time finding the difference, or understanding the difference between React server components, React client components, when to use which.
I thought that when I started, you kind of always want to do server components, because those are generated on the server, and it's faster, but as it turns out, if you, I'm now building the login form, for example, but if you do that with a server action, then showing the progress of doing the login, that's only able, you're only able to do that using a hook from a client server component, so I already need a client component, well, I figured, hey, I thought it was best to have only server components, so that, I'm just reading a lot documentation there, and the stuff that I encountered there, I want to share, so I've already recorded something about moving from Preact to Next.js and React, or the maintain UI components, where I found out that even the entire maintain UI image layer, you can use that as server components with Next.js, so that was a bummer, but those are things that I'm learning, and I want to be able to share that easier by just creating short clips.
One other thing I'm thinking about is something that I've dubbed short casting, so you have podcasting, but you replace the word pot with short, is that when you have short clips, like a minute, two minutes long, where I vent or rant about something, or something I've learned, that I'm also able to easily record an audio snippet, have this platform use, generate a video, or improve the audio for me, and just upload that as a podcast, for example. Besides Debugging Dan, that you just have a collection of short snippets with me venting about stuff, which doesn't really fit with the standard Debugging Dan channel, because those are a bit longer with a beginning, a middle, and an end, but it's also something that I've been thinking about, which would really fit with this, and I'm currently building it.
The internal name I'm using is Dan Cuts, like Pro Cuts, and all those video editing programs have cuts in the name. I'm not planning to release it as a product somewhere, because it's really specific to my needs and what I need, but yeah, that's what I've been working on the past week. I'm going to be working on it next week also, and yeah, it was fun for me to explain what I'm doing. If you have any questions, want to know more, just send me a message, comment on the video, the podcast, just let me know. And I'll speak to you again next week. Bye.
Thanks for tuning in to Debugging Dan. If you enjoyed this episode, please subscribe and leave a review. Stay curious, and see you next week.