The big tip for making the video is to select a capture region that is proportional to 16x9 and not too much bigger than the region you want to display. Doing this will "fill" the YouTube video box and leave no black border and then what you see shouldn't look too small. (If you know people will go to full screen when they watch the video you can capture a bigger area, but there is no way to know that in advance.) In this case I captured a region that was 720x405. I believe that getting exactly the right aspect ratio contributes to the sharpness of the image.
Because the video itself is so short, I produced a transcript directly without trying to use a transcription program first. You play a few seconds of the video and pause, then type what you heard and back again. I keep the movie in a window smaller than half screen, just displaying the control bar and the time. That allows another window smaller than half screen above the video window for the transcription. So even if you only have one normal sized monitor, this is pretty easy to do.
Once the transcript is done, then you have to go in to insert the timings. I have this awkward habit of pausing in mid sentence but then going ganbusters at the end of one sentence and onto the next. So note that the timings allow fractions of seconds. In what I made I only went down to a half second, but the thing allows much finer gradation if you want to put in that effort. In what I've got, the text is on the screen for about 4 seconds, depending on where I pause and how much text is on the screen.
I'm pretty impressed with what can be done now. If students were making these things, they'd catch onto the technology pretty quick and then could focus their attention on the substance of the message.