the amount of times people start and stop and edit their sentences mid sentence that we subconsciously adjust to understanding is quite incredible. I was there doing the interviews and I had no idea how many times both me and the interviewee would restate or restart the same sentence or repronounce a word etc
@liaizon how about transcribing the entire interview as a sort of mindmap? The restarted sentences can be like dead ends on the tree
@liaizon this is one thing that people who have not worked with language often don't really get: language, as it is spoken in reality, does not meaningfully care about grammar. it's actually amazing.
@halcy it actually makes me want to try to having come conversations with different sorts of folks and try transcribing them exactly as they are spoken and then look into what the patterns really are. I am sure there are lots of papers about this but it would be interesting to see how much it varies
@liaizon haven't done any work in actual transcription, but I'd assume there's research into it (though I did at some point have to look at some transcribed speech and for fun I just pulled up transcriptions of the Switchboard corpus, opened just any random file and:
have uh the thing that that bothers me worse than the credit cards i think is uh you mentioned the gasoline credit card
i don't have that but i've got you know one of the [vocalized-noise] the
uh instant teller cards
it's incredible that during normal conversations unless you pay attention you don't notice the mess that comes out of your mouth because brain good
@liaizon I wonder what the error rate for full transcriptions like that is. Even for regular transcriptions the number I've seen is ~4% for average, non-professional transcribers, and I guess this would be worse
@liaizon Years ago I transcribed a director talking about his film. Since this was part of the promotional materials, I cleaned up all the false starts, random sounds, and fixed the grammar.
The director came back furious that I'd changed what he said, because every single word out of his mouth was a treasure, so I re-transcribed it accurately with all that stuff back in, including random sounds and the grammar that's just a mess when its written down.
They ended up not using any of it.
@liaizon Oh yeah. Drives you nuts when you first edit podcasts. The only thing you can do is let it go and only edit if the person talking loses their thread.
the personal instance of Liaizon Wakest