It’s getting frothy out there.
The number of voice actors in a rabid panic over AI in the industry is reaching a head, with social media brimming with daily posts on the topic, despite very little real world evidence of synthetic voices impacting the bottom line of working pros, or even amateurs for that matter.
There’s a supposition among the masses that because the technology is improving, its ascension is inevitable, and that by definition it will supplant human voice actors to a highly disruptive degree. It’s easy to get caught up in the terror, but worst-case scenarios….heck, even moderately disruptive scenarios….are based on a lot of assumptions that very well may not hold up in the real world.
Now, there’s no question that numerous companies and platforms want AI voiceover to be an Earth-shattering thing. And, inevitably, we are going to start seeing even well-known casting platforms offer AI voices against or alongside their human talent. Many voice actors are busy creating their own voice clones which they expect to make available through their websites, casting platforms, or through the platforms of the companies creating these artificial voices for them. But this assumes that the demand will be present, and substantial.
It seems equally likely that AI voices will instead be received much the same way that those paid listings at the top of Google are. If you’re like most people I know, you skip right past those until you get to the organic results, right? Why? Because you know they are inauthentic. Paid for and driven by an algorithm, and not a legitimate reflection of the intent of your search. For most people that’s just reflex now, which is why buying search terms has become a loser’s game in most industries.
I believe we are dramatically underestimating the reflexive human nature to reject that which is inauthentic when it comes to voice clones and other synthetic voices. EVEN IF THEY ARE AMAZING, (and I’ve still yet to hear one that doesn’t break down into unnatural speech after a minute or so,) if they are listed against human talent on a casting platform, human nature will be to skip over them for the real thing.
We are being inundated with daily reports of the impending AI takeover of so many professions and walks of life. But is it actually happening? Take ChatGPT for example. Fun and interesting, sure, but assuming it will be more than just a toy requires multiple leaps of faith, as this article by The New Republic’s Alex Shephard points out: https://newrepublic.com/article/170855/artificial-intelligence-chatgpt-hype-kissinger
It may in fact turn out to be nothing more than a curiosity.
The potential impact of this technology is predicated on a tremendous number of very aspirational assumptions about demand. These are not a fait accompli by any stretch of the imagination. Corporations may be avaricious and cynical, but as a species human beings are not. We collectively and instinctively understand when something is fake, and we reflexively reject fake things.
I frequently ask my clients if they are considering or would consider replacing me with a synthetic voice. The overwhelming majority reply with a flat out “no.” Are they all being honest or contemplating all possible scenarios? Probably not. But I’ve yet to see this technology make even a marginal impact on the growth of my business. My agents are pushing out more auditions than ever. My manager is out there slaying it. Production companies and ad agencies are still hiring directly every day. And there’s more volume on P2P sites than ever. If the voiceover singularity is rapidly approaching, it’s awfully hard to tell.
Lance Blair says
Great points. AI voiceover doesn’t supply a solution to quality production workflows. Even if you got a voice that did sound reliably/consistently perfect beyond a few seconds, it would be a nuisance to direct it. So far. Will there be a point where the director can give directions to AI voices live and then they can react? That would be a problem for our industry. There are fantastic AI avatars out now that can be used for corporate video but to have bespoke audio it’s $300 a month or more, so it’s not worth it for smaller producers. I cloned my voice (for my own research) with ElevenLabs and it could only sound “human” by setting it to “less stable” on the slider, which is funny when you think of it. However, when I repeatedly ran the text through the AI on “less stable” the voice became more and more angry. We’re not there yet.
Sumara says
Amen, Father J! 🙋♀️ 😂
David Holmes says
Those are some enlightening and encouraging thought, JMC, thank you.
I agree, in essence, there are so very many, as yet, unknowns in the formula to be able to predict the outcome.
I do think, though, that there will be a long period of acclimatisation and trial & error while clients and suppliers figure out where they will settle.
Thanks for your words and time!
Bobbi Maxwell says
Great article, JMC. I’ll have to agree with you on most of your points here. (By the way, AI to me stands for “alternately inept”). It just cheapens the product, or becomes less respectful. I liken it to the music industry (I love all genres of music and its always been a part of my life in some way). A fair amount of artists use auto-tune and many of us pick up on it right away. It’s been around since the mid-90s, and first became mainstream when Cher’s “Believe” hit the charts in ’98. You knew something was computerized about her voice right away. It was unique, yes. But give me the true range of Adele or Christina Aguilera any day. You know it’s human and impressive and a gift. Which is exactly what buyers should think of our voices, right? AI? Yeah, pass.
David Gilbert says
While not disagreeing with the fundamental basis of your POV, I can tell you from personal experience that at least in the world of eLearning, AI voices ARE not only accepted but embraced (though in my mind for the wrong reasons). I can’t tell you how many people I reach out to who use WellSaidLabs voices instead of humans.
I do agree, though, that any project that requires true imperfect humanistic speech or, gasp, some emotive acting, will almost always require a real human.
Todd Ellis says
Thanks for stabilizing words in an unstable time. I agree (and with Lance’s comments). I will say, I’m glad I’m 56 in this game and not 26.
I’ve seen “Maximum Overdrive”.
Greg Staudt says
Great insights as always J. Michael and appreciate the update. I completely agree that we are underestimating human nature in rejecting that which is inauthentic!
Hope all is well!
Bev Standing says
I couldn’t agree more. Folks with little to no budget will embrace it perhaps, but those aren’t our clients. We provide the quality and authenticity that makes what we do engaging and connected.