Experimental composer Holly Herndon built an AI voice clone that anyone can use


This musician built an AI clone of her voice so anyone can sing like her

Experimental composer Holly Herndon says this technology is not here to replace artists – and that the future of creativity belongs to collective intelligence

Holly Herndon stands indoors at the Serpentine North Gallery in London, framed by a suspended circular sculpture structure, with brick walls in the background.

Holly Herndon at the Serpentine North Gallery in London, October 2024.

Matthew Chattle/Future Publishing via Getty Images

Holly Herndon hears the future of music in data. Herndon came to electronic music after singing in church and choirs in East Tennessee. She earned a master’s degree at Mills College and a doctorate at Stanford University’s Center for Computer Research in Music and Acoustics.

When she started experimenting with machine learning in 2015, the output sounded “scrappy,” but she remembers seeing “the diamond in the rough.” Today, these experiments have evolved into custom models that allow anyone to act as her.

Scientific American spoke with Herndon about training her AI models and her belief that creativity has always been collective—AI just makes it visible.


On supporting science journalism

If you like this article, please consider supporting our award-winning journalism by subscribes. By purchasing a subscription, you help secure the future of impactful stories about the discoveries and ideas that shape our world today.


(An edited transcript of the interview follows.)

You describe your work as “protocol art”. What does that mean?

In the 20th century, the site of media generation—the paper and pen where music was written—was the artistic act. With protocol art, the creative action occurs upstream of media generation. It creates the set of rules and the conditions under which art is made.

We are very interested in training our own models. I always say ‘we’ because I work with my partner, Mat Dryhurst. We treat each step in the model-making process as a creative intervention. The creation of the data set is part of the artwork. I often write music for training – music not necessarily for human ears, but for a computer to learn something.

Can you give me an example of what it looks like in practice?

We have an exhibition in Berlin right now. We were inspired by Hildegard von Bingen, a medieval composer. We wanted to pretend that polyphony had existed when she was alive. We started with a model of her compositions and added rule sets so that it could generate polyphony in her style. We took these outputs, rearranged them and gave them to human singers to interpret. Then we created a huge installation where performers sing and invite the audience to train with us.

It’s not about putting in “write me a pop song with a guitar.” It’s about using this technology to bring people together to create art in real space.

Most commercial AI models are trained on data scraped from the Internet. Why do you insist on building your own models?

As an electronic musician, I was never one to sample – I always created my own sound palettes. When we started, pre-Suno and pre-all-this-thing, we had to create our own dataset. It just felt natural, like making my own samples or digital instruments.

One criticism of products (such as Suno) is that they are very “average” – trained on everything or the most average. My models sound unique because I create the training data myself. I also think there is incitement under the hood in Suno to limit it to three-minute songs with a verse-chorus structure. It’s the railings that make it boring. I would like them to drop some restrictions.

Has a model ever surprised you?

We did a project called Holly+ around 2021 – a voice cloning of my special voice. We worked with Voctro Labs to train a voice model that works in real time so people can sing with my voice. It was game changing.

If this works in real time, other people can perform each other’s identity in real time. When we tested it, my partner, who is British, sang into it. I heard my voice with a British accent. It was so creepy I had to leave the room – he was singing like me. That was one of the biggest mental locks on how weird and cool this could be.

I think it will take five to ten years to be seamless. But once we transform the body in real time – imagine you could make a model of a whale voice, and then make a hybrid soprano whale. When you sing loudly, there is opera; when you sing low, you’re more whale or barry white. We are no longer bound to my larynx.

Where do you think we will be in 10 years?

A lot of fear around this technology is actually fear of how today’s Internet works – the attention economy, how difficult it is as a creator. My partner always says, “Scrolling is for robots, and strolling is for humans.”

Our more optimistic vision is to use agents to handle all the crap and filter through things, and actually bring us together in the real world. That’s why our projects involve people meeting IRL and doing things together. Some of my smartest developer friends are vibe coders with multiple agents while cooking or walking their toddler. Things can be very beautiful if we imagine and build it that way.

Is this technology changing your definition of creativity?

This whole AI thing might force us to see ourselves as perhaps not the only creative actors in the universe. It doesn’t have to be scary – it can be beautiful and liberating.

Creativity happens in swarms, in community. AI is just collective intelligence – aggregated human intelligence. The art model from the 20th century is linked to an individual genius who touches an object and gives it value. It is thrown on the head. I am all the team’s collective intelligence.

It’s time to stand up for science

If you liked this article, I would like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in its two-century history.

I have been one Scientific American subscriber since I was 12 years old, and it helped shape the way I see the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does for you too.

If you subscribe to Scientific Americanyou help ensure our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten laboratories across the United States; and that we support both budding and working scientists at a time when the value of science itself is too often not recognised.

In return, you receive important news, captivating podcasts, brilliant infographics, can’t-miss newsletters, must-see videos, challenging games, and the world of science’s best writing and reporting. You can even give someone a subscription.

There has never been a more important time for us to stand up and show why science is important. I hope you will support us in that mission.

Add Comment