Neural music instrument cloning from very few samples

Nicolas Jonason, Bob L.T. Sturm

February 2022

Github

https://github.com/erl-j/neural-instrument-cloning

We have combined techniques from neural voice cloning with musical instrument synthesis. This makes it possible to produce neural instrument synthesisers from just seconds of target instrument audio.

If you want to try it we have released a colab with pretrained models.

There will be a paper out soon explaining more details.

Audio examples

Note: All audio examples are rendered at 48 khz.

Here is a real 16 second saxophone recording:

recording nr: 3 training data.wav

We then train a neural instrument clone on this passage.

Here is the resulting clone synthesising a musical passage it has not seen before. The inputs to the synthesiser are pitch, loudness and pitch confidence contours.

recording nr: 3 unseen estimate.wav

This clone can also be used to modify an excerpt. Consider the following excerpt from the training data:

recording nr: 3 training target.wav

Let’s use the clone to synthesise a quieter version.

recording nr: 3 loudness -12 db.wav

Or make it louder!