These results can also be found in our SoundCloud.
To compensate for our relatively small dataset, we also experimented with training GPT2 on a classical corpus first, and then continuing to train it on our jazz dataset. In addition, on/off is closer to representing the way music is actually played, so we reasoned that it was more intuitive to use that method. We can reason that with the original per-timestep preprocessing method there is no concept of notes being held, only notes played at each timestep, so the model erred on the side of too many notes. Qualitatively speaking, music generated with the on/off sounded more musical, partly because it contained fewer notes and held them for longer. d2), as well as special keywords like w for timesteps where no note is played.
Our first approach was to convert each timestep indidually by representing individual notes as words (e.g. To use GPT2, we needed a pipeline to translate MIDI into text. Python3 setup.py process įrom here, you should be able to run and modify lstm.py (many-to-one generation), encoder_decoder.py (seq2seq generation), and textgen.py (encoding and decoding MIDI to text for GPT2 processing). Run the following command to auto-generate phrases.npy and files in input_numpy/, input_images/, and midi_corpus/.