我应该使用什么类型的神经网络？

I'm starting a project where I use a neural network to generate music. I was wondering what type of network I should consider, given the specifications of my sample. Here is what I am working with.

我正在开始一个项目，我使用神经网络来生成音乐。考虑到我的样本规格，我想知道我应该考虑什么类型的网络。这是我正在使用的。

The music I am training is meant to be played as an instruction set by classic Nintendo soundchips. So, obviously, my training set takes that same format. Here is what one line of instructions looks like:

我正在训练的音乐是作为经典Nintendo soundchips的指令集播放的。所以，显然，我的训练集采用相同的格式。以下是一行说明：

ROW 00 : E-1 00 F P80 V00 ... : B-0 00 F P80 V00 ... : D-5 00 . P80 : 1-# 00 F V00

行00：E-1 00 F P80 V00 ......：B-0 00 F P80 V00 ......：D-5 00。 P80：1-＃00 F V00

which I can basically parse into values that correspond to the pitch and volume of the certain instruments being used. Ie, we can change this to something like an array like

我基本上可以解析为与所使用的某些乐器的音高和音量相对应的值。也就是说，我们可以将其更改为像数组一样的东西

[16, 15, 11, 15, 90, 1, 15]

[16,15,11,15,90,1,15]

or something, just ballparking. Anyways, these instructions get fed into the soundchip emulator at a pretty quick rate (like 256 for 3 measures of song). Thus, the entire song can be represented as just one long 2-D array.

什么的，只是滚球停车。无论如何，这些指令以非常快的速度被送入音响芯片模拟器（对于3首歌曲来说就像是256）。因此，整首歌曲可以表示为一个长2-D阵列。

In what I've read, LSTM is a pretty popular strategy for music generation, but I was wondering if I can do something like minimizing loss on a 2-D array that represents the entire song? Since there are so many instructions being sent per song, is it reasonable to use LSTM? Should I change from training full songs to making a few measures at a time?

在我所读到的内容中，LSTM是一种非常流行的音乐生成策略，但我想知道我是否可以做一些像最小化表示整首歌的二维数组丢失的东西？由于每首歌曲发送的指令太多，使用LSTM是否合理？我是否应该从训练完整的歌曲转变为一次制作一些措施？

I also would like to do this project from scratch, without using a library. I would like it to be difficult and faithful to doing realistic neural network creation, but I don't want it to be insanely hard. Thanks. If you have any resources on how to approach this type of thing, let me know!

我也想从头开始做这个项目，而不使用库。我希望做现实的神经网络创建是困难和忠实的，但我不希望它是疯狂的。谢谢。如果您有任何关于如何处理此类事情的资源，请告诉我们！

2 个解决方案

#1

I would recommend using a recurrent neural network (or LSTM) for this problem as it would be the most suitable option. Rather than pass in the entire song, it would be better to tokenise your input into sequences of a certain length, which would make it more efficient to train.

我建议使用递归神经网络（或LSTM）来解决这个问题，因为它是最合适的选择。不是传入整首歌曲，最好将你的输入标记为一定长度的序列，这样可以提高训练效率。

Here's some useful resources I found on the topic:

以下是我在该主题中找到的一些有用资源：

Training a Recurrent Neural Network to Compose Music
训练回归神经网络谱写音乐
Composing Music With Recurrent Neural Networks
用回归神经网络谱写音乐

#2

I worked some time on a project related to music generation so there are some thoughts:

我曾经在一个与音乐生成有关的项目上工作过，所以有一些想法：

RNN are really the best way to predict music.
RNN确实是预测音乐的最佳方式。
I tried to detect batches via original ticks (but you need to figure our how you will go from one tick to another)
我尝试通过原始刻度检测批次（但你需要弄清楚你将如何从一个刻度线转到另一个刻度线）
One idea more I found in the web and worked with: I split notes sequences (gaps) and lengths.
我在网上找到了一个更多的想法并且使用了：我分割了音符序列（间隙）和长度。

Links in the previous answer are a good start.

上一个答案中的链接是一个良好的开端。

If you want to make it not insanely hard: split your arrays into notes gaps and lengths; split into batches; add 2 RNN layers for gaps ad lengths; train your data. Sometimes the model can stack and zero-gap (predict one note) but here data cleaning may help.

如果你想让它变得非常困难：将阵列分成笔记间隙和长度;分成批次;为间隙广告长度添加2个RNN图层;训练你的数据。有时模型可以堆叠和零间隙（预测一个注释）但这里的数据清理可能有所帮助。

Wish you the best!

祝你好运！

#1