Machine Learning in Music Composition

The accomplishments by using machine learning in the field of computer vision and speech recognition have enforced scientists and researchers to exploit these techniques in music information. Music analysis, discovery, and recommendations are the key areas that need to be further investigated. The digital audio signals, processing, and modeling of an efficient machine learning system are the main issues in focus.

There are many research efforts in the literature which can lead to music analysis. These efforts include the detection of music features like styles, instruments, and genres. The music features selection depends on the algorithm in a prediction or analysis system. We can find different systems with different combinations of features.

Music Features for Composition Models and Algorithms

Musicians use machines for composing music in different ways. For example, they can use it to compose a specific part or entire music with some editing. For automatic composition, it requires an algorithm for producing music sheets or sound synthesis. The most widely used algorithm models are knowledge-based, learning-based, mathematical, grammar, and hybrid systems. Following are the details of these models;

Knowledge-based models: The algorithms in this model remove the aesthetic code of genre first. The code is then used for producing a new composition. The new composition will be similar as it is based on the previous composition.
Learning-based models: these algorithms first learn the genre of music from the provided compositions. It then produces a new composition based on the learned genres.
Mathematical models: These models are mostly based on mathematical equations and variables. Many statistical processes are well suited for these types of models. However, using the Markov chain has proven the best in many cases. Similarly, Gaussian distribution is also used and performed well in some test results.
Grammar models: algorithms used in this model produce musical pieces by a set of rules. These rules define macro-level composition instead of single notes.
Hybrid Systems: Thesis systems combine any combination of the models for optimal performance. The selection of models depends on the algorithm which can control the complexity involved. Here, it uses superior features from single models to get the best results.

Implementation of Machine Learning in Music

Machine learning is in practice for music since 1960. Russian researcher R. Kh. Zaripov[1] published the first paper in the history of this field. Later in 1965, Ray Kurzweil [2] generated the first-ever piano piece. This piano piece was capable of pattern recognition in many compositions. Since then, researchers have proposed several machine learning techniques. They implemented these techniques for music analysis, retrieval, and compositions. Some of the current imperative systems are discussed here to get a general idea about the role of machine learning in music prediction.

Biaxial Recurrent Neural Network [3]: It uses deep neural networks for prediction. It uses more than one node and layer for learning and prediction. They used Long Short-Term Memory (LSTM) approach to deal with the short-term memory problem. Sepp Hochreiter and Jurgen Schmidhuber analyzed the issue and proposed the LSTM technique. The short-term memory problem arises because an output value in one step becomes the input for the next. This process goes on unless the same value is output again, as discussed in detail at the given link [4]

MarkovComposer: MarkovComposer uses the second-order Markov chain in a composition process. Two previous music notes determine the next one. The pitch and spacing between notes are also stored in the Markov chain.

Lisl’s Stis (RNNs for Folk Music Generation): where a system is trained on 1180 tunes in ABC format from a collection published in 1778.

MusicComposer: Provides stochastic based music composition and machine learning codes. The system is trained on 8-measure fragments extracted from an online collection of classical MIDI music [5].

DopeLearning: DopeLearning uses RankSVM algorithm and a deep neural network model for retrieval.

Modeling/generating polyphonic music. The polyphonic are produced with the RNN-RBM where systems are trained on various collections of MIDI music. RNN-RBM is an energy-based model for density estimation of temporal sequences.

GRUV: it uses the recurrent neural network for generating music. The project is based on Python language.

deepAutoController: it builds deep graphical models. it provides detail about the code layer of a deep autoencoder. Based on these autoencoders, it creates a new sound using a deep autoencoder.

Irish folk music generation: the system trained itself on 23,962 tunes in ABC format from an online collection.

Synthesizing: digital audio with RNNs, in which a system is trained on digital music audio of the group Madeon.

How to Retrieve the Music Features

Learning the unsupervised features of music, a simple approach is to use spherical k-means clustering [6]. Using this clustering method, k-means calculate distance differences between element vectors and centroids. Then it reassigns the document to that cluster, which has the closest centroid. In music information retrieval, Sander Dieleman and Benjamin Schrauwen [7] use k-means. They consider means on the unit sphere and having a unit L2 norm. In that case, only one parameter was required to tune. For learning the clusters, each point was assigned to one cluster, while a linear combination was used when it extracts a feature.

Music Production Using Learned Systems

Once the data, in this case, is the features that have been extracted from the given sources. The features classifications and number depend on the process and model which used for feature extraction. For example, the dictionary-based techniques are in use to model the musical terms of the lexicon of motifs and their corresponding estimation stochastic processes.

For new instances, the models are parsed for a given context. Then it is compared as a motif in the tree and chooses the symbol for prediction probabilities if it finds the match. If the context is not found, it removes the leftmost symbol and goes back to the previous step. This step iterates and produces a sequence of symbols. This time, these symbols should correspond to a new message from the same source [8]. Some recent computer software and applications which are used for music composition and editing are given as under:

Keykit	Reaktor	Antescofo	Progression
Melody	Max	PetSynth	Rubato
Composer	Ocarina	Music Mouse	Composer
Squared	Music Maker	Nodal

Key Takeaways

Music analyses play a vital role in feature extraction.
Detail features are significant in music information retrieval, composition, and training any associated learning system.
For detail features learning, deep learning outperforms stochastic processes and other machine learning techniques.

Sylvester Kaczmarek

Machine Learning in Music Composition

Music Features for Composition Models and Algorithms

Implementation of Machine Learning in Music

How to Retrieve the Music Features

Music Production Using Learned Systems

Key Takeaways

Recommended Reading