We Compared Automatic Audio-to-MIDI Transcription Tools to Aural Transcription by a Music Arranger

How Good is Audio-to-MIDI software?

In the ever-evolving landscape of music production, the need for efficient tools to transcribe audio into MIDIA protocol for communicating musical information, such as notes and control signals, between electronic musical instruments and computers. has become increasingly apparent. Musicians and producers often find themselves exploring various software solutions to convert their audio recordings into MIDI data for further manipulation and arrangementrefers to the structure and order of musical elements in a composition, such as melody, harmony, rhythm, and instrumentation.. However, the question arises: can these audio-to-MIDI transcriptionThe process of notating a piece of music as it is performed, either by ear or from a recording. tools truly match the precision and musicality of a skilled human arranger relying on their aural skills and knowledge of musical traditions? In this article, we’ll delve into the realm of audio-to-MIDI transcription tools, comparing their advantages and disadvantages, and ultimately exploring why aural transcription by a competent arranger is still the gold standard.

The Rise of Audio-to-MIDI Transcription Tools

Audio-to-MIDI transcription tools have gained popularity for their ability to automate the process of converting audio signals into MIDI data. These tools leverage advanced algorithms and machine learning to analyze audio recordings and generate MIDI information based on detected pitchThe perceived highness or lowness of a sound, determined by the frequency of the sound wave., rhythmThe pattern of beats and accentuations in music, which gives a piece its sense of movement and flow., and other musical elements. Some popular options in the market include Melodyne, Neural Note, Klangio, Logic Pro’s FlexPitch, AmazingMIDI and WIDI Recognition System.

Automatic Transcription Examples

The file source for this series of examples was originally created in Finale. The Finale midi had been imported to Logic Proacronym short for Publishing Rights Organization, such as ASCAP, BMI, SESAC X for a production session. Here’s the audio, bounced down from the MIDI piano track in Logic:

and how it looks when created in Finale:

Example 1: Melodyne

For this example, we instantiated Melodyne onto the new piano audio track, and did a “Transfer”

We exported MIDI from Melodyne and opened it up in Finale, and it gave this:

Although not bad from a pitch perspective, the timings run over, and the midi would need editing, aside from being split into 2 staves.

Example 2: Flex Pitch (Logic Pro X)

This pluginA software application for processing audio signal or MIDI information, including effects and virtual instruments/app appears to be monophonic only.

Example 3: Neural Note

Neural Notea symbol used to represent a specific pitch and duration is open source, under constant development, and worth watching. In this example, it transcribed overtones as actual pitches at the default setting, but the app has onboard filtration tools that enable degrees of selectivity. It can be a useful tool to convert audio into midi within a DAWDigital Audio Workstation. Some are: Ableton Live, FL Studio, Logic Pro X, Cubase, Pro Tools, Studio One, Reason, Reaper, Digital Performer, Bitwig Studio, Samplitude Pro X, GarageBand (Mac), Cakewalk by BandLab, Presonus Studio One, Tracktion Waveform, but requires the operator to go through the same recognition and editing choices that an aural transcriber would while transcribing directly to notationA system of symbols used to write down music. software.
It appears to have a problem discerning duration, as well.
It’s downloadable as a standalone app or plugin in various forms (AU,VST) via GitHub.

Example 4: Klang.io

This looks the best “out of the box,” but like all the others (except monophonic Flex Pitch) transcribed overtones as pitches:

Example 5: Amazing MIDI

We did not test on Amazing MIDI – we’re Mac here, and it’s Windows only freeware, but we did download their published example of another piano piece, which looks like this on import to Finale:

Example 6: WIDI Recognition System

We chose not to test this, given they don’t offer a trial version, and they want minimum $59.90 to purchase and test the plugin.

Advantages of Audio-to-MIDI Transcription Tools

Speed: The primary advantage of using audio-to-MIDI transcription tools is speed. These tools can process large amounts of audio data in a relatively short time, allowing musicians to accumulate a lot of pitch data quickly during the creative process.
Quantization and Precision: Many tools offer advanced quantization features, enabling the operator to edit the MIDI data to line it up with the grid. These same tools should also be available in a DAW, and can require extensive time and editing whether in the recognition app, or in the DAW..
Resulting Data resides in the Target Environment: Converting audio right into the DAW makes sense if you don’t need music notation. If you want to sing or play your instrument into a DAW as a source for melodic content, it makes sense to use automatic transcription, then edit it to correct timing, velocityRefers to the amount of energy applied by a performer in generating a tone, and is communicated in MIDI as an attribute of a note. Sampled instruments usually contain sounds produced at varying velocities, which are mapped to MIDI velocity ranges in the sample player. A MIDI note with a higher velocity will sound as if it's performed with more energy than one with less velocity. Velocity is expressed as a number in a range of 1-127 and other control parameters within the DAW environment. Generally speaking, these tools do well with single-line transcription and can speed up the process of getting data into the session.

Disadvantages of Audio-to-MIDI Transcription Tools

Artificial Sound Artifacts: One common issue with transcription tools is the introduction of artificial artifacts in the MIDI output. A serious drawback in these tools is their inability to definitively recognize the fundamentalLowest, most predominant perceived pitch of a musical tone (root) of a particular sonority, and whether overtones were performed, or rather are a natural component of an instrument’s tone. Most of the tools we tested had sensitivity filtering controls, requiring the operator to make the same kinds of choices an aural transcriber would make as to which notes are overtones (pitches above) or resultant tones (pitches below), and, which are the notes composers or musicians would intend as the fundamental. Invariably, Automatic Transcription requires extensive editing after it does its thing, despite its apparent promise of fast and accurate work. The greater the polyphonic complexity, the less competent automatic transcription is at accurate transcription.
Complex Musical Phrasing: Transcription tools may struggle to accurately capture complex musical phrasing, dynamicsThe relative loudness or softness of an element of piece of music, indicated by symbols, or controlled by MIDI values, and nuances, leading to inaccuracy requiring extensive editing to correct in translation to conventional notation.
Limited Interpretation: These tools won’t grasp the musical intent behind certain performances, leading to interpretations that deviate from the artist’s intentions.

The Aural Advantage

Despite the advancements in audio-to-MIDI transcription tools, the human ear remains unparalleled in its ability to interpret, discern and understand the nuances of musical structure and performance. A competent arranger possesses the musical knowledge and sensitivity required to accurately capture a performance, recognizing fundamental pitches, adjusting tempoThe speed at which a piece of music is performed, often indicated in beats per minute., dynamics, and phrasing in a way that transcription tools cannot replicate.

Why Aural Transcription beats Automatic for Notation

Musical Sensitivity: Aural transcription calls for human understanding of the musical context, requiring the arranger to visualize a sound or passage in conventional notation, resulting in the creation of notation that will produce the same result when performed by humans or a MIDI interpreter.
Efficiency: Given the amount of editing required, when notation or accuracy is the objective, it can take less time to transcribe it aurally and get it right the first time.
Adaptability A skilled arranger can adapt to the stylistic nuances of a specific genre or artist, ensuring a more authentic and personalized transcription.
Expressive Interpretation Aural transcription allows for a more expressive interpretation of the music, capturing the unique qualities of a performance that might be lost in the automated processes of transcription tools.

Conclusion

While audio-to-MIDI transcription tools may be helpful in entering monophonic data into a DAW session, when attempting to transcribe polyphonic source material, the innate musicality and interpretative skills of a competent arranger still save time and money when accuracy is important, and are vital when music notation is part of the end product. Aural transcription goes beyond the capabilities of automated tools, reducing errors and editing time and cost.