Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An update to 37-year-old MIDI (qz.com)
159 points by pseudolus on Feb 1, 2020 | hide | past | favorite | 101 comments


> In MIDI 1.0, all data was in 7-bit values. That means musical qualities were quantized on a scale of 0 to 127. Features like volume, pitch, and how much of the sound should come out of the right or left speaker are all measured on this scale, with 128 possible points. This is not a lot of resolution. For some really sophisticated listeners, they can clearly hear the steps between points.

This is extremely misleading. Sure, the velocity input into your synth is going to be at 7-bit resolution, but at soon as the synth has it, it can play anything it wants at whatever volume it wants to based on how you have configured it. There's nothing about the external 7-bit implementation that is really limiting the dynamics of the synth itself.

Higher resolution timing and a greater amount of 'awareness' about the features of the device at the other end so as to facilitate automatic mapping of controls from surfaces to synth parameters is what I would find more useful.


Also, if we talk about the volume parameter.

The human ear's dynamic range is about 120 dB, which includes about 20-30 dB of pain. With 127 bits, we can map that with 1 dB resolution.

16 bit audio ("CD quality") only has a 90 dB dynamic range.

We would almost never want a single instrument to have a 90 dB dynamic range, but if we did, MIDI values could logarithmically encode it with a better than 1 dB per step resolution.

In a mix, any instrument that is reduced by more than about 20 dB will disappear.

When synthesized music (e.g. electronic drumming) lacks dynamics, it is not because of the encoding of the raw volume parameter. It's due to other factors, like poor synth patches. Poor synth patches use a small number of samples, like say a snare drum being hit in three different ways, and they stretch that over the dynamic range with some naive scaling. A real drum doeesn't work that way; it makes a different sound for each intensity with which it is struck. You need samples of it being played at a myriad volume levels, sort those by intensity and map them to the intensity range.

Some instruments don't even change intensity that much when they are played louder; a lot of the perception of loudness comes from changing harmonic content. If you fake it with one sample that is just volume-adjusted, it will not sound right.

Synthesizers have tricks to help with this, like low-pass filters that respond to velocity: hit the key harder, and more high frequencies go through. That's one tool in the box for creating a more dynamic sound from scratch.


The perceptability of the 128 steps depends on the parameters it influences. If the midi parameter influences e.g. some form of pitch, 128 values won't get you very far without having the steps perceived.

It also has to do with the pressure range of midi controllers: 128 steps are few if you have to distribute them between "barely touching" and "hammering on it with full force". When you play a real instrument, you will notice that the range of things between the most silent and most loud you can manage is usually huge. For midi controllers this is kinda limited, so good developement.


You can distribute the values non-linearly, though; the difference between smallest and largest value might be big, but I don't think I could hit a pad or key in 128 different ways. Some controller software does offer a selection of velocity profiles.


It depends on the instrument. 128 positions is well enough for piano. From other instruments, drums is the one I know best, and there single parameter is just not enough: the result depends on velocity, the position where the stick hits drum head, for non round tip sticks the stick angle. For cymbals, in addition to hitting different parts of the cymbal, stick tip, shaft and shoulder give different sounds. And so on... There's a good reason why loops sampled from acoustic drums are used even though drum synths exist.


You do not need MIDI 2.0 to solve any of those problems.


No, and I don't expect MIDI 2.0 to help. It was just response to the idea that single parameter would be enough but 7 bits wouldn't.


If you want high resolution pitch, then you basically disagree with the whole concept of MIDI. The concept of MIDI is that notes are symbols. MIDI tells an instrument to play A4, not to play note witha 440 Hz fundamental. MIDI doesn't care how that instrument i tuned. A4 could come out as 430 Hz.

That said, MIDI supports microtonal effects like pitch bending. Pitch bend messages use 14 bits.


You get 7 bits with MIDI, not 127 bits. Also noone wants to encode audio samples using MIDI; the last time someone (ab)used a volume control for digital sample playback was on the Commodore 64.


That's not true ... http://www.4front-tech.com/pguide/midi/midi8.html describes the standard.

Those of us who were using MIDI in 1990 fondly remember it taking too much time, being not well supported, and generally not working well. "No one wants to" is true, but 30 years ago, many people did want to.


That's just a typo; 128 values.


> 16 bit audio ("CD quality") only has a 90 dB dynamic range.

That's a persistent myth. The channel noise floor at the frequencies of interest of 4x kHz / 16 Bit audio is below -100 dB due to combined noise shaping and dithering.

While this means for music 16 bit audio is generally sufficient, it has leaves little room for error; mastering has to be excellent. That's why everyone is recording in 24 bit; it allows you to patch up errors later.


It's not a "persistent myth" - it's absolutely correct.

The "persistent myth" is that signal-to-noise ratio and dynamic range are somehow identical. They aren't.

It's also a persistent myth that the unaltered quantisation noise spectrum is basically white noise. It isn't, except as a poor approximation.

In fact it's very spiky - mathematically it's literally a function related to related to the sample rate. Some frequencies produce more audible quantisation artefacts than others. This is audible on very good hardware, and it contributes to both harmonic and intermodulation distortion on cheaper hardware.

Dither and noise shaping distract from the effect in a subjectively pleasing way, but technically they're a cheap fix - like blurring a jpeg and pretending this somehow magically removes all of the compression artefacts. The result may be fine for Instagram, but not for commercial photography.

The bottom line is that 24-bit sampling fixes these issues because they simply become irrelevant. The SNR limits are defined by the analog limitations of the converters, and all of the quantisation artefacts remain below audibility.


> The "persistent myth" is that signal-to-noise ratio and dynamic range are somehow identical. They aren't.

Obviously :)

The SNR will stay at around ~96 dB for 16 bit audio. If that noise were white, then that would naturally limit the dynamic range to essentially the same number. But no one said that the quantization noise of the channel has to be white, indeed, the whole point of dithering and noise shaping is to de-correlate the quantization error from the signal and make the noise spectrum anything but white. Hence the dynamic range can be increased.

> It's not a "persistent myth" - it's absolutely correct.

No. The SNR is 96 dB; the dynamic range for the relevant frequency band is greater.


> This is extremely misleading. Sure, the velocity input into your synth is going to be at 7-bit resolution, but at soon as the synth has it, it can play anything it wants at whatever volume it wants to based on how you have configured it. There's nothing about the external 7-bit implementation that is really limiting the dynamics of the synth itself.

I don't agree. Musical sounds are evolving, so one should not be seeing a note as "keeping a value zero-order sampled for a period" -- consider tremolo, sliding, etc.

When it's one more dimension to consider than just a point, the limitation of 7bit is shown. Consider a lowpass filter commonly found on a synth. Play a note, and turn the knob all the way down. On an analog device the transition will be smooth and you'll hear the filter gradually "closing down". If the knob is mapped to 0-127, you can easily hear the steps (sounds like discrete i----a----o----u----n----).

Sure there are non-standard 14bit encodings (NRPN), but that's manufacturer-specific, and doesn't communicate.

The article is right that MIDI 2.0 will make it feel a lot more "analog".


There are very few patches where the difference between static filter settings of 64 and 65 is audible and it's literally physically impossible to set a typical panel knob with that level of precision. (It's easier on a medium-sized slider and definitely possible on a long-throw fader, but most synth panels don't have those.)

And it's incredibly easy - and fairly standard now - to add a little interpolation to continuous parameters that are varying.

MIDI 2.0 won't make anything feel more "analog" because virtually no one cares about those kinds of performance possibilities, so most manufacturers won't implement them.

Many of the changes in 2.0 have been driven by the ROLI people. I love my Seaboard, but it's very much a minority instrument for a tiny minority of players, and there's no reason to believe any of the changes to the spec going to make a transformative difference to mainstream music.

They're mostly relevant for edge cases where people have already been getting by with 1.0 without extreme pain, but some enhancements may be welcome.


> There are very few patches where the difference between static filter settings of 64 and 65 is audible

Parent wasn't talking about a static patch, but a filter sweep. I encounter audible discretion girl often and many other people have too. Just because it doesn't show up in your workflow doesn't mean it's not a significant limitation for a lot of people.


Contrary to your experience, I find many patches have a narrow sweet spot where the transients created by a overdriven filter produce rich and mesmerizing textures.

Turn up Q and try again? :)


> Sure there are non-standard 14bit encodings

There's nothing nonstandard about 14-bit encodings. Pitch wheel and all of the standard controls are 14-bit.


For completeness, there are a few 14-bit MIDI control standards that I'm aware of, and all of them have major disadvantages.

- MPE pitch wheel: You can only have 16 controls with this method. Otherwise, it's an efficient protocol.

- NRPN/RPN: Requires 4 MIDI messages to send a single value in a proper way (control number, MSB, LSB, NULL control number). Technically you can do it in 3, or 2 if the control number doesn't change.

- CC MSB/LSB: Plows over normal CC messages so you need to explicitly set up your controller / synth to agree with each other. As far as I know the order of MSB/LSB is not defined by any standard, and IMO this makes it a broken standard.

- DX7-style SysEx: Honestly this is my favorite, but because system exclusive messages are supposed to be... system exclusive, nothing else uses this standard (except for some DIY projects I've seen).


> - CC MSB/LSB: Plows over normal CC messages so you need to explicitly set up your controller / synth to agree with each other. As far as I know the order of MSB/LSB is not defined by any standard, and IMO this makes it a broken standard.

It's well defined. The MIDI 1.0 spec says that the range of controller numbers 32 through 63 is reserved for optional LSBs for the corresponding controller numbers in the range 0 through 31. See the "MIDI 1.0 Detailed Specification 4.2" pages 11 and 12.


Your quote doesn't specify the order of LSB and MSB messages sent by controllers. Nowhere in the standard does it specify the order.

The issue is this: If MSB arrives first, you'd want to reset the LSB to 0. If LSB arrives, the value will exist with a 0 LSB for some time duration. This creates bad value jitter. A possible solution (on the synth side) is to hold the MSB in temporary memory until LSB arrives. But if the controller sends LSB and then MSB, this won't work. So you have to deal with the value jitter.

This was brought up at the committee 40 years ago and they decided "eh, not worth it, the jitter's fine." But it's not.


I understand the problem you describe, but it has nothing to do with the order of MSB/LSB being undefined.

The spec, again on page 12, says that upon receipt of an MSB message, the internal understanding of LSB should be reset to 0, so you can't set LSB first if your intent is to set both. If you want to set both MSB and LSB of a control, you have to send MSB first. If you are expecting both MSB and LSB, according to the spec, you can expect them in that order.

What actually prevents the value jitter mitigation strategy you describe is that you don't have to send the LSB in the first place. Controllers and synthesizers could here manually agree via configuration that the LSB will always be sent so that your mitigation strategy would work, or there could be a low-pass filter on the effect of the control to minimize the impact of value jitter for consecutive MSB+LSB changes.


> - CC MSB/LSB: Plows over normal CC messages so you need to explicitly set up your controller / synth to agree with each other. As far as I know the order of MSB/LSB is not defined by any standard, and IMO this makes it a broken standard.

This was set up to a degree with General MIDI 2, and I've found a -lot- of synths in the past that follow it - even to the point of implementing LSB on other controllers that are not part of the standard. This is in the murky past, however, maybe 20 years+ ago...


CC MSB/LSB is still my absolute least favorite method because you have to tell customers to manually set up their DAW/synth to handle each command, or to load a preset/config that does this for them. Of course, customers are going to mess this up, so you end up with lots of tech support issues of the form "I moved a knob on my controller, and it's setting both CC 7 Volume and CC 39, which I've already configured to control filter attack."


I'd also like to mention that MIDI 1.0 runs on a 31250 baud/s com link. This requires a balance between precision and throughput.

The bandwidth gets pretty tight with CC messages interleaved with clock and SYSEX (sample dumps, patch/pattern updates), that in a DIY sequencer, we have to do pendulum SYSEX updates to avoid disrupting the timing.

Care to elaborate the DX7 sysex? Most newer manufacturers have a long ID so the header already goes like F0 00 20 3C 02 00 ...... F7


Sorry, it doesn't look like it's 14-bit, my mistake. https://github.com/asb2m10/dexed/blob/master/Documentation/s...


Thanks!


And it wasn't even uncommon for MIDI controls to be split into a MSB and LSB value. 14 bits is plenty for most audio values.


> This is extremely misleading. Sure, the velocity input into your synth is going to be at 7-bit resolution, but at soon as the synth has it, it can play anything it wants at whatever volume it wants to based on how you have configured it. There's nothing about the external 7-bit implementation that is really limiting the dynamics of the synth itself.

If I have to manually reconfigure every sound to work with the quantization, that's a very impractical workflow.

[With reservation for that I maybe misunderstood]


By the way, seems that MIDI playback support in operating systems and browsers is petering out.

About nine or ten years ago, I had little trouble playing back MIDI files on various platforms.

Recently, I sent an old .mid file that I produced almost a decade ago to someone (at Google!) and they couldn't play it. After trying it myself, I was shocked. Browser after browser, system after system; no dice.

I ended up converting to MP3 using Timidity -> Lame.


It is possible for current browser to play midi using some JavaScript library just like you use native players to play midi. see MIDI.js.

So, for programming usage, it does not matters that much


VLC is a pretty good cross platform solution. You do have to load a soundfont, though.


MIDI 2.0 is awesome and all, but I'd be happy if Firefox supported MIDI 1.0. (only Chrome and Edgium and the like do, Firefox has been saying they will for ages -- https://bugzilla.mozilla.org/show_bug.cgi?id=836897 -- and it seems to be extremely low priority)


I might be hallucinating memories but I'd swear my old MIDI website from 2005 worked perfectly in FireFox, including playing them.


WebMIDI goes beyond just "playing" a MIDI file as audio. It allows you to use MIDI as an input as well as understanding of the SysEx messages that can be used.


I said "including playing" however. Back then, MIDI was handled by the OS, so it didn't matter if your browser supported it or not. All you did was send the raw data to the website's (admittedly crappy since my coding sucks) editor and you had your MIDI track laid down. Adjust quantization, adjust timing (because obviously you aren't doing this realtime back then) and you were golden.


Audio apps should stay out of the browser


I have a Nektar Pacer MIDI controller and was able to find a web-based programming interface for it. https://github.com/francoisgeorgy/pacer-editor

Having something that runs in browser is a benefit, as I don’t need to install a dedicated app to program the unit.

My point is that MIDI is not necessarily synonymous with “audio”.


There are many applications for MIDI that don't necessarily involve rendering audio in the browser. Most useful I've found are various patch editors for hardware synthesizers. I've also experimented with using MIDI for camera control applications, where you'd traditionally use some expensive joystick.

Another application is exactly the opposite of what you fear: offload audio rendering of some musical data to a well-optimized software synthesizer or a hardware synthesizer, via MIDI.


MIDI isn't audio. That was easy to refute.


Why?


Because "the audio thread waits for nothing." Garbage collection is unacceptable, more than a single pointer dereference to get to state is borderline unacceptable, and synthesizers (that an MIDI system would trigger) are up there with some of the most computationally demanding software you can develop even for toy projects. Especially so, even. A naively coded soft synth in C++ talking directly with drivers can easily crap out with 4-5 voices of polyphony.

Now onto why a browser is a bad place to do this. Your audio subsystem supplies a buffer to you (or you supply a buffer to it, depending on OS) and you need to fill it in a fixed amount of time to avoid a buffer underrun, or increase the buffer size. At a 48kHz sample rate and a buffer size of say, 512 (default on MacOS) you have ten milliseconds to fill it.

But you don't get all that time. As buffer sizes get smaller, the dominant factor becomes how much time it takes to get data from kernel to userspace, and from userspace down to kernal. In a browser now you have to go from kernel to user space to sandbox down to user space down to kernel.

So getting low latency MIDI input to a browser, rendering it to sound, and getting it back, is basically a terrible use case scenario in terms of latency. Yea you can do a lot when you don't care about latency, but then the question is, why would you care about live MIDI input if you don't care about latency?

Firefox's audio engine (or at least WebRender) is actually fairly impressive - I've heard things that they can get sub ms latency. But I don't really trust that you can do that and do serious audio processing, which any non-sine synth is going to entail.


A midi event stream is massively less demanding than a waveform stream. A browser could at least be a source of such events that you would route to your favorite hardware synth.

Interestingly, browsers are capable of native audio and video playback for years; somehow it wasn't such a problem, given their relaxed requirements for latency.


It's because streaming video and audio is simplex, MIDI + rendering is duplex. Round trip latency for a decent app needs to be sub 5ms, it doesn't matter if you have 200ms+ for receiving packets from a video/audio source on a web page, since you're not providing live input to get live output.


Have you actually tried it? I mean, I build music web apps that run great on Chromium browsers. They use Web Audio API to generate sounds (non sine synths.... Web Audio API allows you to do all kinds of crazy stuff with oscillators and convolvers and such), and they take input from a MIDI keyboard. On a decent computer it sounds great and latency isn't noticeable (to me anyway). I've compared to using various native software synths and don't hear a difference in latency.

Then again, I'm not expecting anyone to use my app to do a performance in Carnegie Hall. I want kids and other mere mortals to be able to have fun while learning and making music.

You don't have to use it, of course, but I'm genuinely curious why you'd want to deny this sort of thing to others.


You could make similar argument against webgl.

Sure, it's hard, but take a step back and midi is just a more sophisticated keyboard/mouse (re input, the output is just a few pixels).

Most musicians are ok with as much as 10ms latency while playing. That's plenty of CPU time. 1ms is about how long it takes sound to travel 1 feet. So how far away you are from your speakers may give you more latency that whatever current CPUs are capable off.

I'm not arguing to make web DAWs and audio plugins. But for learning piano it's perfectly fine.


Web interfaces for video and voice calls are pretty good because I never particularly want to install somebody's conferencing app.

Of course that has nothing to do with midi.


that API is such a mess, though.


Works for me. I'm able to connect a piano / midi controller to a computer, and have it work with a musical web app in Chrome.

Admittedly, I don't do much other that get the notes played on the keyboard (MIDI code, time down, time up, and velocity).

Don't need much more than that to do a whole lot of interesting things.


it can be made to work, it is mostly not literally broken.


The web MIDI API? What do you dislike about it? It's pretty straightforward and easy to use.


well, last I looked:

1) you could eavesdrop on events from instruments and you could publish midi events, but you could not actually register the browser as an instrument or an output that could be seen by the rest of the midi ecosystem

2) binary abstractions that are not very javascripty leaked up into the js objects. I remember seeing hexadecimal MIDI frames in my console that I had to decode myself- no other web API does that


It won't. Most people aren't interested in those musical subtleties; the ones that are either like their music acoustic (from ballads to grand opera) or listen to experimental electronic music like Autechre which liberated itself from the MIDI straitjacket years ago.

It will likely change music performance to some extent, making techno and other highly synchronous styles more fun and interesting to perform live than is currently practical.


imho MIDI 2.0 hits the low hanging fruit but doesn't go far enough at fixing the biggest problem in professional audio. Deterministic rendering. Basically same input makes the same output, which doesn't happen even in totally digital systems - doubly so with live (or recreated) midi events.

I'll give an example - you have your PC with $DAW_OF_CHOICE running and plug in $MIDI_2_CONTROLLER to USB and enable a track and hit record, play your stuff, and stop. When you play it back, unless the cosmic forces are exceptionally on your side, it will not sound the same as when you played it. It's subtle and sometimes ignored or even desirable, but it's there.

There are a lot of reasons why this problem hasn't been solved, some technical and others artistic. But imho, a certain grade of equipment (namely, recording/reproduction, of which the MIDI protocol is a key component) should behave identically under the same conditions and be able to reproduce a performance exactly. It's a goal that borders on absurd, but I think we could do it!

Namely - enough of this "transport agnostic" horseshit. Give me a professional event protocol that is sent synchronously with the audio, on the wire and on the board. I want my codec chips interleaving midi messages (don't care if they're 16/24/32 bit) as a separate audio channel, even if it is undersampled (eg send LRLRLRMLRLRLRM over I2C at the appropriate clock to have audio in time and MIDI undersampled by 3x per word), I want MIDI events in my interrupt and in the callback, synchronized exactly to whatever audio is coming in at the same time.

And give me unadulterated total, absolute dictatorial control over the audio drivers using a ping/pong buffer that gets mapped to memory in the audio callback for my professional application and mine alone. Minimize the kernel time spent mapping physical memory to virtual memory, like ASIO drivers but without the bullshit. I want as little overhead between the frame buffer coming in off DMA to the chip as I want without the possibility of pwning the system. Hell, give me a dedicated core!


> I'll give an example - you have your PC with $DAW_OF_CHOICE running and plug in $MIDI_2_CONTROLLER to USB and enable a track and hit record, play your stuff, and stop. When you play it back, unless the cosmic forces are exceptionally on your side, it will not sound the same as when you played it. It's subtle and sometimes ignored or even desirable, but it's there.

I'm sorry, but...what?

I'm the rare non-professional-programmer on this site. I'm a professional composer -- I spend my days writing/producing/mixing music on a computer.

Can you go into more detail of what you're talking about here? Because, no offense, but if you were right, I think I would have noticed by now.


I wouldn't be at all surprised if the process is not bit perfect; the question is: can you hear a difference on your own? Can you hear the difference if it is pointed out ahead of time (through waveform comparison)?


I would be absolutely surprised if it weren't bit perfect.

We're dealing with 7 bits of MIDI data controlling, typically at most, 48,000 samples per second. These numbers are chump change for modern computers. The bit depths can add some more complexity to that in terms of dynamic range (I'm usually working at 32 bit float), but that doesn't apply as much to the situation here.

Yes, real-time performance is a tough task (as is always stated, "the audio thread waits for nothing"), but OP here is talking about recording MIDI data, which -- once recorded -- acts as a static input controlling either a synthesizer or sampler. So let's break that down.

A digital synthesizer, unless designed with some amount of randomness (typically for "analog-like" behavior purposes), by definition is the same every time. The MIDI data in this case is going to be something like CC7, controlling output volume; CC1, controlling some pre-defined parameter (i.e. opening/closing a filter); etc. A solid representation of OP's example in this case would be "CC7 controlling output volume of a synth over a 4 second period, linearly from totally silent upwards to 0 dB." I fail to see how that could possibly change from one playback to the next, unless, again, some amount of randomness-with-same-MIDI-data is a feature of the synth's programming.

A sampler is just, at its most basic form, playing back audio. Audio is a static file; MIDI is controlling which audio is playing back. Round Robin sampling, where, say, C4 is played N number of times and X number of similar samples are called randomly so as to avoid the "machine gun effect," where literally the same sample is called, could account for "it not sounding the same," but like the "analog-like" programming of the synth above, that's on purpose, not a flaw.

I routinely deal with situations where phase cancellation null tests would reveal the kind of behavior that OP is talking about, and I simply have never come across them. And that's not even going into sensitivity in listening, which while subjective and impossible to prove, is something I put a lot of faith in.

Sorry, unless OP can point me to a solid source laying out a further explanation, I call horse shit.


"think I would have noticed by now"

I don't think so.

Many systems are non-reproducible and it's often subtle enough that professionals don't know or don't care.


>I'll give an example - you have your PC with $DAW_OF_CHOICE running and plug in $MIDI_2_CONTROLLER to USB and enable a track and hit record, play your stuff, and stop. When you play it back, unless the cosmic forces are exceptionally on your side, it will not sound the same as when you played it. It's subtle and sometimes ignored or even desirable, but it's there.

I've never encountered this with any modern system; I've been sequencing from the early 90s on Atari, still doing it now, and it makes up most of my (varied) day job - plus recording, etc. I've done tests of my own and not found anything measurable (I was an instrument technician in a nuclear facility before turning to music, FTR). There were definitely issues with early DAWs (Cubase Audio springs to mind), but I've not found anything in the modern era.

Care to share some measurable results? I'm interested in what the problem is.


Why would you ever want perfectly deterministic rendering? I've never met an audio engineer that would care about this, and I've never personally run into the problem of my audio projects being too undeterministic. We're not doing LHC experiments here, we're just making music. Who cares if a MIDI note arrives 100 microseconds late? A musician surely wouldn't care.

If it's truly a problem for you because of phasing or something, why not bake the MIDI track into an audio track before mixing?


> There are a lot of reasons why this problem hasn't been solved, some technical and others artistic. But imho, a certain grade of equipment (namely, recording/reproduction, of which the MIDI protocol is a key component) should behave identically under the same conditions and be able to reproduce a performance exactly. It's a goal that borders on absurd, but I think we could do it!

why should this always be the case? or better yet, why should this often be the case? quite a lot of digital effects attempt to behave like their analog counterparts, not their digital counterparts - we should expect a non-deterministic result.

in digital:

I send 1, then I send 1, that makes 2.

in analog:

I sent 1, then I sent 1, that makes 1 + some other stuff + 1.

it shouldn't necessarily be a goal, bordering on "absurd" or not, it's just a different expression in a different medium.

having more bits (25 of them) shouldn't change the sound profoundly, when MIDI was introduced, analog was king, and a note pressed was typically different than the same note pressed a second later. this is the same environment that has been pushed forward with hardware (again) via eurorack, and emulated quite effectively in software.

all that we've really done is smooth the digital by adding more steps, which is fantastic, but to try to "solve" a "problem" with this, other than some smoothness, is just silly.

this said as someone on their nth career writing audio software (https://svmodular.com if you're interested).


It's not about quantization error (which is quantifiable as noise) or the kind of nonlinearities you're talking about, but timing concerns.

It's basically the difference between naive automation in a DAW and sample accurate automation, it's not about the granularity of your changes but the fact that sample accuracy allows your system to reproduce the same thing every time. Not so many years ago, online renders in certain DAWs were perceptually and quantifiably different than offline renders because of things like this - you want to be able to tell a user what they hear while they work is the same when they go back and render.

With MIDI 1 and 2.0 that's rather difficult when factoring in live input due to the fact that your production system has wack drivers on top of a non-realtime OS and can't provide guarantees. MIDI 2.0 goes a good step in the direction with synchronization, but I have doubts that it will be utilized to where we can guarantee received events are replicated accordingly, due to the accuracy of reception and synchronization of clocks. Maybe we'll get it, idk.


> Not so many years ago, online renders in certain DAWs were perceptually and quantifiably different than offline renders because of things like this

and how has moving from 7 bits to 32 bits helped with this? rendering the changes in values across 32 bits is going to take a bit more cpu power than doing it across 7 bits. that's not really relevant here.

moving from 7 bits to 32 bits allows for smoother transitions - which is fantastic, but remember that the sound coming out is the culmination of a lot of different factors: having more bits doesn't change the 1+1 behavior.

> With MIDI 1 and 2.0 that's rather difficult when factoring in live input due to the fact that your production system has wack drivers on top of a non-realtime OS and can't provide guarantees.

great, so adding possibly more instability. I guess that's a "change", but probably not unless the underlying protocol is changed - otherwise no real changes: indeterministic results. and I think I'm ok with that.

midi 2.0: great, but don't expect much different - the music world has moved beyond midi (again), it should be interesting to see how midi adapts past 2.0.


I'm not arguing with you, just agreeing in a different way :D

Nonlinearity is fun. I'm a big fan of it, and have spent a lot of time on the DSP side developing NLP that can be predictable and repeatable, and all the garbage associated with making it sound good.

My issue is more that MIDI 2.0 goes towards part of the issue - e.g. if I press N keys at the same time, N messages should be stamped at the same time and be able to be rendered by the synth at the same time - but I'm doubtful that systems will be able to handle this in a deterministic way, both in recording the incoming messages and replicating them in the same way the performer intended while playing.


MIDI addresses a broader category of problems where there might not necessarily even be an audio stream.

Not unusual now, I have an analog desktop synthesizer module with a MIDI-to-CV control interface, and I simply want to connect a MIDI keyboard to it to be able to play it. My application is unconcerned with digital audio streams. Actually, I have a bunch of synthesizers, some analog and some digital. I use a hardware sequencer to sequence them. Some use their own internal sequencers, so they simply use the start/stop/clock real-time messages to stay on beat. The point at which any of this becomes an audio stream is if I start my PC and record from the mixer output.

MIDI 2.0 at least opens up to some improvements to synchronization with fine grained event timestamps. Using that together with timestamped audio buffer delivery at least allows for jitter mitigation.


Use an FPGA?

I mean that sounds pretty simple, if you think there's a market let's talk.


Silicon isn't the bottleneck here, you could probably rig it up on existing chips. I can think of a way to do it with a stereo codec and surround codec with a mux in there, it wouldn't be super cheap but for pro gear, who cares.

The big blocker is the drivers and compatibility with existing software. Audio people do not like tooling changes, and what I'm talking about is a rather fundamental change to very low level components in MacOS and Windows, the former of which is more important and already does things very well and would take a massive engineering effort with little ROI (CoreAudio is a marvel).

On Linux you could do some impressive shit, and how I'd like to do it is via a hypervisor that hogs a core for audio processing and provides an API back to the system for communication. I know there has been some work to do that already, but incorporating hardware changes to support it would be fairly high cost with fairly low ROI.

This kind of thing could be done, sure, but the money in pro audio and speed of adoption are non-ideal. You're talking 3-5 years dev cycle to get a prototype shipped and in stores, ask users to give up a lot of hardware, and for a subtle change in what they hear.

This would be a project for my free time after an extremely lucrative exit event from my current venture, and I'd have to add it to the list of pro audio paradigm shifts I'd want to work on.


Given your later explanations of what you actually mean, none of what you're talking about can possibly help.

MIDI is a serial protocol without timestamps. It is not possible for it to have "N notes with the same timestamp" because MIDI messages/events do not have timestamps. There is no notion of any time other than "now" in the MIDI protocol.

In the early 2000's there were MIDI hardware interfaces that did accept a timestamped event stream, and claimed to provide much better timing than those that just "send it out ASAP". These gained zero traction in the industry because nobody could actually tell the difference, and it required h/w-specific code, which nobody likes.

Your "how i'd like to do it" on Linux is impressively wrong in the sense that you do not need a hypervisor and you do not need hardware changes. We already do this on Linux, when desired (e.g. embedded Linux in the mixer consoles on several pro-audio companies).

I've spent 20 years writing pro-audio+MIDI software after "an extremely lucrative exit" from a previous venture. I don't think you understand what the actual pro-audio paradigms are, nor how they could be changed.



Why in the world would you reference Adam Neely of all people? He is YouTube famous, but by no means a foremost expert on any of this stuff. He literally just reads the feature list in his video and adds some simple explanation while filming himself walking around a convention. If you want to learn about something like this, talk to one of the contributors or an actual hardware/software developer.

Neely, like all YouTube explainer-celebrities, is primarily concerned with getting views and having "production quality", while leaving the audience with a vague sense of having learned something without actually having learned anything at all. His most popular videos are chock full of non sequiturs and made-up nonsense.


His comment about classical betrayed this a bit. It was pretty ignorant (in the pure sense of uninformed) and confusing.


This seems to just be an articleized version of Adam’s excellent video from several months ago.


Not a particularly informative or well-written one frankly.

"Also, with more memory, there are simply many more possible features that MIDI 2.0 can try to emulate. More memory should also reduce the chance of the timing between playing a MIDI instrument and digital recording to be slightly off. This should mean music played on MIDI 2.0 instruments will feel more analog, and make it possible for non-keyboard instruments to work better with MIDI."

Eh? What?

"The fact that MIDI 2.0 is bidirectional has two major effects. First, it means that it is backwards compatible, and won’t make the billions of MIDI 1.0 devices already out in the world obsolete."

No, backwards compatibility does not follow from MIDI 2.0 being bidirectional.

"“I think using a MIDI guitar would change the way I make music. The way our brain orients to making music on a guitar is just different to a keyboard layout. I used to have a MIDI guitar instrument, but I don’t have it anymore because I felt like there was a lot of latency and I didn’t really like the results I got. I am hoping [MIDI 2.0] will solve some of the issues I had before.”"

Well prepare to be disappointed. The problems with digital non-keyboard instruments has little to do with MIDI. In the case of a MIDI guitar, the latency problem is an issue of physics, not the digital transport.


This is referencing Adam Neely (from the article). I believe they are referencing this video [1]. Overall, Adam Neely's videos are really great. I personally really like his videos on the Star Spangled Banner and Scotch Snaps.

[1] https://www.youtube.com/watch?v=QvJhLQnuktg


Who is Adam?


Adam Neely, musician and living Jazz meme.


Can you link to the video?


I'm happy about it, but if you're able to use Open Sound Control, it's better. Rather than gibberish channel numbers, OSC lets you label your messages meaningfully, as in "/trumpet/volume 100". And it lets you send lots of data types -- numbers, strings, lists -- rather than only numbers.


Is there any reason why music production in the cloud isn't the standard yet?

High-quality VSTs requiree a lot of CPU power. Even my 16-inch MBP easily heats up once I add some more advanced VSTs.

I would rather pay X$ per month and have my music production work station in the cloud and interact with it from any old device with a fast internet connection.

Working with a buffer size of 512 samples, I currently have a latency of 11.6ms in Ableton. Adding another 10ms latency through the internet connection wouldn't be a drama for me.

Working in the cloud would allow me to easily upgrade or downgrade my system based on my needs, better collaboration with others, automatic backups, one-click access to new VSTs and samples, etc.

This set up would probably be less ideal for people who actually have to record a lot of 'real' instruments but a lot of music is only created in the box today with VSTs.

But I'm surely missing something here. Why hasn't this been a trend yet?


>Is there any reason why music production in the cloud isn't the standard yet?

Latency springs to mind, firstly. It's hard enough getting a local DAW with audio interface working fast enough reliably with a high CPU load to ensure that a performer is happy with it. Adding in journey to/from the cloud, I'd think would make that part of it a non-starter.

Your quoted 10mS is doubling what you already have, and I'd wager there's more to it than that - particularly if you have an up and downstream connection to take into account. Put the buffer size up on your Ableton setup to 30mS, and see if that is playable.


As a musician: what you say sounds nice on the marketing papers, but no thanks. Me and most of my collegues value reliability and owning the things they play with. Why? Because it is your damn instrument and it shouldn't change unless you like it to, and it should work anywhere even without internet. Something that needs a network connection to start up is dangerous, but something that relies on a decent internet connection is downright wrong. For home use — maybe — but for live use? Never.

Also 10ms more is already too much. If I had to decide between cool cloud synths and the latency I'd go for latency.


Adding another 10mS of latency makes keyboards and sample pads unplayable for anyone who can actually play - especially if that latency is variable.


> I currently have a latency of 11.6ms in Ableton. Adding another 10ms latency through the internet connection wouldn't be a drama for me.

Music is all about timing. Latency is crucial. 11.6ms is already too high for playing anything but instruments with slow attacks. Adding 10ms more would make it almost unusable for anyone that actually plays with their fingers.

What you describe could probably be used for musics that are programmed rather than played, but that's already a niche product.


If gaming can do it via Stadia, I think music production should be able to do it too. I understand the other comments about latency being particularly crucial for live performance. But there's a big difference between live performance and recording. And I could see a model where as you play live and lay down tracks, it uses a low latency local sample, but then when you playback after the fact (where latency is not important) it can leverage more advanced state of the art VSTs via the cloud.


Audio over internet would be at least 300ms latency, not sure where you're getting "10ms". Anything over 10ms is annoying, and 50ms is nearly unplayable.


Why would it be at least 300ms latency?


To send/receive a multi-channel audio/MIDI buffer to/from a server, you need to go through at least a dozen protocols, including waiting for the speed of light between you and your server. If you're in NY and your server is in LA for example, that's already 30ms gone just considering speed of light. Other factors multiply this latency by an order of magnitude.


Okay and you would say that if you optimised all of these factors you would end up with a latency around 300ms?

I just set my Ableton Live to 300ms and it was actually o.k. I think the reason is that a lot of people don't actually 'play' their instruments these days - at least in electronic music.

Instead, they program their drums by putting midi notes on the grid and then listening to the result. The same with synths etc. So when I work this way, the 300ms latency are actually bearable. Of course it would be different if I used drum pads to play my drums 'live'. But honestly I don't know many people who do that and when I watch tutorials on YouTube also almost no one is doing that. A lot of electronic music producers 'play' their instruments with their mouse button.


Access Analog (https://accessanalog.com/) is already doing something similar, and their system is around 300-2500ms latency (https://accessanalog.com/support/#1534876416634-f660a710-8f4...). A company cannot reliably offer much better than this latency, unless they have servers in all their customer's cities.

To most DAW users, 300ms is unacceptable, so any service that processes audio on a server needs to make this caveat very clear in their documentation. The problem with such a business idea is that local computers run DAWs just fine, so very few people would seek remote audio processing.


Sounds like a business waiting to be born.


My "turing test" is whether or not you can tell the difference between Stan Getz playing it and a midi device playing it.


There are three elements there though:

1) the capture of all the musical and performance input (a wind controller, etc) 2) the transmission of all that information in real-time without latency 3) the conversion of that information into the appropriate sound.

Midi is only part 2 of that - advances in other areas are needed for a fully convincing facsimile, but given the incredible improvement in sample-based libraries over the last 10 years or so (and even more so comparing something like a Spitfire Audio library to a synth string patch), it is possible that will be the case, given the investment of time, brains and money.


What if it's MIDI that was captured from Stan Getz?

What if it's Stan Getz playing live through a MIDI keyboard, into a synth?


Why not just use USB? Standardise all instruments on USB and use adaptors for backwards compatibility.


One advantage that MIDI 1.0 had (and still holds) over USB is it's simplicity to implement on low power controllers and devices.

I got into programming 12 years ago thanks to MIDI, I wanted to control some of my guitar pedals with an Arduino, and the simplicity of the protocol definitely contributed to helping me hack things together and learn. I'm not sure it would have been the case if I had to learn the contrived details of the USB stack just to send a simple ProgramChange message (2 bytes) to my pedal (which uses MIDI DIN plugs, no USB there).

I can't wait to see how to get those 2.0 extensions into the library I built for Arduino [1]. Hopefully it will remain simple for newcomers to learn and enjoy programming by interacting with their musical instruments, like I did.

[1] https://github.com/FortySevenEffects/arduino_midi_library


Most MIDI seems to be done over USB already. MIDI 2.0 is mostly unrelated to the actual connectors and such, it's more about the protocol.


"MIDI" on your Desk may well be done over USB but MIDI as a hardware protocol is absolutely rock solid and is the only choice for live music. Speaking as an occasional backstage gremlin, I minimise the USB proportion of the signal path because reduces the amount of testing required (MIDI is generally well implemented on the hardware side, but not all USB midi interfaces are made equal).

That and a lot of MIDI is instrument-to-instrument rather than to a PC (Or more realistically a Mac, because you have to have a Mac to be a creative, right...)


Windows audio latency increased to a 35ms floor in Vista and never recovered.


This is why many music apps use ASIO drivers that skip Windows audio. ASIO is awful in many ways, but latency is not one of them.


I love how clickbait gets torn apart on this site.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: