The article makes it sound like this is a very a new idea, but physical models of music instruments, including violin, has been around for over 40 years. Daisy Bell, the first piece of computer music and performed by their model, utilized a physical model of the human singing voice based on measurements of human vocal tract, and that was done in 1962.
Julius Smith wrote pretty comprehensive textbook on the subject of building physical models of musical instruments, available online. Here, for example, is a chapter on modeling bowed string sounds: https://ccrma.stanford.edu/~jos/pasp/Bowed_Strings.html
> Daisy Bell, the first piece of computer music and performed by their model, utilized a physical model of the human singing voice based on measurements of human vocal tract, and that was done in 1962
From the article:
> As a demonstration, the researchers applied the computational violin to play two short excerpts: one from “Bach’s Fugue in G Minor,” and another from “Daisy Bell” — a nod to the first song that was ever produced by a computer-synthesized voice.
Pianoteq has mostly replaced my old Kontakt libraries in my DAW outside of course, miking my actual piano.
Also Audio Modeling has been in the business of creating physically modeled virtual instruments, including the violin (under the SWAM series), for a while now as well. You can do pretty fun things like map a USB breath controller to bow pressure, etc.
I recall in the late 1990s that physical synthesis was thought to possibly be the next big thing, that it might take over synthesis of musical instruments entirely from the options of wavetables and FM synth at the time. It didn't, but my point here is that is where it was, a prominent alternative that everyone in the relevant fields was aware of and many people tried to make work, not a recent invention and not just an obscure academic pursuit.
I often use the general algorithm for 2/1 as my "hello world" when I'm building new generative music systems. You don't need too many ingredients to set it up, and it yields some surprisingly decent sounding results.
The most recent one[0] I made was done when I was playing around with Rust, WASM, and WebAudio. (You'll need to press somewhere to start the sound)
This Sonic Pi example really blew my mind when I first heard it. Such a rich sound out of three notes.
use_synth :hollow
with_fx :reverb, mix: 0.7 do
live_loop :note1 do
play choose([:D4,:E4]), attack: 6, release: 6
sleep 8
end
live_loop :note2 do
play choose([:Fs4,:G4]), attack: 4, release: 5
sleep 10
end
live_loop :note3 do
play choose([:A4, :Cs5]), attack: 5, release: 5
sleep 11
end
end
Thanks everyone for the suggestions and kind words.
Some details:
The source code for this project can be found on github [0].
I am using an AudioWorklet node with custom DSP using Rust/WebAssembly. Graphics are just done with the Canvas API. The voice leading is done algorithmically using a state machine with some heuristics.
The underlying DSP algorithm is a physical model of the human voice, similar to the model you'd find in Pink Trombone [1], but with some added improvements. The DSP code for that is a small crate [2] I've been working on just for singing synthesizers based on previous work I've done.
Apologies for the late comment, but I had a query I wanted to share.
Would it be possible for you to create a tool that allows users to mimic human emotional sounds directly in the browser? I’m thinking of sounds like realistic coughs, sighs, gasps, and other vocal expressions like shouting or crying etc
. It would be amazing if the tool could optionally incorporate TTS, but even without it, the functionality would be very valuable for content creators or people who need custom sound effects.
The idea is to let users customize these sounds by adjusting parameters such as intensity, pitch, and duration. It could also include variations for emotional contexts, like a sad sigh, a relieved sigh, a startled gasp, or a soft cough. An intuitive interface with sliders and buttons to tweak and preview sounds in real-time would make it super user-friendly, with options to save or export the generated audio much like the project pinktrombone
I’m quite new to this field and only have basic experience with HTML, CSS, and JavaScript. However, I am very much interested in this area and I was wondering if this is something that could be achieved using tools like CursorAI or similar AI-based solutions? Or better yet, is it possible for you to create something like this for people like me who aren’t very tech-savvy?
What a beautiful idea. Sadly, I do not think I currently have the skills required to build such a tool.
The underlying algorithms and vocal models I'm using here are just good enough to get some singing vowels working. You'd need a far more complex model to simulate the turbulent airflow required for a cough.
If you suspend disbelief and allow for more abstract sounds, I believe you can craft sounds that have similar emotional impact. A few years ago, I made some non-verbal goblin sounds [0] from very simple synthesizer components and some well-placed control curves. Even though they don't sound realistic, character definitely comes through.
Dear Zebproj Thankyou for the response. I see, do you believe that tools like cursor Ai or ChatGPT can help like you I too do not have the skills to make such a tool and while I am trying to get there it will be quite sometime if I can learn those skills and implement. I really wish if someone can make my wish come true I will still however have a look at what you shared cheers Alex
Holding down a note and waiting will cause a second, then a third not to appear. When you move your held, note to another pitch, the other pitches will follow, but with a bit of delay. This produces what is known as staggered voice leading, and produces interesting "in-between" chords.
Sporth is a stack-based language I wrote a few years ago. Stack-based languages are a great way to build up sound structures. I highly recommend trying it.
Chorth may need some fixes before it can run again. I haven't looked at it in a while, but I had a lot of fun using when I was in SLOrk.
If you compare codebases, SuperCollider is definitely the more "modern" of the 2. SC is written in a reasonably modern version of C++, and over the years has gone through significant refactoring. Csound is mostly implemented in C, with some of the newer bits written in C++. Many parts of Csound have been virtually untouched since the 90s.
Syntax-wise, Csound very closely resembles the MUSIC-N language used by early computer musicians in the 60s. "Trapped in Convert" by Richard Boulanger was written in Csound in 1979, and to this day is able to run on the latest version of Csound.
Both Csound and SC are both very capable DSP engines, with a good core set of DSP algorithms. You can get a "good" sound out of both if you know what you are doing.
I find people who are more CS-inclined tend to prefer SuperCollider over Csound because it's actually a programming language you can be expressive in. While there have been significant syntax improvements in Csound 6, I'd still call Csound a "text-based synthesizer" rather than a "programming language".
That being said, I also think Csound lends itself to those who have more of a formal background in music. Making an instrument in an Orchestra is just like making a synthesizer patch, and creating events in a Csound score is just like composing notes for an instrument to play.
FWIW, I've never managed to get SuperCollider to stick for me. The orchestra/score paradigm of Csound just seems to fit better with how I think about music. It's also easier to offline render WAV files in Csound, which was quite helpful for me.
I have programming experience, but that's actually why I prefer Csound. Since Csound's engine is effectively oriented around building up instruments in a modular way, I feel it can simply be wrapped up into more general purpose programming languages to get a language with the power of the more modular synth engine.
You might enjoy my project called sndkit [0]. It's a collection of DSP algorithms implemented in C, written in a literate programming style, and presented inside of a static wiki. There's also a tiny TCL-like scripting language included that allows one to build up patches. This track [1] was made entirely using sndkit.
See my other comments here for more info about the underlying technology.
It is pretty incredible that sophisticated digital physical models of the human vocal tract were being done in the early 60s. This was able to be done largely due to the deep pockets of Bell Labs. A lot of R+D was put into the voice and voice transmission.
Julius Smith wrote pretty comprehensive textbook on the subject of building physical models of musical instruments, available online. Here, for example, is a chapter on modeling bowed string sounds: https://ccrma.stanford.edu/~jos/pasp/Bowed_Strings.html
reply