ASCII Rave in Haskell
I’ve been playing with using words to control the articulation of a physical modelling synthesiser based on the elegant Karplus-Strong algorithm.
The idea is to be able to make instrumental sounds by typing onomatopoeic words. (extra explanation added in the comments)
Here’s my first ever go at playing with it:
For a fuller, more readable experience you’re better off looking at the higher quality avi than the above flash transcoding.
As before, I’m using HSC3 to do the synthesis. If anyone’s interested, I plan to release the full source in September, but the synthesis part is available here
I love it. I once imagined something similar but certainly never attempted it!
alex, this is really cool! can you explain for those of us who are relative laymen and unfamiliar with HSC3, a bit more about how it works? to the uninitiated, it might at first seem to be simply a speech synth + vocoder… but i get the feeling from what you wrote and some of the sounds that it’s not.
very cool in any case, and i think it’s a really apropos and entertaining use of the visual aspect of livecoding!
Well it is like a really broken speech synth.
It seems in this area there’s an important balance to be struck. On one hand you don’t want to necessarily make music with a speech synth, because it’s too much like a human voice. It’s like it’s difficult to stop yourself trying to search for meaning and listen to the sound.
On the other hand I want the ease of composing sounds with text, where I can easily play around with words, having some idea of what a word will sound like. Also the results are a bit like speech, so hopefully a listener can quickly get used to how the sounds are constructed and relate to each other, because these relationships are similar to human words/speech.
I guess it’s also a bit like the idea of an ‘uncanny valley’ from robotics. Broken speech synthesis sounds nice, but if it’s more like human speech, it just sounds rubbish in comparison, or maybe even menacing.
On the technical side, the Karplus-Strong algorithm is just a delay loop with a filter, with feedback. You put some white noise in the delay loop, it feeds back on itself but because of the filter quickly smooths out (rather than making the usual nasty feedback whistles). This acts and sounds much like a real plucked string does, which is why it’s called physical modelling synthesis.
Once I made that I took two parameters; the length of the delay loop and the probability ‘blend’ value that controls how ‘drumlike’ it sounds. I picked a pair of values for these parameters for each consonant in the
English alphabet, my aim being to find a good range of sounds that sound a bit like the letters I’m assigning them to, so fricatives harsher and more percussive than more open sounds.
For the vowels I’m just applying a formant filter which really does make it sound like human vowels.
I think what makes interesting sounds though are ‘articulations’. I’m not switching between the parameter values, but moving between them quickly, creating diphthong type effects.
I hope that helps a bit…
An absolutely fantastic piece of software. I like the videos too, particularly the end of the first one….thank you!
I don’t know the precise topic of your MSc thesis and whether you have the time or inclination to explore yet more new territories with this piece of software, but I thought I might make one suggestion in case it inspires.
Have you considered adding any algorithms for the altering of the human voice due to fatigue from sustained usage, i.e. that the longer you sing for the easier it will be to reach certain notes at first as your vocal cords warm up and the harder it will then become if you continue singing as they are overstretched. You could for example set up a little bit of code that just monitors how regularly the same sort of sounds/phrases/frequencies are produced by the software and to modify them bit by bit based on the frequency of their occurance over time.
And/or have you considered adding any extra oscillitory functions to mimic the ever wavering tone of the human voice, like the simple sine wave signal modulation and harmonic frequency addition that the vocoders used in industrial/military telecommunications systems have for re-synthesizing the human voice after it has been compressed and sent over a very low bandwidth transmission medium?
Hey Ben, Nice ideas, thanks! For my MSc project I was just trying to make the simplest system possible as a proof of concept, so I haven’t begun to think around this kind of idea. But yes adding in more stuff from human voice synthesis could be a good direction for my PhD, and the ideas you mention seem like they wouldn’t take much time to implement and would add a lot of depth to the sound…
Very nice sounds and cool code instrument! I have an offside question, after watching the youtube video : how do you record console typing?
Thanks! I make the screencasts with ffmpeg under ubuntu: http://ubuntu.wordpress.com/2006/06/08/how-to-create-a-screencast-in-ubuntu/
alex, i will like to use your software, i cannot program such cute applications. please let me know when ready for the public. tx! /a
Hi Alei, I’ll be able to release it soon, and will make a post when i do
Man, this is all I ever did with my first speech synth back on my Atari ST 🙂
(Okay… once I got bored of making it swear…)
Nice to see it being updated and improved!
It’s kind of unrelated, but it might interest you 😉
Really neat stuff. I was just pondering though- I don’t know quite how the speech synthesis works, but what might be interesting is to be able to control how speech like it sounds, so when you want to remove some of the speechiness from the sound, you can just set that parameter down, and when you want to have the loop “say” something (like you did at the end with the “thank you” bit, That would be neat for a live performance. 🙂
Cool stuff, keep it up!
Yhancik – yep I have seen scrambled hackz in a basement in dortmund, beautiful stuff
Joe – interesting idea although i think a ‘problem’ is that if you read the text while hearing the sound, you will perceive it as speech whatever. The speech perception overrides normal sound perception on the tiniest of clues 🙂
great stuff ! any chance for compiled binaries for win and os x ? thanks !
Sorry wakax, I’m not really a windows person. I know that the windows version of supercollider is called “psycollider”, available here:
You’d need to get that running as well as getting my code compiled.
Good luck !
I have to say thats very cool indeed.
hey, nice stuff. maybe you already know about this but … have you heard of the “standard beatbox notation” (SBN) ?
it’s a way used by human beatboxers to write down grooves – and it’s pretty close to the onomatopoeic technique you are using there.
Using SBN in your program could be a nice way to compose beats – but I guess the sound synthesis part would have to be much more complex to handle plosives and stuff.
Yep I heard about SBN a couple of months ago and was really impressed by what the beatboxers are doing. You’re right it’s a real inspiration for this kind of work.
I’ve got a new version working with more complex synthesis and interesting phonetics, will add a new video soon.
is it possible to download Your super cool program ?
It’s all a bit old and broken, but there’s a version in HaXe available here:
I’m working on a newer version of the haskell stuff at the moment…
Alex could you help a Haskell noob get this running? I would /love/ to look at your source as I learn the ropes. Absolutely beautiful work– Brian Eno would be proud I think.
Email if you wish (you have the address) — Thanks.
Thanks for the note. Sorry I haven’t released the source yet, will do it when I get some time, hopefully next week.
Ah, actually I’ve just realised this post is about my previous version which I did release here:
the code is quite shonky but it should be possible to get it to work, let me know if you get problems.
my newer code uses waveguide synthesis and is more interesting I think — that’s what I plan to release soon.