Haskell patterns ad nauseam

TL;DR I’m now describing algorave music as functions from time ranges to lists of events, with arbitrary time precision, where you can query continuously varying patterns for more detail by specifying narrower time ranges.

For more practical demo-based description of my current system see this post.

I’ve been restructuring and rewriting my Haskell pattern library for quite some time now. I’ve just done it again, and thought it would be a useful point to compare the different approaches I’ve taken. In all of the following my underlying aim has been to get people to dance to my code, while I edit it live (see this video for an example). So the aim has been to make an expressive language for describing periodic, musical structures quickly.

First some pre-history – I started by describing patterns with Perl. I wrote about this about ten years ago, and here’s a short video showing it in action. This was quite frustrating, particularly when working with live instrumentalists — imperative language is just too slow to work with for a number of reasons.

When I first picked up Haskell, I tried describing musical patterns in terms of a tree structure:

data Event = Sound String
           | Silence
data Structure = Atom Event
               | Cycle [Structure]
               | Polymetry [Structure]

(For brevity, I will just concentrate on the types — in each case there was a fair amount of code to allow the types to be composed together and used).

Cycles structure events into a sequence, and polymetries overlay several structures, which as the name suggests, may have different metres.

The problem with this structure is that it doesn’t really lend itself to live improvisation. It represents musical patterns as lists embedded within lists, with no random access — to get at the 100th metric cycle (or musical loop) you have to generate the 99 cycles before it. This is fine for off-line batch generation, but not so good for live coding, and is restrictive in other ways — for example transforming events based on future or past events is awkward.

So then I moved on to representing patterns as functions, starting with this:

data Pattern a = Pattern {at :: Int -> [a], period :: Int}

So here a pattern is a function, from integers to lists. This was quite a revelation for me, and might have been brought on by reading Conal Eliot’s work on functional reactive programming, I don’t clearly remember. I still find it strange and wonderful that it’s possible to manipulate this kind of pattern, as a trivial example reversing it, without turning it into a list of first order values first. Because these patterns are functions from time to values, you can manipulate time without having to touch the values. You can still generate music from recursive tree structures, but with functions within functions instead of in the datatypes. Great!

In the above representation, the pattern kept note of its “period”. This was to keep track of the duration of the cycle, useful when combining patterns of different lengths. This made things fiddly though, and was a code smell for an underlying problem — I was representing time with an integer. This meant I always had to work to a predefined “temporal atom” or “tatum”, the lowest possible subdivision.

Having a fixed tatum is fine for acid house and other grid-based musics, but at the time I wanted to make structures more expressive on the temporal level. So in response, I came up with this rather complex structure:

data Pattern a = Atom {event :: a}
                 | Arc {pattern :: Pattern a,
                        onset :: Double,
                        duration :: Maybe Double
                       }
                 | Cycle {patterns :: [Pattern a]}
                 | Signal {at :: Double -> Pattern a}

So lists are back in the form of Cycles. However, time is represented with floating point (Double) values, where a Cycle is given a floating point onset and duration as part of an Arc.

Patterns may also be constructed as a Signal, which represents constantly varying patterns, such as sinewaves. I found this a really big deal – representing discrete and continuous patterns in a single datatype, and allowing them to be composed together into rich structures.

As with all the other representations, this did kind of work, and was tested and developed through live performance and audience/collaborator feedback. But clearly this representation had got complex again, so had the supporting code, and the use of doubles presented the ugly problem of floating point precision.

So simplifying again, I arrived at this:

  data Pattern a = Sequence {arc :: Range -> [Event a]}
                 | Signal {at :: Rational -> [a]}
  type Event a = (Range, a)
  type Range = (Rational, Rational)

This is back to a wholly higher-order representation and is much more straightforward. Now we have Sequences of discrete events (where each event is a value which has a start and end time), and Signals of continuously varying values. Time is now represented as fractions, with arbitrary precision. An underlying assumption is that metric cycles have a duration of 1, so that all time values with a denominator of 1 represent the end of one cycle and the beginning of the next.

A key insight behind the above was that we can represent patterns of discrete events with arbitrary temporal precision, by representing them as functions from time ranges to events. This is important, because if we can only ask for discrete events occurring at particular points in time, we’ll never know if we’ve missed some short-lived events which begin and end in between our “samples” of the structure. When it comes to rendering the music (e.g. sending the events to a synthesiser), we can render the pattern in chunks, and know that we haven’t missed any events.

At this point, things really started to get quite beautiful, and I could delete a lot of housekeeping code. However, I still wasn’t out of the woods..

Having both Sequence and Signal part of the same type meant that it was somehow not possible to specify patterns as a clean instance of Applicative Functor. It meant the patterns could “change shape” when they are combined in various ways, causing problems. So I split them out into their own types, and defined them as instances of a type class with lots of housekeeping functions so that they could be treated the same way:

data Sequence a = Sequence {range :: Range -> [Event a]}
data Signal a = Signal {at :: Time -> [a]}

class Pattern p where
  pt :: (p a) -> Time -> [a]
  atom :: a -> p a
  silence :: p a
  toSignal :: p a -> Signal a
  toSignal p = Signal $ \t -> pt p t
  squash :: Int -> (Int, p a) -> p a
  combine' :: p a -> p a -> p a
  mapOnset :: (Time -> Time) -> p a -> p a
  mapTime :: (Time -> Time) -> p a -> p a
  mapTime = mapOnset
  mapTimeOut :: (Time -> Time) -> p a -> p a

I’ll save you the instance declarations, but things got messy. But! Yesterday I had the insight that a continuous signal can be represented as a discrete pattern, which just gets more detailed the closer you look. So both discrete and continuous patterns can be represented with the same datatype:

type Time = Rational
type Arc = (Time, Time)
data Pattern a = Pattern {arc :: Arc -> [Event a]}

Much simpler! And I could delete about half of the supporting code. Here’s an example of what a “continuous” pattern looks like:

sig :: (Time -> a) -> Pattern a
sig f = Pattern f'
  where f' (s,e) | s > e = []
                 | otherwise = [((s,e), f s)]

sinewave :: Pattern Double
sinewave = sig $ \t -> sin $ pi * 2 * (fromRational t)

It just gives you a single value for the range you ask for (the start value in the range, although on reflection perhaps the middle one or an average value would be better), and if you want more precision you just ask for a smaller range. If you want a value at a particular point, you just give a zero-length range.

I’ve found that this representation actually makes sense as a monad. This has unlocked some exciting expressive possibilities, for example taking one pattern, and using it to manipulate a second pattern, in this case changing the density of the pattern over time:

listToPat [1%1, 2%1, 1%2] >>= (flip density) (listToPat ["a", "b"])

Well this isn’t fully working yet, but I’ll work up some clearer examples soon.

So I hope that’s it for now, it’s taken me a ridiculous amount of effort to get to this point, and I’ve ended up with less code than I begun with. I’ve found programming with Haskell a remarkably humbling experience, but an enjoyable one. I really hope that this representation will stick though, so I can concentrate more on making interesting functions for transforming patterns.

In case you’re wondering what the mysterious “a” type is in the above definitions of “Pattern a“, well of course it could be anything. In practice what I end up with is a pattern of hashes, which represent synthesiser control messages. I can represent all the different synthesiser parameters as their own patterns (which are of different types depending on their function), and combine them into a pattern of synthesiser event, and manipulate that further until they eventually end up with a scheduler which sends the messages to the synth. For a close up look at an earlier version of my system in use, here’s a video.

The current state of the sourcecode is here if you fancy a look, I’ve gone back to calling it “tidal”. It’s not really in a state that other people could use it, but hopefully one day soon.. Otherwise, it’s coming to an algorave near you soon.

As ever, thanks to those who have given me advice along the way.

14 Comments

  1. While I do like the idea of treating continuous signals as discrete signals with flexible resolution, I’m concerned and wary that your observed audio will be unstable due to aliasing on query ranges.

    Consider visual scene-graphs, and ‘level of detail’ constraints. The idea in (good) visual scene-graphs is that nodes provide some high-level metadata, e.g. to support occlusion, collision detection, volumetric shadows, and pre-GPU clipping. An outer-node may be a container for inner-nodes, and the level of detail pursued on a node is typically determined by its size and camera distance (though one could also heuristically leverage ‘importance’ or perhaps some information about hue and contrast). We can “zoom in” on a visual feature to get more detail. Those details could be procedurally generated, but if so it’s very important that the details don’t shift about or wibble as we zoom – i.e. details can become more detailed, but they shouldn’t be allowed to wibble, pop-in, or vanish as we zoom, pan, or tilt the camera. (Sudden changes in contrast, e.g. a grey stone floor suddenly turning brilliant red due to a thin layer of tiny rose petals, must also be avoided. I’ve occasionally contemplated how to fit such requirements into a type system…)

    An audio description should probably have similar stability characteristics. Important audio features (by whatever heuristic – volume, distinction, awesomeness) ought be visible audible with greater precedence. And stability under different frequencies, time-shifts, and composition seems similarly important for audio as it is for visuals.

    I don’t have any solutions in mind. Just concerns. Audio isn’t something I have much experience with compared to visual graphs.

  2. Hi David,

    I don’t think I’ve seen the problem that you hint at. can see this could be a concern, but do not see how it is relevant here.

    In the end everything has to be discrete on a digital computer. This system allows continuous structure to be represented, which you can then sample at the limits of perception if you wish (well, twice that). I’m working with discrete structures of sounds (not the audio signal directly), so I just apply the continuous structure to a discrete structure with perfect accuracy. I don’t see where instability comes into it.

    alex

  3. Artifacts of sampling, such as aliasing, can be perceived even if you sample at or above the limits of perception. Depending on the sound being sampled, your model may return perceivably different sounds based on, for example, whether you sample at 44000 Hz or 48000 Hz, or whether you start sampling at ‘t=0’ vs. ‘t=1/7 ms’. (The latter can be an issue for linear composition of sounds.)

    Anyhow, the stability I describe is stability of a model relative to different sampling policies. Whether individual samples are perfect is not the most important concern for continuous models. A little bit of fuzziness – e.g. over-sampling, taking pseudo-random subpixel super-frequency offsets, then averaging – would actually help mask artifacts.

  4. Yes I know about aliasing, thanks.

    I’m still not sure what you mean by stability, but get a sense of it. I’ve developed this system through development of practice in tandem with theory, so find it hard to generalise to different scales/domains. A concrete example might help.

    It would be fun to apply this to the visual domain. I did do a VJ performance a couple of years ago where I used it to describe colour transitions.

  5. So I guess the main question I have is what the list buys you – if you only use sig and fmap, then you’re dealing with Time -> a, not Time -> [a].

    David has a post on why Time -> a is a bad idea: http://awelonblue.wordpress.com/2011/07/19/signals-in-rdp/

    From what I can tell, everything gets passed through seqToRelOnsets – so the final data type which the synthesizer sees is [(Double,OscMap)]. I guess abstracting out Double to Time is a good idea, and OscMap to a, so why not [(Time,a)] as the representation?

    Alternately, you could just borrow David’s signal type: https://github.com/dmbarbour/Sirea/blob/master/src/Sirea/Internal/SigType.hs https://github.com/dmbarbour/Sirea/blob/master/src/Sirea/Internal/DiscreteTimedSeq.hs
    I’m not certain it’s 100% equivalent to [(Time,a)], but a few tweaks should allow equivalent expressiveness.

    One more thing: how does one use Tidal? A cabal file would be helpful. I’ve installed netclock and its dependencies, and compiled Dirt (no idea how to get Jack working on ubuntu, though); does the emacs file do the rest?

  6. Hi,

    Thanks for this, good to have this challenged. I guess I’d be a bit sad if it turned out all the above work is for nothing, and a simple list would be better. But I wouldn’t want to continue in ignorance..

    I use Time -> [a] because I’m dealing with polyphonic music.

    I’m only describing the representation above, not how to use it, except in a couple of cases to help explain the representation. I’m using more than sig and fmap.

    Time is a rational number, to accurately represent subdivisions of a rhythmic cycle without compounding rounding errors through multiple pattern transformations. It is ultimately turned into floating point before sending to the synth for fine-grained scheduling, but accuracy is maintained up to that point.

    Also, the whole structure is not turned into [(Double, a)] – only a small chunk of it is for buffering, I think a quarter of a cycle. If things are going at 120 cycles per minute, that’s half a second at a time. If latency started mattering to me, it could be much less.

    [(Time,a)] would make many pattern transformations difficult.

    I’m assuming you’re suggesting that the list should be ordered by Time, but still to find particular time range you’d have to search forward through the whole list. To manipulate events based on prior events, you’d either have to hold some kind of cache or do a lot of searching through the list from the beginning of time. With granular synthesis, the list would get very big.

    This is made worse because the pattern would be mutable, replaced through the process of live coding, and each time you’d have to regenerate the whole performance to get back to the present moment.

    There are other arguments for representing patterns as functions, but I think the above nails it.

    Have a look at my follow-up post, it might help motivate this a bit more.

    If you still want to convince me towards lazy lists though, I’d love to hear more.

    I think I’m the only Tidal user at the moment. The netclock stuff requires a server running in supercollider, and the darcs repo seems to be offline at the moment. It should be fairly trivial making netclock optional to support a simple single-process mode. I’ll try to get to this soon..

  7. I tried representing Time with Rationals early on, but profiling found it to be a major sink for space and CPU for high-frequency signals (i.e. above 1kHz). I eventually switched to a fixpoint representation of time, currently with nanosecond precision. A fixpoint representation of continuous time is qualitatively different than your temporal atoms; there is no `next` instant since it’s logically continuous (and the precision is subject to transparent change), and the precision is much finer than even one microsound.

    I am interested in representations of sound that allow rapid indexing to a current instant. I find interesting the possibility of modeling sounds on a spirograph (or harmonograph), i.e. where all sounds are cyclic but cycles can be layered within cycles of fixed relative radii, and there is a clear time component. Issues with building implicit state while processing events over time (e.g. with folds, accumulators, integrals) remains the same whether you use time-ordered [(T,a)] or T->a. With cyclic representations, however, we should be able to place upper bounds on how much information is accumulated on a given cycle.

    An accurate record of any live programming performance must record all the code changes/states, and precisely when they apply. The idea of regenerating the performance with the new code doesn’t seem desirable to me. Instead, I’ve developed semantics to help me model code as changing at a particular instant. State is modeled as external, like a database or filesystem. Identifiers for state (e.g. filenames) are stable, very resistant to accidental change. The new code can seamlessly pick up where the old code left off. (Some rework is necessary, e.g. in case of speculative evaluation, caches, memoization. But none of the rework is about recomputing state.)

  8. Hi David,

    Yes as I’m working at musical control rate rather than audio rate, I have to try quite hard to reach hardware limits, even on my ancient laptops.

    I’m not sure what you mean by “fixpoint representation of time”, I’m guessing it’s related to this? http://en.wikipedia.org/wiki/Fixed_point_(mathematics)

    I’m with you on spirographs though, my conception of time as cycles within cycles came from thinking about gears. One day I’d love to create a hybrid computer, involving a CNC router for printing out gears to live code the analogue aspects of the machine.

    I suppose I’m not really modelling cycles within cycles in the representation though, but with functions.

    I’m also with you on recording code changes. As I think I’ve said before, I don’t have any state apart from the current pattern, and the current time.

  9. I should have used “fixed point“. This is less expressive than rationals, yet (assuming microsecond precision or tighter) more than sufficient to express both music and non-music sound, whether at control rate or audio rate.

    It’s good to minimize state.

    I’ve been trying to take that to an extreme, contemplating how to get rid of start time and current pattern as state, and make sound a function of the code and universal time. Though, I am also interested in ‘virtual’ instruments that might leverage sensors – camera, LEAP motion, myo, emotiv, etc. – in controlling sound, which makes for some difficult latency requirements.

    So far, I’ve not found a good way to be rid of ‘current pattern’ state entirely (can’t abort patterns at arbitrary points in time without audible artfacts). This is one of the reasons I’m interested in stability and cyclic representations, i.e. to ensure smooth sound switching.

  10. Ah yes, I guess fixed point is good enough for ntp.. Rational numbers are a neat fit for me though, and I fear rounding errors would still cause me problems when mixing discrete and continuous patterns. A miniscule rounding error in a continuous pattern could translate to the difference between night and day in a discrete one, right?

    I suppose what I’m doing makes *music* a function of code and universal (or at least unix) time, even if sound is handled by a separate synth doing sample playback. I don’t like the idea of handling separate note on/note off/parameter change messages, so set all sound parameters with a single sound onset message.

    I tend to introduce code changes at the start of metric cycles, although where polyrhythms are involved, the dominant cycle is often perceptually ambiguous.

    Of course I’m not dealing with audio directly so don’t get audio-level discontinuities. SuperCollider’s proxyspace avoids such things by crossfading between old and new code.

  11. Rounding errors are not a problem for fixed point time. They won’t occur for adds or diffs; only expanding or compressing on the temporal axis can introduce rounding error. And even that error won’t accumulate unless you perform some repeated expand-then-add operation.

    Re: what I’m doing makes *music* a function of code and universal (or at least unix) time

    I had the impression your music was a function of code and relative time (or equivalently: code, universal time, and start time).

    Elimination of start time has a pervasive impact on how we model sound. For example, you cannot “regenerate from the beginning” because there is no beginning of universal time. Nor can you directly add one sound to run ‘after’ another, since that doesn’t make sense for two sounds that are both functions of universal time (though there are indirect ways to model this).

    If we are rid of the traditionally implicit ‘start time’, we don’t need to synchronize it between multiple observers that independently compute the sound. E.g. if we grafitti code on a wall, everyone who looks at it with their AR glasses should get the same sounds at the same time (within the limits of GPS-synchronized hardware clocks).

    I’d be a little upset if someone claimed a behavior is “a function of code and universal time” when they actually mean it’s “a function of code and universal time and a start time”. Those are vastly different situations.

    Re: not dealing with audio directly

    Direct work on audio can be nice, up to the point of firing it off to JACK or PortAudio. There are fewer limits that way. Haskell would be a decent language for building a richly typed alternative to SuperCollider, perhaps leveraging plugins for runtime code generation. Some of the existing intermediate libraries would probably work quite well.

  12. Hi David,

    I do a lot of expanding and compressing along the time axis. For example each discrete note event generally starts off as a value lasting one whole cycle, and repeating forever. To concatenate five patterns together would involve compressing each pattern by a factor of five. This can happen to many levels of depth, so the density function ends up being called a lot:


    density :: Time -> Pattern a -> Pattern a
    density r p = mapResultTime (/ r) $ mapQueryTime (* r) p

    Not sure if you misunderstand my post – when I was talking about having to regenerate a performance from the start, it was as a drawback of an early system I quickly discarded. At this point I don’t record the start of a performance, and regenerating a whole performance is impossible without reverse engineering the undo-tree elisp output, which I haven’t got around to do, and would only have integer second-level precision.

    I do sometimes record edits using my editor (undo-tree-mode in emacs), I suppose when I do the timestamp on the first edit records the start of the performance, but I don’t have access to that from the haskell. It might be interesting if I did, though – I’d love to be able to manipulate history.

    As I remember, Time is in cycles (not “physical time”) relative to the system start time of the clock server. So a global, arbitrary epoch. I think this qualifies as your ‘universal’ time.

    When it comes to mapping Time to “physical time”, the current BPM, the physical time it changed, and the current physical time time is needed.

    If I wanted to have one event occur after another, I’d need to fmap (const new-value) old-pattern, then (inter-onset-value ~>), then combine the new pattern with the original.

    I like the idea of doing synthesis with Haskell, if I had more time..

  13. If you use your density function a lot, then rationals will work better than fixpoint for you. OTOH, if you used fixpoint, you’d probably find a way to express the same behavior with at most one use of time-division. (Aside: the first type to your density function should be ‘Rational’ for documentation reasons. I found it confusing that you were multiplying one time by another until I realized you were treating it as a unitless scalar.)

    Your music seems to be a function of your code and the state of a clock server, which does seem close in nature to universal time. I’m curious whether you could effectively tweak it to actually be universal time: how much does the quality and content of your performance depend on starting with cycle count near zero in the clock server? If the clock server started at some deterministic function of Julian date, and moved at a deterministic rate, could you still render an effective performance? (You’d have less global, stateful control over BPM, but you could model something similar in code.)

    Recording edits at the editor seems awkward and imprecise, e.g. it doesn’t account for compile time or file-system delays. For greatest precision, you need to record when edits logically apply in process, which may require more formally and explicitly modeling when code is swapped into the model.

  14. It would make no difference if I counted cycles from a different epoch, but sharing BPM period/phase is important for multi-process and collaborative performance.

    I agree recording edits and editor is imprecise, but it’s good enough for me for now. In practice I don’t use the recording at all, I generally just write code for the moment and am happy to delete it after. I see there are some interesting applications ahead for modelling edits properly though!

Leave a Reply

Your email address will not be published. Required fields are marked *