Nobody really seems to have gotten it right yet, as far as I'm concerned, mostly because they haven't combined good algorithms with an easy UI.
I gave up on writing Visual DJ myself. Why? Well, it just seemed like too much work for too little payoff. But I put up this webpage in the hope that it may influence other software designers to implement a few of these ideas in their own products. If this page influences your design, all I ask in return is that you provide me with a lifetime of free licenses to the software (smile).
This document is a little rough. I'd like to add some more to it eventually, such as illustrations and so on. We'll see if it happens!
High level:
Select from a set of songs a sequence of music that "sounds good" together, and produces the desired effect in the audience. It often "tells a story".
Low level:
You have two turntables, and a box in the middle which can mix the sounds from these turntables. While turntable #1 is playing, place a record on turntable #2, and
If we represent measures as # and a switch in the pattern as |, then
record #1 ########|########|################|########|########|
record #2 ########|########|########|########
(record #1 is playing) (both records) (record #2)
Problem #1: making two records go the same speed
The primary skill they work on for about the first year is "beat matching", or making the two records go the same speed and have the beats hit at the same time.
This is essentially hand-to-ear coordination. It is a surprisingly difficult task to master.
The input device is a pitch control on both turntables, which adjusts the pitch of the record from -8% to +8%. You can also directly manipulate the record, by moving it forwards or backwards. This gives you both a position control and velocity control. As you can imagine, controlling the velocity is difficult.
Lets say the record you are trying to match is around 129.3 beats per minute (bpm). You will move the pitch on the second record back and forth, until you zero in on the correct pitch. (An interesting side project would be to find out if this works like Fitts law. Does it work the same way for audio? It seems to be an acquired skill, with better DJs able to do it much quicker than novices.)
Ive observed that DJs tend to get very bogged down in the technical details of their craft, and as a result are not able to concentrate on the aesthetic portions. It simply requires too much brain power to focus on the technical details and the aesthetic details simultaneously.
They are concentrating on how to do the task, rather than on the task itself. Sounds like something from Bill Buxton's Chunking and Phrasing paper?
Problem two: phrasing and music structure
After the first year, most DJs will have figured out how to match beats, and make two records go the same speed. What they have not usually mastered is phrasing, which is knowing how the structure of songs work, and matching the phrases of the two records together.
If a DJ understands phrasing, they will know how to add in a new song when it is appropriate. Because most electronic music has similar phrasing, it allows the DJ to layer music and have the resulting effect sound correct.
Phrasing requires an understanding of the structure of the music, and using it requires a deep memory of the structure of thousands of songs already played.
This means that phrasing is something that some DJs dont learn for several years. Precocious DJs will, and they will stand out because their performances will sound better. But again, this is an example of an area where DJs spend so much time mastering a technical skill that theyre unable to focus on the task rather than the mechanics of the task.
Surprisingly, nobody seems to have developed a notation system for the structure of the music. At least there is not a system that DJs use.
Essentially, the task is reduced to two steps:
In this stage, the DJ marks up the music, showing as much useful information about the song as possible. In addition, the computer attempts to mark up any part of the music it can by algorithmic means. For example, there are probably algorithms for determining the BPM for a song, and the placement of the beats.
Here are the types of markup that might be useful:
Out of the entire collection of a DJs music, there is a set of music which sounds good with the current song that is playing.
The DJ is shown a list of music that he has identified as sounding good with the current song (song #1-3). He can also choose any other song by name from his collection.
Information on the structure of the current song, and songs that would go well with the current song, is displayed on the screen.
The DJ can see if the beat structure matches up, because the markup shows where the beats are. They also can see any other visualization were able to represent meaningfully. (What can be done meaningfully is a good research question).
The DJ just drags the desired song to the location indicated on the current song, and then theyre able to mix the two records.
The computer handles the task of beat matching, and the phrasing is taken care of because the music markup reveals all the structure that phrasing requires. When the DJ drags the second song to the first, it jumps to spots on the song that represents phrase changes.
The DJ thus has a far easier task than at present. Instead of trying to master hand-to-ear coordination with beat matching, they go through a simple mark up process.
They transform the task from how to do it to what to do to sound good.
A Guide to Bird Songs, by Aretas A. Saunders 1951 Doubleday & Company Golden City, New York
The songs of birds vary according to five characters. These are time, pitch, loudness, quality, and phonetics. By combination of these characters we get form, rhythm, accent, and other attributes.
Musical notation describes pitch, time, and loudness, and to such a record notes on quality and phonetics may be added.
I have devised a method by which the time, pitch, and loudness of bird songs may be recorded as accurately as by musical notation, and at the same time more quickly and without the air or detailed knowledge of musical symbols.
Very clever uses lines to represent the sounds. Thickness of lines represents loudness. Position vertically represents pitch Position horizontally represents time Gaps represent rhythm Wavy lines represent trills Curves represent slurs Loops represent liquid consonants. Quality of song is represented in text Annotated.