Once the database is built, the user must provide an orchestra definition file to tell SPORCH which combinations of instruments it can use. Any sound recording can then be analyzed and orchestrated by executing SPORCH's search engine and specifying the source sound file. When finished, the program outputs the results of its analysis, either a text printout or a C data structure depending on how the program is used (as the sporch command-line program or by loading libsporch.so and calling its API directly). This may then be resynthesized for verification or passed to other functions for refinement and output to a notation program.
The orchestration algorithm works as follows:
The algorithm essentially finds instrument/pitch/dynamic level tuples whose frequency contents add together to form a composite spectrum that has some crude resemblance to that of the source. The amount of similarity varies depending on the source used and the instrumentation specified. In general, sound sources with a strong pitched element produce orchestrations that sound relatively close with respect to pitch and timbre. Sound sources that contain noise, however, are also useful when matching them with pitched instruments—the algorithm attempts to approximate the noise by selecting a somewhat random but biased collection of notes.
When developing the application, several different methods of peak matching (or different types of error margins) were tried. The ones originally expected to give the best results were based either on the frequency discrimination curve or critical bandwidth curves—in other words, the error margin changed depending on the frequency values under consideration. The results were much worse in quality than when the static error margin described above was used, the most obvious discrepancy being a difference in perceived pitch. Since the results were chords made of pitches quantized to either 12 or 24 equal divisions of the octave, any difference of more than a semitone or quartertone between the most prominent pitch components was heard simply as a different pitch. The conclusion was that when forming chords based on semitone divisions of the octave, the error margin must be half of a semitone distance (half of the distance between neighboring pitches). For quartertone scales the distance is half of a quartertone.
The procedure outlined above is executed when the software is run at its lowest “search depth” setting. When set to a higher level, the algorithm may split the search at any point, considering the best two or three instrument choices rather than just selecting one. Multiple search paths are opened up only if their scores are close to each other (within some threshold level). The higher the search depth setting the more exhaustive the search, the highest setting being a complete search through every possible combination. This heuristic causes the algorithm to increase the search only when choices are relatively close.
SPORCH also assigns numerical values to each of its matches which may be interpreted as a confidence value or a rating of the contribution of the instrument to the total match. This value is the relative amount that the instrument/pitch has subtracted from the original in terms of the same scoring procedure described above. It is somewhat related to the dynamic level chosen and is useful for determining which instrument/pitch tuples are the most important. A single, final confidence value is also output showing the total percentage amount of subtraction that was done on the source. Informal listening tests have shown that the value is useful in terms of estimating whether the result is similar with respect to timbre but not necessarily with respect to pitch.
Although the spectrum of the orchestration typically contains many more prominent frequency components than that of the source, most of the original partials are present. The energy of the sound in both examples is also usually concentrated in the same frequency areas. When comparing the two aurally, the timbre of the orchestrated sound is very similar to the timbre of the original, given the fact that the texture of the two sounds are most likely completely different. The subtraction procedure then accomplishes at least two things that are significant in matching the timbre of the original sound: