Monday, July 4, 2011

Both conditions are have been completed. While the first condition tested whether participants could tell the difference between two fully different systems, the second condition tested their ability to determine the difference between a single component: the preamplifier. In this condition, the first system contained the expensive setup from condition one (VPI HRX, Dynavector DRT XV-1s, Kimber KCAG interconnection cables, and a Manley Steelhead preamplifier) and the second system contained the same setup except other than the preamplifier, which was replaced with the Pass Labs Xono (originally from the moderately priced setup in condition one).

The second condition was run during 2 weeks: May 9-13, and May 23-27. The listening room (CIRMMT's Critical Listening Room) was setup in the same configuration as the first condition (please see previous post).

At present we are awaiting the results to be analyzed. These will be done within the next couple weeks. Below is a brief summary of the experiment.

This study investigates whether expert listeners can hear differences between PPS using closely matched components—at different price ranges—and if so, how they describe these differences and which PPS they prefer. Our motivation is as follows: if there is no audible difference between two systems in an optimal listening environment, then the discrepancy in price may not be justified for archival purposes.
For our study, we define a PPS as a system used in the playback of phonograph records. Each system is comprised of the following components: a cartridge, a turntable, a preamplifier, and two pairs of interconnection cables. All components are manufactured by well-known, reputable companies. The cost of the high-end system was approximately five times that of the mid-range system. The high-end system’s components were chosen based on component cost and by industry recommendations for optimal component choices, given the pool of equipment available to the study. The mid-range system components were selected through an iterative process of informal testing and component substitution aimed at minimizing audible differences between the two systems. Both systems were assembled and calibrated with reference to a well-reputed guide.

As the source medium used in this study is analog, the produced audio signal varies subtly with every take. For this reason, we selected unplayed phonographs as audio sources. However, even new phonographs are subject to dust and static. The phonographs were therefore vacuum-cleaned prior to each take to assure that the presence of dust was minimized and that the remaining dust was randomly dispersed across the record. Phonographs were recorded using a high-end analog-to-digital converter at 24-bit 96-kHz. Each phonograph was recorded twice on each system in a counterbalanced fashion, resulting in four Audio Interchange File Format (.aiff) files for each excerpt. Five musical phrases of 5 to 8 seconds representing standard musical genres (e.g., jazz, classical, rock) were selected; only phrases that were free from audible artifacts of the recording process (e.g., pops, hiss) were used.

Participants were professional sound engineers and audiophiles who reported listening to vinyl for several hours per week. Both listening tests were performed in an ITU standard room using an AB preference task. In each trial, participants heard two recordings of the same phonograph record, recorded on either the same or different systems, and selected the one they liked best. Participants also describe the perceived differences between the two systems in a post-questionnaire.

Each test was comprised of a training block and 4 experimental blocks of 12 trials each; each block contained recordings from a different phonograph record. To nullify order effects, block order was counterbalanced between participants, and trial order within each block was randomized. Differences in output levels across each system were mitigated by adjusting the gain of the mid-range system’s files using an audio editing software.

The first test investigated experts’ ability to discriminate between the high-end and mid-range systems described above with eleven expert listeners. A second test investigated the effect of varying a single component—the preamplifier—between two otherwise identical PPS. All other components were derived from the high-end system of the first test. The cost of the preamplifier in the more expensive system is approximately three-times that of the other preamplifier. Fourteen expert listeners participated in the second test. All results are provided using 2-tailed cumulative binomial tests.

Monday, April 11, 2011

Condition 1: underway!

We have begun testing subjects for our first of our test conditions. Here's a brief roundup of the experimental conditions as well as the setup.

Our final systems used for condition 1 are as follows:

The expensive system:
Dynavector DRT XV-1s (moving-coil) Cartridge
VPI HR-X Turntable w/ JMW 12.6 Memorial Tonearm
Kimber Kable 0.5 m KCAG interconnects (from table to preamp)
Manley Steelhead preamplifier (gain setting = 55 dB, load = 50 ohms, load caps = 0 pF)
Kimber Kable 0.5 m KCAG interconnects (from preamp to A-D converter)

The mid-range system:
Ortofon Kontrapunkt A (moving-coil) Cartridge
VPI Aries 2
Kimber Kable Tonik 2.0 m
Pass Labs XONO preamplifier (gain = standard setting (i.e., non-high-output), load = 47 ohms, load caps = off)
Kimber Kable 0.5 m PBJ interconnects (from preamp to A-D converter)

Digitization was performed via a PrismSound ADA-8XR converter (settings: peak input [0 dBFS]=+11.5 dBu, 24-bit, 96kHz, no processing). Audio was recorded in the Logic 7 recording environment also set to 24-bit and 96kHz. Audio clips (5-8 sec, depending on musical phrasing) were prepared using Audacity, and great care was taken to cut the clips as close to each other, and all clips of the same musical sample were exactly the same length, without pops. Each vinyl was cleaned using a VPI HW-17 cleaning machine.

The clips used for the test were taken from the following tracks:
  1. Miles Davis - Blue in Green (Kind of Blue)
  2. Santana - Oya Como Va (Abraxis)
  3. Steely Dan - Aja (Aja)
  4. Holst - Saturn (The Planets)
  5. Pink Floyd - The Great Gig in the Sky (Dark Side of the Moon)
Each record was played twice on both systems (resulting in 4 .aiff files for each record), alternating between systems, in counterbalanced order with at least 5 hours between recordings. During each take, the entire track was recorded; once all 4 takes were completed for a given record, musical phrases of approximately 5 seconds that were not affected by recording artifacts were identified. Only phrases that were free of recording artifacts on all 4 takes were extracted for use as stimuli.

The perceptual study took place at the Critical Listening Laboratory at the Centre for Interdisciplinary Research in Music, Media, and Technology (CIRMMT), McGill University. This is an ITU standard room providing high-quality, controlled listening conditions. The audio recordings were played back through a MAX/MSP patch on a Mac Pro computer. Digital audio (24-bit/96kHz) was sent via TOSLINK to a Grace m906, to a stereo pair of Wilson Watt/Puppy loudspeakers powered via Bryston 14B amp. Below are some images of the Critical Listening Lab and the connections used for this experiment. Here is the view from outside the listening lab:


The front and rear of the Grace m906. Digital TOSLINK enters the Grace m906, and analog audio is output to the Bryston.


The output from the Grace m906 enters the rear of the Bryston 14B:


the output of the Bryston is sent through this patch bay:


and then into the listening laboratory next door:


where they are connected to the Watt loudspeakers:

.

Procedure:

An A/B preference test was employed to determine whether participants were able to differentiate between excerpts recorded on the mid-range and on the high-range phonograph playback systems. The preference test comprises 4 blocks of 12 trials each; an additional training block of 12 trials, during which participants are acquainted with the user interface, precedes the test. The excerpts within each trial were recorded either on the two different phonograph playback systems, or twice on the same system. Each block corresponds to one of the LPs employed in the study (see table 2). Block order was counterbalanced between participants, and trial order was randomized (double- blind) within each block, to counteract potential ordering effects. A short break of 2 minutes was taken between each block.
During each trial, participants were exposed to two excerpts. A user interface featuring two boxes, labelled ‘A’ and ‘B’, was used to indicate the currently playing excerpt, by placing a cross inside the respective box. Participants first listened to the entirety of ‘A’, followed by the entirety of ‘B’. Participants then had the opportunity to repeat playback of either excerpt, switch in-place between excerpts during playback, and to pause and resume playback. Once ready, participants were asked to indicate their preferred excerpt. The next trial was triggered as soon as a choice was made.

Participants were asked for their preference, rather than whether they could tell a difference between the excerpts. Participants were instructed to choose arbitrarily between ‘A’ and ‘B’ if no difference could be identified. This strategy was chosen to avoid biasing uncertain participants towards saying ‘no’ to differentiability, even if they could have told the difference subconsciously; subconsciously-informed choices are recorded, and truly arbitrary choices are negated in aggregate due to the randomized, counterbalanced presentation of extracts.

After the listening portion of the study, participants were asked to complete a questionnaire. This questionnaire recorded basic demographic information; the participants’ level of studio experience, musical experience, and ear training; their familiarity with the stimuli employed in the study (7-point Likert scale, ranging from ‘not familiar’ to ‘very familiar’—see table 2); qualitative de- scriptions of the perceived differences between the excerpts, for individual blocks and in general (free text entry); a quantitative indication of perceived difficulty in differentiating between the excerpts (7-point Likert scale, ranging from ‘very easy’ to ‘very difficult’) ; as well as information on the participants’ listening habits, including their preferred musical genres and home listening set-up.

Results from the test will be posted as they become available.

Monday, January 17, 2011

note: a full write-up of the following pilot study will appear soon.

Over the winter break, we ran a pilot test of our perception study. The setup for this test is a A-B preference test, where recordings made on different phonograph playback systems are compared. The systems each consist of a cartridge, turntable, preamplifier, and interconnects. The key difference between each type of component in these two systems is that one is of a significantly higher price range than the other. The 2 systems are detailed below (note: the interconnects between turntable and preamp are Kimber Kable Tonik for both systems; the internconnects listed are between preamplifier and PrismSound A-D converter):

more expensive:
  • cartridge: Ortofon Kontrapunkt B
  • turntable: VPI HR-X w/JMW Memorial tonearm
  • preamplifier: Manley Steelhead
  • interconnects (preamp to A-D conv.): Kimber KCAG
less expensive:
  • cartridge: Ortofon Kontrapunkt A
  • turntable: VPI Aries 2
  • preamplifier: Pass Labs XONO
  • interconnects (preamp to A-D conv.): Kimber PBJ
We recorded five 15-second audio clips with both systems twice, resulting in 20 recordings (four for each audio example). The files were recorded with the use of a PrismSound ADA-8XR analog to digital converter, routed by firewire to Apple Logic 7. Each file was recorded in 24-bit and 96 kHz audio in Logic 7, and time-aligned using Audacity audio editor. The clips used (taken from our collection of vinyl recordings) were:
  1. Michael Jackson - Don't Stop 'Til You Get Enough
  2. Sergei Prokofiev - Dance of the Knights (from Romeo and Juliet)
  3. Pink Floyd - The Great Gig in the Sky
  4. Weather Report - Birdland
  5. Manhattan Transfer - Operator
These clips were selected as they were determined to be the most revealing in the task of distinguishing the two systems during the creation of the systems themselves.

All four recordings of each audio clip were presented in a block, which consists 10 trials. In each trial, 2 of the recordings (A and B) are first played in their entirety. The participant is then asked to select their preferred clip. Before making a choice, they may use the testing interface controls to either play A or B from the beginning, or to switch between the two files mid-playback, continue from the same position in the alternate clip. Once all 10 trials have been completed by the participant, they are instructed to take a two-minute break, before they continue to the next block. Prior to the start of the actual test, the participant is guided through a training block, which is similar to the testing blocks, except this is the participant's first experience with the interface and audio examples.

The files were played at 24-bit/96 kHz via a Max/MSP patch on a Mac Pro with a RME MADI HDSPe audio interface. Participants listened to the audio in an acoustically-treated facility through 2 Wilson Watt/Puppy speakers, powered by a Bryston 14B amplifier.

In total, 14 participants completed our pilot study. We are now reviewing the results of the test and will use these results to determine the next appropriate steps to take.

Monday, December 13, 2010

The XONO has replaced the GSP combo, and while the comparison has been made slightly closer (i.e., improved bass frequencies), differentiation between systems is still possible. We found that the mid-high frequencies were more clearly represented (increased volume perhaps) with the expensive setup as compared to the less expensive setup. Our next step is to record the less expensive turntable and cartridge through the Steelhead to determine the overall effect of the turntable and cartridge.
Last week David and I met to test out additional audio with the most recently modified setups (i.e., Ortofon Kontrapunkt A on less expensive and Kontrapunkt B on more expensive). In total we tested 4 participants. Using both the Michael Jackson (MJ) clip and a newly recorded Pink Floyd (PF) clip, we were still able to determine a difference between the 2 systems. Interestingly, the preference switched between the clips for all participants. For the MJ clip, those participants that could tell the difference chose the more expensive setup, but for the PF clip, those that could tell a difference preferred the less expensive setup.

As a result of these findings, we have chosen to attempt to replace the GSP combo preamps (Jazz Club and Elevator EXP) with the Pass Labs XONO. We will record the same audio and perform a comparison between the 2 systems as we have done previously. If these systems are also found to be distinct, the next step will be to simply record both outputs of the turntables through the same preamp, to determine the effect of the turntable/cartridge combination.

Tuesday, November 16, 2010

11/16/2010

David and I have met and tried out the patch with newly recorded testing material. The test music consisted of excerpts from Michael Jackson's Off the Wall, Dave Grusin's Discovered Again, Prokofiev's Romeo and Juliet, and The Weather Report's Heavy Weather.

The results were that we found difficulty in discerning the difference between both the Prokofiev and Grusin excerpts, while it was considerably easier to identify which of the test trials were comparing similar systems, i.e., Michael Jackson recording from the more expensive system to an second recording of Michael Jackson from the same system. It is our opinion that this discrepancy in results is founded in the repeatable phrases in the excerpts chosen: The strong reoccuring rhythms in the Michael Jackson and Weather Report create a grid within which we were able to repeatably hear and analyze the difference between the two recordings. Moreover, it was easy to use the switch button within a single playback and hear a similar if not the same phrase repeat immediately. This is not to say that the Grusin or the Prokofiev were absent of rhythm! Rather, the rhythmic repetition was either less apparent or on longer time scales.

Now what to do? Its clear that the two systems need to be made closer for this comparison. While they are certainly close, it is our opinion that we must change the components, then retest using these same audio examples. The options available to us for adjusting the systems are as follows:
  1. cartridges: we have already adjusted the cartridge, and in doing so have added a step-up amplifier (because we were using an MM cartridge originally, and now are using a MC cartridge). This change significantly improved the difficulty of the comparison. Perhaps an additional adjustment to another MC cartridge more akin to the Ortofon in use on the expensive system might be an appropriate next step, especially considering that we are using the DL-103, which is on the less expensive side of our cartridge collection (see cartridges here).
  2. pre-amplifer: before changing cartridges from MM to MC, we tested the comparison of the expensive turntable and cartridge through the GSP combo (Jazz Club and Elevator EXP) and Manley Steelhead using the variable output (set to a similar perceptual volume loudness to the GSP combo). This comparison produced a very difficult comparison.
It is also our opinion that we purchase a duplicate set of true test records: Now that we have determined that the audio used will play a *large* factor in the differentiability of the 2 systems, we think it only appropriate to assess the difficulty of the test on the actual test material, rather than relying on close material.

We already have a duplicate of Dark Side of the Moon. Our plan for this week is to adjust the system setup and retest using the albums of difficulty, then once we are satisfied, also include Dark Side of the Moon.

Monday, November 8, 2010

11/8/2010

Our testing software is now just about ready. As a first step to assure that the patch will in fact behave correctly, we will be conducting the experiment ourselves, using audio recorded from the 2 systems (but not the audio from the actual experiment). If there is a noticeable difference between the 2 systems, we will begin to modify the setup of the less expensive system such that it is more similar to the more expensive one.

Here is a screenshot of the present patch:


The patch opens first for the experimenter to enter the participants ID. Upon entering the ID and hitting OK, the patch in the background (purple) opens in full screen. Once the participant selects OK, the audio begins for the first trial. During the initial playback of both A and B, all buttons and dialogs are hidden from view, leaving only the purple box with the A and B boxes, trial number and help button visible. Once these 2 files have played, the additional functionality appears and the user may use the patch as required (e.g., replay A or B, switch preserving the timing between the 2, and pause). Button presses and selections are recorded and stored with the time from the trial inception. Each block consists of a single musical passage, recorded twice on each system. This creates 4 versions of the passage, which will be compared in 8 trials per block, presented in a double-blind order. The trials are as follows (order is *not* preserved):
  1. A1, A2
  2. A2, A1
  3. B1, B2
  4. B2, B1
  5. A1, B1
  6. B1, A1
  7. A2, B2
  8. B2, A2
The next blogpost will display the results of the initial experiment as run on Guillaume, David and myself (the developers of the patch), as well as recommendations as needed for a closer comparison between the 2 systems.