|The Seventh Annual Interactive
PROJECT BAR-B-Q 2002
Group Report: Maximizing the Resources Available to Achieve Quality Game Audio
|Participants: A.K.A. "The Harshmellows"||Clint Bajakian; Bay Area Sound Dept.|
|Alexander Brandon; ION Storm||Jack Buser; Dolby Laboratories|
|Mike D'Amore; Kensei Consulting||Mark Griskey; LucasArts|
|David Javelosa; AET Santa Monica College||Matt Levine; Uncle Vector's Audio Lab|
|Julian Kwasneski; Bay Area Sound Dept.||Rob Rampley; Line 6|
|Jay Semerad; University of Michigan||Tom White; MMA|
|David Zicarelli; Cycling '74||
Facilitator: Van Webster; Webster Communications
Game audio professionals feel that audio in games is not given enough attention or resources because the value of audio (and how to maximize that value) is not generally understood by people who are often in charge of those decisions.
A. Write a document which:
B. Create and execute a research project to quantify audience reaction to specific audio values.
Attributes of "Quality Audio"
The working group identified the following attributes that must be addressed when creating a sound track in order to achieve "quality audio" for interactive entertainment:
Style sets the mood of the game's world. Just as the style of the gameplay and graphics is well-defined at the beginning of a project, so should the game audio. It is important for your team to consider the expectations of your audience.
Style applies to all four principle elements of audio.
Appropriate use of style helps establish scale, mood, and time period of the world. It helps immerse the gamer in the credible environment.
A good stylistic approach establishes:
Original sounds create a unique identity for your product. Original content is an important avenue for expression and gives the content creators maximum control over the experience such as character development, story reinforcement and lend to the overall impact of the title.
Key benefits of original content include the following values:
Using original sounds is an opportunity to create an identity for a product. It generally requires far fewer resources to create a memorable and unique melody or sound effect that sticks in the user's mind than it is to create a unique "look" for a product using graphical imagery. And just as a visual identity is essential to differentiating a product, original elements in the audio allow for branding enhancement. They also enhance the experience of using the product, by allowing for thematic and character development by using variations on the original content. If the action is supposed to be more intense, but the soundtrack stays the same because there is no way to modify the licensed content, the impact of the title will be shortchanged.
Everyone knows the theme to Star Wars or James Bond. If recognizable and distinctive audio content is a part of a title, there may be additional business advantages. For one thing, the material can be repurposed, such as in the form of soundtrack CDs. Plus, copyright and licensing issues are minimized. It all adds up to enhancement of the overall value of the product's assets.
Using audio assets from established products creates comfort, allows you to build on existing emotional connections, adds recognition and aids in the immersive experience.
Using audio assets from existing sources can create a sense of familiarity by leveraging existing emotional connections. While it can be more difficult to create a strong thematic identity with familiar content than with original content, familiar content can provide a point of reference for users, enhancing the overall immersive experience. For example, in a title that refers either to a space / sci-fi genre, it would be silly not to be consistent the existing vocabulary of spaceship sounds established over the last 50 years. If the title is directly based on a film, the use of sound effects from the film may be extremely important in establishing the authenticity of the relationship between the title and its source. Another use of familiar content is in establishing an appropriate time period for the title. What would be playing on the radio while the action was taking place?
Care should be taken when using audio content originally produced for a different medium. There can be both copyright (composition) licenses and mechanical (reproduction) licenses, and the practice of licensing content for the title may limit the ability to repurpose the title or move it to a future platform.
Listeners respond to variety. Having a dynamic audio environment can increase playtime and creates a richer audio environment for the user. Some of the aesthetic issues that create variety are:
When appropriate, audio should change in respect to changes to player perspective.
Changes in perspective can affect a player's experience with the game in the following ways:
The Merriam-Webster's Collegiate Dictionary defines high fidelity as 'the reproduction of an effect (as sound or an image) that is very faithful to the original.'
American Heritage Dictionary of the English Language defines high fidelity as ' the electronic reproduction of sound, especially from broadcast or recorded sources, with minimal distortion.
Now that we have a definition of what high fidelity is, let's take a look at how high fidelity audio affects the quality of a game.
Consumers have become accustomed to enjoying CD quality recordings in their stereos and cars, and home theater systems. All of this is great news for consumers, but it creates a new set of challenges for people who create games. The major game consoles all support DVD playback and surround sound, so games can be compared head to head with the film titles they are derived from. If the quality of the audio in the game is lower than the DVD, the game will feel cheap, in spite of the fact that it probably cost twice as much to buy.
It is common for people to use the term 'CD quality' when referring to audio. CDs have a sampling rate of 44.1kHz, a bit depth of 16 bits, and two channels of audio. Those numbers correspond to the following characteristics:
Sampling rate - Determines the highest frequency that can be reproduced during playback. Lowering the sampling rate will reduce or eliminate the high frequencies in the audio.
Bit depth - Lowering the bit depth will reduce the definition and clarity of a recording. Lowering the bit depth significantly will yield a gritty, distorted sound.
Stereo vs. mono - Lowering the channel count from two to one eliminates the spatial relationship between sounds. While not necessary for voice-overs, sound effects and music sound much fuller and more realistic when reproduced in stereo.
Fidelity vs. Resources
In the process of creating a game, file size is often the most challenging obstacle to overcome. Since digital audio can up a significant amount of space, producers and designers tend to put a lot of pressure on the audio team to keep the audio files small. Reducing the numbers in any of the three categories mentioned above will reduce file size, but care should be taken to reduce only where the results will not significantly impact the quality of the sound. Careful experimentation by an audio engineer can yield significant savings in file size, but it takes time, and should be considered part of the process.
Here is a basic guide for file bandwidth optimization. It is intended to give an idea of the relative importance of the characteristics listed across the top. Note: for all audio assets, it is undesirable to go below 16 bits, 24 bits is preferable if the console supports it, and there is room in the game.
An analogy can be drawn between frame rate and sampling rate. Here is a rough comparison of the quality of different rates.
When lowering sampling rates to save space, it is important to remember the Nyquist theory: The highest audible frequency in the recording must be less than half the sampling rate.
One of the best ways to avoid making big sacrifices on audio quality is to involve the audio team in the process as early as possible. There are many different ways to create audio for games, and some use very limited system resources, but knowing the limitations up front will help the audio team to get the best results possible. The overall concept of the game is important for an engineer or composer to understand before they make decisions about how to produce the audio. If there are going to be significant restrictions on file size, polyphony, or any other technical aspect, it is much better for a composer to work within those limitations from the beginning than to make a bunch of great stuff and be forced to degrade or eliminate cues all together.
Dynamic range is increasingly important in game audio design. While maximizing volume in games was once desirable, it has become an earmark of substandard audio. Loudness is a relative measurement. If everything is loud, the player will simply turn down the overall level. Quieter sounds can draw the player in to a new situation, then when a big sound hits it has much more dramatic impact.
In short, the dynamics of gameplay should be conveyed in the audio.
Some things to consider are:
Bass is often unused or overused in games. Either the bass is avoided because it is difficult to mix properly, or it is over used in an attempt to make the music sound fatter. The most effective use of bass tends to have the following characteristics:
Increasingly, consumers are expecting the fidelity of the game to match the fidelity of other consumer media (DVD, DVD-A, etc.). High resolution sonic assets, combined with intelligent sample rate and format decisions, are an important part of establishing high fidelity, and will result in a game with the maximum quality audio experience. Other factors such as effective use of bass and proper use of dynamic range all serve the same purpose.
Game Designers should consider the value of real-time digital signal processing (DSP) in order to enhance the audio aspect of the game play experience. Real-time signal processing can be used to create the sense of a living, breathing world from a series of otherwise static audio events. DSP can be used to simulate more directly the sonic behavior of the "real world" as it provides more paths for interaction with the player, completing the realism and interactivity of the visual environment. DSP is useful for all types of sounds, from dialog to special effects, ambiances and music components. All of these audio sources can then be routed and modified by DSP in order to tailor them specifically at runtime for the specific game area, danger level, intensity, health, etc.
Examples and benefits of real-time signal processing:
Record "dry" (no effects) dialog components of the game and use DSP to effect and place the dialog in the situation. This significantly eases language localization, as the dialog for another language need only be recorded dry, and all effects will be applied by the game audio engine at runtime.
The following lists DSP signal processes that can be very useful in minimizing post-production work on localized dialog file sets, as well as making a game audio track more compelling overall.
Reverb effects are used to simulate the acoustic reflections of an environment. A dry sound effect or dialog line can be placed in a hallway, tiled bathroom, sports arena, cathedral, parking garage, cavern, etc. Real-time control of the reverb parameters can simulate the audio source moving closer or further away from the POV.
Spatialization can be used to position an audio source in three dimensional audio "space". This is effectively the same as 3D audio. Spatialization effects can be used to move a sound effect behind and around the player's POV or create the effect of a sound source approaching from any direction. 3D audio placement is the ideal companion to 3D graphics and helps draw a player into the environment.
Delay is a simple, yet useful, signal processing effect. Delay can be used to simulate the echo of a character's voice or a special effect placed in a canyon or cavernous situation. In music applications delay is also quite useful. Prerecorded audio tracks, such as percussion, can be rhythmically enhanced with slap-back and repetitive effects in one situation and echoed in another situation
Dynamically controlled, pitch shifting can be used to transform dialog from chipmunk to demonic and back again, requiring only a single audio component for a myriad of applications. For musical passages, pitch shifting provides multiple reuse of audio samples for a variety of effects, from simple key changes to pitched sinister effects. Subtle real-time pitch shifting yields welcome variety in the frequent playback of the same sound, such as weapons.
Chorus, Flanging, and Phasing are examples of pitch effects widely used in musical applications. In a playback environment, these effects can be used to "fatten" a sound from a single audio source.
Distortion is typically related to the aggression or tension of a sound. Runtime control of distortion effects allow a composer to use a single "clean" sound in environments of less tension and reuse those same sounds with more distortion when the environment becomes more tense or aggressive. Distortion is most popularly used with, but certainly not limited to, guitar sounds. Distortion may also be useful to process dialog to sound gritty when coming over a radio or P.A. speaker.
Equalization and Filtering effects (essentially the same thing) play a very important role in shaping a sound for its particular placement in an environment. A dialog sound might be filtered to sound as if it were coming from a radio, or made to sound muffled if it's coming from the other side of a door or wall. In musical contexts, filtering effects are used to create motion in sound loops. With dynamic filtering, a sound loops can be used repeatedly with an ever-engaging motion and spectral placement.
Some modern game platforms have the ability to perform the following signal processing techniques in real-time with little or no performance hit to the onboard processor. Others do require some main processor power to accomplish these effects. It is important for a developer to understand the audio processing capabilities of each platform in order to best determine what types of effects will be used, and at what "price" to system resources, possibly being competed for my other aspects of the game.
In summary, signal processing maximizes product quality by creating a more compelling game play experience. .By signal processing audio dynamically in real-time, the game behaves much more like real life sonically and succeeds to a much greater extent in drawing the player into the world. This attention to realism is analogous to the increased emphasis on more realistic lighting and physics, including increased realism of skies and weather. Producers, programmers and audio technicians should collaborate on the strategic decisions for each game platform using as many onboard DSP signal processes as are feasible in the overall production plan.
Transitions in a game soundtrack are important parts of a consistent game experience. Poorly done transitions will ruin suspension of disbelief, and possibly appear as production or equipment errors to gamers. Transitions occur in user interface screens, cut-scenes, and even game play.
Too many titles use straight cuts (instantly transitioning from one piece of music to another), and this is a well-known practice that gives a very unpolished and unprofessional sound to a product.
The time has long passed where there is not enough memory in a major console or PC title to allow for crossfades (usually, two files need to be loaded at once into memory in order to crossfade between them), so at the minimum this technique should suffice for much better sounding transitions.
Examples of transitions are fades, crossfades, and having the music transition as a segment that connects one piece to another within an adaptive soundtrack.
There is a lack of awareness of what is necessary to maximize the resources available to achieve quality interactive audio.
Addressing The Solutions:
Interactive audio sounds as though it was intentionally composed for that moment. Interactive audio is the key differentiator between linear and adaptive entertainment. Instead of a player being guided through a pre-sequenced audio performance in a game, a player can influence the outcome of that audio performance in the game, resulting in a more engaging experience.
Common Concepts and Practical Application
Well.in reality, it all depends upon your definition. Historically - and for some unknown reason - MIDI in the game audio world came to be closely associated with the original "audio" for games - FM synthesis, and it's ensuing sibling rival.General MIDI (GM). Unfortunately, not ONLY, did FM synthesis and GM give game audio a somewhat less than, shall we say, quality centric reputation - it also confused the market place into thinking that the control protocol - MIDI - actually had something to do with the device that created the sound (FM synthesis, sample playback etc..) and hence the "inside joke" that MIDI sounds bad!
Of course, IN FACT, MIDI DOES sound bad - it actually sounds like any other control protocol made audible - kind of like your 56k modem connecting to AOL! Definitely NOT something that appeals to the masses - but then again - it wasn't intended to be listened to! So when somebody says "MIDI Sounds Bad" - you know that they really are not aware of the reality of the situation and if that person is the person in charge of your audio development - you might want to consider finding another one!
With that said - when your game audio guru brings up the MIDI topic, the question to ask is what is he/she going to control with it! There are many possibilities out there! But before you do that - you need to have at least a "general understanding" of what it is - and what is possible with it.
What is MIDI?
Simply put - MIDI is a standard control protocol, for devices which create, record, edit, store, and transmit control data in the digital domain. Period, end discussion. It has NO predetermined technology usage, and definitely NOT, any inherent audio quality attributes. MIDI is just as easy to utilize on a musical instrument, computer, set-top box or game console!
What can it be utilized for?
Typical, and in it's most widely utilized form, this data - represents individual musical notes, and physical controllers which effect that note - for use by electronic musical instruments, tone modules, and/or computer sound cards. When used in this fashion - entire orchestral scores can not only represented in a very small amount of storage space (kilobytes NOT megabytes) - but edited, modified, and rearranged down to the note level at ANY time during the production process! The benefits don't stop there - but that is deeper and more involved than we want to get into here!
Two of the newest uses of MIDI in this fashion are not only game audio for cell phones - but the brand new market of Ring Tones! Millions of people around the world are purchasing these Ring Tones on a monthly basis to update the ring sound on their cell phones! Companies like Disney, MTV, Sony and Yamaha are all investing heavily in this new cross marketing and selling opportunity - and they are being supported by the vast majority of the cell phone manufacturers and hardware developers! In addition - every single major provider of wireless communication - i.e. AT&T, Sprint, Verizon etc.. have built the back-end business systems to make it easy for their customers to obtain new ring tones - and for copyright owners to profit! If you're looking to increase player loyalty, AND your ROI in audio - THIS is definitely something you need to be looking into!
Other uses for MIDI.
Did you watch the Grammy's this year (2003) in HDTV / 5.1 Dolby Surround? If not, you missed out. But did you know that the audio console utilized to do to accomplish that complex task was MIDI controllable?
Been to Las Vegas Lately? Seen the fountains at the Bellagio? The Show in the EFX Theater at the MGM? Seen Mirage's Volcano? Well - you saw MIDI Show control in action!
And those are just some of the wide variety of uses for the MIDI protocol!
So.MIDI - is not such a bad word after all!
As you can see, the uses for the MIDI protocol are as varied as the problems that it solves! AND more importantly - all of these problems are solved, and opportunities presented - utilizing, a standardized, mature but evolving protocol, developed by - and for the audio community! For further information on the MIDI protocol, the MMA (MIDI Manufacturers Association), IA-SIG (Interactive Audio Special Interest Group) or links to experts in this field - visit the MMA at http://www.midi.org
In order to understand the possibilities for increasing the ROI of game music/audio it is important to view the market dynamics of the converging industries within the entertainment space.
The entertainment space within the typical home environment is a very controlled area i.e. bedroom or family room, in which two diametrically opposed influences collide when it comes to the purchase and/or use, of any technology-based products!
In any room that can be viewed by the public (i.e. family room) one
typically must balance form (style, fashion, looks etc.) vs. function. So
the product must fit as to looks and perform a task that does not
already exist within the room. It is very rare to find more than one TV, one audio system (stereo, surround sound etc.), one VCR, one CD player etc.
In rooms which are not "publicly available i.e. a child,s bedroom, form
is not as important however the function coefficient still exists.
Again it would be very rare to find more than one of anything besides
collectables/toys within a bedroom space.
The Market Dynamics
Prior to the introduction of the PS2 the typical game box was utilized For only games. However, the XBOX? brought a new storage medium to the table, the CD-ROM, which included, at no extra cost to the hardware vendor the ability to play standard audio CDs. This added feature, brought the set-top box to a new level with the home entertainment system as the consumers figured out that they can hook it up to the home stereo system in the living room, play the game on the big screen TV AND hear the audio from the best speakers in the house! Since playing audio CD,s was not its primary function and adamantly rarely used for that purpose its promotion to the family room as a high tech game device - on a full time basis made it acceptable in form AND function.
In the past 18 months the lowly set-top box has now graduated to including a DVD player. THIS function brought along with it higher quality DACs (audio outputs) and Dolby and/or THX surround sound making it on par, from a playback standpoint with any DVD video only player on the market.
The form vs. function battle is now on ^ as ^ the family unit decides if it really needs more than ONE DVD player! The set-top box is already connected to the hi-fi system and the TV ^ why buy a product that only does one thing when they already have something that does both?
The Personal Computer
Even before the launch of the Microsoft's - XP Media Center, the high tech consumer has if not building specifically - an entertainment centric PC, have at the very least utilized their computer for entertainment purposes. Above and beyond the sales figures we are aware of in the PC gaming industry - according to a report by Forrester research, more than three out of five users listened to music on their PC, while nearly one half use it to watch DVD,s!
Other Factoids To Consider
DVI to Move Beyond PCs to Consumer Electronics though the PC market has been the primary one for Digital Visual Interface (DVI) since its 1999 introduction, the next few years should see DVI move beyond the PC space, according to In-Stat/MDR
New Harris Poll Shows Personal Computer as Home's Digital Nerve Center. Digital Lifestyle is Becoming Reality. Though they may own different digital technology products, half (50 percent) of U.S. computer owners and 41 percent of European ones say their computer is more important than any other piece of technology in their home. Younger people give even more emphasis to the computer, as 57 percent of Americans 18-34 years of age view the computer as most important, vs. 48 percent of 35-49 year olds and 44 percent of those over 50.
Broadband Internet Tops 15.6 Million in the U.S.
Nielsen Media Research Estimates 106.7 Million TV Households In The U. S.
We can go on and on quoting sources, facts and figures. However it simply easier to say that the convergence within the entertainment space is here and now it's not really what the pundits have been predicting but it does exist!
Until recently, the gaming industry has received minimal attention from academic institutions and serious scientific study. While game audio professionals have long sought to improve the quality of in-game audio content, it may be necessary to appeal to such institutions in order to provide evidence that digital audio quality is important to a positive interactive experience.
It is clear that there are links between audio and visual interpretation within the brain, noticeably in speech-perception. Previous research has strongly supported a cross-modal interaction between audio and visual stimuli: a single perceptual channel, taking in audio and visual signals, limits the amount of stimuli that is processed cognitively by our brains (Ordonez 2002). The full extent to which audio and visual stimuli influence each other is not understood, however initial research has implied a greater influence of video over audio stimuli in subjective report tests. But research has generally consisted of a passive viewing of each presentation; no attempts have yet been made to discover a cross-modal interaction for interactive media such as video game play.
Using a computer game where graphic resolution and audio sample resolution can be manipulated, it may possible to predict results based on previous studies where high and low video quality were paired with high and low audio quality. A unique situation presents itself with interactive audio formats, however. When a subject controls the triggering of sound effects, we might speculate whether more or less attention will be focused on audio. With several randomly distributed subject pools, we look to determine whether these sound effects are more noticeable given a high quality (44kHz- 96kHz) or low quality sample (11kHz-22kHz).
Objective data other than "high score" will be difficult to attain so all data will most likely be subjective, participant-based self-reporting. In order to reduce bias, priming may include presentation of high or low quality audio, listening to the same sounds without visual stimuli or presentation of visual stimuli without sound, or asking subjects to pay particular attention to either graphics or audio. After playing the game for a period of 10-20 minutes, subjects will be asked to review the graphics and audio, as well as other commonly rated aspects of video games, including play control and entertainment value. Other relevant details such as age, gender, experience playing video games, etc., will also be self-reported.
If the experiment yields significant results, we hope to produce evidence for a cross-modal effect on audio and visual stimuli for an interactive realm and to determine a user's ability to distinguish between high and low quality audio given in this environment. Difficulty in conducting the experiment will primarily manifest in the following: finding a game that is not already subject to widespread user bias and is also playable at variable audio and video rates, acquiring computers and sound systems, and pooling an adequate number of participants (100-120 is ideal).
select a section:
Copyright 2000-2014, Fat Labs, Inc., ALL RIGHTS RESERVED