Supporting Audio in your EPUB application

1. Getting started.

In July 2013 our colleague @Ching Yue wrote an article about adding Audio and Video support to our RMSDK for iOS platform.  We noted that a similar set of APIs would be added to add basic audio support in RMSDK and DL Reader for iOS.   Today, I am going to follow-up with some audio related.
This post will focus on the APIs added in RMSDK for audio.  If you are interested in knowing more about how we added A/V support in general and ideas on how to incorporate these into your own ebook reader applications, please read the first post here.

2. Audio API details

Structure dpdoc::AudioInfo is the object that RMSDK uses to store audio information. It captures all the necessary rendering information such as location, size and data source for an audio object.

struct AudioInfo
    int x;                     /* x-coordinate of audio location on the screen */
    int y;                     /* y-coordinate of audio location on the screen */
    int width;                 /* width of the audio */
    int height;                /* height of the audio */
    dp::String url;            /* path to the source audio file */

virtual int getAudioCountForCurrentScreen();

This API function returns the count of audio elements for the current screen being actively rendered by RMSDK.  An application can use this information to iterate through each AudioInfo object.

virtual bool getAudioInfoForAudioOnCurrentScreen(int audioIndex, dpdoc::AudioInfo * info);

This API function is used by the Application to retrieve the AudioInfo object associated with the specific audioIndex.

virtual dp::String getAudioInfoForCurrentScreenAsJSON();

Similarly, all of the AudioInfo object information can be retrieved and then packaged into a JSON type data object. This comes in handy if your application supports JSON protocol. In this scenario, all audio elements for the current screen being rendered will be represented by the JSON object. Here’s an example:

audio info:
          "url":"<a href="/Development/datalogics-rmsdk/rmsdk-  master/test/data/audioTest.epub/EPUB/audio/asharedculture_soundtrack.mp3">file:///C:/Development/datalogics-rmsdk/rmsdk-master/test/data/audioTest.epub/EPUB/audio/asharedculture_soundtrack.mp3</a>",

virtual dpio::Stream* getAudioStream(dp::String url);

When an application has the original audio URL and needs to retrieve the actual data stream, it calls this API. The URL should be a valid path, which can come from either  getAudioInfoForAudioOnCurrentScreen or getAudioInfoForCurrentScreenAsJSON. The data returned is an object of dpio::Stream class, and supports its related methods.

3. Workflow

The workflow of adding audio support is consistent with adding support for video,  which is a big plus for application developers who have been working with video APIs.  Please refer to the section “3.  General Workflow” in the original post.

4. What is supported

4.1 Audio tags in EPUB

The audio tag is supported by adding XHTML_AUDIO to existing xhtml node type which can be passed by RMSDK. A sample markup will look like below:
<audio id=”bgsound” src=”../audio/soundtrack.mp3” autoplay=”” loop=””>
<div class=”errmsg”>
Your Reading System does not support the audio tag.
</div >

RMSDK processes the audio tag with certain supported attributes. These attributes are a subset of attributes defined in HTML 5 and EPUB 3.
Table 1.  Attributes in the audio element

src URL of the audio file Yes URL of the embedded audio, using the child element is preferred
width Width Yes
height Height Yes

Table 2. Attributes in the source element

src URL of the audio file Yes URL of the embedded audio
type Specify MIME types Yes

4.2 Supported audio formats

Table 3 Supported audio formats and their MIME types

MP3 .mp3 audio/mpeg
AAC .aac audio/mpeg

MP3 and AAC are two audio formats currently supported. They are widely supported and, more importantly, in line with what iOS playback supports natively. Note: it is possible to enhance your application to support other formats, as long as you extend the capability in RMSDK to accept and propogate the data back.
4.3 Audio dimensions and rendering
Like video dimensions, RMSDK also sets default audio dimensions, which is 320×88. This is chosen to best fit most iOS devices. Other details and the rendering can also be referenced with our video article.

5. Testing with book2png

There are two useful book2png commands that you can use to test RMSDK audio support. First is the command for fetching all audio attributes and returning as JSON object:
book2png.exe –audio-info-json
The second one retrieves all the audio data in an EPUB file, and dumps the data stream to a local folder:
book2png.exe –write-audio-to-file
6. Conclusion
That’s all – pretty simple and consistent with our video APIs. You can find all of this information in your licensed copy of RMSDK and DL Reader source code as well. If you would like to develop your application using RSMDK with A/V support, DL Reader source code is a great place to look for How-Tos at coding level.  We hope that this article can get you started quickly.
If you have any questions or comments, please email us at

Share this post with your friends

Get instant access to the latest PDF news, tips and tricks!

Do you want monthly updates on the latest document technology trends?

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time. 

Like what you're reading?

Get Datalogics blogs sent right to your inbox!

By submitting the form, you agree to receive marketing emails from Datalogics. You may unsubscribe at any time.