Making the Case for RichMedia in Non-3D PDFs
Does RichMedia belong in standard PDFs? This casual discussion with Patrick Gallot (Senior Solutions Architect) from Datalogics explores that question.
Video Transcript:
Lindsey: Today we're going to chat about embedded audio video content and PDFs and how when we have initially spoke about this you came to me with some good ideas and thoughts that you wanted to further explore. So that's basically just what we're going to do here today. Yeah. So first of all before we get started with the questions do you mind just explaining a little bit about your role at Datalogics?
Patrick: I am the technical lead for the support department. Basically what that means if my colleagues come up with some difficult scenarios from their customers or if my own customers have difficult scenarios and they need to try and figure out how to make it work with our products they'll bump it up to me sort of as a first tier before passing it on to engineering to see if we can't resolve it faster than going through the whole engineering process.
Lindsey: Okay, let's just get into it. So you said you thought that embedding audio and video into PDFs has become out of favor. Can you explain a little bit more about why you you think that is?
Patrick: Well, I think part of it is that the most common PDF viewer that people use these days are their browsers and their browsers don't actually support embedded rich media as part of PDFs, which is kind of ironic because they have all the necessary components to do it well. It's just because the way it's things have been developed, it's kind of fall into a hole of like there's no support for it, so there's no demand for it and it's not really something that's being actively worked on, it seems, right? And further irony about it is that video is becoming ubiquitous and everywhere you go, you're on camera somewhere, right? There's a lot of I think potential need for taking some of that video and memorializing it in something that's going to have more longerlasting impact than just having it on a file on a hard drive. I think that there's potential useful applications for it. We just haven't gotten there yet.
Lindsey: You also had mentioned that that you thought archivists that were steering the creation of PDF/A, PDF Archiving, kind of shot themselves in the foot by disallowing that material in PDF/A files. Can you explain just a little bit more about why you feel that way?
Patrick: I should back up a little bit first and say PDF/A is a subset standard of PDF that basically tries to harden PDF files so that they can be stored for long term. Something that was spearheaded I think by archavists and the PDF community because you can do a lot of things in PDFs that sort of degrade the information that you have, like if you don't embed your fonts the experience that you'll have on a machine that doesn't have those fonts can be completely different from a machine that does, and you can assume that, Windows machines will always have Arial - maybe, maybe not, I mean Microsoft has changed from having their default from Arial to Calibri most recently, so things are changing and we have to anticipate that things will change even more. So, how exactly to make things viable long term to have them displayed the way they were intended to. It's sort of a hard problem to tackle and the PDF community did it through the PDF/A standard in large part through the PDF association and then working with ISO.
But if you think of PDF as electronic paper, putting audio or video or even 3D into it sort of breaks that organizing metaphor because you don't really you don't have paper where you have moving images on it. You don't have it talking to you and that's the realm of fantasy or magic really. So it sort of breaks the idea of how exactly PDFs should be. And then there's another way that sort of rich media sort of breaks the whole PDF concept of how it is is that it allows you to embed or refer to media that you're not necessarily certain is going to be supported in the viewer. You can pretty much embed any audio and video and as long as you tell it this is the type of audio that I'm embedding as long as it's a recognized internet format for this is perfectly allowed and it's perfectly allowed for the viewer not to be able to support that which is kind of the opposite of what you do for PDF where you want it to be look exactly the same on whatever platform you look at it on no matter what the viewer so that's a couple ways that rich media sort of breaks the thing.
But at the same time, you still want those to be actual documents to be part of documents because rather than just having raw file of this is audio or video format. You have this um you can add context to it. You can have text. You can um add images or stuff that sort of explain what exactly is in the video rather than having to go dig through the video and watch it or pull out metadata. You can have it in more structured format inside the PDF and says this is the encapsulated part of this is the video part of it, but you also have all this information that might be useful to somebody who wants to look at this particular video. Um, so I think that's a big part of why I think having not just the raw um audio or video file, but having the context of the PDF would actually be very useful is to have that context.
Lindsey: Yeah. Yeah, I think that's a good point. Yeah. And what do you think as far as the potential value and you and you touched on that just now, but if you if we want to maybe go into it a little bit more of like what do you think the biggest potential value for that long-term archiving of those materials, the A/V materials?
Patrick: I think part of it is the long-term archiving because that's one of those actually this may have been more of a concern a few years ago, but there was a lot of tension between - HTML is really popular and PDF is really popular - and there's both sort of ways of delivering information to a user - which one is going to win out? - and it doesn't seem to have turned out into one will become dominant over the other. It seems as if HTML is preferred for, I think the term was ephemeral, delivery of information where you assemble it together to present to the customer and doesn't necessarily, can't necessarily save it and long term in that particular form, there's too many implicit assumptions as to how that information is presented that'll fall apart if you try to do it, versus PDF where archivists have sort of thought about how do you preserve information for the long term they just haven't extended that thinking to audio video inside of PDFs as much so that's one part of it and the other thing is once you put it into a PDF, you can do things like add digital signatures so that you have some reassurance that the document or the video wasn't altered since it was put into the PDF, which can be very helpful, right?
And in terms of where it could add value beyond just the archivist community, this is probably going into politics, I'm not sure - how if you think about what happened with the January 6 insurrection in DC five, six years ago now. Where there was a lot of video that was captured in different portions. And the idea was how exactly to sort of make it available to all the defendants so that they could defend themselves in court and they eventually I think they eventually put it into a massive database that all the parties could access their part to see if they could find video that was clear them or that could be used. But I think that having the ability to maybe take the video evidence and put it into a PDF document that you can freeze by putting a digital signature and say this can't be modified without breaking this digital signature is something that would be very useful to the legal system. And another thought of minus to that is when you have PDFs, we often think of them as electronic paper, digital documents, and that's fine. But if you look at the back end of how exactly they're structured, they're really these self-contained databases that have a specific schema for organizing information to make it act like paper. And there are features within PDF that allows you to link to other documents so that you can basically have sort of a distributed database of these documents that can link to each other and reference each other that don't necessarily need an outside database system to be a self-contained hole. As long as if you have PDF viewer, you can have that. So, I think putting video evidence into like PDFs and having cross link to other documents that have different video evidence would be sort of having like a self assembling distributed database for whoever needed that evidence to argue their case in court. I don't think the technology for that has been developed or thought of in that way, but I think it could be potentially very valuable.
Lindsey: Yeah, I think that's a really good point, especially given what you said about the digital signatures really that proof that nothing's been changed or tampered with or breached or or edited or anything is is important for not only the security of the document, but just to like you said as evidence for for a case. That makes a lot of sense to me.
Patrick: We could actually go a bit further than just merely putting a digital signature into the document. There's a whole standard called C2PA, which would allow sort of a provenence chain which would also be useful in this particular case that you're giving the defendant just a small part of the evidence from a larger video file. You can put the chain of evidence into the PDF that says this is where the source of the fil is the edits that were made to create this. And so I think that's potentially very useful for the legal industry to have that sort of capabilities inside of PDFs.
Lindsey: Definitely. Yeah, I agree. Okay. So, let's say that you were the one who needed to convince like the International Standards Committee that governs PDF/A. Let's say you were up in front of them and you said, "Okay, I'm going to make the case to reconsider your ban on embedded multi-multimedia in PDF/A documents." What are aside from, I guess, from some of the stuff we've already chatted about, what do you think the arguments you would make for that would be?
Patrick: Well, in one sense, the committee that meets for this pretty much meets during PDF week in conjunction with the ISO working group and the PDF association. They sort of combined their meetings to sort of have the at least the PDF experts in the room to actually go over that. But I think for this particular project might possibly want to reach out beyond just the PDF community to go back to the archivists. Not necessarily the people who worked on it 20 years ago or so, but the people who are more relevant to it who are in the positions of doing this work today, the archist at say the National Archives, not just for the US, but other countries that have an interest in having this sort of information and storing it long term. the Library of Congress, for example, having them see if this is something that if we basically took the arguments making now and sort of put them together into logical arguments, say, "Hey, this is a project that we think could be valuable and could be valuable to you. Can you help us make this happen?" Right?
I think we'd probably get a warm reception for the idea um within the PDF community and potentially within the archist. We just have to fill them where exactly their interests would be aligned, right? So if I could have a trans transcript in this conversation afterwards to actually put those arguments together, they actually might be worthwhile.
Lindsey: Yeah. Yeah. I think so too. I think this is a really interesting topic because we we don't talk a whole lot here at Datalogics too much about rich media and PDFs. I know you've spoken on that topic before and I think that it's something that you know you're making really good arguments for it and especially given PDF as a structure, the way it's structured, all of that. Like I think that's it's just I think it's important to consider. I think it's a I think it's a good topic to be bringing up. Yeah. And it's a good time to bring it up.
Patrick: There's one more point I do want to make about PDF/A. The way I think about PDF/A is that it makes the implicit assumptions explicit. So that when the current viewers that read PDFs are long obsolete, thus nobody remembers them, you'll still be able to basically read these documents and be able to view them the way they were intended to view because they put in all the information that you needed so that you can still render it. As long as you have the specs for how exactly display a PDF, you'll should be able to do that. because we've done that thinking about PDFA what exactly you need to do in order to make it viable as a long-term format. That's where I think that there is this niche for, well it's not really a niche, but this area of need for a PDF as a document that's something that you can store long term you can redact it which is another thing that is also very helpful for legal type stuff. Not sure how that would work with redaction of video that's something that possibly needs to be discussed. It might be more of a C2PA thing.