Tom Crane  0:01  
I'm Tom Crane. I'm the technology director of Digirati. I'm also one of the editors of the IIIF specifications. And today, I'll talk about big IIIF.

In the last webinar, I use this slide to illustrate what I wasn't talking about them. Because in the last webinar, we were looking at ad hoc, handcrafted artisanal  manifests, you know, different tools to make manifest very quickly. But in this webinar, we're going to look at industrial scale IIIF production. And it's interesting that because we now have a standard for this kind of thing, we see some kind of convergence solutions emerging in this space. So it is a kind of factory process. This is Wellcome's digitization room. And at full capacity, over a million digitised images come out of here every month. Not at the moment, but these pictures are pre-pandemic, but it was producing a million a month. 

So how does it work? They've been doing this for a long time. Since 2012, it needs lots of humans and lots of software infrastructure. So here's the first version of this infrastructure. And I'm looking at it from the kind of right hand side the delivery end, which is the bit we did, and not the production production end. So I'm sorry, I'm going to gloss over the complexity and people involved on the left hand side of the diagram. But in general, pictures get captured or audio and video get kept gets captured, QA happens. workflow software, essential to manage the process, the assets get stored safely away in a preservation system. The workflow process produces METS files to describe digital objects, and METS auto files to carry their OCR text. And then we built a system called the digital delivery system, which turned all that information into a nice API, mixing in some catalogue metadata from the library catalogue and the archive catalogue. And turning that into an API that could then be served up through a Wellcome player. And this this kind of pattern is something that has happened many times before. Very familiar pattern, but bespoke. So then, along came IIIF, shiny IIIF layer, we added in an extra layer, which turned that previous bespoke layer into IIIF and in the process, turned the Wellcome player into the Universal Viewer with a bit of help from the British Library. And so now we've kind of layered on top of IIIF layer on top of that system. Again, big scale, 30 million images later, we decided that it would be a good idea to extract the kind of asset delivery piece of that. So the the image services,  the pixel delivery, the AV delivery, we extracted that out into us cloud based service called the Digital Library cloud services. And we could have a clear kind of dividing line between that and the rest of the system. Because the IIIF specifications gave us a kind of a clear service boundary, you know, this bit can implement those specs. And the other bit is Wellcome specific. A few years after that, we replaced the commercial preservation management system with an open source version that Wellcome ollection  have developed. So we we reworked the digital delivery system in the DLCS. To work with that, that's well worth checking out, it's really, really good, really good system. And then the final piece of work, as the system was kind of getting on for 10 years old, we can move it all into the cloud, and made it IIIF three first. So we eventually ultimately got rid of that archaeological layer of the old pre-IIIF life API's. And now it's entirely IIIF based. And it still produces IIIF 2, but that's converted back from the IIIF three. So and we also stopped talking to library catalogues and archive catalogues directly and use the Wellcome collections fantastic catalogue API that they've made available.

So just just to return to this idea of asset delivery as a commodity service. Now briefly, what the DLCS does is, if you tell it about an image, it will provide an image API endpoint for it. If you tell it about an AVI resource, it will provide a web friendly derivative or transcode of it. And it can write provide the IIIF Auth API on top of those resources. It can do it for other types of file. It's not really about producing manifests though it can do that. It's about asset delivery as a separate thing. But humans can use it, even though it's designed for kind of large scale systems integration. So I'm gonna give you a brief demo with some top top quality archive content which is the Dixons catalogue from 1985. So here I am in the DLCS. I'm going to create a new space which is like a folder to put some content in, and I'm going to drag some images from my desktop. These are just scans of pages. And what's going on here is that I can flick a switch and turn that space into IIIF manifests, I put an image in here. And I can view that in the universal view, I can start adding in some more images, bring in the rest of the pages. two pages in, I've got two more pages that might apply manifest. Let's bring in a few more. This is what this is doing is uploading these JPEGs or TIFFs to the service and converting them to JPEG 2000, creating image API endpoints for them, and providing that deep zoom image API endpoint for them, but also turning the list of images into a manifest. And if I want to change the order, I can drag images around to make a different order for them. So I've I've moved two pages up, and that's in the wrong order, you can see, but it allows me to bring a particular page to the front, so I can see just how expensive video cameras were 35 years ago. Yeah, so this, although humans can do this, humans tend not to do this at the kinds of scale that the Wellcome needs, because, if I just move slides. You know, it's it. Isn't that an easier way to make manifest than that huge industrial process that we saw earlier? Can you just take the pictures, drag them in here, then you've got a manifest? Well, there's loads more going on than that. And it will take hundreds of years to do those images like that, because there's loads of text to process. So far, there's 12 billion words of text to run through the system. There's access control, there's structural information that's coming from those match files, like volumes and title pages and sections of manuscripts. And so all that stuff really has to happen in an automated way. Just that minute or so while I was fiddling around with making the Dixons catalogue, you know, 10, or 20 books might have had to ever run through that automated process. And we built observer observability into it. So you can have a look at dashboards and see what's going on. So like, why, why, why are we doing all this? And what's that got to do with national collection? Well, I think we all know the kind of arguments for IIIF, it's in, it's in people's self interest to publish it so they can use it themselves. But it's in people's interest to publish it so that other people can use it, and maybe even later, start aggregating it. So that already happens with things like Europeana and DPLA in the United States. And it might happen eventually here with the National Collection. Now, the end result of infrastructures like Wellcome stuff is a whole set of link text and annotation resources with API's with machine readable data. And that allows aggregators to do more than just provide, you know, just just put things in a bucket. So here's an example of an aggregator. This is open texts from the National Library of Scotland. And they are one of the harvesters of Wellcome's IIIF. And it means I can go somewhere else not Wellcome to search for cocktail recipes, and find results here I've got a couple from the Internet Archive, one from Wellcome. And then that can take me back into Wellcome, where I can then use the IIIF Search API to look for text within that object. Now, the open text example is really meant to get you access to the text, and the publishers own context for that object, rather than providing a kind of rich context and discovery environment of its own.

But what does this all mean for a national collection? I think we're making an assumption just by being here. That IIIF is the means by which national collection objects could be distributed to user interfaces, whatever those user interfaces might be. But that means contributors to the national collection have to provide IIIF. And some of that activity must be specific to the publisher, you can't just use off the shelf software to do all of this, because some of it will be specific, you know, the metadata, you include the way you arrange things, that's going to be specific to individual contributors. And there are a growing number of tools and platforms which help. But there's, you know, there's work required to get this working, and doing it on a big scale, cost money. institutions that have large digitisation workflows already can add job life creation to those workflows. If you've only got five objects, or 10 objects or 100 objects, even you can make the manifest by hand, stick them in GitHub Pages and do it at no cost. But what's the kind of middle ground what to organisations? Do you have 50,000 things for example, I'm just gonna finish up by not not not not on a downer, but just as a kind of wider thing to consider. IIIF is obviously only a part of this whole process. It provides part of what the national collection might might do that for the bit it can do. It's, I think it's the right bit. So the right choice for that bit. But there's much more. So you know, I would say that, you know, having deep zoom is nice, but having manifests is more important than deep zoom. So having images in manifests, even if they're not deep zoom is more important than having the deep zoom. Things like Wikimedia Commons and Flicker, just offer you a selection of sizes for paintings and photographs. But the IIIF manifest is the thing that carries those objects out for interoperability. But having those manifests is no good. If a national collection couldn't discover them, even if every museum library and gallery in the country could have rich IIIF for all their things, how do they get that into a national collection? And how does it stay up to date? Well, there are answers to that, which we can look at later. But that's, that's that's the next question. And then, even if you have all that the national connection can't do much with those things, you know, if it can stay in sync with all those contributors, how does it organise them? IIIF manifests have no descriptive semantics of their own, they can point at descriptive metadata. If you harvest a manifest, you can follow links that metadata, that means the national collection will need to pick a lightweight metadata scheme or a small family of schemes, maybe that's easy for its contributors to produce and map their own schemes onto. And that's, you know, the difficulty of doing that the kind of social and political difficulty of agreeing metadata schemes across kind of all cultural heritage, shouldn't be underestimated. Doesn't have to be a massive, complex heavyweight taxonomy. But something needs to underpin, you know, some some form of knowledge organisation in the national collection, kind of like schema.org. But even then, even if we have all that, you know, a national collection needs design, it needs, something needs to inform what that scheme or schemes were. And that design needs to be informed by a good understanding of research and education and entertainment needs of the platform. And I've got a vision of what the potential of such a significant piece of kind of public common infrastructure could be. So yeah, IIIF is is fantastic for that part. But it's just the beginning of the kind of journey for towards the national collection. That's my slides there. You can have a look at them, if you like at that link. And I think what questions are after these four sessions,