AI & ML interests
π€ Hugging Face x πΈ BigScience initiative to create open source community resources for LAMs.
Recent Activity
π BigLAM
A community-run home for machine-learning-ready datasets from libraries, archives, and museums.
Most cultural-heritage data wasn't originally prepared with ML workflows in mind β it lives in catalogue systems, IIIF endpoints, METS/MODS records, and various idiosyncratic formats that each institution has its own version of. BigLAM is a place where those datasets get repackaged into formats ML practitioners can actually load and work with, contributed by the people who know the source material best.
The org started as a datasets hackathon inside the BigScience project in 2022 and has grown into a standing community for cultural-heritage ML.
What's here
The org is datasets-first: 46+ image, text, and tabular collections from libraries, archives, and museums, prepared so they load cleanly with the datasets library. A handful of models and spaces live here too β mostly early experiments from the BigScience-era hackathon.
For task-specific, deployable models built on top of these datasets, see the sibling org small-models-for-glam.
Contributing a dataset
If you've prepared a LAM dataset that other researchers might use, the best home is usually your institution's own Hugging Face organisation (e.g. NationalLibraryOfScotland). Institutional ownership signals authority over the data and makes long-term maintenance easier. Setting up a new org on the Hub is free and quick.
If your institution isn't on the Hub yet, or you'd prefer to host the dataset here, open a discussion and we'll help get it set up under BigLAM. Useful additions are typically datasets where the format conversion (METS/ALTO β parquet, IIIF manifest β loadable image splits, etc.) has already been done and the licensing is clear enough for open release.
Already have a dataset here that should sit under your institution's org? Open a discussion or issue on the dataset repo β we're happy to transfer ownership.
60+ contributors over the years. Day-to-day maintenance is light-touch; for help with a contribution, open a discussion and someone will see it.
-
biglam/doab-metadata-extraction
Viewer β’ Updated β’ 8.09k β’ 264 β’ 13 -
biglam/rubenstein-manuscript-catalog
Viewer β’ Updated β’ 49.7k β’ 239 β’ 3 -
biglam/bpl-card-catalog
Viewer β’ Updated β’ 838k β’ 239 β’ 5 -
biglam/harvard-library-bibliographic-dataset
Viewer β’ Updated β’ 11.1M β’ 550 β’ 2
-
biglam/doab-metadata-extraction
Viewer β’ Updated β’ 8.09k β’ 264 β’ 13 -
biglam/rubenstein-manuscript-catalog
Viewer β’ Updated β’ 49.7k β’ 239 β’ 3 -
biglam/bpl-card-catalog
Viewer β’ Updated β’ 838k β’ 239 β’ 5 -
biglam/harvard-library-bibliographic-dataset
Viewer β’ Updated β’ 11.1M β’ 550 β’ 2