AI & ML interests

πŸ€— Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.

Recent Activity

Organization Card

πŸ“š BigLAM

A community-run home for machine-learning-ready datasets from libraries, archives, and museums.

Most cultural-heritage data wasn't originally prepared with ML workflows in mind β€” it lives in catalogue systems, IIIF endpoints, METS/MODS records, and various idiosyncratic formats that each institution has its own version of. BigLAM is a place where those datasets get repackaged into formats ML practitioners can actually load and work with, contributed by the people who know the source material best.

The org started as a datasets hackathon inside the BigScience project in 2022 and has grown into a standing community for cultural-heritage ML.

What's here

The org is datasets-first: 46+ image, text, and tabular collections from libraries, archives, and museums, prepared so they load cleanly with the datasets library. A handful of models and spaces live here too β€” mostly early experiments from the BigScience-era hackathon.

For task-specific, deployable models built on top of these datasets, see the sibling org small-models-for-glam.

Contributing a dataset

If you've prepared a LAM dataset that other researchers might use, the best home is usually your institution's own Hugging Face organisation (e.g. NationalLibraryOfScotland). Institutional ownership signals authority over the data and makes long-term maintenance easier. Setting up a new org on the Hub is free and quick.

If your institution isn't on the Hub yet, or you'd prefer to host the dataset here, open a discussion and we'll help get it set up under BigLAM. Useful additions are typically datasets where the format conversion (METS/ALTO β†’ parquet, IIIF manifest β†’ loadable image splits, etc.) has already been done and the licensing is clear enough for open release.

Already have a dataset here that should sit under your institution's org? Open a discussion or issue on the dataset repo β€” we're happy to transfer ownership.


60+ contributors over the years. Day-to-day maintenance is light-touch; for help with a contribution, open a discussion and someone will see it.