» Dezrann documentation

Specifying corpus and piece data and metadata

A metadata/corpus.json file gathers all data/metadata information on pieces into a corpus, that is:

You are welcome to discuss or to propose MR to improve the definition and the handling of such metadata.

{
  "corpus": {
    // Corpus metadata [mandatory: at least id and title]
    "id": "bach-fugues",
    "title": "Das wohltemperierte Klavier, Buch I",
    (...)
  },

  "pieces": {
    "bwv846": {
      // Piece metadata [mandatory, see below]
      "opus": { ... },

      // Sources (scores, audio, analyses, ...)
      // [mandatory: at least one score/audio source]
      "sources": [ ... ],

      // Piece settings
      "settings": { ... }
    },
    
    "bwv847": { 
      ...
    }
  },

  "settings": {
      // Optional
      "access": "public"
  },

  "template": {
      // Optional
      ...
   }
}

Files should be referred by a URL (preferably to some git repository) or a stable external identifier. They can also be referred by a local path (relative to the corpus.json file).

There can be optional settings specific to Dezrann or any other application.

The metadata/corpus.json file can be split in several files, for example when the corpus section is hand-curated whereas the pieces are produced by another script.

Examples

The /metadata directory contains several examples, such as:

For most of the baroque/classical/romantic music, the score is the reference, there may be recordings. For jazz/pop, a recording is the reference, there may be a score, such as in several examples on edmus.json.

corpus Metadata

This section describes the corpus as a whole. This is used both on http://dezrann.net/corpora and on corpus pages such as http://dezrann.net/explore/mozart-piano-sonatas.

To draft a corpus, only id and title are mandatory. The other fields will have to be carefully written/reviewed/translated to prepare the release of a public corpus.

id: Mandatory, unique identifier, such as "bach-fugues", "corelli-trio-sonatas" or "schubert-winterreise.

🏳️ shorttitle: Title with < 25 characters, to be displayed on http://dezrann.net/corpora and at other places with reduced space

🏳️ motto: Short text (< 120 characters, nominal sentence) advertising the corpus, including some number of the works

image: url of one image (recommended size: XXXpx XXXpx) to illustrate the corpus

🏳️ title: Mandatory, title

🏳️ text: 3-6 lines presenting the corpus, with a historical / musicological perspective (and not referring to Dezrann). Links (in markdown) to Wikipedia or other sites are welcome.

🏳️ availability: (for the general public) 1-4 lines detailing what is actually in Dezrann, including stating which content is there (scores, autographs, analyses, synchronized audio), both qualitatively and quantitatively.

status: (for the technical audience) short message summarizing the main issues with this corpus, that will be displayed on https://doc.dezrann.net/status

🏳️ contributors.*: see metadata.md#the-contributors-bloc. Before a public release, the corpus should contain a contributors.maintainer field.

showcase: a list of 1-4 ids of pieces in the corpus with a particularly high quality (availability of sources, quality of those). They will be randomly displayed/showcased from some places such as http://dezrann.net/corpora

Optional external references ref*: see below

quality:corpus and quality:corpus:metadata: See quality.md. If you just completed a draft of corpus.json, start with these both fields to 1.

Other fields (optional)

genre

opus: catalogue numbers or opus numbers

piece Metadata and Data

The piece data may be either static / fully redacted, or produced by a script from some, and/or maintained with templates, see templating.md.

sources Data

At least one source has to be provided.

 "sources": [
  // Scores (.MEI (preferred), .musicxml, .mscz, .krn), see scores.md
  {
    "score": "http://gitlab.com/bla/bli/02.krn",
    "license": "CC-BY-SA-4.0",
    "contributors": {
      "encoder": "Jane Foo",
      "editor": "KernScores"
    }
  },
  {
    "score": "http://bla.net/sonata-12.mscz",
    "measure-map": "http://bla.net/sonata-12.mm.json"
  },
  {
    "score": "vivaldi-04.mei"
  },
  // Scores from Neuma pipeline
  {
    "score:neuma": "all:collabscore:saintsaens-ref:C080_0"
  },
  {
    "score:gallica": "ark:/12148/bpt6k1162028r"
  },

  // Audio/video sources
  {
    "audio":  "http://gitlab.com/bla/bli/bla-07.mp3",
    "source":  "http://a-wonderful-open-music-project.org/",
    "license": "CC0-1.0"
  },
  {
    "video":  "http://a-wonderful-open-music-project.org/my-video-07.mp4",,
    "source":  "http://a-wonderful-open-music-project.org/",
    "license": "CC0-1.0",
     "contributors": {
      "performer": "Clara Dee"
    },
    "info": "Studio recording"
  }
  {
    "video:yt":  "df6DFfs",
    "contributors": {
      "performer": "Clara Dee"
    },
    "info": "Live recording at the Schnupz Concert Hall"
  },
  {
    "audio:yt":  "df6DFfs",
  },
  {
    "audio:yt":  "df6edfs",
    "synchro": "http://gitlab.com/bla/bli/synchro.json"
  },

  // Analyses that will be displayed in Dezrann
  {
    "analysis": "https://gitlab.com/foo/bar/haydn-symph099-mvt1.dez",
    "contributors": {
      "analyst": "Dinh-Viet-Toan Le, Francesco Maccarini"
    },
    "ref:doi": "10.1145/3543882.3543884"
  },

  // Special sources, with provided image(s) and position files
  // Any scan, with positions
  {
    "images": [ "http://bli/scan-07-page1.jpg", "http://bli/scan-07-page2.jpg" ],
    "positions": "positions-07.json"
  },
  {
    "image": "http://bli/scan-07.jpg",
    "positions": "positions-07.json"
  },

  // Scan of piano rolls (to be better specified)
  {
    "audio": "bla/bli07.mp3",
    "image": "bli/roll-07.jpg",
    "positions": "positions-07.json"
  },
  {
    "grid": "| D A | Bm D/F# | G D | Em7 A7"
  }
  ]

Source: Basic information

For each source, one (and only one) of these fields has to be provided:

Source: Metadata

These fields are optional

(Jazz: Concert place / Recording Label, to be detailed)

opus Piece Metadata

opus: Basic information

(when applicable) corpus or collection

(mandatory) id: Unique identifier, such as "bach/bwv847". Should include some opus information and be consistent with other identifiers used in Dezrann. (Note that there is also a id field outside opus, it will be removed at some time.)

opus: Titles

(mandatory) 🏳️ title or piece:title: We follow the Open Opus style guide, except that we do not put the opus information (see opus below).

(when applicable) 🏳️ nickname or piece:nickname

(when applicable) movement:num and movement:title (in this case, we do not use title and nickname but rather piece:title and piece:nickname)

(mandatory) opus: catalogue number or opus number, such as "opus": "K.551". There are something numbers that are outside catalogue/opus numbers and that may be encoded as other fields, such as "symphony:num": "41" for Mozart’s Jupiter.

opus: Other information

genre

key: key/tonality of this file (movement, not piece), such as D minor of Bb major

meter: such as 4/4 or 3/4, 6/8. More complex meters can be given through measure maps in the score source.

Common fields for corpus, opus, and one of the sources

Basic information

year: year, or range of years

ref: External references

{
  "ref": "https://github.com/DCMLab/mozart_piano_sonatas/tree/main/scores",
  "ref:gallica": "ark:/12148/btv1b55002567w",
  "ref:neuma": "all:collabscore:saintsaens-ref:C080_0",
  "ref:doi": "10.1145/3543882.3543884",
  "ref:wikipedia": "Symphony_No._35_(Mozart)",
  "ref:musicbrainz": "611a55ef-cfb4-3bbf-9d65-7d4af5506093",
  "ref:musicbrainz:recording": "ed80fa40-4871-4669-9509-18bcfee420fd",
  "ref:imslp": "19_Sonatas_for_the_Piano_(Mozart,_Wolfgang_Amadeus)",
  "ref:kernscores": "mozart/sonata/sonata04-1.krn",
  "ref:rism": "990062490",
  "ref:wjd": 70
}

ref is any URL. The other ref:s are identifiers on some sites/databases, as defined on RefsBase.ts. When possible, put a reference of the particular movement. But it’s also acceptable to put an external reference to a piece or even to the collection.

It is strongly recommended to include as much information as possible. Specifically, for scores, it is advisable to include ref:rism to trace the primary source used by the encoder(s).

The contributors bloc

🏳️ One of the three following fields is usually defined:

(for opus)

(for an audio source)

🏳️ The following fields can also be used:

Localization

In either corpus, piece, or source metadata, the 🏳️ fields can be localized, such as:

    "collection": "Le quattro stagioni",
    "collection:en": "The four seasons",
    "collection:fr": "Les quatre saisons",

    (...)

    "ref:wikipedia:fr": "Les_Quatre_Saisons",

Do not put :xx for the title in the original language.

Note that even contributor names such as composers may be known with some variations across languages.

"contributors": {
    "composer": "Johann Sebastian Bach",
    "composer:fr": "Jean-Sébastien Bach"
}

Do not localize numeric or formalized fields such as key, opus, or genre.