» Dezrann corpus/developer documentation

Specifying corpus and piece data and metadata

A metadata/corpus.json file gathers all data/metadata information on pieces into a corpus, that is:

You are welcome to discuss or to propose MR to improve the definition and the handling of such metadata.

{
  "corpus": {
    // Corpus metadata [mandatory: at least id and title]
    "id": "bach-fugues",
    "title": "Das wohltemperierte Klavier, Buch I",
    (...)
  },

  "pieces": {
    "bwv846": {
      // Piece metadata [mandatory, see below]
      "opus": { ... },

      // Sources (scores, audio, analyses, ...)
      // [mandatory: at least one score/audio source]
      "sources": [ ... ],

      // Piece settings
      "settings": { ... }
    },
    
    "bwv847": { 
      ...
    }
  },

  "settings": {
      // Optional
      "access": "public"
  },

  "template": {
      // Optional
      ...
   }
}

Files should be referred by a URL (preferably to some git repository) or a stable external identifier. They can also be referred by a local path (relative to the corpus.json file).

There can be optional settings specific to Dezrann or any other application.

The metadata/corpus.json file can be split in several files, for example when the corpus section is hand-curated whereas the pieces are produced by another script.

Examples

The metadata on all public Dezrann corpora is gathered on https://algomus.fr/dezrann/dezrann-catalog.json.

The /metadata directory contains several examples, such as:

For most of the baroque/classical/romantic music, the score is the reference, there may be recordings. For jazz/pop, a recording is the reference, there may be a score, such as in several examples on edmus.json.

corpus Metadata

This section describes the corpus as a whole. This is used both on http://dezrann.net/corpora and on corpus pages such as http://dezrann.net/explore/mozart-piano-sonatas.

To draft a corpus, only id and title are mandatory. The other fields will have to be carefully written/reviewed/translated to prepare the release of a public corpus.

id: Mandatory, unique identifier, such as "bach-fugues", "corelli-trio-sonatas" or "schubert-winterreise.

🏳️ shorttitle: Title with < 25 characters, to be displayed on http://dezrann.net/corpora and at other places with reduced space

🏳️ motto: Short text (< 120 characters, nominal sentence) advertising the corpus, including some number of the works

image: url of one image (recommended size: XXXpx XXXpx) to illustrate the corpus

🏳️ title: Mandatory, title

🏳️ text: 3-6 lines presenting the corpus, with a historical / musicological perspective (and not referring to Dezrann). Links (in markdown) to Wikipedia or other sites are welcome.

🏳️ availability: (for the general public) 1-4 lines detailing what is actually in Dezrann, including stating which content is there (scores, autographs, analyses, synchronized audio), both qualitatively and quantitatively.

status: (for the technical audience) short message summarizing the main issues with this corpus, that will be displayed on https://doc.dezrann.net/status

🏳️ contributors.*: see metadata.md#the-contributors-bloc. Before a public release, the corpus should contain a contributors.maintainer field.

showcase: a list of 1-4 ids of pieces in the corpus with a particularly high quality (availability of sources, quality of those). They will be randomly displayed/showcased from some places such as http://dezrann.net/corpora

Optional external references ref*: see below

quality:corpus and quality:corpus:metadata: See quality.md. If you just completed a draft of corpus.json, start with these both fields to 1.

Other fields (optional)

genre

opus: catalogue numbers or opus numbers

piece Metadata and Data

The piece data may be either static / fully redacted, or produced by a script from some, and/or maintained with templates, see templating.md.

sources Data

At least one source has to be provided.

 "sources": [
  // Scores (.MEI (preferred), .musicxml, .mscz, .krn), see scores.md
  {
    "score": "http://gitlab.com/bla/bli/02.krn",
    "license": "CC-BY-SA-4.0",
    "contributors": {
      "encoder": "Jane Foo",
      "editor": "KernScores"
    }
  },
  {
    "score": "http://bla.net/sonata-12.mscz",
    "measure-map": "http://bla.net/sonata-12.mm.json"
  },
  {
    "id": "score-1",
    "score": "vivaldi-04.mei"
  },
  // Scores from Neuma pipeline
  {
    "score:neuma": "all:collabscore:saintsaens-ref:C080_0"
  },
  {
    "score:gallica": "ark:/12148/bpt6k1162028r"
  },

  // Audio/video sources
  {
    "audio":  "http://gitlab.com/bla/bli/bla-07.mp3",
    "source":  "http://a-wonderful-open-music-project.org/",
    "license": "CC0-1.0"
  },
  {
    "video":  "http://a-wonderful-open-music-project.org/my-video-07.mp4",,
    "source":  "http://a-wonderful-open-music-project.org/",
    "license": "CC0-1.0",
     "contributors": {
      "performer": "Clara Dee"
    },
    "info": "Studio recording"
  }
  {
    "video:yt":  "df6DFfs",
    "contributors": {
      "performer": "Clara Dee"
    },
    "info": "Live recording at the Schnupz Concert Hall"
  },
  {
    "audio:yt":  "df6DFfs",
    "id": "audio-07"
  },
  {
    "audio:yt":  "df6edfs",
    "synchro": "http://gitlab.com/bla/bli/synchro.json"
  },

  // Analyses (.dez format)
  {
    "analysis": "https://gitlab.com/foo/bar/haydn-symph099-mvt1.dez",
    "contributors": {
      "analyst": "Dinh-Viet-Toan Le, Francesco Maccarini"
    },
    "ref:doi": "10.1145/3543882.3543884"
  },

  // Special sources, with provided image(s) and position files
  // Any scan (score, piano rolls...), with positions
  {
    "images": [ "http://bli/scan-07-page1.jpg", "http://bli/scan-07-page2.jpg" ],
    "positions": "http://bli/positions-07.json",

  },
  // Image with some reference to audio/video
  {
    "images": [ "http://bla/spectral-analysis-07.jpg" ],
    "positions": "http://bla/positions-07.json",
    "ref-source": "audio-07"
  },
  // Image with some reference to a score
  {
    "images": [ "http://blu/scan-07.jpg" ],
    "positions": "http://blu/positions-07.json",
    "ref-source": "score-1"
  },

  // To be better specified
  {
    "grid": "| D A | Bm D/F# | G D | Em7 A7"
  },
  {
    "karaoke": "http://bli/07.kar"
  },
  {
    "tab": ""
  }
  ]

Source: Basic information

For each source, one (and only one) of these fields has to be provided:

Identifiers. Each source should have an id. If the id is not provided, it will be generated. Dezrann may use the id to select the source, as for example in the URL https://www.dezrann.net/~/telemann-flute-fantaisies/twv40-02?audio=porter.

Each source may refer to the id of another source through ref-source.

Source: Metadata

Common metadata fields (see below) can be used. These fields are optional, but, before a public release, sources should have good metadata, in particular with a clear license.

Moreover, the following field can be used:

opus Piece Metadata

opus: Basic information

(when applicable) corpus or collection

(mandatory) id: Unique identifier, such as "bach/bwv847". Should include some opus information and be consistent with other identifiers used in Dezrann. (Note that there is also a id field outside opus, it will be removed at some time.)

opus: Titles

(mandatory) 🏳️ title or piece:title: We follow the Open Opus style guide, except that we do not put the opus information (see opus below).

(when applicable) 🏳️ nickname or piece:nickname

(when applicable) movement:num and movement:title (in this case, we do not use title and nickname but rather piece:title and piece:nickname)

(mandatory) opus: catalogue number or opus number, such as "opus": "K.551". There are something numbers that are outside catalogue/opus numbers and that may be encoded as other fields, such as "symphony:num": "41" for Mozart’s Jupiter.

opus: Other information

genre

key: key/tonality of this file (movement, not piece), such as D minor of Bb major

meter: such as 4/4 or 3/4, 6/8. More complex meters can be given through measure maps in the score source.

Common metadata fields

These fields may be used for either corpus, opus, and each one of the sources. Please try to provide metadata at the most specific level possible.

Basic information

All these fields are optional.

🏳️ contributors (dictionary, see below)

🏳️ text (string). For a source, it could be a short string presenting the context. For a corpus (and possibly for a piece), it should be more detailed (see above).

license (SPDX identifier, or list of identifiers)

ref:* (see below)

year: year, or range of years

album (string)

(Jazz: Concert place / Recording Label, to be detailed)

ref: External references

{
  "ref": "https://github.com/DCMLab/mozart_piano_sonatas/tree/main/scores",
  "ref:dezrann": "https://dezrann.net/~/mozart-piano-sonatas/k280.1",
  "ref:corpus": "https://dezrann.net/~/mozart-piano-sonatas",
  "ref:gallica": "ark:/12148/btv1b55002567w",
  "ref:neuma": "all:collabscore:saintsaens-ref:C080_0",
  "ref:doi": "10.1145/3543882.3543884",
  "ref:wikipedia": "Symphony_No._35_(Mozart)",
  "ref:musicbrainz": "611a55ef-cfb4-3bbf-9d65-7d4af5506093",
  "ref:musicbrainz:recording": "ed80fa40-4871-4669-9509-18bcfee420fd",
  "ref:imslp": "19_Sonatas_for_the_Piano_(Mozart,_Wolfgang_Amadeus)",
  "ref:kernscores": "mozart/sonata/sonata04-1.krn",
  "ref:rism": "990062490",
  "ref:wjd": 70
}

ref is any URL. The other ref:s are identifiers on some sites/databases, as defined on RefsBase.ts. When possible, put a reference of the particular movement. But it’s also acceptable to put an external reference to a piece or even to the collection.

It is strongly recommended to include as much information as possible. Specifically, for scores, it is advisable to include ref:rism to trace the primary source used by the encoder(s).

ref:dezrann and ref:corpus are public URLs to the very piece/corpus on Dezrann

The contributors bloc

🏳️ One of the three following fields is usually defined:

(for opus)

(for an audio source)

🏳️ The following fields can also be used:

Localization

In either corpus, piece, or source metadata, the 🏳️ fields can be localized, such as:

    "collection": "Le quattro stagioni",
    "collection:en": "The four seasons",
    "collection:fr": "Les quatre saisons",

    (...)

    "ref:wikipedia:fr": "Les_Quatre_Saisons",

Do not put :xx for the title in the original language.

Note that even contributor names such as composers may be known with some variations across languages.

"contributors": {
    "composer": "Johann Sebastian Bach",
    "composer:fr": "Jean-Sébastien Bach"
}

Do not localize numeric or formalized fields such as key, opus, or genre.