» Dezrann corpus/developer documentation
A metadata/corpus.json file gathers all data/metadata
information on pieces into a corpus, that is:
corpus (corpus metadata)opus (piece metadata) and
sourcesYou are welcome to discuss or to propose MR to improve the definition and the handling of such metadata.
{
"corpus": {
// Corpus metadata [mandatory: at least id and title]
"id": "bach-fugues",
"title": "Das wohltemperierte Klavier, Buch I",
(...)
},
"pieces": {
"bwv846": {
// Piece metadata [mandatory, see below]
"opus": { ... },
// Sources (scores, audio, analyses, ...)
// [mandatory: at least one score/audio source]
"sources": [ ... ],
// Piece settings
"settings": { ... }
},
"bwv847": {
...
}
},
"settings": {
// Optional
"access": "public"
},
"template": {
// Optional
...
}
}Files should be referred by a URL (preferably to some git repository)
or a stable external identifier. They can also be referred by a local
path (relative to the corpus.json file).
There can be optional settings specific to Dezrann or
any other application.
The metadata/corpus.json file can be split in several
files, for example when the corpus section is hand-curated
whereas the pieces are produced by another script.
The /metadata directory contains several examples, such as:
minimal.json (minimal example, two pieces, scores, audios, analyses)
openscore-lieder-corpus.json
(corpus) and openscore-lieder.json
(pieces, generated from data in
another git)
bach-fugues.json (with some templating yielding bach-fugues.full.json)
mozart-piano-sonatas.json (with some templating yielding mozart-piano-sonatas.full.json)
supra.json
and weimar-jazz.json
(only corpus, the upload of the pieces is
directly handled by custom scripts)
For most of the baroque/classical/romantic music, the score is the reference, there may be recordings. For jazz/pop, a recording is the reference, there may be a score, such as in several examples on edmus.json.
corpus MetadataThis section describes the corpus as a whole. This is used both on http://dezrann.net/corpora and on corpus pages such as http://dezrann.net/explore/mozart-piano-sonatas.
To draft a corpus, only id and title are
mandatory. The other fields will have to be carefully
written/reviewed/translated to prepare the release of a public
corpus.
id: Mandatory, unique identifier, such as
"bach-fugues", "corelli-trio-sonatas" or "schubert-winterreise.
🏳️ shorttitle: Title with < 25 characters, to be
displayed on http://dezrann.net/corpora and at other places with
reduced space
🏳️ motto: Short text (< 120 characters, nominal
sentence) advertising the corpus, including some number of the works
image: url of one image (recommended size: XXXpx XXXpx)
to illustrate the corpus
🏳️ title: Mandatory, title
🏳️ text: 3-6 lines presenting the corpus, with a
historical / musicological perspective (and not referring to Dezrann).
Links (in markdown) to Wikipedia or other sites are welcome.
🏳️ availability: (for the general public) 1-4 lines
detailing what is actually in Dezrann, including stating which content
is there (scores, autographs, analyses, synchronized audio), both
qualitatively and quantitatively.
status: (for the technical audience) short message
summarizing the main issues with this corpus, that will be displayed on
https://doc.dezrann.net/status
🏳️ contributors.*: see metadata.md#the-contributors-bloc.
Before a public release, the corpus should contain a
contributors.maintainer field.
showcase: a list of 1-4 ids of pieces in the corpus with
a particularly high quality (availability of sources, quality of those).
They will be randomly displayed/showcased from some places such as http://dezrann.net/corpora
Optional external references ref*: see below
quality:corpus and quality:corpus:metadata:
See quality.md. If you just completed a draft
of corpus.json, start with these both fields to
1.
genre
opus: catalogue numbers or opus numbers
piece Metadata and
DataThe piece data may be either static / fully redacted, or produced by a script from some, and/or maintained with templates, see templating.md.
sources DataAt least one source has to be provided.
"sources": [
// Scores (.MEI (preferred), .musicxml, .mscz, .krn), see scores.md
{
"score": "http://gitlab.com/bla/bli/02.krn",
"license": "CC-BY-SA-4.0",
"contributors": {
"encoder": "Jane Foo",
"editor": "KernScores"
}
},
{
"score": "http://bla.net/sonata-12.mscz",
"measure-map": "http://bla.net/sonata-12.mm.json"
},
{
"score": "vivaldi-04.mei"
},
// Scores from Neuma pipeline
{
"score:neuma": "all:collabscore:saintsaens-ref:C080_0"
},
{
"score:gallica": "ark:/12148/bpt6k1162028r"
},
// Audio/video sources
{
"audio": "http://gitlab.com/bla/bli/bla-07.mp3",
"source": "http://a-wonderful-open-music-project.org/",
"license": "CC0-1.0"
},
{
"video": "http://a-wonderful-open-music-project.org/my-video-07.mp4",,
"source": "http://a-wonderful-open-music-project.org/",
"license": "CC0-1.0",
"contributors": {
"performer": "Clara Dee"
},
"info": "Studio recording"
}
{
"video:yt": "df6DFfs",
"contributors": {
"performer": "Clara Dee"
},
"info": "Live recording at the Schnupz Concert Hall"
},
{
"audio:yt": "df6DFfs",
},
{
"audio:yt": "df6edfs",
"synchro": "http://gitlab.com/bla/bli/synchro.json"
},
// Analyses that will be displayed in Dezrann
{
"analysis": "https://gitlab.com/foo/bar/haydn-symph099-mvt1.dez",
"contributors": {
"analyst": "Dinh-Viet-Toan Le, Francesco Maccarini"
},
"ref:doi": "10.1145/3543882.3543884"
},
// Special sources, with provided image(s) and position files
// Any scan, with positions
{
"images": [ "http://bli/scan-07-page1.jpg", "http://bli/scan-07-page2.jpg" ],
"positions": "positions-07.json"
},
{
"images": [ "http://bli/scan-07.jpg" ],
"positions": "positions-07.json"
},
// Scan of piano rolls (to be better specified)
{
"audio": "bla/bli07.mp3",
"image": "bli/roll-07.jpg",
"positions": "positions-07.json"
},
{
"grid": "| D A | Bm D/F# | G D | Em7 A7"
}
]For each source, one (and only one) of these fields has to be provided:
score (url or file)
score:neuma or
score:gallicavideo (url or file)
video:ytaudio (url or file), including videos that are… not
real videos
audio:ytanalysis (url or file in .dez
format)These fields are optional
contributors (dictionary, see below)name (short name used to refer to this source and
distinguish it from others. For example, for a score, it may correspond
to contributors.encoder. For an audio, it is usually the
same as contributors.performer or
contributors.artist. But it can also be
contributors.editor when relevant, for example to
acknowledge a open-data project)info (short string, < 100 characters)license (SPDX
identifier)source (url)album (string)(Jazz: Concert place / Recording Label, to be detailed)
opus Piece Metadataopus: Basic
information(when applicable) corpus or collection
corpus for generic names such as
"Mozart piano sonatas" (they do not bring more information
than what is in piece:title and composer)collection for data giving more information
("London symphonies", "Le quattro stagioni").
The two fields may be different.(mandatory) id: Unique identifier, such as
"bach/bwv847". Should include some opus
information and be consistent with other identifiers used in Dezrann.
(Note that there is also a id field outside
opus, it will be removed at some time.)
opus: Titles(mandatory) 🏳️ title or piece:title: We
follow the Open Opus
style guide, except that we do not put the opus information (see
opus below).
(when applicable) 🏳️ nickname or
piece:nickname
(when applicable) movement:num and
movement:title (in this case, we do not use
title and nickname but rather
piece:title and piece:nickname)
(mandatory) opus: catalogue number or opus number, such
as "opus": "K.551". There are something numbers that are
outside catalogue/opus numbers and that may be encoded as other fields,
such as "symphony:num": "41" for Mozart’s Jupiter.
opus: Other
informationgenre
key: key/tonality of this file (movement, not piece),
such as D minor of Bb major
meter: such as 4/4 or
3/4, 6/8. More complex meters can be given through measure maps in
the score source.
corpus, opus, and one of the
sourcesyear: year, or range of years
ref: External
references{
"ref": "https://github.com/DCMLab/mozart_piano_sonatas/tree/main/scores",
"ref:gallica": "ark:/12148/btv1b55002567w",
"ref:neuma": "all:collabscore:saintsaens-ref:C080_0",
"ref:doi": "10.1145/3543882.3543884",
"ref:wikipedia": "Symphony_No._35_(Mozart)",
"ref:musicbrainz": "611a55ef-cfb4-3bbf-9d65-7d4af5506093",
"ref:musicbrainz:recording": "ed80fa40-4871-4669-9509-18bcfee420fd",
"ref:imslp": "19_Sonatas_for_the_Piano_(Mozart,_Wolfgang_Amadeus)",
"ref:kernscores": "mozart/sonata/sonata04-1.krn",
"ref:rism": "990062490",
"ref:wjd": 70
}ref is any URL. The other ref:s are
identifiers on some sites/databases, as defined on RefsBase.ts.
When possible, put a reference of the particular movement. But it’s also
acceptable to put an external reference to a piece or even to the
collection.
It is strongly recommended to include as much information as possible. Specifically, for scores, it is advisable to include ref:rism to trace the primary source used by the encoder(s).
contributors bloc🏳️ One of the three following fields is usually defined:
(for opus)
composerartist (on pop music, when the music is frequently
related to the artist rather than the composer)performer (for transcription of jazz solos)(for an audio source)
performer🏳️ The following fields can also be used:
lyricistarrangereditor (edition, publisher)encoder (digital encoding of an existing edition)transcriber (transcription of solos)analyst (supervision of analyses)annotator (annotation, following precise analytical
guidelines)maintainer (long-term maintainance of the corpus or the
piece)In either corpus, piece, or source metadata, the 🏳️ fields can be localized, such as:
"collection": "Le quattro stagioni",
"collection:en": "The four seasons",
"collection:fr": "Les quatre saisons",
(...)
"ref:wikipedia:fr": "Les_Quatre_Saisons",Do not put :xx for the title in the original
language.
Note that even contributor names such as composers may be known with some variations across languages.
"contributors": {
"composer": "Johann Sebastian Bach",
"composer:fr": "Jean-Sébastien Bach"
}Do not localize numeric or formalized fields such as
key, opus, or genre.