» Dezrann corpus/developer documentation
A metadata/corpus.json file gathers all data/metadata
information on pieces into a corpus, that is:
corpus (corpus metadata)opus (piece metadata) and
sourcesYou are welcome to discuss or to propose MR to improve the definition and the handling of such metadata.
{
"corpus": {
// Corpus metadata [mandatory: at least id and title]
"id": "bach-fugues",
"title": "Das wohltemperierte Klavier, Buch I",
(...)
},
"pieces": {
"bwv846": {
// Piece metadata [mandatory, see below]
"opus": { ... },
// Sources (scores, audio, analyses, ...)
// [mandatory: at least one score/audio source]
"sources": [ ... ],
// Piece settings
"settings": { ... }
},
"bwv847": {
...
}
},
"settings": {
// Optional
"access": "public"
},
"template": {
// Optional
...
}
}Files should be referred by a URL (preferably to some git repository)
or a stable external identifier. They can also be referred by a local
path (relative to the corpus.json file).
There can be optional settings specific to Dezrann or
any other application.
The metadata/corpus.json file can be split in several
files, for example when the corpus section is hand-curated
whereas the pieces are produced by another script.
The metadata on all public Dezrann corpora is gathered on https://algomus.fr/dezrann/dezrann-catalog.json.
The /metadata directory contains several examples, such as:
minimal.json (minimal example, two pieces, scores, audios, analyses)
openscore-lieder-corpus.json
(corpus) and openscore-lieder.json
(pieces, generated from data in
another git)
bach-fugues.json (with some templating yielding bach-fugues.full.json)
mozart-piano-sonatas.json (with some templating yielding mozart-piano-sonatas.full.json)
supra.json
and weimar-jazz.json
(only corpus, the upload of the pieces is
directly handled by custom scripts)
For most of the baroque/classical/romantic music, the score is the reference, there may be recordings. For jazz/pop, a recording is the reference, there may be a score, such as in several examples on edmus.json.
corpus MetadataThis section describes the corpus as a whole. This is used both on http://dezrann.net/corpora and on corpus pages such as http://dezrann.net/explore/mozart-piano-sonatas.
To draft a corpus, only id and title are
mandatory. The other fields will have to be carefully
written/reviewed/translated to prepare the release of a public
corpus.
id: Mandatory, unique identifier, such as
"bach-fugues", "corelli-trio-sonatas" or "schubert-winterreise.
🏳️ shorttitle: Title with < 25 characters, to be
displayed on http://dezrann.net/corpora and at other places with
reduced space
🏳️ motto: Short text (< 120 characters, nominal
sentence) advertising the corpus, including some number of the works
image: url of one image (recommended size: XXXpx XXXpx)
to illustrate the corpus
🏳️ title: Mandatory, title
🏳️ text: 3-6 lines presenting the corpus, with a
historical / musicological perspective (and not referring to Dezrann).
Links (in markdown) to Wikipedia or other sites are welcome.
🏳️ availability: (for the general public) 1-4 lines
detailing what is actually in Dezrann, including stating which content
is there (scores, autographs, analyses, synchronized audio), both
qualitatively and quantitatively.
status: (for the technical audience) short message
summarizing the main issues with this corpus, that will be displayed on
https://doc.dezrann.net/status
🏳️ contributors.*: see metadata.md#the-contributors-bloc.
Before a public release, the corpus should contain a
contributors.maintainer field.
showcase: a list of 1-4 ids of pieces in the corpus with
a particularly high quality (availability of sources, quality of those).
They will be randomly displayed/showcased from some places such as http://dezrann.net/corpora
Optional external references ref*: see below
quality:corpus and quality:corpus:metadata:
See quality.md. If you just completed a draft
of corpus.json, start with these both fields to
1.
genre
opus: catalogue numbers or opus numbers
piece Metadata and
DataThe piece data may be either static / fully redacted, or produced by a script from some, and/or maintained with templates, see templating.md.
sources DataAt least one source has to be provided.
"sources": [
// Scores (.MEI (preferred), .musicxml, .mscz, .krn), see scores.md
{
"score": "http://gitlab.com/bla/bli/02.krn",
"license": "CC-BY-SA-4.0",
"contributors": {
"encoder": "Jane Foo",
"editor": "KernScores"
}
},
{
"score": "http://bla.net/sonata-12.mscz",
"measure-map": "http://bla.net/sonata-12.mm.json"
},
{
"id": "score-1",
"score": "vivaldi-04.mei"
},
// Scores from Neuma pipeline
{
"score:neuma": "all:collabscore:saintsaens-ref:C080_0"
},
{
"score:gallica": "ark:/12148/bpt6k1162028r"
},
// Audio/video sources
{
"audio": "http://gitlab.com/bla/bli/bla-07.mp3",
"source": "http://a-wonderful-open-music-project.org/",
"license": "CC0-1.0"
},
{
"video": "http://a-wonderful-open-music-project.org/my-video-07.mp4",,
"source": "http://a-wonderful-open-music-project.org/",
"license": "CC0-1.0",
"contributors": {
"performer": "Clara Dee"
},
"info": "Studio recording"
}
{
"video:yt": "df6DFfs",
"contributors": {
"performer": "Clara Dee"
},
"info": "Live recording at the Schnupz Concert Hall"
},
{
"audio:yt": "df6DFfs",
"id": "audio-07"
},
{
"audio:yt": "df6edfs",
"synchro": "http://gitlab.com/bla/bli/synchro.json"
},
// Analyses (.dez format)
{
"analysis": "https://gitlab.com/foo/bar/haydn-symph099-mvt1.dez",
"contributors": {
"analyst": "Dinh-Viet-Toan Le, Francesco Maccarini"
},
"ref:doi": "10.1145/3543882.3543884"
},
// Special sources, with provided image(s) and position files
// Any scan (score, piano rolls...), with positions
{
"images": [ "http://bli/scan-07-page1.jpg", "http://bli/scan-07-page2.jpg" ],
"positions": "http://bli/positions-07.json",
},
// Image with some reference to audio/video
{
"image": "http://bla/spectral-analysis-07.jpg" ],
"positions": "http://bla/positions-07.json",
"ref-source": "audio-07"
},
// Image with some reference to a score
{
"images": [ "http://blu/scan-07.jpg" ],
"positions": "http://blu/positions-07.json",
"ref-source": "score-1"
},
// To be better specified
{
"grid": "| D A | Bm D/F# | G D | Em7 A7"
},
{
"karaoke": "http://bli/07.kar"
},
{
"tab": ""
}
]For each source, one (and only one) of these fields has to be provided:
score (url)
score:neuma or
score:gallicavideo (url)
video:ytaudio (url), including videos that are… not real videos
audio:ytimage or images (url), together with
positionsanalysis (url), in .dez
formatThese fields are optional
contributors (dictionary, see below)name (short name used to refer to this source and
distinguish it from others. For example, for a score, it may correspond
to contributors.encoder. For an audio, it is usually the
same as contributors.performer or
contributors.artist. But it can also be
contributors.editor when relevant, for example to
acknowledge a open-data project)info (short string, < 100 characters)license (SPDX
identifier)source (url)album (string)(Jazz: Concert place / Recording Label, to be detailed)
opus Piece Metadataopus: Basic
information(when applicable) corpus or collection
corpus for generic names such as
"Mozart piano sonatas" (they do not bring more information
than what is in piece:title and composer)collection for data giving more information
("London symphonies", "Le quattro stagioni").
The two fields may be different.(mandatory) id: Unique identifier, such as
"bach/bwv847". Should include some opus
information and be consistent with other identifiers used in Dezrann.
(Note that there is also a id field outside
opus, it will be removed at some time.)
opus: Titles(mandatory) 🏳️ title or piece:title: We
follow the Open Opus
style guide, except that we do not put the opus information (see
opus below).
(when applicable) 🏳️ nickname or
piece:nickname
(when applicable) movement:num and
movement:title (in this case, we do not use
title and nickname but rather
piece:title and piece:nickname)
(mandatory) opus: catalogue number or opus number, such
as "opus": "K.551". There are something numbers that are
outside catalogue/opus numbers and that may be encoded as other fields,
such as "symphony:num": "41" for Mozart’s Jupiter.
opus: Other
informationgenre
key: key/tonality of this file (movement, not piece),
such as D minor of Bb major
meter: such as 4/4 or
3/4, 6/8. More complex meters can be given through measure maps in
the score source.
corpus, opus, and one of the
sourcesyear: year, or range of years
ref: External
references{
"ref": "https://github.com/DCMLab/mozart_piano_sonatas/tree/main/scores",
"ref:dezrann": "https://dezrann.net/~/mozart-piano-sonatas/k280.1",
"ref:corpus": "https://dezrann.net/~/mozart-piano-sonatas",
"ref:gallica": "ark:/12148/btv1b55002567w",
"ref:neuma": "all:collabscore:saintsaens-ref:C080_0",
"ref:doi": "10.1145/3543882.3543884",
"ref:wikipedia": "Symphony_No._35_(Mozart)",
"ref:musicbrainz": "611a55ef-cfb4-3bbf-9d65-7d4af5506093",
"ref:musicbrainz:recording": "ed80fa40-4871-4669-9509-18bcfee420fd",
"ref:imslp": "19_Sonatas_for_the_Piano_(Mozart,_Wolfgang_Amadeus)",
"ref:kernscores": "mozart/sonata/sonata04-1.krn",
"ref:rism": "990062490",
"ref:wjd": 70
}ref is any URL. The other ref:s are
identifiers on some sites/databases, as defined on RefsBase.ts.
When possible, put a reference of the particular movement. But it’s also
acceptable to put an external reference to a piece or even to the
collection.
It is strongly recommended to include as much information as
possible. Specifically, for scores, it is advisable to include
ref:rism to trace the primary source used by the
encoder(s).
ref:dezrann and ref:corpus are public URLs
to the very piece/corpus on Dezrann
contributors bloc🏳️ One of the three following fields is usually defined:
(for opus)
composerartist (on pop music, when the music is frequently
related to the artist rather than the composer)performer (for transcription of jazz solos)(for an audio source)
performer🏳️ The following fields can also be used:
lyricistarrangereditor (edition, publisher)encoder (digital encoding of an existing edition)transcriber (transcription of solos)analyst (supervision of analyses)annotator (annotation, following precise analytical
guidelines)maintainer (long-term maintainance of the corpus or the
piece)In either corpus, piece, or source metadata, the 🏳️ fields can be localized, such as:
"collection": "Le quattro stagioni",
"collection:en": "The four seasons",
"collection:fr": "Les quatre saisons",
(...)
"ref:wikipedia:fr": "Les_Quatre_Saisons",Do not put :xx for the title in the original
language.
Note that even contributor names such as composers may be known with some variations across languages.
"contributors": {
"composer": "Johann Sebastian Bach",
"composer:fr": "Jean-Sébastien Bach"
}Do not localize numeric or formalized fields such as
key, opus, or genre.