» Dezrann corpus/developer documentation
This page provides the status of the Dezrann corpora, including
statistics on sources (scores, measure maps, audio/video content, and
synchronization). There is a particular emphasis on quality ratings for corpora and pieces.
Integrating multi-modal data from various sources is challenging. Rather
than claiming that the integration is perfect, the main objective here
is to accurately describe what is working, what is not, and to document
any issues encountered. There are also links to open issues on the dezrann-corpus
GitLab. While we prefer to have issues resolved, it is far better to
have open issues describing problems than to ignore them! You are
welcome to report issues and/or contribute to fixing them.
2025-05-01 20:42
🌏 Public corpora
bach-fugues » The
Well-Tempered Clavier, Book I
- https://www.dezrann.net/explore/bach-fugues
- Bach’s Well-Tempered Clavier has been extensively studied, and
systematic analyses of Bach’s fugues have been published by Prout
(1910), Tovey (1924), Keller (1965), and Bruhn. Giraud et al
(2015) published an annotation dataset detailing the 24 fugues of
the first book, together with algorithms for fugue analysis. The Dezrann
corpus contains these 24 annotated fugues, with scores synchronized to
open recordings by Kimiko
Ishizaka as well as performances recorded by the Bach Netherlands
Society
- ✅ The corpus is in good condition, with only minor issues. For the
majority of pieces, the score are well presented, include analyses, and
are synchronized with two videos.
- Content
- 24 pieces
- 24 scores, 24 analyses, 24 recordings, 32 videos, 48
synchronizations
- On server: 48 audio, 48 wave, 24 score, 23 3d
- Quality: corpus: 4.0 [1], corpus:metadata:
4.0 [1], annotation: 0→5 (avg 4.8) [24], audio: 4.0 [24],
audio:synchro: 0→4 (avg
2.9) [24], metadata: 3.0
[24], musical-time: 3.0 [24], score: 0→3 (avg 2.9) [24]
- Metadata
(71 KB)
- License: ? (scores), CC-BY-3.0 (audio), ODbL-1.0 (annotations)
- Maintainers
- Issues: 9
opened, 9 closed
- Rebuilt: 2025-01-08 362M
mozart-piano-sonatas
» Mozart Piano Sonatas
- https://www.dezrann.net/explore/mozart-piano-sonatas
- The corpus consists of complete scores of all 18 sonatas with form,
harmony, and cadence annotations (Hentschel et al., 2021).
Sonatas 1 (K279), 2 (K280) and 5 (K283) also have texture annotations
(Couturier et al.,
2022). Some movements also have synchronized audio. The corpus uses
measure maps (Gotham
et al., 2023) to improve annotation interoperability.
- ✅ The corpus is in good condition, with only minor issues. It is
fully reproducible from a long-term archive. For the majority of pieces,
the score are well presented, include analyses. Some of them have
synchronized audio.
- Content
- 72 pieces
- 54 scores, 54 measure maps, 63 analyses, 56 recordings, 1 video, 56
synchronizations
- On server: 54 audio, 54 wave, 53 score, 53 3d
- Quality: corpus: 5.0 [1], corpus:metadata:
4.0 [1], annotation: 5.0
[54], audio: 4.0 [54], audio:synchro: 0→3 (avg 2.6) [54], metadata: 4.0 [54],
musical-time: 1→5 (avg
4.9) [53], score: 3.0
[54], warning: ? [0]
- Metadata
(132 KB)
- License: CC-BY-NC-SA-4.0 (scores), ODbL (annotations), CC0-1.0,
CC-BY-NC-SA-3.0 (specific recordings)
- Maintainers
- Issues: 10
opened, 3 closed
- Rebuilt: 2025-01-24 356M
mozart-string-quartets
» Mozart String Quartets
- https://www.dezrann.net/explore/mozart-string-quartets
- The corpus shows here 72 out of the 86 movements. Cadence and key
and form annotations are provided for some of these movements (mainly
first movements, in sonata form), as published in (Allegraud et al., 2019) and
(Feisthauer,
2021).
- ✅ The corpus is in good condition, with only minor issues. However,
the quality of scores is not equal among the corpus, and some pieces are
not yet synchronized.
- Content
- 109 pieces
- 83 scores, 83 analyses, 84 recordings, 83 synchronizations
- On server: 72 score, 72 3d, 21 audio, 21 wave
- Quality: corpus: 3.0 [1], corpus:metadata:
4.0 [1], annotation: ??, audio: 0→3 (avg 0.7) [83], audio:synchro: 0→4 (avg 0.3) [83], metadata: 2.0 [83],
musical-time: 0.0 [83], score: ??
- Metadata
(215 KB)
- License: ? (scores), ODbL (annotations), CC-BY-NC-ND-3.0
(audio)
- Maintainers
- Issues: 5
opened, 1 closed
- Rebuilt: 2025-01-08 482M
classical-symphonies
» 🚧 Classical and Early-Romantic Symphonies
- https://www.dezrann.net/explore/classical-symphonies
- The corpus includes first movements of 24 symphonies composed
between 1779 and 1824: the last six Haydn Symphonies (99–104), three
Mozart Symphonies (38–40), and all nine Beethoven Symphonies. These
movements are analyzed with textural annotations by (Le et al.,
2022). Audio recordings by the Bamberger Symphoniker (Mozart) and by
the Royal Philharmonic Orchestra (Haydn,
Beethoven,
1960-61) will be soon added.
⚠️ Note that the experience with
this corpus is not smooth due to numerous performance issues when
displaying the scores. Moreover, many pieces are not synchronized
with the audio recordings. However, the annotation data of the corpus is
in good condition.
- ⚠️ The experience with this corpus is not smooth due to numerous
performance issues when displaying the scores. Moreover, many pieces are
not synchronized with the audio recordings. However, the annotation data
of the corpus is in good condition.
- Content
- 48 pieces
- 24 scores, 24 analyses, 22 recordings, 6 synchronizations
- Quality: corpus: 2.0 [1], corpus:metadata:
4.0 [1], annotation: 3→5 (avg 3.5) [24], audio: ??, audio:synchro: 0→3 (avg 0.6) [24], metadata: 4→5 (avg 5.0) [24],
musical-time: ??, score: ??
- Metadata
(54 KB)
- License: CC-BY-NC-SA-4.0 (scores), ODbL (annotations), CC0-1.0,
CC-BY-NC-SA-3.0 (specific recordings)
- Maintainers
- Issues: 10
opened, 1 closed
- Rebuilt: 2025-01-09 1.2G
schubert-winterreise
» Winterreise (Winter Journey)
- https://www.dezrann.net/explore/schubert-winterreise
- The Schubert Winterreise Dataset (SWD, Weiß 2021) contains, for
all of the 24 lieder, scores, harmonic and formal analyses, as well as
synchronized recordings. The free recordings with Gerhard Hüsch and
Hanns-Udo Müller (1933) and Randall Scarlata and Jeremy Denk (2006) are
available through Dezrann.
- The corpus is in good condition. Most scores are well presented,
with two synchronized recordings and analyses. ⚠️ However, some issues
remain on specific pieces.
- Content
- 24 pieces
- 24 scores, 24 measure maps, 24 analyses, 48 recordings, 48
synchronizations
- Quality: corpus: 3.0 [1], corpus:metadata:
4.0 [1], audio: 4.0
[24], audio:synchro: 4.0 [24]
- Metadata
(45 KB)
- License: CC-BY-3.0 (scores, annotations), PDM-1.0, CC-BY-NC-ND-3.0
(audio)
- Maintainers
- Issues: 13
opened, 3 closed
- Rebuilt: 2024-03-11 361M
openscore-lieder
» 19th Century Lieder from female composers
- https://www.dezrann.net/explore/openscore-lieder
- The OpenScore Lieder corpus consists of over 1,300 songs from the
long nineteenth century. The collection is available to play online at
musescore.com
and is also available for
download. For more on the score collection see (Gotham and Jonas
2021) or this
magazine piece. This Dezrann collection presents a subset of the
scores by women composers, including harmonic analyses published on the
‘When
in Rome’ meta-corpus reported in Gotham
et al. 2023a. The corpus uses measure maps (Gotham et al., 2023b)
to improve annotation interoperability.
- ✅ The corpus is in good condition, with only minor issues. The
score are well presented, some of them include analyses and/or
synchronized recordings. Perspectives include adding more analyses and
open recordings.
- Content
- 174 pieces
- 174 scores, 174 measure maps, 53 analyses, 32 recordings, 27
synchronizations
- On server: 170 score, 170 3d, 32 audio, 32 wave
- Quality: corpus: 3.0 [1], corpus:metadata:
4.0 [1], metadata: 3.0
[174], musical-time: 3.0 [174], score: 4.0 [174], annotation: 3.0
[53], audio: 4.0 [32], audio:synchro: 3.0 [27]
- Metadata
(178 KB)
- Issues: 5
opened, 3 closed
- Rebuilt: 2025-01-30 283M
weimar-jazz » 🚧 Weimar Jazz
Database
- https://www.dezrann.net/explore/weimar-jazz
- Started at the University of Music in Weimar, the Jazzomat project studied the
jazz repertoire, in particular by transcribing and analyzing 400+ solos
and aligning them to recordings. The Dezrann corpus contains 330+ of
these high-quality jazz transcriptions, with chords, sections, and form
annotation, from which 200+ with synchronized audio.
🚧 Note
that work is still ongoing on this corpus, in particular the display of
scores could be improved
- 🚧 Work on this corpus is still ongoing to improve the integration
into Dezrann. The rendering of scores could be improved with a better
extraction from WJD internal data. The synchronization is sometimes
off.
- Content
- 456 pieces
- 456 scores, 456 measure maps, 456 analyses, 329 videos, 329
synchronizations
- On server: 333 score, 333 3d, 228 audio, 228 wave
- Quality: corpus: 2.0 [1], corpus:metadata:
4.0 [1], annotation: 3.0
[456], audio: 4.0 [456], musical-time: 2.0 [456], score: 3.0
[456]
- Metadata
(623 KB)
- Issues: 9
opened, 3 closed
- Rebuilt: 2025-01-29 1.5G
supra » SUPRA
- https://www.dezrann.net/explore/supra
- The Stanford University
Piano Roll Archive is a research portal for some rolls digitized
from the Stanford Libraries’ collection of 15,000+ piano and organ
rolls. SUPRA contains 456 Welte T-100 piano rolls from the years
1905-1928 with rendered ‘expressive’ audio, talking into account
dynamics and tempo information. The Dezrann corpus shows here these 456
piano rolls aligned to the audio files.
- ✅ The corpus is in good condition, with only minor issues. Most
piano rolls are well presented, with synchronized audio. Perspectives
include to add scores and analyses for some pieces.
- Content
- 456 pieces
- ⚠️ No sources ?
- On server: 456 audio, 456 wave
- Quality: corpus: 3.0 [1], corpus:metadata:
3.0 [1], musical-time: 0.0 [456], audio: 3.0 [456],
audio:synchro: 3.0 [456]
- Metadata
(314 KB)
- Issues: 7
opened, 4 closed
- Rebuilt: 2024-03-11 3.1G
slovenian-folk-song-ballads
» Slovenian Folk Song Ballads
- https://www.dezrann.net/explore/slovenian-folk-song-ballads
- Zbirka Slovenske ljudske pripovedne pesmi vsebuje transkribirano
terensko gradivo, ki so ga zbrali slovenski etnologi, folkloristi,
etnomuzikologi in različni sodelavci Glasbenonarodopisni inštitut ZRC
SAZU v letih od 1819 do 1995. Tematsko razvrščena v družinske
pripovedne pesmi, obsega 404 enoglasnih zapisov ljudskih pesmi in
vključuje začetni verz besedila, obsežne metapodatke in glasbeno
analizo, ki zajema konture, harmonijo in strukturo pesmi (melodije in
besedila)(glej
Borsan et al., 2023). Poleg tega vključuje omejeno število (23)
razpoložljivih posnetkov. Uredniški odbor zbirke: Vanessa Nina Borsan,
aktualni člani Glasbenonarodopisni
inštitut ZRC SAZU (Mojca Kovačič, Marjeta Pisk) in raziskovalna
skupina Algomus.
- ✅ The corpus is in good condition, with only minor issues. It is
fully reproducible from a long-term archive. For the majority of pieces,
the score are well presented and include analyses. A few of them have
synchronized recordings.
- Content
- 404 pieces
- 404 scores, 404 measure maps, 404 analyses, 404 recordings, 404
synchronizations
- On server: 404 score, 404 3d, 23 audio, 23 wave
- Quality: corpus: 5.0 [1], corpus:metadata:
4.0 [1], annotation: 4.0
[404], audio: 5.0 [404], audio:synchro: 3.0 [404], metadata: 4.0
[404], musical-time: 4.0 [404], score: 3.0 [404]
- Metadata
(718 KB)
- Issues: 5
opened, 5 closed
- Rebuilt: 2024-12-19 114M
erkomaishvili
» Traditional Georgian Sacred Music sung by Artem Erkomaishvili
- https://www.dezrann.net/explore/erkomaishvili
- The Erkomaishvili dataset consists of historic tape recordings of
three-voice Georgian religious songs performed in 1966 by the master
chanter Artem
Erkomaishvili. Successive overdubbing recordings were done for each
song: top voice, then top and second voice, then the three voices
together. These recordings have been digitized, curated, and analyzed by
(Rosenzweig,
2020) for computational musicology research. The dataset includes
audio material, scores based on the transcriptions by (Shugliashvili,
2014), synchronizations, and F0 annotations. The Dezrann corpus contains
here all 101 songs of the Erkomaishvili dataset, with scores
synchronized with the audio files.
- ✅ The corpus is in good condition, with only minor issues. For the
majority of pieces, the score are well presented, with the four
synchronized audios.
- Content
- 101 pieces
- 101 scores, 404 recordings, 404 synchronizations
- On server: 404 audio, 404 wave, 101 score, 101 3d
- Quality: corpus: 3.0 [1], corpus:metadata:
4.0 [1], audio: 4.0
[101], audio:synchro: 4.0 [101]
- Metadata
(211 KB)
- Issues: 7
opened, 7 closed
- Rebuilt: 2025-01-31 341M