Daniel de Lafeuille, Nouvelle Carte D’Italie - Nieuwe Kaart van Italien, 1706, Wikimedia Commons, Public Domain
Abstract: This data story illustrates a digital exploration of reserach data on opera holdings of the Doria Pamphilj Archive by the Partitura project of the German Historical Institute in Rome (DHI Rome). By enriching the Partitura dataset with established authority sources such as Wikidata, RISM, GeoNames, and transforming it to LOD, new analytical insights into the historical and musicological dimensions of the opera collection are revealed. Leveraging services provided by NFDI4Culture and EOSC, the study exemplifies how data federation with European infrastructures can significantly enhance interoperability of research data and create multimodal research perspectives. Methodologically, the story uses examples ranging from genre distribution analyses to geospatial mappings of opera premiere locations as well as music information retrieval through federated SPARQL queries.
Introduction
The Archivio Doria Pamphilj in Rome houses a significant yet largely unexplored collection of operatic materials dating from the 16th to the 19th centuries. Originally assembled between 1764 and 1777 by Giorgio Andrea IV Doria Landi Pamphilj (1747–1820), this archive includes approximately 300 bibliographic units encompassing sacred music from the 16th and 17th centuries, vocal and instrumental music from the 18th century, and printed music from the 19th century. Notably, the archive contains 27 complete opera scores, 21 collections of varied arias ("Arie diversi"), and 128 individual aria manuscripts from the late 18th and early 19th centuries.
Palazzo Doria Pamphilj, Rome / 1779; Wikimedia Commons, Public Domain
To facilitate scholarly access and promote deeper musicological research, the German Historical Institute in Rome (DHI) initiated the "Partitura Project," supported by the German Research Foundation (DFG) between 2008 and 2015 under the leadership of Dr Roland Pfeiffer. The project accomplished comprehensive digitisation of opera scores from both the Doria Pamphilj and the Massimo collections, resulting in a digital archive comprising approximately 115,000 images and a database of around 30,000 aria incipits.
Research questions
Our data story addresses two research questions:
firstly, whether selected datasets from the DHI can be effectively federated within European data spaces using data-driven methods, thereby generating novel insights that surpass the original data;
and secondly, how these federated data can be employed to unlock new research potentials and concrete scholarly advancements in music-historical studies.
Methodologically, the exploration was designed as a structured experiment, constrained to a 100-hour timeframe spread across four weeks. The investigation exclusively utilised cloud-based European infrastructures, notably those provided by NFDI4Culture and the European Open Science Cloud (EOSC).
A key aspect involved employing artificial intelligence and knowledge graphs, particularly for data curation, semantic enrichment, and assisted programming. Quality control was maintained through a "human-in-the-loop" approach, while data federation occurred in real-time during analyses. The experiment aimed at demonstrating measurable improvements in data interoperability, enhanced discoverability and interpretative value, and the creation of new multimodal interaction opportunities with historical datasets.
Envisaged data federation and methodological approach
Data preparation
Initial Dataset
Although openly accessible via the Partitura website, the initial metadata set presented significant challenges, such as inconsistent spellings, variations in dates, and non-standardised attribution of composers, operas, and arias. Additionally, the metadata utilised RISM identifiers but lacked further authoritative references or standardised interfaces for data exchange.
Challenges in the Partitura source data
Application of 5 Star Linked Open Data Principles
The data preparation adhered strictly to Sir Tim Berners-Lee’s 5 Star Linked Open Data principles, progressing from basic availability as open licensed data online, through structured and standardised formats, ultimately reaching the highest level of interlinking with external datasets using W3C standards (RDF and SPARQL). This process ensured robust semantic interoperability and facilitated scholarly reuse and citation of data.
Step 1: Migration to relational database with REST API
As a preliminary step, data were systematically transferred into a no-code database environment from NFDI4Culture (NocoDB). This enabled initial data disambiguation and created structured access via APIs suitable for automated processing. The resulting structured dataset consisted of clearly disambiguated entities:
Step 2: Semi-automated enrichment with authority data
Subsequent enrichment involved semi-automated linking of entities with authority data sources. The achieved accuracies and efforts involved in linking were documented precisely as follows:
Entity Type
Linked Source
Accuracy (%)
Duration
Composers
Wikidata
78.05
10 min
Composers
RISM
95.12
10 min
Locations
Wikidata
75.00
30 min
Theatres
Wikidata
100.00
180 min
Inventory Entries
RISM
19.15
20 min
Operatic Scores
RISM
27.65
0.5 days
Additional curation
Manual
-
1.5 days
This enrichment process substantially improved the dataset’s scholarly value and interoperability.
Step 3: Application of the NFDIcore ontology
The structured data were then annotated using the NFDIcore ontology, which provided a standardised semantic framework specifically designed for representing scholarly data within the NFDI4Culture research data infrastructure. Applying this ontology ensured that metadata about operatic resources and related scholarly information could seamlessly integrate into federated research infrastructures.
NFDIcore and its different modules (Creator: Tabea Tietz)
Step 4: Transformation into Linked Open Data
Finally, the enriched and structured data were transformed into Linked Data (RDF triples), stored within a Triple Store to facilitate federation and interoperability with other linked datasets. The resulting dataset was made publicly accessible via a dedicated SPARQL endpoint, hosted on the LOD.ACADEMY infrastructure. This transformation allowed live querying and federated analyses, significantly enhancing scholarly accessibility and the potential for novel analytical perspectives.
Final structure of the data as Partitura Knowledge Graph
Data analysis
Analysis 01: Distribution of opera seria, opera buffa and oratorio in the collection
PREFIXrism:<http://rism.online/>PREFIXct:<http://data.linkedct.org/resource/linkedct/>PREFIXschema:<http://schema.org/>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXnfdicore:<https://nfdi.fiz-karlsruhe.de/ontology/>PREFIXcto:<https://nfdi4culture.de/ontology#>PREFIXpartitura:<https://nocodb.nfdi4culture.de/dashboard/#/nc/pxjfen9oerev7k9/mj3w842gefzot7u?rowId=>PREFIXrism:<https://rism.online/sources/>SELECT(?partituraLabelas?opera)(?composerLabelas?composer)(SAMPLE(?locationLabel)as?location)(SAMPLE(?year)as?year)(SAMPLE(?rismItem)as?rism)(SAMPLE(?partitura)as?partitura)WHERE{SERVICE<https://lod.academy/dhi-rom/data/partitura/sparql>{SELECT?partitura?partituraLabel?composerLabel?rismItem?locationLabel?year{?partituraacto:Item.?partiturardfs:label?partituraLabel.?partituraschema:composer?composer.?composerrdfs:label?composerLabel.?partituracto:relatedLocation?location.?locationrdfs:label?locationLabel.?partituracto:relatedEvent?event.?eventnfdicore:startDate?year.?partituracto:relatedItem?rismItem.}}# with a relation to Pietro Metastasio?rismItemcto:relatedPerson<https://rism.online/people/97823>.}GROUP BY?partituraLabel?composerLabelORDER BY?partituraLabel?composerLabel
Show query result 06
PREFIXrism:<http://rism.online/>PREFIXct:<http://data.linkedct.org/resource/linkedct/>PREFIXschema:<http://schema.org/>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXnfdicore:<https://nfdi.fiz-karlsruhe.de/ontology/>PREFIXcto:<https://nfdi4culture.de/ontology#>PREFIXpartitura:<https://nocodb.nfdi4culture.de/dashboard/#/nc/pxjfen9oerev7k9/mj3w842gefzot7u?rowId=>PREFIXrism:<https://rism.online/sources/>SELECT(?partituraLabelas?opera)(?composerLabelas?composer)(SAMPLE(?locationLabel)as?location)(SAMPLE(?year)as?year)(SAMPLE(?rismItem)as?rism)(SAMPLE(?partitura)as?partitura)WHERE{SERVICE<https://lod.academy/dhi-rom/data/partitura/sparql>{SELECT?partitura?partituraLabel?composerLabel?rismItem?locationLabel?year{?partituraacto:Item.?partiturardfs:label?partituraLabel.?partituraschema:composer?composer.?composerrdfs:label?composerLabel.?partituracto:relatedLocation?location.?locationrdfs:label?locationLabel.?partituracto:relatedEvent?event.?eventnfdicore:startDate?year.?partituracto:relatedItem?rismItem.}}# with a relation to Pietro Metastasio?rismItemcto:relatedPerson<https://rism.online/people/97823>.}GROUP BY?partituraLabel?composerLabelORDER BY?partituraLabel?composerLabel
Example 07: All operas of Nicolo Piccinni in Doria Pamphilj
PREFIXsc:<http://purl.org/science/owl/sciencecommons/>PREFIXct:<http://data.linkedct.org/resource/linkedct/>PREFIXschema:<http://schema.org/>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXnfdicore:<https://nfdi.fiz-karlsruhe.de/ontology/>PREFIXcto:<https://nfdi4culture.de/ontology#>PREFIXpartitura:<https://nocodb.nfdi4culture.de/dashboard/#/nc/pxjfen9oerev7k9/mj3w842gefzot7u?rowId=>SELECT(?partituraLabelas?opera)(?composerLabelas?composer)(SAMPLE(?locationLabel)as?locationLabel)(SAMPLE(?year)as?year)(SAMPLE(?rismItem)as?rismExample)(SAMPLE(?partitura)as?partituraExample)WHERE{SERVICE<https://lod.academy/dhi-rom/data/partitura/sparql>{SELECT?partitura?partituraLabel?composerLabel?rismItem?locationLabel?year{?partituraacto:Item.?partiturardfs:label?partituraLabel.?partituraschema:composer?composer.?composerrdfs:label?composerLabel.?partituracto:relatedPerson<https://rism.online/people/17950>.?partituracto:relatedLocation?location.?locationrdfs:label?locationLabel.?partituracto:relatedEvent?event.?eventnfdicore:startDate?year.?partituracto:relatedItem?rismItem.}}# with a relation to Pietro Metastasio?rismItemcto:relatedPerson<https://rism.online/people/97823>.}GROUP BY?partituraLabel?composerLabelORDER BY?partituraLabel?composerLabel
Show query result 08
PREFIXsc:<http://purl.org/science/owl/sciencecommons/>PREFIXct:<http://data.linkedct.org/resource/linkedct/>PREFIXschema:<http://schema.org/>PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIXrdfs:<http://www.w3.org/2000/01/rdf-schema#>PREFIXnfdicore:<https://nfdi.fiz-karlsruhe.de/ontology/>PREFIXcto:<https://nfdi4culture.de/ontology#>PREFIXpartitura:<https://nocodb.nfdi4culture.de/dashboard/#/nc/pxjfen9oerev7k9/mj3w842gefzot7u?rowId=>SELECT(?partituraLabelas?opera)(?composerLabelas?composer)(SAMPLE(?locationLabel)as?locationLabel)(SAMPLE(?year)as?year)(SAMPLE(?rismItem)as?rismExample)(SAMPLE(?partitura)as?partituraExample)WHERE{SERVICE<https://lod.academy/dhi-rom/data/partitura/sparql>{SELECT?partitura?partituraLabel?composerLabel?rismItem?locationLabel?year{?partituraacto:Item.?partiturardfs:label?partituraLabel.?partituraschema:composer?composer.?composerrdfs:label?composerLabel.?partituracto:relatedPerson<https://rism.online/people/17950>.?partituracto:relatedLocation?location.?locationrdfs:label?locationLabel.?partituracto:relatedEvent?event.?eventnfdicore:startDate?year.?partituracto:relatedItem?rismItem.}}# with a relation to Pietro Metastasio?rismItemcto:relatedPerson<https://rism.online/people/97823>.}GROUP BY?partituraLabel?composerLabelORDER BY?partituraLabel?composerLabel
Example 09: All works of Nicoló Piccinni with places of premiere
The digital data exploration experiment described in this data story has successfully demonstrated the scholarly value and feasibility of federating the operatic holdings of the Doria Pamphilj Archive with broader European data infrastructures. The data preparation process involved extensive semantic enrichment, ultimately achieving an average enrichment quotient of 64.3%. Starting from an original metadata table of 154 rows and 13 columns, the data evolved through two distinct integration stages: firstly, into a relational database comprising 2,849 records, and subsequently into a knowledge graph containing 3,400 semantic statements.
The technical implementation required approximately 1,400 lines of program code, notably supported by AI-assisted programming tools (co-piloting). Data preparation utilised 432 lines of code (around 90% AI-generated), data integration required 211 lines of code (also approximately 90% AI-assisted), and analysis scripts comprised 757 lines of code (around 30% AI-generated). Importantly, the federated data workflows leveraged infrastructures spanning four European countries: Germany, Italy, Poland, and Switzerland.
In broader terms, the experiment achieved its primary objectives by demonstrating the practical benefits of federating cultural data with European data spaces. The use of cloud-based European research infrastructures and linked data services (such as NFDI4Culture, EOSC, and related authority data services) proved highly productive. Artificial Intelligence, especially in data preprocessing and assisted programming tasks, significantly enhanced efficiency, although manual curation efforts remained unexpectedly high. Nevertheless, this manual work yielded additional exploratory analytical insights, suggesting a valuable symbiosis between automated and manual curation processes.
Finally, the project underscored the need for developing new quality assurance methods suitable for federated research data, an ongoing discussion within the research data community. The NFDI framework provides a vital platform for dialogue among researchers, funding bodies, and policymakers. Through federation with external sources like RISM, Corago, and NFDI4Culture, several promising new research perspectives on operatic and music-historical scholarship emerged, offering substantial avenues for future scholarly investigation.
Nicoló Piccinni, Demofoonte; Bibl. del Cons. di Musica S. Pietro a Majella / IMSLP, Public Domain