Best practices for Linked Data
Transcripción
Best practices for Linked Data
Best practices for Linked Data Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net [email protected] Phone: 34.91.3367417, Fax: 34.91.3524819 Acknowledgements: M. Poveda, V. Rodríguez-Doncel , D. Vila BabeLData: TIN2010-17550 Linked Data: why it is important? • Facilitate data integration § § § § § From heterogeous sources In different formats Different granularity In different languages From different countries © Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December BD AEMET BD VIAF BD BNE BD IGN BD Prisa BD DBpedia Data Integration BNE Ubicado en 1605 Alcalá de Henares El Quijote Año de Publicación Same as Autor M. Cervantes birthPlace M. Cervantes Alcalá de Henares M. Cervantes Year of publication creator Don Quixote 1960 Alcalá de Henares Alcalá de Henares Translated into located guía Hebrew Tapas Siglo de Oro VIAF Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December Temperatura 20º 3 3 Foundations Unique identifiers: URI RDF(S) models identify or name a resource Equivalence links to other datasets Same As Data navigation http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of Person Cer Is a Work Is a Cervantes http://datos.bne.es/resource/XX1718747 Is creator of Cer El Quijote http://datos.bne.es/resource/XX3383563 Same As Same As Cervantes http://viaf.org/viaf/17220427 Cervantes http://www.w3.org/DesignIssues/LinkedData.html http://dbpedia.org/resource/Miguel_de_Cervantes Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December 4 The model (Ontology) and the data for humans Idiom translation Is creator of Year Work birthPlace Person Ontology Place Publication date Located at Has subject Library Catalán translation 1960 Is creator of El Quijote birthPlace Cervantes Alcalá de Henares Publication date Has subject Located in Vida de Cervantes Data BNE Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December 5 5 The model and the data for Machines Language Ontology http://iflastandards.info/ns/fr/frbr/frbrer/C1002 translation Is creator of work Año http://iflastandards.info/ns/fr/frbr/frbrer/C1001 Person http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Publication date birthPlace Has subject http://geo.linkeddata.es/ontology/Municipio Located in Biblioteca http://xmlns.com/foaf/0.1/Organization Catalán http://datos.bne.es/resource/XX1924295 http://geo.linkeddata.es/resource/Alcalá de Henares translation Don Quijote de la Mancha 1960 http://datos.bne.es/resource/XX3383563 Es autor Cervantes Saavedra, Miguel de birthPlace http://datos.bne.es/resource/XX1718747 Publication date Has subject Located in BNE http://datos.bne.es/resource/bimo0002045496 Vida de Miguel de Cervantes Saavedra http://datos.bne.es/# Asunción Gómez-Pérez 6 W3C @ Spain – 2013 Madrid, 18th December Data 6 Linked Data is to be processed by machines Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December The generation process Providers Domains Asunción Gómez-Pérez Sources W3C @ Spain – 2013 Madrid, 18th December Languages The Linked Data Generation Process Data Curation Specification Exploitation Modelling Publication Generation Linking 9 There is no One-Size-Fits-All Formula Lot of data in many domains … Music On-line activities Publications E-Gov Cross-domains Geographic Life Sciences I want to use Linked Open Data § Who generated the LD dataset? § When the LD dataset was created? § How the LD dataset was created? § Is the latest version of the LD dataset? § Is the license information clearly stated in the LD dataset? § How is LD licenses offered? § Is the LD dataset monolingual or multilingual? LOD observations • How the LD generation process influence the use of the data by third parties? • • • • Vocabularies Licenses Language Provenance How to prevent GIGO GARBAGE PROCESS Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December Vocabularies 14 th Cervantes at the data level URI URI URI URI URI http://www.server1.org/resource/Cervantes Cervantes Same as http://d-nb.info/gnd/11851993X Same as http://datos.bne.es/resource/XX1718747 Author Same as Phone http://www.server2.es/resource/Cervantes D. Quijote Date of Birth 914 296 093 #People Size 1547 Same as 1547 http://geo.linkeddata.es/page/resource/Municipio/Cervantes Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December 276,4 km² Cervantes and a bit of semantics rdf:type Retaurant URI URI URI URI URI http://www.server1.org/resource/Cervantes rdf:type http://d-nb.info/gnd/11851993X Person rdf:type Same as http://datos.bne.es/resource/XX1718747 rdf:type Street http://www.server2.es/resource/Cervantes Author D. Quijote Date of Birth 1547 rdf:type Municipality http://geo.linkeddata.es/page/resource/Municipio/Cervantes Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December Cervantes (Person) Cervantes foaf foaf:Agent foaf:Group foaf:Document foaf:Organization foaf:Person foaf:mbox foaf:publications foaf:Image - foaf:firstName - foaf:surname foaf:img - foaf:birthday owl:Thing foaf:knows foaf:depiction foaf:homepage “Miguel” instanceOf instanceOf foaf:firstName “de Cervantes Saavedra” foaf:surname bibliothek:Cervantes foaf:birthday “29-09” instanceOf http://www.BibliothekBerlin/…/images/Quixote.tif foaf:img foaf:publications foaf:depiction http://.../authors/cervantes.png http://www.BibliothekBerlin.com/.../3-538-06892-5 17 instanceOf License Information 18 LOD observations: Licenses How Open is the Open Linked Data Cloud? An example: the British National Bibliography License Information is not up to date Metadata information without license information License information provided as XML Linked Data Rights pattern http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/ Lenguage 25 Rationale: LOD is dominated by the English Language § 2007 § 2009 § 2013 Questions: 1. Searching resources in a particular language 2. Distribution of natural languages across RDF datasets? 3. Usage of language tags to indicate the natural language of RDF tags? 1. Distribution of usage of language tags 2. Distribution of literals tagged as English vs other languages 3. Distribution of literals tagged in languages other than English 26 Example of multilingual library resource The dataset publisher does not tag the language of the content of different fields “Ernest Hemingway” and “El viejo y el mar” MARC 21 records Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December 27 Multilingualism and the Linked Data Process How to represent language information for datasets? • # VoiD description :bne a void:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> . # DCAT description :bne a dcat:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> How to represent language information in Linked Data? § Traditional annotation properties for most cases dbpedia:Miguel_de_Cervantes rdfs:label "Miguel de Cervantes"@es . "ミゲル・デ・セルバンテス"@ja . "미겔 데 세르반테스"@ko . § Richer models for more demanding applications # LEMON isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic]. :cartographic a lemon:LexicalEntry; lemon:form [lemon:writtenRep “cartográfico”@es; isocat:grammaticalGender isocat:masculine]; lemon:form [lemon:writtenRep “cartográfica”@es; isocat:grammaticalGender isocat:feminine]. isocat:grammaticalGender rdfs:subPropertyOf lemon:property. Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18th December Implementation of the recording of data and metadata provenance Generation process • PROV-O @W3C creator Resource provenance • DC File.txt creaDonDate rights John 12-‐2-‐1900 GPL used Revision Process generatedBy PROVENANCE Model (RDF(S)) Filev1. txt RDF Store 29 1 Conclusions The use of § § § § § Data curated Use vocabularies widely known License metadata in RDF Language metadata in RDF Provenance metadata in RDF § Will influence the use of the linked data by third parties Asuncion Gomez-Perez W3C @ Spain – 2013 Madrid, 18th December Thanks for your attention ! Asuncion Gomez-Perez Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June 31 There is no One-Size-Fits-All Formula Phase BNE DC Wgs84 time geometry2rdf NOR2O DNB VIAF LIBRIS DBPEDIA Publication Exploitation PRISA INE Scovo SSN ontology SIOC Data cube MARiMbA Silk Links generation AEMET hydrontology Modeling RDF generation IGN DBPEDIA CSV parser CSV parser Silk Silk NOR2O NOR2O DBPEDIA Geolinkeddata.es Geonames Geolinkeddata.es Pubby Geolinkeddata.es sitemap4rdf SPARQL map4rdf http://oa.upm.es/14465/1/2.formulaLD.pdf The multilingual Web of Data: Current state Monolingual datasets Multilingual datasets 349 635 1,906 2,201 January 2012 June 2012 676 1,984 December 2012 1. Number of Monolingual and multilingual datasets RDF literals with English tag 431,660 RDF literals without language tag 2,567,324 10,250,936 January 2012 3,154,779 RDF literals with language tag 3,365,930 10,594,338 June 2012 12,272,806 December 2012 2. Current usage of language tagging capabilities in RDF RDF literals with other language tag 403,714 557,785 2,135,664 2,751,065 2,808,145 January 2012 June 2012 December 2012 3. English tags versus other languages' tags 4. Evolution of top-10 languages 33