Best practices for Linked Data

Transcripción

Best practices for Linked Data
Best practices for
Linked Data
Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid
Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
[email protected]
Phone: 34.91.3367417, Fax: 34.91.3524819
Acknowledgements:
M. Poveda, V. Rodríguez-Doncel , D. Vila
BabeLData: TIN2010-17550
Linked Data: why it is important?
•  Facilitate data integration
§ 
§ 
§ 
§ 
§ 
From heterogeous sources
In different formats
Different granularity
In different languages
From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
BD
AEMET
BD VIAF
BD BNE
BD IGN
BD
Prisa
BD
DBpedia
Data Integration
BNE
Ubicado en
1605
Alcalá de Henares
El Quijote
Año de
Publicación
Same as
Autor
M. Cervantes
birthPlace
M. Cervantes
Alcalá de Henares
M. Cervantes
Year of
publication
creator
Don Quixote
1960
Alcalá de Henares
Alcalá de Henares
Translated
into
located
guía
Hebrew
Tapas Siglo
de Oro
VIAF
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Temperatura
20º
3 3
Foundations
Unique identifiers: URI
RDF(S) models
identify or name a resource
Equivalence links to other datasets
Same As
Data navigation
http://iflastandards.info/ns/fr/frbr/frbrer/C1001
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Is creator of
Person
Cer
Is a
Work
Is a
Cervantes
http://datos.bne.es/resource/XX1718747
Is creator of
Cer
El Quijote
http://datos.bne.es/resource/XX3383563
Same As
Same As
Cervantes
http://viaf.org/viaf/17220427
Cervantes
http://www.w3.org/DesignIssues/LinkedData.html
http://dbpedia.org/resource/Miguel_de_Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
4
The model (Ontology) and the data for humans
Idiom
translation
Is creator of
Year
Work
birthPlace
Person
Ontology
Place
Publication date
Located at
Has subject
Library
Catalán
translation
1960
Is creator of
El Quijote
birthPlace
Cervantes
Alcalá de Henares
Publication date
Has subject
Located in
Vida de Cervantes
Data
BNE
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
5 5
The model and the data for Machines
Language
Ontology
http://iflastandards.info/ns/fr/frbr/frbrer/C1002
translation
Is creator of
work
Año
http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Person
http://iflastandards.info/ns/fr/frbr/frbrer/C1005
Publication date
birthPlace
Has subject
http://geo.linkeddata.es/ontology/Municipio
Located in
Biblioteca
http://xmlns.com/foaf/0.1/Organization
Catalán
http://datos.bne.es/resource/XX1924295
http://geo.linkeddata.es/resource/Alcalá de Henares
translation
Don Quijote de la Mancha
1960
http://datos.bne.es/resource/XX3383563
Es autor
Cervantes Saavedra, Miguel de
birthPlace
http://datos.bne.es/resource/XX1718747
Publication date
Has subject
Located in
BNE
http://datos.bne.es/resource/bimo0002045496
Vida de Miguel de Cervantes Saavedra
http://datos.bne.es/#
Asunción Gómez-Pérez
6
W3C @ Spain – 2013 Madrid, 18th December
Data
6
Linked Data is to be processed by machines
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
The generation process
Providers
Domains
Asunción Gómez-Pérez
Sources
W3C @ Spain – 2013 Madrid, 18th December
Languages
The Linked Data Generation Process
Data
Curation
Specification
Exploitation
Modelling
Publication
Generation
Linking
9
There is no One-Size-Fits-All Formula
Lot of data in many domains …
Music
On-line activities
Publications
E-Gov
Cross-domains
Geographic
Life Sciences
I want to use Linked Open Data
§  Who generated the LD dataset?
§  When the LD dataset was created?
§  How the LD dataset was created?
§  Is the latest version of the LD dataset?
§  Is the license information clearly stated in the LD dataset?
§  How is LD licenses offered?
§  Is the LD dataset monolingual or multilingual?
LOD observations
•  How the LD
generation process
influence the use of
the data by third
parties?
• 
• 
• 
• 
Vocabularies
Licenses
Language
Provenance
How to prevent GIGO
GARBAGE
PROCESS
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Vocabularies
14
th
Cervantes at the data level
URI
URI
URI
URI
URI
http://www.server1.org/resource/Cervantes
Cervantes
Same as
http://d-nb.info/gnd/11851993X
Same as
http://datos.bne.es/resource/XX1718747
Author
Same as
Phone
http://www.server2.es/resource/Cervantes
D. Quijote
Date of Birth
914 296 093
#People
Size
1547
Same as
1547
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
276,4 km²
Cervantes and a bit of semantics
rdf:type
Retaurant
URI
URI
URI
URI
URI
http://www.server1.org/resource/Cervantes
rdf:type
http://d-nb.info/gnd/11851993X
Person
rdf:type
Same as
http://datos.bne.es/resource/XX1718747
rdf:type
Street
http://www.server2.es/resource/Cervantes
Author
D. Quijote
Date of Birth
1547
rdf:type
Municipality
http://geo.linkeddata.es/page/resource/Municipio/Cervantes
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Cervantes
(Person)
Cervantes foaf
foaf:Agent
foaf:Group
foaf:Document
foaf:Organization
foaf:Person
foaf:mbox
foaf:publications
foaf:Image
- foaf:firstName
- foaf:surname
foaf:img
- foaf:birthday
owl:Thing
foaf:knows
foaf:depiction
foaf:homepage
“Miguel”
instanceOf
instanceOf
foaf:firstName
“de Cervantes Saavedra”
foaf:surname
bibliothek:Cervantes
foaf:birthday
“29-09”
instanceOf
http://www.BibliothekBerlin/…/images/Quixote.tif
foaf:img
foaf:publications
foaf:depiction
http://.../authors/cervantes.png
http://www.BibliothekBerlin.com/.../3-538-06892-5
17
instanceOf
License
Information
18
LOD observations: Licenses
How Open
is the Open Linked Data Cloud?
An example: the British National Bibliography
License Information is not up to date
Metadata information without license information
License information provided as XML
Linked Data Rights pattern
http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/
Lenguage
25
Rationale: LOD is dominated by the English Language
§  2007
§  2009
§  2013
Questions:
1.  Searching resources in a particular language
2.  Distribution of natural languages across RDF
datasets?
3.  Usage of language tags to indicate the natural
language of RDF tags?
1.  Distribution of usage of language tags
2.  Distribution of literals tagged as English vs other languages
3.  Distribution of literals tagged in languages other than
English
26
Example of multilingual library resource
The dataset publisher does not tag the language of the content of different fields
“Ernest Hemingway” and “El viejo y el mar” MARC 21 records
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
27
Multilingualism and the Linked Data Process
How to represent language information for datasets?
• 
# VoiD description
:bne a void:Dataset;
dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> .
# DCAT description
:bne a dcat:Dataset;
dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es>
How to represent language information in Linked Data?
§ 
Traditional annotation properties for most cases
dbpedia:Miguel_de_Cervantes
rdfs:label "Miguel de Cervantes"@es .
"ミゲル・デ・セルバンテス"@ja .
"미겔 데 세르반테스"@ko .
§ 
Richer models for more demanding applications
# LEMON
isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic].
:cartographic a lemon:LexicalEntry;
lemon:form [lemon:writtenRep “cartográfico”@es;
isocat:grammaticalGender isocat:masculine];
lemon:form [lemon:writtenRep “cartográfica”@es;
isocat:grammaticalGender isocat:feminine].
isocat:grammaticalGender rdfs:subPropertyOf lemon:property.
Asunción Gómez-Pérez
W3C @ Spain – 2013 Madrid, 18th December
Implementation of the recording of data
and metadata provenance
Generation process
• PROV-O @W3C
creator Resource provenance
•  DC
File.txt creaDonDate rights John 12-­‐2-­‐1900 GPL used Revision Process generatedBy PROVENANCE Model (RDF(S)) Filev1. txt RDF Store 29
1
Conclusions
The use of
§ 
§ 
§ 
§ 
§ 
Data curated
Use vocabularies widely known
License metadata in RDF
Language metadata in RDF
Provenance metadata in RDF
§  Will influence the use of the linked data by third parties
Asuncion Gomez-Perez
W3C @ Spain – 2013 Madrid, 18th December
Thanks for your attention !
Asuncion Gomez-Perez
Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June
31
There is no One-Size-Fits-All Formula
Phase
BNE
DC
Wgs84
time
geometry2rdf
NOR2O
DNB
VIAF
LIBRIS
DBPEDIA
Publication
Exploitation
PRISA
INE
Scovo
SSN ontology
SIOC
Data cube
MARiMbA
Silk
Links generation
AEMET
hydrontology
Modeling
RDF generation
IGN
DBPEDIA
CSV parser
CSV parser
Silk
Silk
NOR2O
NOR2O
DBPEDIA
Geolinkeddata.es
Geonames
Geolinkeddata.es
Pubby
Geolinkeddata.es
sitemap4rdf
SPARQL
map4rdf
http://oa.upm.es/14465/1/2.formulaLD.pdf
The multilingual Web of Data: Current state
Monolingual
datasets
Multilingual
datasets
349
635
1,906
2,201
January 2012
June 2012
676
1,984
December 2012
1. Number of Monolingual and multilingual datasets
RDF literals with
English tag
431,660
RDF literals without
language tag
2,567,324
10,250,936
January 2012
3,154,779
RDF literals with
language tag
3,365,930
10,594,338
June 2012
12,272,806
December 2012
2. Current usage of language tagging capabilities in RDF
RDF literals with
other language tag
403,714
557,785
2,135,664
2,751,065
2,808,145
January 2012
June 2012
December 2012
3. English tags versus other languages' tags
4. Evolution of top-10 languages
33

Documentos relacionados