GeoSPARQL Endpoint
A spatial linked data endpoint. 21,900 datasets from data.gov.ie, enriched and queryable.
The Map That Answers Questions
Ireland’s national open data portal, data.gov.ie, contains over 21,000 datasets. Many are spatial — flood zones, transport routes, environmental monitoring stations, planning boundaries. But the portal treats them as flat catalogue entries. You can search by keyword and download files. You cannot ask: “show me every dataset whose coverage intersects County Cork.”
The GeoSPARQL Endpoint changes that. It harvests the portal’s DCAT-AP metadata, enriches it with geometries extracted from GeoJSON resources, and loads everything into a triplestore. The result: a SPARQL endpoint where you can query datasets by what they describe and where they describe it. Spatial joins, distance queries, intersection tests — the vocabulary of GIS — applied to the entire national catalogue.
It’s a spatial linked data endpoint. The same dataset that the portal returns as a search result, the endpoint returns as a graph node — connected to its publisher, its format, its spatial extent, and every other dataset that covers the same area.
A Map That Knows What’s On It
Imagine a map of Ireland where every government dataset has been pinned to the area it covers. Flood maps pinned to rivers. Transport maps pinned to roads. Environmental data pinned to monitoring stations. Now imagine you can point at any spot on the map and ask: “what data covers this place?” and it tells you — not just the datasets you know about, but all of them, from every agency, even ones you’ve never heard of. That’s what the GeoSPARQL Endpoint does with the 21,000 datasets on data.gov.ie.
From Portal to Triplestore
1. Harvest
The pipeline queries the CKAN API, fetching DCAT-AP metadata for every dataset that has GeoJSON resources. Pagination walks the full catalogue. Each dataset’s metadata — title, publisher, theme, format, access URL — is captured as RDF triples.
2. Enrich
For datasets with GeoJSON resources, the pipeline downloads the geometry and extracts bounding boxes. Irish coordinate systems (Irish Transverse Mercator, EPSG:2157; Irish Grid, EPSG:29903) are automatically detected and transformed to WGS84. Each bounding box becomes a GeoSPARQL geometry linked to the dataset.
3. Load
Enriched RDF is uploaded to Apache Jena Fuseki via the Graph Store Protocol. Metadata and spatial data load into separate named graphs. The union default graph is enabled, so queries work without explicit GRAPH clauses.
4. Query
The Fuseki SPARQL endpoint supports GeoSPARQL functions: geof:sfIntersects for spatial intersection, geof:distance for proximity, geof:buffer for area expansion. Standard SPARQL queries over DCAT-AP metadata work alongside spatial queries in the same request.
Asking the Catalogue Where
Spatial dataset discovery
A researcher studying coastal erosion needs every dataset that covers Ireland’s coastline. Instead of keyword-searching the portal, they run a GeoSPARQL query: “datasets whose extent intersects this coastal geometry.” The endpoint returns datasets from the EPA, Marine Institute, OPW, and Geological Survey — agencies the researcher didn’t know had relevant data.
Cross-domain spatial overlap
An environmental agency wants to know which flood-risk datasets overlap with water quality monitoring datasets. A single SPARQL query joins the two themes on their spatial extents and returns the intersecting pairs. What would require GIS software and manual work becomes a query.
Federated queries
A European research project combines Irish data with Wikidata. A federated SPARQL query asks: “give me all Irish datasets published by organisations whose Wikidata entry shows them as government agencies.” The endpoint handles the Irish side; Wikidata handles the organisational classification.
Under the Bonnet
Triplestore
Apache Jena Fuseki 6.0+ with GeoSPARQL 1.0 support. Deployed via Docker. SPARQL 1.1 query and update endpoints. Union default graph enabled.
Pipeline
Python: rdflib for RDF generation, pyproj + shapely for coordinate transformation and geometry processing, requests for CKAN API access. Four stages: harvest, enrich, load, demo queries.
Coordinate systems
Automatic detection and transformation of Irish Transverse Mercator (EPSG:2157) and Irish Grid (EPSG:29903) to WGS84 (EPSG:4326). Heuristic detection based on coordinate magnitude.
Data model
DCAT-AP metadata in named graph :metadata. GeoSPARQL geometries in named graph :spatial. Linked via dataset URI. Geometry encoded as WKT (Well-Known Text) literals with geo:asWKT.
GeoSPARQL functions
geof:sfIntersects, geof:sfContains, geof:sfWithin, geof:distance, geof:buffer, geof:boundary. All implemented by Jena’s spatial index.
Deployment
Docker Compose. Fuseki exposed on port 3030. Write endpoints protected by HTTP Basic authentication. Pipeline runs as standalone Python scripts or via run_pipeline.py orchestrator.