Preview derilinx-labs project

MQA

Score your data catalogue against European metadata quality standards. Know what to fix first.

Django Data Quality DCAT-AP

Real Problems, Measured Solutions

National portal improvement

A country’s open data office is rated “Sufficient” on data.europa.eu — politically embarrassing when neighbours score higher. The team runs MQA against their 5,000-dataset catalogue. The gap analysis reveals that 2,100 datasets are missing dcat:keyword and 1,800 lack dct:spatial declarations. The recommendations rank these by impact: adding keywords alone is worth 30 points catalogue-wide. The team prioritises, implements a bulk metadata update through their CKAN API, and re-runs MQA to verify. Six weeks later, the EU harvests again. Rating: Good. The minister is pleased.

Pre-publication quality gate

A regional government is publishing environmental monitoring data for the first time. Before the datasets go live, the data steward exports the draft catalogue as Turtle and runs it through MQA. The report flags missing publisher information, absent licence declarations, and distributions without format specifications. She fixes each issue in the source system, re-exports, re-scores — zero violations. The datasets publish clean, avoiding the embarrassment of appearing on a remediation list later and the extra work of fixing metadata after it’s already been harvested.

Procurement evaluation

A national statistics office is evaluating vendors for a new open data platform. Three vendors submit proposals, each claiming DCAT-AP compliance. Rather than relying on promises, the evaluation team asks each vendor to provide a demo catalogue URL. They run MQA against all three. Vendor A scores 280 (Good). Vendor B scores 185 (Sufficient). Vendor C scores 95 (Bad) — their demo catalogue is missing mandatory properties across most datasets. The numbers cut through the sales decks. Vendor A wins the contract. The scoring methodology is documented in the procurement record.

Continuous monitoring

An open data team sets up a monthly MQA check as part of their operations. They score the catalogue on the first Monday of each month, archive the JSON report, and track the trend. When a major data provider starts submitting datasets with missing contact points, the next month’s report shows a score drop and identifies exactly which datasets changed. The team catches the regression, contacts the provider, and restores quality before it affects their public rating. Metadata quality becomes a managed metric, not a surprise at audit time.

Under the Bonnet

Scoring methodology

Aligned with the data.europa.eu MQA methodology. Five dimensions, 405 points maximum: Findability (100), Accessibility (100), Interoperability (110), Reusability (75), Contextuality (20). Rating thresholds: Excellent (351+), Good (221–350), Sufficient (121–220), Bad (0–120).

Input formats

Any DCAT catalogue: Turtle (.ttl), RDF/XML (.rdf), JSON-LD (.jsonld), N-Triples (.nt). Accepts URLs (fetched server-side), pasted content, or file uploads. CKAN catalogues work via their /catalog.ttl endpoint.

Scoring modes

Deterministic (325 points): Metadata-only analysis. Checks property presence, controlled vocabulary usage, format declarations. No network requests. Fast.

Reachability (80 points): HTTP verification of dcat:accessURL and dcat:downloadURL. 5 concurrent workers, 10-second timeout per URL. SSRF protection blocks private IP ranges.

SHACL validation

Detects DCAT-AP 3.0 and DCAT-AP-SE 3.0.1 profiles. Validates against bundled shapes via pyshacl. 30 interoperability points for full conformance.

API endpoints

POST /api/mqa/score/ — score catalogue content (JSON body: {content, format}).
POST /api/mqa/reachability/ — check URL list (JSON body: {urls}).
GET /api/mqa/comparison/ — EU country benchmark data.

Export formats

PDF (HTML-rendered), CSV (tabular), JSON (structured). All include full dimension breakdown, per-dataset scores, gaps, and recommendations.

Architecture

Django backend, stateless computation. No database required for scoring — catalogue parsed in memory. Self-hostable via Docker or WSGI. Open source, licence TBC.

MQA

From League Table to Action Plan

Labels on Tins

From URL to Action Plan

1. Enter your catalogue

2. Score your metadata

3. Check URL reachability (optional)

4. Review the results

5. Export your report