MQA
Score your data catalogue against European metadata quality standards. Know what to fix first.
From League Table to Action Plan
You manage a national open data portal. Four thousand datasets, contributed by dozens of agencies, maintained by people who left years ago. The European Union harvests your catalogue every week, scores it against their Metadata Quality Assessment methodology, and publishes the results. Your country’s score: Sufficient. Which is a polite way of saying mediocre. Your minister has seen the league table. Denmark is rated Excellent. Germany is Good. You are Sufficient, sandwiched between countries you’d rather not be compared to. Something needs to improve.
But improve what?
The EU’s MQA methodology is a 40-page PDF. It scores metadata across five dimensions: Findability, Accessibility, Interoperability, Reusability, Contextuality. Each dimension has multiple indicators. Each indicator has specific rules about which properties must be present, which controlled vocabularies must be used, whether URLs actually resolve. Multiply that by four thousand datasets, each with multiple distributions, and you’re looking at tens of thousands of individual checks. You could read the methodology, build a spreadsheet, audit manually. It would take months. The next harvesting cycle is in six weeks.
MQA takes your catalogue and does the audit for you. Every dataset, every distribution, every check. In seconds. It tells you your overall score, breaks it down by dimension, and shows you exactly which checks are failing on which datasets. More importantly, it prioritises. Not “your metadata needs improvement” but “these 847 datasets are missing keywords, and fixing that alone is worth 30 points — enough to move you from Sufficient to Good.”
It’s the difference between knowing you have a problem and knowing the solution.
The gap analysis shows what’s missing. The recommendations tell you what to fix first, ranked by impact across the whole catalogue. The country comparison shows where you stand and what’s achievable. The export gives you a report for your minister and a CSV for your technical team.
Six weeks. Four thousand datasets. One URL. Now you have a plan.
Labels on Tins
Think of metadata as the label on a tin of food. The contents might be perfectly good — nutritious, delicious, exactly what someone needs — but without a proper label, nobody knows what’s inside. No ingredients list, no allergy warnings, no use-by date. Would you eat it? Would you feed it to someone else?
Data works the same way. You might have the best air quality measurements in Europe, but if your dataset doesn’t say what it contains, when it was updated, how to access it, or who to contact with questions, nobody will find it and nobody will trust it. It just sits there, unlabelled, unused.
The European Union has standards for data labels — a set of rules called DCAT-AP that says what information every dataset should have. And they score every data portal in Europe against those rules, publicly, on a league table. That score is your MQA score: Metadata Quality Assessment.
MQA (the tool) checks your catalogue against those rules and tells you exactly what’s missing. Not vague advice like “improve your metadata” — specific findings like “412 datasets don’t have keywords” or “your distributions are missing file format declarations.” It shows you the gaps, ranks them by importance, and tells you what to fix first.
It’s a quality audit for your data labels. Run it before the EU does.
From URL to Action Plan
1. Enter your catalogue
Paste your catalogue URL into the input field. MQA accepts any DCAT endpoint: a CKAN catalogue, a DCAT-AP RDF feed, a static Turtle file. You can also paste catalogue content directly or drag and drop a file. The tool auto-detects the format.
2. Score your metadata
MQA parses every dataset and distribution in your catalogue and scores them against the EU’s five-dimension framework: Findability, Accessibility, Interoperability, Reusability, Contextuality. This scoring is deterministic — it examines only the metadata you provided, no network requests required. You get up to 325 points from metadata alone. Results appear in seconds, even for large catalogues.
3. Check URL reachability (optional)
Click “Check URLs” to verify that your access and download URLs actually respond. MQA sends HTTP requests to each URL and records whether it returns a successful status code. Reachable URLs add up to 80 additional points. This step takes longer — a few seconds per URL — but reveals broken links that hurt your score.
4. Review the results
The dashboard shows your overall score and rating (Excellent, Good, Sufficient, or Bad). Dimension cards break down performance across the five areas. The dataset table lists every dataset with its individual score; expand any row to see which specific checks failed. The recommendations panel ranks fixes by catalogue-wide impact — what to tackle first for maximum score improvement. The country comparison shows how you stack up against other EU portals.
5. Export your report
Download results as PDF for stakeholder presentations, CSV for spreadsheet analysis, or JSON for pipeline integration. Each format includes the full breakdown: scores, gaps, recommendations, and per-dataset details.
Real Problems, Measured Solutions
National portal improvement
A country’s open data office is rated “Sufficient” on data.europa.eu — politically embarrassing when neighbours score higher. The team runs MQA against their 5,000-dataset catalogue. The gap analysis reveals that 2,100 datasets are missing dcat:keyword and 1,800 lack dct:spatial declarations. The recommendations rank these by impact: adding keywords alone is worth 30 points catalogue-wide. The team prioritises, implements a bulk metadata update through their CKAN API, and re-runs MQA to verify. Six weeks later, the EU harvests again. Rating: Good. The minister is pleased.
Pre-publication quality gate
A regional government is publishing environmental monitoring data for the first time. Before the datasets go live, the data steward exports the draft catalogue as Turtle and runs it through MQA. The report flags missing publisher information, absent licence declarations, and distributions without format specifications. She fixes each issue in the source system, re-exports, re-scores — zero violations. The datasets publish clean, avoiding the embarrassment of appearing on a remediation list later and the extra work of fixing metadata after it’s already been harvested.
Procurement evaluation
A national statistics office is evaluating vendors for a new open data platform. Three vendors submit proposals, each claiming DCAT-AP compliance. Rather than relying on promises, the evaluation team asks each vendor to provide a demo catalogue URL. They run MQA against all three. Vendor A scores 280 (Good). Vendor B scores 185 (Sufficient). Vendor C scores 95 (Bad) — their demo catalogue is missing mandatory properties across most datasets. The numbers cut through the sales decks. Vendor A wins the contract. The scoring methodology is documented in the procurement record.
Continuous monitoring
An open data team sets up a monthly MQA check as part of their operations. They score the catalogue on the first Monday of each month, archive the JSON report, and track the trend. When a major data provider starts submitting datasets with missing contact points, the next month’s report shows a score drop and identifies exactly which datasets changed. The team catches the regression, contacts the provider, and restores quality before it affects their public rating. Metadata quality becomes a managed metric, not a surprise at audit time.
Under the Bonnet
Scoring methodology
Aligned with the data.europa.eu MQA methodology. Five dimensions, 405 points maximum: Findability (100), Accessibility (100), Interoperability (110), Reusability (75), Contextuality (20). Rating thresholds: Excellent (351+), Good (221–350), Sufficient (121–220), Bad (0–120).
Input formats
Any DCAT catalogue: Turtle (.ttl), RDF/XML (.rdf), JSON-LD (.jsonld), N-Triples (.nt). Accepts URLs (fetched server-side), pasted content, or file uploads. CKAN catalogues work via their /catalog.ttl endpoint.
Scoring modes
Deterministic (325 points): Metadata-only analysis. Checks property presence, controlled vocabulary usage, format declarations. No network requests. Fast.
Reachability (80 points): HTTP verification of dcat:accessURL and dcat:downloadURL. 5 concurrent workers, 10-second timeout per URL. SSRF protection blocks private IP ranges.
SHACL validation
Detects DCAT-AP 3.0 and DCAT-AP-SE 3.0.1 profiles. Validates against bundled shapes via pyshacl. 30 interoperability points for full conformance.
API endpoints
POST /api/mqa/score/ — score catalogue content (JSON body: {content, format}).POST /api/mqa/reachability/ — check URL list (JSON body: {urls}).GET /api/mqa/comparison/ — EU country benchmark data.
Export formats
PDF (HTML-rendered), CSV (tabular), JSON (structured). All include full dimension breakdown, per-dataset scores, gaps, and recommendations.
Architecture
Django backend, stateless computation. No database required for scoring — catalogue parsed in memory. Self-hostable via Docker or WSGI. Open source, licence TBC.