Skip to main content
Software Database
  1. SovereignSky Projects/

Software Database

Active Started December 2024
Table of Contents

Find replacements for foreign-controlled tools. European and open-source alternatives mapped to common proprietary tools.

What It Does
#

The Software Database aggregates software from multiple government-vetted and expert-curated catalogs into a unified, searchable catalog. Each product is deduplicated, categorized across four taxonomies, and enriched with vendor information, alternatives mappings, and sovereignty assessments.

Data Sources
#

We prioritize government-vetted catalogs for quality and legal compliance:

SourceProductsPublisherTrust Level
SILL France626DINUM - French GovernmentGovernment-vetted
CNCF Cloud Native Landscape1,358Cloud Native Computing FoundationFoundation-backed
Euro Stack1,103Abilian SAS / EuroStack Initiative (FR)Expert-curated
OpenAlternative649OpenAlternativeCommunity-curated
Cloud Service Map582Google CloudExpert-curated
switching.software130Datenpunks (DE)Expert-curated

Planned Sources
#

SourcePublisherTrust Level
openCode / ZenDiSGerman Government (ZenDiS)Government-vetted
Developers ItaliaItalian GovernmentGovernment-vetted
EU OSS CatalogueEuropean CommissionGovernment-vetted

The Challenges
#

Deduplication: One Product, Many Identities
#

The same software appears across catalogs with different identifiers. For example, Ansible appears as ansible, red-hat-ansible, and ansible-automation-platform depending on the source. We maintain a canonical registry with 942 product mappings that resolve duplicates across the 4,400+ raw entries from all sources.

Category Harmonization
#

Each catalog uses its own category system - CNCF uses technical hierarchies, SILL France uses publiccode.yml categories, Euro-Stack uses business focus. We map all source categories to four standardized taxonomies:

  • Technical (43 categories): Infrastructure capabilities
  • Developer (26 categories): Development tools and frameworks
  • Business (28 categories): Use cases and departments
  • Platforms (8 categories): Operating systems and clouds

This requires 10,415+ lines of category mappings.

Data Enrichment
#

Raw catalog data lacks context. We enrich products using Wikidata for vendor information (headquarters, founding date) and web search to validate descriptions and find missing homepages.

Goal: Fully Automated Pipeline
#

The target is a weekly automated pipeline that scrapes all sources, auto-suggests canonical ID mappings using fuzzy matching, and publishes the updated catalog - flagging conflicts for human review without blocking the pipeline.