SatcomLLM

Future Preparation

Status

Ongoing
Status date

2025-10-24
Activity Code

1A.128

Objectives

The SatcomLLM project investigates how open-source Large Language Models can be adapted to satellite communications through the development of the satellite communications Expert Virtual Assistant (SCEVA). SCEVA serves as a demonstrator system, integrating instruction fine-tuning with retrieval-augmented generation (RAG) to provide document-grounded responses for technical and operational tasks.

The objectives are to identify key satellite communication applications where LLMs can add value, such as engineering support, mission documentation, and anomaly analysis, and to prioritise these use cases in collaboration with ESA and industry experts. The project builds a curated domain-specific corpus and tailored evaluation benchmarks to guide fine-tuning and testing.

SCEVA and its RAG-enabled variants will be developed and assessed to showcase practical interaction with satellite communications datasets and to evaluate performance, reliability, and usability for engineers, mission planners, and SMEs. Comparative evaluation with commercial LLMs will provide further insights into strengths, limitations, and opportunities for future development.

By releasing models, datasets, and software components under open licenses, SatcomLLM contributes to European digital sovereignty and supports the Advanced Rsearch TES programme in advancing innovative tools for the satellite communications sector.

Challenges

Key challenges include keeping pace with fast-evolving AI models and methods, while ensuring their effective adaptation to satellite communications domain. The project must design evaluation benchmarks specific to satcom tasks and assemble high-quality datasets validated by domain experts. Another challenge is developing SCEVA with retrieval-augmented generation to deliver accurate, document-grounded answers, while maintaining usability, reliability, and deployability for ESA and European stakeholders.

Benefits

SatcomLLM delivers a tailored Large Language Model for satellite communications, addressing limitations of existing general-purpose or proprietary AI systems. While tools like ChatGPT or Gemini offer broad capabilities, they are not adapted to satcom workflows and raise concerns over data confidentiality, licensing, and reliability in specialised domains.

The SatCom Expert Virtual Assistant (SCEVA), integrates instruction fine-tuning on curated satcom datasets with retrieval-augmented generation (RAG), enabling verifiable, document-grounded responses. This ensures higher accuracy and relevance when supporting tasks such as link budget evaluation, engineering analysis, mission documentation, or regulatory checks. Unlike proprietary systems, SCEVA can be deployed in controlled environments, whether on cloud, local servers, or edge platforms, giving users full control over data handling.

An additional advantage is openness: models, benchmarks, and software developed within SatcomLLM will be released under open licenses where possible. This strengthens European digital sovereignty, reduces dependence on non-European providers, and supports transparent validation by the wider satcom community. By combining domain adaptation, technical reliability, and open accessibility, SatcomLLM provides a specialised alternative to competitor systems, offering both practical utility for stakeholders and a foundation for future innovation within ESA’s Advanced Research Telecommunications Systems (ARTES) programme.

Features

The SatcomLLM project delivers the SatCom Expert Virtual Assistant (SCEVA), a domain-specific system built on an open-source LLM adapted to satellite communications. At its foundation is SatcomLLM, a ~70B parameter model fine-tuned with a curated satcom dataset of around 160,000 documents, openly released to benefit both the satellite communication and AI research communities.

SCEVA provides access to the model through an API and a web-based interface, supporting tasks such as question answering, technical summarisation, and knowledge exploration. Expert users can further fine-tune the system with proprietary datasets, enabling custom adaptations while retaining control over data.

To enhance factual reliability, SCEVA-RAG integrates retrieval-augmented generation, linking responses to a dedicated document store that can be populated with user-provided materials. For ESA, SCEVA-RAG-ARTES extends this capability by including a version pre-populated with ARTES 4.0 documentation, ensuring responses are grounded in authoritative programme materials.

Together, these components form a versatile tool for communication, research, and presentation of satcom advances. By combining openness, domain-specific fine-tuning, and extensibility through RAG, SatcomLLM equips European stakeholders with a robust foundation to explore and operationalise LLM technology in satellite communications.

System Architecture

Our system centres on two instruction fine-tuning tracks of Llama models, 8B and 70B.

Training happens in two steps. First, we fine-tune on hundreds of thousands of satcom related question answer pairs drawn from broad domain content to teach solid terminology and workflows. Second, we fine tune on a smaller, carefully reviewed set that focuses on reasoning and multi step problem solving, so the models learn to plan, justify, and compute.

A data pipeline handles ingestion, cleaning, deduplication, and labelling, then formats examples for training. Retrieval augmented generation is available at inference time, so answers can cite trusted documents when needed. We evaluate with automatic metrics and expert review on satellite communication tasks like link budgets, protocol choices, and mission planning.

Deployment supports API and web app access, with logging, guardrails, and continuous evaluation. The objective is clear, these models should be the strongest satcom specialists in their parameter class, 8B and 70B.

Plan

The project begins with use-case identification, gathering input from ESA, RINA, and external stakeholders to prioritise applications. Next, benchmarking evaluates state-of-the-art open-source LLMs against satcom-specific tasks. In development, the SatcomLLM model is fine-tuned on a curated dataset of 160,000 documents and integrated into SCEVA, with RAG variants prepared. Validation (T4) includes demonstrations, user testing, and comparative evaluation.

Key Milestones (MS):

MS1 - Use cases and benchmarks;
MS2 - Model adaptation and prototype;
Final Review - SCEVA and SCEVA-RAG demonstration with ESA feedback.

Current status

The project is now in the iterative training phase, where models are retrained multiple times with different hyperparameter settings. Each iteration includes improved data curation and evaluation to refine model quality. A vector database of over 100,000 satcom documents has been created to support retrieval and grounding. The frontend and backend systems are being finalised to ensure stable internal use by the ESA technical officer for testing and evaluation.