Institut für Intelligente Systeme (IIS)
Refine
Document Type
- Article (9)
- Conference Proceeding (9)
- Master's Thesis (2)
- Book (1)
- Doctoral Thesis (1)
Keywords
- Künstliche Intelligenz (2)
- Artificial Intelligence (1)
- Autonomes Fahrzeug (1)
- Biomedical Engineering (1)
- Character-based language models (1)
- Computer Science Applications (1)
- Computer Vision and Pattern Recognition (1)
- Control and Optimization (1)
- Control and Systems Engineering (1)
- Großes Sprachmodell (1)
Institute
The trajectory prediction task in autonomous driving (AD) involves forecasting the future movements
and behaviors of road users, or agents, based on their current and past states, as well as
their surrounding environment. Recent advancements in Large Language Models (LLMs) have
demonstrated impressive capabilities in generalizing across various tasks, presenting a potential
opportunity to utilize them in AD. This thesis explores the feasibility and effectiveness of utilizing
LLMs for trajectory prediction, presenting and evaluating multiple approaches. One method
employs a text-only strategy, where all relevant information, including the states of agents and the
environment, is encoded as text input for the LLM. Another approach transforms the LLM into a
Vision Language Model (VLM) that uses images which encode the environment, particularly lanes.
The evaluation is conducted using the nuScenes dataset and associated metrics. Results
reveal that the text-only LLM surpasses the CoverNet baseline model by up to 2% in
multimodal prediction. However, effectively encoding complex environmental factors, such as lanes,
remains a challenge. To address this, we experiment with a custom VLM. Despite our efforts, this
does not lead to improved performance, indicating that further refinement is required. Additionally,
we explore an alternative strategy where the LLM is tasked with ranking or selecting the most
plausible trajectories from a set of predictions generated by existing state-of-the-art trajectory
prediction models. However, this approach also does not yield an improvement in performance and
requires further investigation to be successful.
This master’s thesis tackles the challenging task of indoor localization using high-resolution
synthetic aperture Radio Detection And Ranging (radar) images. In the course of this thesis,
different approaches to localization revolving around the detector-descriptor architecture are
evaluated.
The proposed pipeline acts as a flexible backbone for the implementation of different detection,
description and matching algorithms. By leveraging typical characteristics of radar images
through dedicated, radar-specific keypoint detectors, the algorithm is able to detect reliable
keypoints for tracking. Through the use of a state of the art standalone machine learning
descriptor model named Detect Don’t Describe - Describe Don’t Detect (DeDoDe), the pipeline
is able to match keypoints across different radar images reliably. Since DeDoDe can work on
any kind of generated images, the pipeline is not limited to radar images and can be used with
other types of inputs as well.
The pipeline parameters are adapted and evaluated on a custom dataset of Synthetic Aperture
Radar (SAR) images generated by four synchronized radar sensors mounted on an Unmanned
Ground Vehicle (UGV) in an indoor environment. With a combination of the Constant False
Alarm Rate (CFAR) algorithm and the descriptor named DeDoDe, this thesis proposes a
radar-visual hybrid approach for localization based on radar images.
On the custom dataset, the pipeline acts like a black box with just a Random Sample Consensus
(RANSAC) threshold as the sole tunable parameter. The architecture for this pipeline is
created with the intent of providing a solid base for loop closure detection in radar Simultaneous
Localization And Mapping (SLAM) systems.
Since the descriptors of the DeDoDe model already offer great performance without any need
for retraining or fine tuning, the pipeline is able to perform well out of the box. For estimation
of the relative error between two correlated SAR images, a median Euclidean translational
error of less than 1 cm was achieved over all scenarios.
StixelNExT
(2024)
Kalman-Filter
(2024)