The Genshin Impact Dataset (GID) for SLAM

The Genshin impact dataset (GID) is collected in the Genshin Impact game^[1] for visual SLAM. It currently consists of 60 individual sequences (over 3 hours in total) and covers a wide range of scenes that are rare, hard, or dangerous for field collection in real world (such as dull deserts, dim caves, and lush jungles). It provides great opportunities for SLAM evaluation and benchmark. Moreover, it includes a large number of visual challenges (such as low illumination and low texture scenes) to test the robustness of various SLAM algorithms. It is part of our work How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception.

Citation

If you use some resource from this dataset, please cite the paper as:

BibTeX

@article{Zhao2024CEMS,
  title={How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception},
  author={Xuhui Zhao and Zhi Gao and Hao Li and Hong Ji and Hong Yang and Chenyang Li and Hao Fang and Ben M. Chen},
  journal={Journal of Intelligent \& Robotic Systems},
  year={2024},
  volume={110},
  number=42,
  pages={1--19},
  doi={https://doi.org/10.1007/s10846-024-02077-4}
}

APA

Zhao, X., Gao, Z., Li, H., Ji, H., Yang, H., Li, C., Fang, H., & M. Chen, B. (2024). How Challenging is a Challenge? CEMS: a Challenge Evaluation Module for SLAM Visual Perception. Journal of Intelligent & Robotic Systems.

1. Dataset Organization

The dataset is generally composed of two parts: sequences (blue part) and support files (orange part), as the following figure shows.

In the sequences part, each sequence contains several files for the convenience of usage. We take the Seq-001 as an example and elaborate next.

Seq-001.mp4 The recorded video from the Genshin Impact game which can be further processed according to different needs. It has a 1436 (width) × 996 (height) resolution at 30 FPS.
Seq-001.png The content preview of the recorded video for a fast grasp without playing it. It summarizes the resolution (width × height), duration (sec), FPS, and frames in total.
Frames-Sparse It is a folder storing split frames from the recorded video. For the convenience of end users, we split the whole video in advance with a frame interval of 10 (extract 1 frame every 10 frames).
Groundtruth-EuRoC.txt For the convenience of users, we provide groundtruth poses of split frames in both EuRoC and TUM format. This file records poses in EuRoC^[2] format:

timestamp[ns], pos_x[m], pos_y[m], pos_z[m], quat_w, quat_x, quat_y, quat_z
Groundtruth-TUM.txt This file records poses in TUM^[3] format:

timestamp[s] pos_x[m] pos_y[m] pos_z[m] quat_x quat_y quat_z quat_w
Timestamps.txt This file stores the corresponding timestamps of split frames in the Frames-Sprase folder. The time unit is nanosecond (10^-9 second).

In the support files part, it contains camera intrinsics and tool scripts.

Intrinsics.yaml This file records the focal length (fx and fy) and principle point (cx and cy) for the pinhole camera model we use. It is organized in standard yaml format, which is easy for data input and output.
tool-splitVideo.py This Python script is used for splitting the original video into separate frames according to user settings. The only launch parameter for this script is the path of the video you want to process. As for the other parameters, users can set them in an interactive manner. All interactive parameters are summarized below:
- Clipping start time: start timestamp of clipping, unit: second, default: 0s
- Clipping end time: end timestamp of clipping, unit: second, default: the end of the whole video
- Sampling interval N: sample one frame every N frame, default: output every frame
- Scale for output frame: scale factor for output frame images, default: 1 for the original size
- Type for output frame: file type for output frame images, default: .jpg
- Name format for frame: name format for output frame images, select from Timestamp format (12 digits to represent timestamp in nanoseconds) and Frame index format (4 digits to represent frame index in the original video). Default: Timestamp format.
tool-resizeFrames.py This Python script is used for the resizing of existing frame images. It requires three launch parameters:
- Search folder: the folder path of frames need to be processed
- Image type: the type of images in the folder
- Scale: the scale for resizing

2. Dataset Coverage

2.1 Collection Distribution

We collect sequences at different places in the Genshin Impact game to cover a wide range of scenes as much as possible. Generally, each country in the game (Mondstadt, Liyue, Inazuma, and Sumeru) has 15 sequences to reflect its unique features. More specifically, the sequences are distributed as follows:

Sequence 1-15 are collected in Mondstadt
Sequence 16-30 are collected in Liyue
Sequence 31-45 are collected in Inazuma
Sequence 46-60 are collected in Sumeru

The following figure shows the distribution of sequences in different regions. You may click the figure and zoom in to see the details since the world map is very large.

2.2 Sequence Diversity

Benifiting from the large and diverse game world, the sequences in GID also have a great diversity, which we summarize in the following aspects.

Scene The dataset involves a wide range of scenes, including deserts, caves, jungles, and so on. The following figure shows some type of scenes. For example, the user can test the robustness toward low light conditions of their SLAM in dim cave scenes.

Time The sequences in GID generally cover a whole day, from morning to afternoon and night. This potentially enables experiments for SLAM in changing illumination conditions. The following figure shows the coverage of a whole day.

Weather The dataset includes various weather conditions, such as clear, cloudy, and raining scenes. The following figure shows some examples of different weather conditions.

Visual Challenges for SLAM The dataset contains various visual challenges for SLAM algorithms, such as low-light, low-texture. Sequences of these challenges may boost the development and benchmark of visual SLAM in challenging environments. The following figure shows some representitative challenges in the dataset.

Duration The sequences cover a wide range of durations, from 59 seconds (Seq-042) to 333 seconds (Seq-049 & Seq-058), which provides the possibility to test the scability of SLAM. The following figure shows the distribution of sequences in different durations.

3. Downloads

We upload all 60 sequences and provide two ways to download the dataset: Google Drive and Baidu Netdisk. You can click Google Drive or Baidu Netdisk for downloading the whole dataset (about 22 GB totally) according to your network environment. Or you can download individual sequences by clicking corresponding links in the following table.

Seq. No	Region	Duration (sec)	Google Drive	Baidu Netdisk
Seq-001	Mondstadt	102	Link	Link
Seq-002	Mondstadt	280	Link	Link
Seq-003	Mondstadt	170	Link	Link
Seq-004	Mondstadt	120	Link	Link
Seq-005	Mondstadt	177	Link	Link
Seq-006	Mondstadt	142	Link	Link
Seq-007	Mondstadt	140	Link	Link
Seq-008	Mondstadt	130	Link	Link
Seq-009	Mondstadt	129	Link	Link
Seq-010	Mondstadt	182	Link	Link
Seq-011	Mondstadt	209	Link	Link
Seq-012	Mondstadt	231	Link	Link
Seq-013	Mondstadt	123	Link	Link
Seq-014	Mondstadt	150	Link	Link
Seq-015	Mondstadt	293	Link	Link
Seq-016	Liyue	294	Link	Link
Seq-017	Liyue	191	Link	Link
Seq-018	Liyue	288	Link	Link
Seq-019	Liyue	175	Link	Link
Seq-020	Liyue	177	Link	Link
Seq-021	Liyue	322	Link	Link
Seq-022	Liyue	238	Link	Link
Seq-023	Liyue	158	Link	Link
Seq-024	Liyue	163	Link	Link
Seq-025	Liyue	241	Link	Link
Seq-026	Liyue	326	Link	Link
Seq-027	Liyue	257	Link	Link
Seq-028	Liyue	104	Link	Link
Seq-029	Liyue	286	Link	Link
Seq-030	Liyue	269	Link	Link
Seq-031	Inazuma	172	Link	Link
Seq-032	Inazuma	110	Link	Link
Seq-033	Inazuma	249	Link	Link
Seq-034	Inazuma	77	Link	Link
Seq-035	Inazuma	268	Link	Link
Seq-036	Inazuma	235	Link	Link
Seq-037	Inazuma	152	Link	Link
Seq-038	Inazuma	252	Link	Link
Seq-039	Inazuma	231	Link	Link
Seq-040	Inazuma	98	Link	Link
Seq-041	Inazuma	129	Link	Link
Seq-042	Inazuma	59	Link	Link
Seq-043	Inazuma	133	Link	Link
Seq-044	Inazuma	155	Link	Link
Seq-045	Inazuma	64	Link	Link
Seq-046	Sumeru	72	Link	Link
Seq-047	Sumeru	191	Link	Link
Seq-048	Sumeru	208	Link	Link
Seq-049	Sumeru	333	Link	Link
Seq-050	Sumeru	219	Link	Link
Seq-051	Sumeru	146	Link	Link
Seq-052	Sumeru	237	Link	Link
Seq-053	Sumeru	147	Link	Link
Seq-054	Sumeru	213	Link	Link
Seq-055	Sumeru	79	Link	Link
Seq-056	Sumeru	186	Link	Link
Seq-057	Sumeru	150	Link	Link
Seq-058	Sumeru	333	Link	Link
Seq-059	Sumeru	200	Link	Link
Seq-060	Sumeru	190	Link	Link

4. Technical Details

4.1 Data Collection & Pre-processing

All the sequences are collected with fixed and consistent camera settings. The computer used for data collection is equipped with an Intel Core i9-9900K CPU, 64GB RAM, and an NVIDIA Titan RTX GPU. We first record videos from the Genshin Impact game, where the videos are saved in .mkv format. The original resolution of the recorded video is 1920 (width) × 1200 (height) @ 30FPS, as the following figure shows.

Then, we write Python scripts to split the recorded videos into frames and save them in .jpg format, where we sample 1 frame every 10 frames. Moreover, we simultaneously crop the frame images to 1436 × 996 to remove unrelated parts in the original videos. The following figure shows the cropped and outputted frames of the Seq-046 sequence.

4.2 Groundtruth Estimation & Reconstruction

To obtain the precise poses of the camera, we use the ColMap software^[4] for groundtruth estimation and 3D reconstruction. We input all the frames in a sequence to ColMap and obtain the camera poses and 3D points. We use "automatic reconstruction" mode with the following parameters:

Data type: Video frames
Quality: Medium
Shared intrinsics: Yes
Sparse model: Yes
Dense model: Yes

For other parameters, we let ColMap manage them as default. The following figure shows the estimated camera poses and point cloud of the Seq-046 sequence in ColMap.

We can also visualize reconstructed 3D meshes with MeshLab^[5] software, as the following figure shows.

4.3 Post-processing

After reconstruction, we export the estimated poses and trajectory in ColMap to a images.txt file, which contains the estimated camera poses. We then write Python scripts to convert the images.txt file to the aforementioned standard TUM and EuRoC formats. Moreover, we export the estimated camera intrinsics in ColMap to a cameras.txt file.

5. SLAM Demos with GID

The following figure briefly demonstrates the performance of ORB-SLAM2^[6] (monocular) on our dataset, which is a classic and sophisticated visual SLAM. For the best understanding, you may click here to download and view the whole video of testing (50s).

Generally, the ORB-SLAM2 performs well in various scenes, even in some challenging scenes, demonstrating the feasibility of our dataset for running SLAM algorithms. For example, we compare the estimated trajectory for Seq-060 and the groundtruth poses with EVO tool^[7], as the following figure shows.

After scale and trajectory alignment, it can be seen that the estimated poses are generally consistent with groundtruth. On the one side, this demonstrates the feasibility of our dataset; on the other side, this shows the high accuracy of groundtruth estimated by ColMap.

6. FAQs

Q1: What are the features and advantages of the proposed dataset?

Answer:

Compared with field-collected sequences, our dataset contains more diverse scenes for SLAM to test. Moreover, many scenes in the dataset may be difficult or dangerous to collect in real world, such as the desert, the caves, and snow mountains.
Compared with sequences collected in simulation environments, the proposed dataset has the following advantages.
- The scenes are exquisite and beautiful in the Genshin Impact game. Generally, few simulation platforms (such as Gazebo^[8], XTDrone^[9]) provides such simulated quality. Some sophisticated simulation platforms (such as AirSim^[10], Nvidia Omniverse^[11]) may provide high quality, but they are usually difficult to get involved and design your own world.
- It is time-consuming and laborious to build a high-quality scene in simulation software from scratch, especially for large scenes. However, we can directly use the built scenes and collect sequences in the game, which is more efficient.
- Existing simulation platforms are difficult to simulate photorealistic visual challenges we wanted for SLAM tests. For example, XTDrone typically cannot simulate different weather conditions. However, we can easily recored sequences containing photorealistic weather changes in the game, such as sunny, rainy, snowy, and foggy.

Q2: How the groundtruth poses are estimated? What about the accuracy? How you guarantee its reliability?

Answer:

As we mentioned before, we use the ColMap software for the groundtruth pose estimation, which is a popular and sophisticated software for 3D reconstruction. We use the "automatic reconstruction" mode with medium quality to obtain the groundtruth poses. The estimated poses are generally accurate.
Since we do not have the real poses of camera, we evaluate the accuracy of estimated groundtruth with reprojection error, which is automatically calculated in ColMap software. The reprojection error indicates the average distance between the reprojected 3D points and the corresponding 2D points in the image. The following figure shows the corresponding reprojection error of each sequence in the dataset. Generally, the overall of all sequences is 0.88 (less than 1 pixel), which is very small.

We cannot obtain the real groundtruth poses, so we focus more on the consistency of estimated trajectory and reconstructed 3D points. We think that if the consistency is high, then the estimated trajectory is accurate. Of course, this is not absolute, and the estimated groundtruth may also have errors. We will continue to explore and adopt more accurate methods to estimate the groundtruth.
Moreover, it should be noticed that the scale of the estimated trajectory is not absolute due to the scale ambiguity, and the groundtruth trajectory does not have absolute scale information. Therefore, remember to perform scale alignment before evaluating estimated trajectories from your SLAM. The scale of different sequences is not comparable.

Q3: How can I use the dataset to evaluate my SLAM?

Step1: Download the sequences and useful tools you need in the dataset with provided links.
Step2: (optional) Resample the downloaded video with provided Python script according to your needs.
Step3: Run your interested visual odometry or SLAM algorithm and save the estimated trajectory to a file.
Step4: Evaluate the performance of your algorithm with the provided groundtruth poses with various tools, such as the EVO.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Genshin Impact Dataset (GID) for SLAM

Citation

1. Dataset Organization

2. Dataset Coverage

2.1 Collection Distribution

2.2 Sequence Diversity

3. Downloads

4. Technical Details

4.1 Data Collection & Pre-processing

4.2 Groundtruth Estimation & Reconstruction

4.3 Post-processing

5. SLAM Demos with GID

6. FAQs

Q1: What are the features and advantages of the proposed dataset?

Q2: How the groundtruth poses are estimated? What about the accuracy? How you guarantee its reliability?

Q3: How can I use the dataset to evaluate my SLAM?

7. References

About

Releases

Packages

zhaoxuhui/Genshin-Impact-Dataset

Folders and files

Latest commit

History

Repository files navigation

The Genshin Impact Dataset (GID) for SLAM

Citation

1. Dataset Organization

2. Dataset Coverage

2.1 Collection Distribution

2.2 Sequence Diversity

3. Downloads

4. Technical Details

4.1 Data Collection & Pre-processing

4.2 Groundtruth Estimation & Reconstruction

4.3 Post-processing

5. SLAM Demos with GID

6. FAQs

Q1: What are the features and advantages of the proposed dataset?

Q2: How the groundtruth poses are estimated? What about the accuracy? How you guarantee its reliability?

Q3: How can I use the dataset to evaluate my SLAM?

7. References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages