赛事简介(GUIDANCE)
INTRODUCTION
Motivation
Following the success of AI commanders in StarCraft and DOTA2, enhancing gameplay experience with AI in various genres of games has been viewed as the next grand challenge. With the rise of open-world games in recent years, learning intelligent agents that have general task-solving capabilities in open-world environments has attracted increasing attention. However, the lack of satisfactory training and testing benchmarks remains an obstacle to research in this field. In this context, the goal of this competition is to advance research in the field of open-world intelligent agent learning. As a stepping stone to the ultimate goal of learning highly intelligent agents "living" in virtual worlds, we decided to focus first on open-world FPS games, taking into account the recent popularity of battle royale games.
Related Work
In general, modern 3D FPS games are inherently incomplete information games that are extremely hard for learning winning strategies in multiple-player scenarios and are known to have no optimal policy. Despite the difficulties, there have been attempts over the past decade to apply reinforcement learning in FPS games. To our best knowledge, the most influential work is the Augmented DRQN model proposed by Lample & Chaplot (2017), where the method leverages both visual input and the game feature information (e.g. presence of enemies or items) and modularizes the model architecture to incorporate independent networks to handle different phases of the game. Their approach successfully learned a competitive FPS agent by minimizing a Q-learning objective and showed better performance than average human players. Following this success, more work on learning FPS game agents has been proposed, such as Arnold by Chaplot & Lample (2017) which benefits from the Action-Navigation architecture, Divide&Conquer by Papoudakis et. al (2018) which further refined the idea of separating the control strategies of map exploration from enemy combat. Although these methods have shown promising results, their training and evaluation context is largely limited to old-fashioned video games with relatively small world sizes and low visual resolution, such as VizDoom (originally 1993) and Quake 3 (originally 1999). Recently, Pearce & Zhu (2021) tried to learn an FPS agent to play CSGO, a phenomenal modern 3D FPS game with high-resolution visual rendering. This new game environment not only introduces more computational burden (mostly due to extracting visual features) but also makes it more difficult for the agent to explore and adapt to the game world efficiently. The new approach addressed the challenge primarily by using behavioral cloning, and the learned agents showed reasonably good performance compared to normal human players in the Deathmatch mode. Building upon these works, in this challenge we seek to further expand the frontiers of AI in playing large-scale open world modern FPS games.
Highlights
The WildScav challenge aims to advance research on learning agents with universal abilities in open-world gaming environments.
The new challenge will provide an FPS game environment similar to current popular battle royale games (e.g. PUBG), allowing multiple players to compete against each other with diverse tactics. Compared to previous work, the new game environment has
a larger world size (over 50,000 m2)
a higher diversity of PCG-based game battlefields (over 100 different maps with randomly spawned buildings, plants, obstacles, and ground area with damping effects)
highly customizable game mechanics (e.g. random generation of bonus items with user-specified distribution, time periods, and quantities)
With this high degree of freedom of the game environment, we expect participants to fully exploit rich input information from both visual perception (e.g. depth map computed from 3D scene mesh) and task-related game features (e.g. target item location) to learn agents that can perform well on generalized world scenarios and tasks.
CHALLENGE STRUCTURE
Infrastructure
This challenge is built upon an open-world FPS game environment. The infrastructure of this challenge mainly consists of the following parts:
Components
- Runtime: The backend runtime environment for game logic simulation developed with the Unity3D game engine.
- Gameplay API: the programming interface allowing users to build their own training environments and actually control agents to perform tasks in the game environment. The API not only provides the communication channel to the backend game runtime to get observation data and send back action commands but also allows users to set some game configuration parameters as they wish. While we are going to have a fixed set of game configurations for the online evaluation, we hope participants make good use of these features to customize their training environments and try to learn agents with more generalized and robust task-solving abilities.
- Replayer: a GUI application powered by the Unity3D game engine developed for visualizing a game replay. This is similar to the spectator mode common in multiplayer FPS games, which allows users to watch the game history interactively. Uses can view the action of an agent from different perspectives and also switch between multiple agents or different observation modes (e.g. first person, third person, free) to see the whole game in a more immersive way. Participants can find the download links of this tool from our GitHub repository. To use this tool, follow the instruction below (assuming we are running the engine on Linux and watch the replay on Windows):
- Decompress the downloaded file to anywhere you like.
- Turn on record when running the game (check details in Python API docs). One record file will be saved at the end of each episode of the game.
- Copy the record file (e.g. xxx.bin) from "fps_Data/StreamingAssets/Replay" (under the root of Game Runtime) to "FPSGameUnity_Data/StreamingAssets/Replay" (under the root of Game Replayer).
- Run the executable entry FPSGameUnity.exe to start the application.
- Select the record you want to watch from the drop-down menu and click "Play" to start playing the record.
- During the play, users can make the following operations
- Press "Tab": pause or resume
- Press "E": switch observation mode (between first person, third person, free)
- Press "Q": switch between multiple agents
- Press "ECS": stop replay and return to the main menu
Data & Resources
World Mesh: We provide mesh data for each game map in the form of a .obj file.
Location List: Since the game is based on open-world settings, players can be spawned at (in theory) arbitrary places in the game world. To avoid potential "blind alley" for the player, we additionally provide all recommended spawning locations of players in a map in the form of a .json file. Though you may also set the spawning location manually, we suggest selecting locations from the provided candidates to avoid potential bad cases.
In general, we hope participants make good use of the provided tools and resources to build their own training environments (e.g. A gym environment with customized observation and action spaces and reward functions) to learn agents with generalizable abilities to solve the tasks in the different tracks.
Game Configuration
We provide a high degree of freedom for the participants to control every single game. Specifically, for each new game, one can control settings like timeout, game mode, map id, supply refresh time, and distribution (to see a detailed description of all configuration parameters, check out our GitHub repository) to customize the agent's learning environment. But notice that in our final evaluation stage, we will have a fixed set of game configurations to maximize the fairness of the competition. Since this is a challenging open-world environment, we expect participants to fully exploit the diversity of the large space of possible configurations of a game and try to learn an agent that can perform reasonably well across various environments, instead of naively overfitting to one specific game configuration.
Agent Observation
The gameplay interface provides multiple sources of information about the agent's self-condition as well as its surrounding environment. The observation mainly consists of two parts, visual perception data, and game variables.
Visual Perception: Unlike previous similar competitions, we do not provide the screen buffer to avoid high computation overload of rendering the game scenes as well as extracting latent features from images (e.g. using a CNN). Instead, we implement an efficient way to compute a low-resolution depth map from the agent's camera using only the location, orientation values, and the mesh data of the static scenes.
Game Variables: We also provide access to multiple classes of game-related variables to allow participants the freedom to construct their own observational features. These variables include location and orientation, state of motion, health, state of combat, and task-related metrics.
From our experience, the above observations are reasonably informative for an agent to make good action decisions. For a detailed description of these variables, check out our Python API docs.
Agent Control
Track 1-1 (Navigation): The first track requires only some basic actions to control movement and orientation, including:
- WALK_DIR: a number between [0, 360] that determines which direction (angle) the agent walks towards.
- WALK_SPEED: a number between [0, 10] that determines how fast the agent walks (with unit m/s).
- TURN_LR_DELTA: the change in the horizontal camera angle (yaw) between two frames. A negative value means a left turn, and a positive value means a right turn.
- LOOK_UD_DELTA: the change in the vertical camera angle (pitch) between two frames. A negative value means looking down, a positive value means looking up.
- JUMP: a bool variable determining whether to jump at the current time step.
Track 1-2 (Supply Gathering): The second track requires players to collect supplies randomly distributed on the world map. Extending from Track 1-1, we only need one more action
Track 2 (Supply Battle): In the last track, we introduce the combat system that is very common in FPS games. We have two more actions for players to take, including:
ATTACK: a bool variable that determines whether to fire the weapon and cost one bullet at the current time step.
RELOAD: a bool variable that determines whether to refill the weapon's clip using spare ammo.
Other Details
About actions:
- All actions can be set and executed simultaneously within one time step (except ATTACK and RELOAD).
- PICKUP only takes effect if the supply is within the trigger range (fixed at 1m). If multiple supplies are within the trigger area, the closest one is picked up.
- The targeting direction of ATTACK is determined by the agent's current orientation. Therefore, agents may need to adjust TURN_LR_DELTA and LOOK_UD_DELTA in coordination with ATTACK to improve their chances of hitting enemies.
- The RELOAD action takes some time to complete. Additional ATTACK and RELOAD have no effect during this time.
- If the weapon clip is empty and there is still ammunition, a RELOAD is automatically triggered.
- The following table gives a brief overview of the valid actions for each track.
About environment:
The value range for yaw is set to (-180, 180] and for pitch to [-90. +90] (with -90 means looking up towards sky).
All walls, trees, rocks, and furnishings are indestructible in our environment and the agent can only fire one bullet at a time.
The agent does not need to consider the effects of wind speed and gravity on bullet trajectories, although the bullet itself has a fixed velocity.
The actual movement speed of the agent depends on the combined effect of WALK_SPEED and the environment terrain. For example, stones in the way may slow down or even hinder movement. Uphill terrain also reduces speed. Some special areas reduce movement speed by a certain decay factor (for example, on ice = 0.5)
GETTING STARTED
Our project is put on the Github repository Wilderness-Scavenger.
Follow the installation guide to set up your working environment.
Familiarize yourself with the framework and use of our gameplay API inspirai_fps.
Familiarize yourself with the use of replay tool.
Check out examples and baselines to learn more about the API and start from here to build your own environments.
EVALUATION
Submission
Note: The detailed instruction for submission is pending to be determined. We will release the instruction before the submission system opens.
In brief, a participant should submit an agent controller that takes in the state data (e.g. location, orientation, health, etc.) at each time step and output a control action that tells the backend engine how to operate the player (e.g. move forward/backward). Specifically, the format of state and action is defined in our python gameplay interface and the participants need to wrap their controller into a python class that can be imported from an external module or package. To minimize potential problems during evaluation, we will provide a Docker image setting up a python runtime environment with a compatibility test template. Before submission, participants can test their controller class in this template and make sure the name of the class and all input and output parameters conform to the given specifications. Also, they should check if the working result is consistent with their expectations. Finally, the controller class and other used dependent libraries or data files should be packaged into a single Docker image (which can be simply adapted from ours) and sent to us.
Evaluation Rule
Track 1-1 (Navigation): In general, an agent is measured by how fast it can navigate to target locations. The performance is evaluated mainly based on the time cost by the agent to reach the target location. All submitted agents will be evaluated on 10 new maps (unseen in training) for 100 games (10 games for each map). We set the timeout of a game to be 5 minutes. At the start of each game, a random start location and a random target location are sampled from our manually selected candidate location pool. To ensure fairness, all agents will be tested with the same start and target locations in each game. Finally, the score of an agent is its average time cost in the 100 games. Notice that it is absolutely possible an agent fails in a game, which means it does not reach the target location before timeout. In this case, we add punishment to the actual time cost (which is the timeout limit) and it is computed based on the 3D spatial distance between the target location and the agent's end location. The score will be used for the final ranking, the lower the better.
Track 1-2 (Supply Gathering): The task for agents is to collect supplies as much as possible before the timeout. Similarly, we will evaluate the submitted agents in multiple maps for multiple games. For Track 2, agents are tested across 10 maps for 100 games (10 games for each map). The timeout of a game is set to 10 minutes. In each game, a start location is randomly selected from the candidate location pool and all submitted agents will have the same start location to ensure fairness.
Track 2 (Supply Battle): The goal in the this track is the same as Track 1-2 while the advantageous game strategy may be quite different due to the introduction of the combat system. We run the evaluation across 10 maps for 10 games (one map for each game). Still, the agent start location is randomly picked from the candidate location pool. However, it is unlikely that all agents are spawned at exactly the same location. To address this, we set up a small spawning area in each game, and all the agents are randomly dropped in this area at the beginning with a short period of invincible state to prevent the early fight. The score of an agent is calculated as the average number of supplies it collected across all games. The score will be used for the final ranking, the higher the better.
Evaluation Process
Singleplayer (Track 1): Each submission of Track 1-1 and Track 1-2 is independently evaluated. A team can submit its solution multiple times (limited) and only the latest submission will be counted towards its final ranking. The top 1 team on each task will be given the prize.
Multiplayer (Track 2): We apply a two-stage workflow to simplify multiplayer evaluation. The first stage is a typical qualifier evaluation stage to select out top 10 teams to enter the second stage. The second stage will be the final contest, where agents from different teams directly compete against each other in a game.
- Stage 1 (qualifier evaluation): In the first stage, all teams are independently tested by competing with our baseline agents. We use the same game configuration for all submissions. In each game, the agent is spawned randomly near the edge of the world range. Then within a timeout limit of 15 minutes, 10 agents (including 9 baseline agents and the submitted agent) will compete against each other.
- Stage 2 (final contest): The top 10 teams from the first stage will enter the second stage. At this stage, each team controls an agent to compete in the game. We use different game configurations to run multiple games to evaluate all agents. The final score of a team will be the average of all scores it receives in all games.