Ippo 到 mappo

约 1083 字大约 4 分钟

2025-02-07

环境安装

gym-ma 的安装对 gym 有一定的版本要求 gym>=0.19.0,<0.20.0 同时你的pip 也不能太高，否则会报错

error in gym setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.
[end of output]
  
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

pip install --root-user-action=ignore 'pip<=23.0.1'
pip install --root-user-action=ignore 'setuptools<=66'
pip install --root-user-action=ignore 'wheel<=0.38.4'
pip install --root-user-action=ignore ma-gym

Combat 环境讲解

观察空间是一个 150 是数组，是一个 6x5x5 的数组拉平的，每个元素是一个 6 个元素的数组，代表一个 agent 的状态,他的观察范围是 5x5 的视野范围

动作空间是 5+n, n 是敌人的数量, 在 hands on rl 中是7

观察空间/observation space

 def get_agent_obs(self):
        """
        When input to a model, each agent is represented by a set of one-hot binary vectors {i, t, l, h, c}
        encoding its team ID, unique ID, location, health points and cooldown.
        A model controlling an agent also sees other agents in its visual range (5 × 5 surrounding area).
        :return:
        """
        _obs = []
        for agent_i in range(self.n_agents):
            # team id , unique id, location, health, cooldown
            _agent_i_obs = np.zeros((6, 5, 5))
            hp = self.agent_health[agent_i]

            # If agent is alive
            if hp > 0:
                pos = self.agent_pos[agent_i]
                for row in range(0, 5):
                    for col in range(0, 5):
                        if self.is_valid([row + (pos[0] - 2), col + (pos[1] - 2)]) and (
                                PRE_IDS['empty'] not in self._full_obs[row + (pos[0] - 2)][col + (pos[1] - 2)]):
                            x = self._full_obs[row + pos[0] - 2][col + pos[1] - 2]
                            _type = 1 if PRE_IDS['agent'] in x else -1
                            _id = int(x[1:]) - 1  # id
                            _agent_i_obs[0][row][col] = _type
                            _agent_i_obs[1][row][col] = _id
                            _agent_i_obs[2][row][col] = self.agent_health[_id] if _type == 1 else self.opp_health[_id]
                            _agent_i_obs[3][row][col] = self._agent_cool[_id] if _type == 1 else self._opp_cool[_id]
                            _agent_i_obs[3][row][col] = 1 if _agent_i_obs[3][row][col] else -1  # cool/uncool
                            entity_position = self.agent_pos[_id] if _type == 1 else self.opp_pos[_id]
                            _agent_i_obs[4][row][col] = entity_position[0] / self._grid_shape[0]  # x-coordinate
                            _agent_i_obs[5][row][col] = entity_position[1] / self._grid_shape[1]  # y-coordinate

            _agent_i_obs = _agent_i_obs.flatten().tolist()
            _obs.append(_agent_i_obs)
        return _obs

前面看错了, 以为是 3x3 的视野范围, 但 3x3 是攻击范围, 视野范围是 5x5

if self.is_valid([row + (pos[0] - 2), col + (pos[1] - 2)]) and (
PRE_IDS['empty'] not in self._full_obs[row + (pos[0] - 2)][col + (pos[1] - 2)]):

下面来分部份讲解

self.is_valid([row + (pos[0] - 2), col + (pos[1] - 2)])
def is_valid(self, pos):
  return (0 <= pos[0] < self._grid_shape[0]) and (0 <= pos[1] < self._grid_shape[1])

首先把相对坐标换成绝对的，然后判断是否是在地图内

PRE_IDS['empty'] not in self._full_obs[row + (pos[0] - 2)][col + (pos[1] - 2)]):

PRE_IDS = { 'wall': 'W', 'empty': 'E', 'agent': 'A', 'opponent': 'X', }

self._full_obs内是一个二维数组，每个元素是一个字符串，代表该格的状态

[['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
 ['E', 'E', 'E', 'E', 'E', 'A2', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'A1', 'E', 'E', 'E', 'E', 'E', 'E', 'E' ],
['E', 'E', 'E', 'E', 'E', 'X1', 'X2', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', ' E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', ' E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', ' E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', ' E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', ' E'],
['E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E', 'E']]

这主要是判断该格有没有友方或敌方单位,有的话才会记录信息,不然就是 0

每个格子有5个信息，分别是 {i, t, l, h, c} team ID: 1, -1 unique ID: 1, 2, 3 ... health points: 0 - 3 cooldown: True, False locationx: 个 0-1 的值，代表 x 的位置 locationy: 个 0-1 的值，代表 y 的位置

What's the confusion? What if agents attack each other at the same time? Should both of them be effected? Ans: I guess, yes What if other agent moves before the attack is performed in the same time-step. Ans: May be, I can process all the attack actions before move directions to ensure attacks have their effect.

Ippo 到 mappo

环境安装

Combat 环境讲解

观察空间/observation space

动作空间/action space