多智能体强化学习环境PettingZoo文档详解(三) 您所在的位置:网站首页 pygame三D贴图 多智能体强化学习环境PettingZoo文档详解(三)

多智能体强化学习环境PettingZoo文档详解(三)

2023-03-27 17:46| 来源: 网络整理| 查看: 265

1.Landmarks: Landmarks are static circular features of the environment that cannot be controlled. In some environments, like Simple, they are destinations that affect the rewards of the agents depending on how close the agents are to them. In other environments, they can be obstacles that block the motion of the agents. These are described in more detail in the documentation for each environment.1.地标:地标是环境中无法控制的静态圆形特征。在某些环境中,如Simple,它们是影响智能体回报的目的地,这取决于智能体与它们的距离。在其他环境中,它们可能是阻碍智能体运动的障碍。每个环境的文档中都对这些进行了更详细的描述。

2.Visibility: When an agent is visible to a another agent, that other agent’s observation contains the first agent’s relative position (and in Simple World Comm and Simple Tag, the first agent’s velocity). If an agent is temporarily hidden (only possible in Simple World Comm) then the agent’s position and velocity is set to zero.2.可见性:当一个智能体对另一个智能体可见时,另一个智能体的观察值包含第一个智能体的相对位置(在Simple World Comm和Simple Tag中,则是第一个智能体的速度)。如果智能体被暂时隐藏(仅在Simple World Comm中可以),则智能体的位置和速度将设置为零。

3.Communication: Some agents in some environments can broadcast a message as a part of its action (see action space for more details) which will be transmitted to each agent that is allowed to see that message. In Simple Crypto, this message is used to signal that Bob and Eve have reconstructed the message.3.通信:某些环境中的某些智能体可以广播消息作为其动作的一部分(有关更多详细信息,请参阅动作空间),该消息将被传输到允许查看该消息的每个智能体。在简单加密(Simple Crypto)中,此消息用于表示Bob和Eve已重新构建消息。

4.Color: Since all agents are rendered as circles, the agents are only identifiable to a human by their color, so the color of the agents is described in most of the environments. The color is not observed by the agents.4.颜色:由于所有智能体都呈现为圆形,因此人类只能通过其颜色来识别智能体,因此大多数环境中都描述了智能体的颜色。智能体无法观察到颜色。

5.Distances: The landmarks and agents typically start out uniformly randomly placed from -1 to 1 on the map. This means they are typically around 1-2 units apart. This is important to keep in mind when reasoning about the scale of the rewards (which often depend on distance) and the observation space, which contains relative and absolute positions.5.距离:地标和智能体通常从地图上的-1到1均匀随机地开始。这意味着它们通常相隔1-2个单位。在推导奖励的规模(通常取决于距离)和观测空间(包含相对位置和绝对位置)时,这一点很重要。



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有