Type of Q-Learning Model: Table-Based 

Left Side: ML-Agents

Right Side: Q-Learning 

Black Arrow Agent: The agent is being moved by ML-Agents and Q-Learning. It will succeed if it reaches the white triangle goal and fails if it touches a wall. Upon success or failure, the black arrow agent and white triangle goal will respawn at a new point within the bounds.

White Triangle Goal: The goal to reach. It will move to a random point in the box if the black arrow agent touches a wall, reaches the goal, or runs out of time. If in map 2 and the white triangle touches a wall, it will spawn in a random place within the bounds. 

Pink Square: If in map 2, the red square moves to a random point on a circle offset from the goal. The goal will move towards this square, giving it random movement. 

Blue Oval: If in map 2 and the white goal triangle detects a wall, it will move toward the blue oval instead of the red square to avoid a collision with the wall. 

Distance State: The state is determined by the distance the target is away from the agent, rounded to the nearest integer. There are 11 total states 0,1,2,3,4,5,6,7,8.9.10. For example, if the agent is 3.4 units away from its target, it will be at state 3.

Direction State: The state is determined by the direction of the target from the agent. There are 8 states: Up, Down, Left, Right, Up Left, Up Right, Down Left, Down Right. For example, if the goal is to the right of the agent, the state will be Right. 

Actions: There are 8 actions: Move Up, Move Down, Move Left, Move Right, Move Up Left, Move Up Right, Move Down Left, Move Down Right. The agent will move in the specified direction. 

Map Rules: Each map is bordered with a box and has a white triangle as the goal. Passing or failing results in the episode ending and the position of the agent and target being randomized. On a pass, the background will turn green, while on a fail, the background will become red.

Map 1: On Map 1, the goal will not move until the agent passes or fails by touching the goal or walls respectively.

Map 2: On Map 2, the goal will actively wander around the area. It will reset to a random position within the bounds if it touches the walls. It has active collision avoidance and will attempt to maneuver away from walls. The pink square is the spot on the circle radius the target will wander to and the blue oval is where the target will move to if it detects a wall.