Given a code thay implements an evaluate_agent() function that estimates performance of a given agent on an environment. random_agent is a completely random baseline. Random_nn gives us a randomly initialized neural network. random_policy_search tries n randomly initialized neural networks and keeps the best, plotting peak performance over time.
For this homework:
Choose an environment from the list of classic control environments besides CartPole.
a. What are its input (state space) and output (actions)?
b. How does the random agent perform?
c. How does random_policy_search perform?
Choose a parameter of your choice to vary, and investigate how it impacts performance. For example, you could change num_hidden, or anything else that is an arbitrarily chosen parameter. Does it matter for performance, and what are its effects?
Sample Solution