For an episodic or continuous task we use this formulation to update the value function
We express the sequence of rewards as the "return"
$R_t = r_t + \gamma * r_{t+1} + \gamma^2 * r_{t+2} ... + \gamma^T * r_{T}$
where $T$ is the terminal timestep
Using this, we can write the update to the value function as