The application of reinforcement learning (RL) in artificial intelligence has become increasingly widespread. However, its drawbacks are also apparent, as it requires a large number of samples for support, making the enhancement of sample efficiency a research focus. To address this issue, we propose a novel
N
-step method. This method extends the horizon of the agent, enabling it to acquire
... [Show full abstract] more long-term effective information, thus resolving the issue of data inefficiency in RL. Additionally, this
N
-step method can reduce the estimation variance of
Q
-function, which is one of the factors contributing to estimation errors in
Q
-function estimation. Apart from high variance, estimation bias in
Q
-function estimation is another factor leading to estimation errors. To mitigate the estimation bias of
Q
-function, we design a regularization method based on the V-function, which has been underexplored. The combination of these two methods perfectly addresses the problems of low sample efficiency and inaccurate
Q
-function estimation in RL. Finally, extensive experiments conducted in discrete and continuous action spaces demonstrate that the proposed novel
N
-step method, when combined with classical deep
Q
-network, deep deterministic policy gradient, and TD3 algorithms, is effective, consistently outperforming the classical algorithms.