This paper introduces a new strategy for managing energy consumption by employing a constrained deep Q-network (DQN) algorithm to regulate Heating, Ventilation, and Air Conditioning (HVAC) systems. The strategy is grounded in the principles of the Markov Decision Process (MDP), which allows for the creation of a transition kernel. By utilizing an artificial neural network (ANN), we can predict levels of PM 2.5 and PM 10 based on the given conditions and actions. This prediction helps in forming the transition kernel within the MDP framework. Using this predictive model as the transition kernel, we optimize the control policy of the energy management agent through the constrained DQN technique. This technique ensures that actions remain within specific constraints. We validate the effectiveness of our approach through numerical experiments using real data collected from Namgwangju Station. These experiments demonstrate that the introduced constraints successfully keep PM 2.5 and PM 10 levels within predefined thresholds at the stations. Furthermore, reducing the threshold results in more significant decreases in PM levels, achieved by allocating additional power to blowers and air conditioners.