Question: Problem Statement: The objective of the problem is to implement an Actor - Critic reinforcement learning algorithm to optimize energy consumption in a building. The
Problem Statement: The objective of the problem is to implement an ActorCritic
reinforcement learning algorithm to optimize energy consumption in a building. The agent should learn to adjust the temperature settings dynamically to minimize energy
usage while maintaining comfortable indoor condition.
This dataset contains energy consumption data for a residential building, along
with various environmental and operational factors.
Data Dictionary:
o Appliances: Energy use in Wh
o lights: Energy use of light fixtures in the house in Wh
o T T: Temperatures in various rooms and outside
o RH RH: Humidity measurements in various rooms and outside
o Visibility: Visibility in km
o Tdewpoint: Dew point temperature
o Pressmmhg: Pressure in mm Hg
o Windspeed: Wind speed in ms
State Space:
The state space consists of various features from the dataset that impact energy
consumption and comfort levels.
Current Temperature T to T: Temperatures in various rooms and
outside.
Current Humidity RH to RH: Humidity measurements in different
locations.
Visibility Visibility: Visibility in km
Dew Point Tdewpoint: Dew point temperature.
Pressure Pressmmhg: Atmospheric pressure in mm Hg
Windspeed Windspeed: Wind speed in ms
Total State Vector Dimension: Number of features temperaturehumidity
visibilitydew pointpressurewindspeed features
Target Variable: Appliances energy consumption in Wh
Action Space:
The action space consists of discrete temperature adjustments:
Action : Decrease temperature by deg C
Action : Maintain current temperature
Action : Increase temperature by deg C
Adjustments are clamped within the defined temperature limits deg C to deg C
If the action is to decrease the temperature by deg C you'll adjust each temperature
feature T to T down by deg C If the action is to increase the temperature by deg C you'll
adjust each temperature feature T to T up by deg C Other features remain
unchanged.
The action space is limited to discrete temperature adjustments pm deg C within a defined range
deg C to deg C
Policy Actor: A neural network that outputs a probability distribution over possible
temperature adjustment.
Value function Critic: A neural network that estimates the expected cumulative
reward energy savings from a given state.
Reward function:
The reward function should reflect the overall comfort and energy efficiency based
on all temperature readings. ie balance between minimising temperature
deviations and minimizing energy consumption.
Calculate the penalty based on the deviation of each temperature from the
target temperature and then aggregate these penalties.
Measure the change in energy consumption before and after applying the
RL action.
Combine the comfort penalty and energy savings to get the final reward.
The RL framework integrates these adjustments by modifying the temperature
features in the state vector, computing rewards based on energy savings and comfort
penalties, and training the ActorCritic model to find an optimal policy.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
