Question: Problem Statement: The objective of the problem is to implement an Actor - Critic reinforcement learning algorithm to optimize energy consumption in a building. The

Problem Statement: The objective of the problem is to implement an Actor-Critic
reinforcement learning algorithm to optimize energy consumption in a building. The agent should learn to adjust the temperature settings dynamically to minimize energy
usage while maintaining comfortable indoor condition.
This dataset contains energy consumption data for a residential building, along
with various environmental and operational factors.
Data Dictionary:
o Appliances: Energy use in Wh
o lights: Energy use of light fixtures in the house in Wh
o T1- T9: Temperatures in various rooms and outside
o RH_1- RH_9: Humidity measurements in various rooms and outside
o Visibility: Visibility in km
o Tdewpoint: Dew point temperature
o Press_mm_hg: Pressure in mm Hg
o Windspeed: Wind speed in m/s
State Space:
The state space consists of various features from the dataset that impact energy
consumption and comfort levels.
Current Temperature (T1 to T9): Temperatures in various rooms and
outside.
Current Humidity (RH_1 to RH_9): Humidity measurements in different
locations.
Visibility (Visibility): Visibility in km.
Dew Point (Tdewpoint): Dew point temperature.
Pressure (Press_mm_hg): Atmospheric pressure in mm Hg.
Windspeed (Windspeed): Wind speed in m/s.
Total State Vector Dimension: Number of features =9(temperature)+9(humidity)+1
(visibility)+1(dew point)+1(pressure)+1(windspeed)=21 features
Target Variable: Appliances (energy consumption in Wh).
Action Space:
The action space consists of discrete temperature adjustments:
Action 0: Decrease temperature by 1\deg C
Action 1: Maintain current temperature
Action 2: Increase temperature by 1\deg C
Adjustments are clamped within the defined temperature limits (-10\deg C to 30\deg C).
If the action is to decrease the temperature by 1\deg C, you'll adjust each temperature
feature (T1 to T9) down by 1\deg C. If the action is to increase the temperature by 1\deg C, you'll
adjust each temperature feature (T1 to T9) up by 1\deg C. Other features remain
unchanged.
The action space is limited to discrete temperature adjustments (\pm 1\deg C) within a defined range
(-10\deg C to 30\deg C).
Policy (Actor): A neural network that outputs a probability distribution over possible
temperature adjustment.
Value function (Critic): A neural network that estimates the expected cumulative
reward (energy savings) from a given state.
Reward function:
The reward function should reflect the overall comfort and energy efficiency based
on all temperature readings. i.e., balance between minimising temperature
deviations and minimizing energy consumption.
Calculate the penalty based on the deviation of each temperature from the
target temperature and then aggregate these penalties.
Measure the change in energy consumption before and after applying the
RL action.
Combine the comfort penalty and energy savings to get the final reward.
The RL framework integrates these adjustments by modifying the temperature
features in the state vector, computing rewards based on energy savings and comfort
penalties, and training the Actor-Critic model to find an optimal policy.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!