Institut Riset dan Publikasi Indonesia (IRPI) MALCOM: Indonesian Journal of Machine Learning and Computer Science Journal Homepage: https://journal. id/index. php/malcom Vol. 3 Iss. 2 October 2023, pp: 281-292 ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Optimization of Energy Consumption in 5G Networks Using Learning Algorithms in Reinforcement Learning Daffa Dean Naufal1. Harry Ramza2. Emilia Roza3* 1,2,3 Departemen of Electrical Engineering. Faculty of Industrial Technology and Informatics. Universitas Muhammadiyah Prof DR HAMKA. Indonesia E-Mail: 1daffadean22@gmail. com, 2hramza@uhamka. id, 3emilia_roza@uhamka. Received Sep 5th 2023. Revised Oct 15th 2023. Accepted Oct 22th 2023 Corresponding Author: Emilia Roza Abstract The 5G network is an evolution of the 4G Long Term Evolution (LTE) fast internet network that is widely adopted in smart phones or gadgets. 5G networks offer faster wireless internet for various purposes. This research is a literature review of several articles related to machine learning, specifically regarding energy consumption optimization with 5G networks and reinforcement learning algorithms. The results show that various techniques have evolved to overcome the complexity of large energy intake including integration with 5G networks and algorithms have been completed by many researchers. Related to electricity consumption, it was found that during 5G use cases, in a low site visitor load scenario and while reducing power intake takes precedence over QoS, power savings can be made by 80% with 50 ms latency, 75% with 20 ms and 10 ms latency, and 20% with 1 ms latency. If QoS is prioritized, then power savings reach a maximum of five percent with minimum impact in terms of latency. Moreover, with regards to power performance, it has been observed that DQN-assisted motion can offer improvements. Keyword: Algorithms. Energy. Long Term Evolution. Reinforcement Learning, 5G INTRODUCTION Before the 5G generation of advanced technology appears, there are several generation sequences. The development of the initial generation or 1G generation in 1980 created a voice service with an analog system. Generation 2G in the late 1980s created a signal to transmit voice. The 3G generation created email access services and multimedia services. The 4G generation created IP or internet protocol with a high level of 5G or 5th generation can be used with larger bandwidth using BDMA. CDMA and millimeter waves. 5G is an internet network connection designed with various enormous advantages, including faster upload and download processes with a stable and wide connection range . In short, the 5G network is an evolution of the fast 4G Long Term Evolution (LTE) internet network, which is widely adopted in smartphones or gadgets. The 5G network offers faster wireless internet for a variety of purposes. When the 5G network is enjoyed, the internet speed will shorten the time, for example for downloading 4K resolution films, games, software and various other content. The growing need for fast and reliable connectivity has driven the development of fifth-generation . G) mobile network technology . While 5G offers better performance and higher speeds, there are challenges associated with increased energy consumption. More advanced 5G networks require more complicated infrastructure and more powerful devices, which often consume more energy than previous generations . Therefore, it is important to optimize energy consumption in 5G networks to keep a balance between network performance and environmental impact. One way to address this issue is by applying Reinforcement Learning (RL) deep learning algorithms. With RL, 5G networks can automatically and dynamically optimize resource usage and energy consumption based on changing network conditions. The most important part of the 5G connection is its ability to deliver better Virtual Reality and Artificial Intelligence or AI, connect machine to machine (M2M), and bring Internet of Things (IoT) or connected objects to the internet network to the next stage. According to Lestari, 5G technology as the fifth generation in the development phase of cellular telecommunications networks has superior bargaining power in terms of network speed which is even 10-100x faster than 4G technology. So, it has the potential to be able to realize longdistance interactions at the same time without any interference or obstacles. The speed of this technology is even described as downloading more than 30 films with high quality or resolution in just a few seconds and DOI: https://doi. org/10. 57152/malcom. MALCOM-03. : 281-292 can provide access to various applications through one universal device and interconnection with existing telecommunication infrastructure . Based on research results released by the International Telecommunications Union regarding International Mobile Telecommunication 2020, there are 8 main capabilities possessed by 5G technology and it is projected that this technology will have a far leap compared to previous technologies, namely as follows . Maximum data speed, reaching a peak speed of 20 Gbit/s. In fact, this technology is described as having speeds of up to 10-100x faster than previous technologies. Wide range of user coverage speeds, available throughout the coverage area to users/mobile devices . n Mbit/s or Gbit/. Very low latency. Very high mobility. Connection density that can accommodate the number of connected users. Energy efficiency. Spectrum efficiency. Good and high area traffic capacity in every geographic range. Through the main capabilities offered in the implementation of 5G technology, it is not impossible for this technology to become a solution or answer to hampered human activities due to weak connections to the internet or hampered development of innovation in the field of digital technology due to the lack of adequate telecommunications networks. Thus, through maximum improvements in 5G technology infrastructure and optimization of its use, this technology will continue to be able to support significant increases in digital transformation in various sectors. One sector that can benefit from 5G technology is energy. The energy sector has a very important role in realizing sustainable industrial development. The industrial world is very closely related to resource and energy optimization problems. This greatly influences product selling prices and the company's image as a company that is healthy and able to manage resources well so that a sustainable industry can be realized. From a physical point of view, energy use drives economic productivity and industrial growth and is the center of modern economic operations. Apart from that, energy also encourages household consumption so that it can boost the economy . Optimizing energy consumption needs to be done considering the importance of energy in various aspects of life. Energy use in the industrial sector is different from energy use in the household or commercial sector where energy use in industry is used to produce a product. Energy-using equipment in industry is designed to support production patterns that are relatively efficient at initial production design conditions, but become ineffective when production patterns change. The use of efficient equipment in industrial systems, although important, does not provide a guarantee that energy savings can be achieved if the designed equipment is not operated according to its initial operating function. System optimization in the industry aims to make energy use in production operational processes more efficient and optimal. Currently, synthetic intelligence is broadly used to clear up numerous problems consisting of business, robotics, natural language, arithmetic, games, notion, clinical analysis, engineering, monetary evaluation and science analysis . Gadget getting to know may be defined because the utility of computers and mathematical algorithms which can be followed by getting to know that comes from statistics and produces predictions for the future. The sphere of system studying is concerned with the query of the way to construct laptop applications to enhance automatically based totally on experience. Gadget learning is split into 3 categories: Supervised learning. Unsupervised, and Reinforcement getting to know . Reinforcement generally falls between Supervised and Unsupervised in which this technique works in a dynamic surroundings wherein the idea need to complete the purpose with none specific notification from the pc if the purpose has been accomplished . Synthetic intelligence is a big a part of machine gaining knowledge of, computer imaginative and prescient or natural Language Processing. Through virtual assistants. Siri, as an example or in e-trade whilst the usage of chatbots to carry out customer service, on every occasion there is technology with the capability to make unbiased selections, artificial intelligence will truly have a critical position to play. However, there's a great deal more capacity in the courting between synthetic intelligence and our cutting-edge activities. It can be said that currently to facilitate human activities including optimization, it requires several aspects such as 5G: a fast network that helps apps work. Big data: large volumes of data to process. Artificial intelligence: the algorithms behind smart devices. Reinforcement learning comes from animal learning theory. this learning does not require any prior knowledge, it is possible to independently acquire optional policies with knowledge acquired through trial and Optimization of Energy Consumption in 5G. (Naufal et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 error, and continuously interact with the dynamic environment. Reinforcement learning problems are solved by learning new experiences through trial and error . Reinforcement learning algorithms are related to dynamic programming algorithms commonly used to solve optimization problems. In particular, the reinforcement learning method based on the Markov decision process model includes two types. The first is model-based methods such as the SARSA algorithm, where reinforcement learning first learns model knowledge, then obtains an optimal strategy from that model knowledge. Second are modelrelevant methods such as the Temporal Difference algorithm and the Q-learning algorithm, where reinforcement learning directly calculates the optimal strategy without knowledge of the model . In this paper, studies related to optimizing energy consumption with 5G networks and reinforcement learning algorithms will be discussed. MATERIALS AND METHOD This research is a literature review of several articles related to machine learning, especially on optimizing energy consumption with 5G networks and reinforcement learning algorithms. The review was carried out from some of the latest research efforts that utilize machines learning. The data collection process was carried out by examining some of the previous relevant literature. Supporting theories, data and information are used as references in research. Start Collect data related to 5G networks and learning algorithms in reinforcement learning. Analyze the data and determine the parameters that need to be optimized. Create a reinforcement learning model to optimize energy consumption in 5G Conduct simulation and testing of the model to ensure its effectiveness Evaluate and improve the model if necessary. Implementation model on 5G network. Finish Figure 1. Research Flowchart The literature study method is used to find information related to energy consumption optimization in 5G networks using learning algorithms in reinforcement learning. In this method, references from journals, articles, and related documents relevant to the topic are searched. After that, reading and understanding the contents of the references found. The information found is then organized into a conclusion. Flowcharts are used to describe the steps taken in optimizing energy consumption on 5G networks using learning algorithms in reinforcement learning. The first step is to collect data related to 5G networks and learning algorithms in reinforcement learning. Next, data analysis is carried out and determine the parameters that need to be optimized. After that, a reinforcement learning model is created to optimize energy consumption on 5G networks. The model is then tested through simulation and testing to ensure its effectiveness. If needed, the model is evaluated and improved. Finally, the model is implemented on a 5G network. RESULTS AND DISCUSSION The The rapid expansion of mobile networks, coupled with the emergence of 5G networks and the demand for compaction, has led to a significant increase in energy consumption. By utilizing the technology, telecommunications companies can achieve informed automation and decision making, effectively reduce energy use, and pave the way for more sustainable networks. Reinforcement learning algorithms have been applied in various fields. The application of the model . eep reinforcement learnin. has been proposed in smart city applications for the solution of indoor localization problems by utilizing Bluetooth signal strength and low energy . The algorithm used in the study is included in reinforcement learning. The Q-Learning algorithm is used to learn the optimal action in each state adopted by the system through trial and error. This solves most of the problems that occur in Natural Language Processing. The hidden states between input words and output vectors form an intensive network for comprehensive and efficient learning . The algorithm used in the research is included in reinforcement learning. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 281-292 MALCOM-03. : 281-292 The Markov decision process (MDP) algorithm is used for image object detection. The proposed system uses an active agent that explores within the scene to identify the target object location, and learns policies to improve the geometry of the agent by taking simple actions in a space that has integrated parameters of discrete actions and corresponding continuous parameters . The algorithm used in the study is included in reinforcement learning. In Wang's research, 2021, computer vision and Deep Reinforcement Learning were combined to optimize energy consumption in the form of increasing fuel economy of hybrid electric vehicles. The proposed method is capable of autonomously learning optimal control policies from visual input with smart convolutional neural network-based object detection to extract available visual information from the on-board Researchers constructed real 100-kilometer city and highway driving cycles, in which visual information was combined. The results show that systems based on deep reinforcement learning with visual information consume 4. 3% - 8. 8% more fuel slightly compared to without visual information and the proposed method achieves fuel savings of 96. 5% from global optimization dynamic programming. Researchers also use other sensors, including radar, lidar, ultrasonics. Global Positioning System (GPS) . Related to the development of technology, the ability of 5G to connect a large number of devices should also be a highlight for the energy industry. As is known, the energy sector requires many devices such as sensors, drones, grids, and so on to distribute energy well. IoT technology is able to connect everything in realtime so as to create good synergy. With the help of 5G, the ability of IoT technology to create these synergies will be significantly enhanced, resulting in efficiencies in asset management and energy distribution that have never existed before in the energy sector. In this research, two research articles were selected which will be discussed further, namely Marta's research . and GiannopoulosAo research . According to MaltaAos research, in mobile networks, 5G extremely-Dense Networks (UDN. have emerged as they efficiently boom the network potential due to cellular splitting and densification. A Base Station (BS) is a hard and fast transceiver this is the principle conversation factor for one or greater wi-fi cell patron devices. As UDNs are densely deployed, the wide variety of BSs and conversation links is dense, elevating worries approximately useful resource control in regards to energy performance, seeing that BSs consume plenty of the whole price of power in a mobile community. It is predicted that 6G next-generation cell networks will include technologies inclusive of artificial intelligence as a service and recognition on electricity efficiency. The usage of system gaining knowledge of it is viable to optimize power intake with cognitive management of dormant, inactive and lively states of community factors. Reinforcement enables rules that allow sleep mode techniques to steadily deactivate or activate additives of BSs and decrease BS electricity intake. in this work, a nap mode control based totally on kingdom motion praise kingdom motion (SARSA) is proposed, which allows the use of particular metrics to find the nice tradeoff between strength discount and satisfactory of service (QoS) constraints. The study presents requirements and standards related to force transmission and hysteresis that have been evaluated on these coatings. In the 5G network, the BS has auxiliary elements that contribute electricity: the energy consumption of nodes and the energy consumption of verbal exchanges. the node's power consumption includes the power consumption of signal processing, cooling, and battery backup. Communication intensity includes the power consumption to transmit a signal with a certain coverage range, which depends on the distance of each UE: , a distant UE consumes more power in transmitting records than a UE better. The sleep mode method assumes the deactivation or activation of the BS hardware components. These components can be grouped via comparable deactivation/activation time and assigned to exclusive sleep mode The subsequent sleep mode ranges can be assumed: Sleep Mode (SM)1: The power amplifier and some additives that handle the analog front end and virtual baseband are disabled. This is the fastest with an on or off time of 0. 071 ms . rthogonal frequency division multiplexing (OFDM) imagin. SM2 level: wishes 1 ms . sub-frame of Transmission Time c language (TTI)) to deactivate or prompt extra additives of the analogue front-stop. SM3 stage: the strength amplifier, all of the components of the virtual baseband, and almost all the components of the analogue the front-cease . esides the clock generato. are switched off. The deactivation or activation time is 10 ms . SM4 level: is the standby mode in which a massive a part of the components of BS is deactivated. The wake-up functionality will take 1s, as this is the minimum sleep period. Table 1 gives the energy consumption measurements for four one-of-a-kind types of BSs wherein the authors have grouped the power consumption and transition times via sleep mode Optimization of Energy Consumption in 5G. (Naufal et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Table 1. Energy Consumption Type 2x2 Macro 4x4 Macro LSAS Load Mode Full Zero BS Power Consumption [W] Sleep Mode Table 2 gives the electricity consumption, duration of activation/deactivation, percent of reduction in energy consumption and energy savings, grouped with the aid of SMs. Table 2. Percent of Reduction Sleep Mode Energy Comsumption Act/ Deac. Duraction Awake SM1 SM2 SM3 SM4 035 ms 5 ms 5 ms 500 ms Energy Consumption Reduction Energy Saving When the device enters a deeper sleep mode, it consumes much less power. however, the forwarding delay to invite the BS takes longer, reducing the user's QoS. At each sleep mode level, the activation and deactivation versions are the same. the minimum duration of each sleep mode is the sum of the deactivation and activation times. Table 3 summarizes the 5G use instances offering for each of the application categories, the general traits, and the same old person aircraft latency requirements. Table 3. 5G Use Summarisation Use Case Application Categories eMBB - Broadcastinh - Media delivery - Online Gaming mMTC - Actuators - Sensors - Trackers - Wearables - Low cost devices - Extreme coverage - Long devuces battery 1ms, one way for both downlink and uplink - Augmented Reality - Mobile robots - Motion control - Remote control - High Reality - Ultra-low latency - High availability 1 ms, one eay for both downlink and uplink URLLC General Characteristics - Extereme data rates - Large data volume - Low latency . est Standard user plane latency . 4ms, one way for both downlink and uplink Sub-use cases user plane latencies . - Online gaming, 10 ms - Downlink video streaming 4k, 20 ms - Autonomus vehicles: sensor, 5 ms - Autonomus vehicles: video dinamic, 5 ms - Autonomus vehicles: video fixed, 50 ms - Automative: Assisted, 5 ms - Automative: Cooperative, 10 ms - Automative: TeleOperated, 20 ms - Industry 4. 0: Motion control 1 ms - Industry 4. 0: Factory automation, 10 ms - Industry 4. 0: Process automation, 50 ms The layout of the machine version considers the tradeoff among the strength intake and the E2E consumer latency. Whilst in a given sleep mode, the BS does no longer transmit or acquire site visitors from the quit user but listens to incoming visitors from the middle network meant for the give up person. In case the BS is napping, incoming site visitors from the center community is saved on a packet buffer and therefore latency will increase, but, this enables a discount in energy intake. Figure 1 presents the device model. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 281-292 MALCOM-03. : 281-292 Figure 2. System Model The utility instance is described admirably in terms of . traffic throttling, . power consumption, and . sleep mode policies. each of these definitions is precise in the following subsections. the model is coded in Python to simulate the incorporation of the RL agent inside the BS so that you can achieve appropriate sleep during simulation, the agent is trained to interact with the environment by observing, performing actions, and earning rewards. the observations depend on the number of visitors to the site obtained through the BS providing packet buffering, and the energy consumption depends on the sleep mode level chosen by the agent. Traffic coming from the core network is modulated using a stochastic process to shape the behavior of the body parts of the information traffic for use in the simulation. A Poisson procedure is defined to obtain exclusive traffic patterns with special arrival charges over time . Table 4 identifies traffic load variations. Table 4. Variation of Traffic Load #event Traffic Load The power distribution model is determined by a rough estimate of the power savings at the BS node, which can be defined as the fraction of time the BS spends in each sleep mode or wake mode during a period of time. The sleep mode policy is the final results of the RL algorithm that allows the device to study the firstclass policy to use in each simulated environment Algoritm 1 SARSA ( On-Policy TD Control ) for Estimating ycE OO yc O Require: Algorithm parameters: step size Oy OO . E > 0 Initialize ycE. cI, y. OAyc OO yc , yu OO ya. , ycaycycaycnycycycaycycnycoyc yceycuycyyceycayc ycEaycayc ycE. cyceycycoycnycuycayco, . ) = 0 Loop for each episode do Initilialize S Choose A from using policy derived from OO Oe greedy Loop for each step of episode do Take action A, observe R. SAo Choose AAo from SAo using policy derved from OO Oegreedy ycE. cI, y. Ia ycE. cI, y. Oy . cI ycycE. cIA , yaA) Oe ycE. cI, y. ] ycI Ia ycIA ya Ia yaA end for until S is terminal end for Figure 3. States and Action Optimization of Energy Consumption in 5G. (Naufal et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Figure 4 below offers the convergence received with the following parameters: SMweight = 1. Buflim = 50 ms and site visitors load = 5%. It can be visible that because the number of episodes increases, the fee of amassed on the spot rewards stabilizes after round a hundred iterations. Figure 4. Convergence of the Total Reward Function per Episode Figure 5 below resents the distribution of the different sleep mode states. Figure 5. States of the BS with a Threshold of 50 ms Figure 6 below presents the energy saving percentage for each Buflim among the simulated traffic loads. Figure 6. Energy Saving Percentage per Buflim and Traffic Load The outcomes of the simulations show that, relying on the goal of the 5G use case, in low traffic load scenarios and while a discount in energy intake is preferred over QoS, it's far possible to attain strength savings up to 80% with 50 ms latency, 75% with 20 ms and 10 ms latencies and 20% with 1 ms latency. If the QoS is favored, then the energy savings reach a maximum of 5% with minimal effect in terms of latency . In step with GiannopoulosAo research, electricity performance (EE) constitutes a key target inside the deployment of 5G networks, mainly due to the elevated densification and heterogeneity. Figure 7 below represents the network model. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 281-292 MALCOM-03. : 281-292 Figure 7. Network Model The device-stage EE jointly examines the second tier transmitting power resources of the okay MiRUs which are gift within the network area and the requested QoS of the DPs related to this cell. The DQN schemes are divided into 2 broad categories, namely the classical centralized/single-agent DQN vs. decentralized/multi-agent DQN. The single-agent C-DQN algorithm can be described through the agentenvironment interaction in figure 7 below. Figure 8. The Single-Agent C-DQN Algorithm The outline of the MA-DQN schme may be summarized in the following steps . or a given episod. Step 1: Each agent observes only its associated users . heir associated PRB and experienced throughpu. Step 2: Based on its own policy, select a . Step 3: The individual action selected by each agent are combined to form the global action vektor, i. the power value of each PRB and cell. Step 4: Then, the reward is defined similarly to the centralized scheme, defined in eq. Specifically, the system-levev EE increment takes into account the throughput experienced by all users and the power level of all MiRUs . global rewar. Step 5: In case that the system-level EE was improved, the agents continues to play in the same episode. Otherwise, the reward is zero and another episode is initiated. Step 6: steps 1-5 are repeated until convergence in this manner, although each individual agent has partial observability of its envirinment, it is able to AusenseAy. Inside the paper, a Deep Q-network (DQN) based electricity manage scheme is proposed for enhancing the device-level EE of two-tier 5G heterogeneous and multi-channel cells. The algorithm aims to maximize the EE of the gadget through regulating the transmission power of the downlink channels and reconfiguring the user affiliation scheme. To correctly remedy the EE problem, a DQN-primarily based method is hooked up, properly changed to make sure ok QoS of every user . hrough defining a demand-driven worthwhile devic. and near-most advantageous strength adjustment in each transmission hyperlink. To at once evaluate distinct Optimization of Energy Consumption in 5G. (Naufal et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 DQN-based tactics, a centralized (C-DQN), a multi-agent (MA-DQN) and a transfer mastering-based (T-DQN) method are deployed to cope with whether or not their applicability is useful in the 5G HetNets. Numerous simulations were conducted to demonstrate the overall performance of the advanced methods both for evaluative and comparative purposes. Simulation is shown below. Table 5. Parameters System Parameters Parameters Value Frequency 6 GHz 5G numerology Number of PRBs per PRB bandwith Number of users per MiRU MaRU/MiRU cell Noise power density Max power of MaRU/MiRU Min power per PRB 88 MHz DQN Parameters Parameters Value Number of hidden Activation function of Relu input and hidden layers Activation function of Linear output layers Memory size Mini-batch size 500/100 m Update target frequency -174 dBm/Hz Optimizer Adam 80/25 W yun decay Linear Loss function Huber loss The results show that DQN-supported movements can provide better overall network-wide EE performance because they stabilize the alternation between energy consumption and throughput . n Mbps/Wat. the MA-DQN technique has achieved high performance (>5 Mbps/Wat. , as decentralized knowledge collection allows low-dimensional agents to be coordinated through international rewards. Figure 9 below depicts the education assessment of the proposed DQN-based algorithms. In widespread, some of 30000 education episodes changed into proved sufficient for the praise convergence of both SC-DQN and MA-DQN schemes. Figure 9. The Education Assessment of the Proposed DQN-based Algorithms In similarly evaluating the T-DQN in opposition to MA-DQN solutions. T-DQN provides useful utilization for terribly low or very high inter-cell distances, while the use of MA-DQN is desired . hrough a thing of O1. for intermediate inter-cellular distances . , in which the strength savings are possible in the direction of achieving improved EE. In a fashionable factor of view, the performance deviation among the strategies will become more apparent as the issue of situation increases. As effectively observed from Fig. 10 below. AVG and Random strategies confirmed bad EE performance, mainly due to the absence of wise power configuration to keep away from inter-cellular interferences. MALCOM - Vol. 3 Iss. 2 October 2023, pp: 281-292 MALCOM-03. : 281-292 Figure 10. Bad EE Performance In Fig. 11 below, the EE degrades with the degree of densification, independently of the method used for EE maximization. Figure 11. the EE Degrades with the Degree of Densification In Fig. 12 below. MA-DQN can clearly present increased EE . y a factor of O1. compared with TDQN in cases that the inter-cell distances ranges between 200-400 m. Figure 12. Present Increased EE The DQN-assisted EE solution is independent of the user velocity, showing stability both in the EE and throughput performance as desribed in figure 13 below. Optimization of Energy Consumption in 5G. (Naufal et al, 2. ISSN(P): 2797-2313 | ISSN(E): 2775-8575 Figure 13. Stability Both in the EE and Throughput Performance From Fig. 14 below, low values of correspond to high power consumptions and high system throughput satisfaction. Figure 14. High Power Consumptions and High System Throughput Satisfaction DISCUSSION Several techniques have been developed to overcome the problem of high energy consumption, such as integration with 5G networks, and algorithms have been implemented by some researchers. As studies performed by way of Malta . pertains to energy consumption, it become located that in 5G use cases, in low site visitors load scenarios and when decreasing electricity consumption is favored over QoS, power savings may be finished as much as 80% with 50 ms latency, 75% with latency 20 ms and 10 ms, and 20% with 1 ms If QoS is prioritized, then electricity financial savings attain a maximum of five% with minimal impact in terms of latency. As for research carried out by Giannopoulos . and related to strength performance, it became located that DQN-assisted movements can offer improved EE performance throughout the network, as they stability the trade-off among power consumption and accomplished throughput . n Mbps/Wat. CONCLUSION Optimizing energy consumption with 5G networks and involving reinforcement learning algorithms has been applied to several applications. Reinforcement learning supported by the development of 5G networks has the potential to solve difficult problems. Reinforce learning has several concepts, namely as a decisionmaking agent, there is an environment with space and various objects in it, and there is a process of making decisions between various possibilities. REFERENCES