International Journal of Electrical and Computer Engineering (IJECE) Vol. No. February 2019, pp. ISSN: 2088-8708. DOI: 10. 11591/ijece. Hardware simulation for exponential blind equal throughput algorithm using system generator Yusmardiah Yusuf. Darmawaty Mohd Ali. Norsuzila YaAoacob Wireless Communication Technology Group. Universiti Teknologi MARA. Malaysia Article Info ABSTRACT Article history: Scheduling mechanism is the process of allocating radio resources to User Equipment (UE) that transmits different flows at the same time. It is performed by the scheduling algorithm implemented in the Long Term Evolution base station. Evolved Node B. Normally, most of the proposed algorithms are not focusing on handling the real-time and non-real-time traffics simultaneously. Thus. UE with bad channel quality may starve due to no resources allocated for quite a long time. To solve the problems. Exponential Blind Equal Throughput (EXP-BET) algorithm is proposed. User with the highest priority metrics is allocated the resources firstly which is calculated using the EXP-BET metric equation. This study investigates the implementation of the EXP-BET scheduling algorithm on the FPGA The metric equation of the EXP-BET is modelled and simulated using System Generator. This design has utilized only 10% of available resources on FPGA. Fixed numbers are used for all the input to the The system verification is performed by simulating the hardware co-simulation for the metric value of the EXP-BET metric algorithm. The output from the hardware co-simulation showed that the metric values of EXP-BET produce similar results to the Simulink environment. Thus, the algorithm is ready for prototyping and Virtex-6 FPGA is chosen as the Received Dec 20, 2017 Revised Jul 31, 2018 Accepted Aug 16, 2018 Keywords: FPGA Hardware co-simulation LTE Scheduling Algorithm MATLAB Simulink Xilinx System Generator Copyright A 2019 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Yusmardiah Yusuf. Wireless Communication Technology Group (WiCoT). Faculty of Electrical Engineering Universiti Teknologi MARA, 40450 Shah Alam. Malaysia. Email: ymardiahyusuf@gmail. INTRODUCTION Scheduling algorithm is the method to allocate radio resources to user equipment (UE) . The UE, for example mobile phone that transmit different flows such as web browsing or video streaming at the same The process of scheduling mechanism is based on scheduling algorithms implemented at the Long Term Evolution Standard (LTE) base station. Evolved Node B. The scheduling process is performed in the Medium Access Control (MAC) layer. Since the implementation of scheduling algorithm is an open issue in LTE, many scheduling algorithms have been proposed by the researchers . , . Previously, various scheduling algorithm which offered several techniques in handling resources to the users have been developed such as Frame Level Scheduler (FLS) . Modified Largest Weighted Delay First (MLWDF) . Proportional Fairness (PF) . In general, many researchers have also suggested packet schedulers that allocate the resources to UEs by considering the channel quality conditions such as Best Channel Quality Indicator (BCQI) . and Maximum Rate . In LTE, one of the important features is the scheduling The algorithm itself will determine which packet bring the first priority to be scheduled. However, none of them have proposed the scheduling algorithm that consider both real-time flow such as video streaming, online gaming and non real-time flow such as web browser, email. The study in . Journal homepage: http://iaescore. com/journals/index. php/IJECE Int J Elec & Comp Eng ISSN: 2088-8708 has proposed the EXP-BET algorithms. These algorithm consider both real time and non real time flows Based on the simulation results, the EXP-BET algorithm performance was better than the FLS and EXP-PF algorithms for the real-time services. For the non-real-time services. EXP-BET has shown 72% improvement as compared to FLS and 7. 52% for EXP-PF in fairness index. The authors conclude that, scheduling could be recommended as one of the methods to solve the problem of the cell edge users since EXP-BET algorithm gave a fair share of the system resources to users considering multiple services. Field Programmable Gate Array (FPGA) was established by Xilinx Company It is developed based on the programmable logic devices (PLD. and the logic cell array (LCA) concept. By providing a twodimensional array of configurable logic blocks (CLB. and programming the interconnection that connects the configurable resources. FPGA can implement a wide range of arithmetic and logic functions . , . The architecture is a reconfigurable logical device made up of an array of small logic blocks and allocated interconnection resources. FPGA has the advantages in terms of performance, cost, reliability, flexibility and time-to-market . as compared to other popular IC technologies such as application specific integrated circuits (ASIC. and digital signal processors (DSP. In terms of FPGA implementation, none of the researchers have implemented the EXP-BET scheduling algorithm using the hardware platform. In 2015, the authors of paper . have focused on the implementation of various algorithms for an arbiter with low port density . -bi. using FPGA platform. Round robin arbiter which led to strong fairness is selected and it works on the principle that a request that was just served should have the lowest priority on the next round of arbitration. Over the past few years, new software tools have been established by Xilinx Company for the development of the FPGA. Using Simulink as add on tool, they presented the System Generator that concedes the design of the hardware circuits configured with the Simulink environment. Furthermore, the combination of Xilinx System Generator and Simulink environment provides simple technique of the hardware design through the use of existing System Generator blocks and subsystems. This will save both the required design time and hardware implementation resources. Hence, the proposed algorithm is ready for commercialization as FPGA is faster to market. In FPGA, no layout, masks or other fabricating steps are needed and it is simpler to design as compared to ASIC . The hardware implementation is important for designers of high-performance (Digital Signal Processin. DSP systems such as wireless networks. Hence, verification on a hardware is needed to validate the theoretical and simulation work. Therefore, this study aims to implement and verify the hardware simulation of EXP-BET algorithm using Xilinx System Generator (XSG). The algorithm is modelled using MATLAB Simulink which is configured with XSG. The paper is organized as follow: in Section 2, we describe the research method. Section 3 presents the results and discussion. Finally section 6 draws the conclusion. RESEARCH METHOD The proposed packet scheduling algorithm for the downlink transmission of LTE is the Exponential Rule and Blind Equal Throughput (EXP-BET) algorithms. The flowchart for the design of the EXP-BET algorithm is presented in Figure 1. The EXP Rule algorithm schedules the real-time services while the BET algorithm take cares of the non-real-time services and served the users based on the metrics equation . Exponential (EXP) rule The main idea behind the EXP Rule algorithm is to have fair treatment between throughput, fairness, and delay requirements for a scheduling algorithm. The EXP Rule gives higher priority to the user with the highest transmission delay or user that has more packets in its buffer. It is a channel-aware scheduling algorithm which considers the CQI metric in the scheduling decision . and has been proved to be the most promising approach for delay sensitive real-time applications such as video and VoIP. This is described by the metric of . Rule miEXP A exp E E1A E i E A ANk AverageD HoL EE ai D HoL ,i . Where i is the tuneable parameter which is equal to 5/0. 99Ei. Ei is the the tolerable time interval within which the packet must receive. DHOL is the delay of the first packet to be transmitted by the ith user. AverageDHOL is equal to 1 D . NRT is the number of active downlink real time flows, e k is the spectral Eu N RT i A1 HoL efficiency for i user over k resource block Hardware simulation for exponential blind equal throughput algorithm using systemA(Yusmardiah Yusu. A ISSN: 2088-8708 Blind equal throughput Fairness can be achieved with Blind Equal Throughput (BET) which stores the past average throughput achieved by each user. The metric . or the ith use. is calculated as: miBET Ri . Where Ri. is equal to Ri. -) ri. , is the weigh factor for moving average . OO. Ri. is the past average throughput of the user at time t-1, ri. is the achievable data rate for user ith at time tth. START Design the EXP-BET Algorithm using Simulink Blocks Design the EXP-BET Algorithm using Xilinx Blockset Test the System under Simulink Design Verified? Yes Test the system under System Generator Design Verified? Hardware Co-Simulation Yes Timing Analysis Design Verified? Yes Timing Verified? Yes END Figure 1. EXP-BET design flow The EXP-BET algorithm is modelled using the Xilinx Blockset. The Xilinx Blockset library contains all the basic blocks such as adders, multipliers, registers and memories for the specific design. The algorithms are developed and models are created for all the mathematical operation for the EXP-BET metricAos computation using library provided by Xilinx Block set. To implement the EXP-BET algorithm into FPGA. MATLAB Simulink . , and Xilinx system generator tools need to be configured. In the Simulink environment, the FPGA boundary is defined in the Gateway In and Gateway Out blocks where the input and output for the FPGA is fed into the Gateway In and the output is produced from the Gateway Out port. These ports interface the Simulink double data type and the FPGA fixed point environments. In the Gateway In block, the Simulink floating point input is converted to a fixed point format, saturation and rounding modes. These parameters are defined by the designer. The system output which is generated by the Gateway Out port converts the FPGA fixed point format to Simulink double numerical precision floating point format. Hence, the system is simulated, tested and verified by examining the results which is generated on the display port from the Simulink source library. To validate the designed model in Simulink, timing Int J Elec & Comp Eng. Vol. No. February 2019 : 171 - 180 Int J Elec & Comp Eng ISSN: 2088-8708 analysis is used. Timing analysis is represented with delay parameter and it is used for verification of Simulink environment design. This verifies the functionality of the system model generated using the XSG and Simulink. The next step is to set up the system generator for the hardware Co-simulation. In fact, the hardware Co-simulation is one of the techniques provided by the system generator to transform the model built in Simulink environment into hardware. The XSG can be used with different types of FPGA boards and provide few other options for clock speed, compilation type and analysis. FPGA board used for the implementation of EXP-BET algorithm is Virtex-6 xc6vlx240t-1ff1156. Lastly, the FPGA is compiled using bitstream programming file (BIT) that is automatically generated by the System Generator during Hardware Co-Simulation. After the generated bit file is downloaded onto the FPGA, the input to the device is fed from SimulinkAos source block and the device output is received back in SimulinkAos sinks block. This enables wide-ranging testing as the data from the FPGA can be directly transferred to the MATLAB environment. After the hardware Co-simulation is completed, the results can be seen on a display sink blocks from the Simulink library. If the output is similar to the Simulink environmentAos output, then the algorithm is confimed to be successfully prototyped. The Xilinx blockset used in the design is presented in Figure 2. Figure 2. Xilinx blockset used in the simulink design . RESULTS AND ANALYSIS This section discusses on the results of simulating the EXP-BET metric equation in the System Generator. The results obtained are then verified using hardware co-simulation. Simulating the EXP-BET algorithm using system generator Firstly, the design of EXP-BET is verified through rate and type propagation using the System Generator block. If a signal carrying floating-point data is connected to the port of a System Generator block that does not support the floating-point data type, error will be detected. The rate and propogation type for EXP and BET algorithms are illustrated in Figure 3 and Figure 4. Figure 3. Rate and type propagation for BET algorithm Hardware simulation for exponential blind equal throughput algorithm using systemA(Yusmardiah Yusu. A ISSN: 2088-8708 Figure 4. Rate and type propagation for EXP rule algorithm Timing analysis Timing is very important when the designer is working with hardware description language. Hardware language involves simultaneous execution of process which means it runs in parallel manner. The System Generator provides a timing analysis tool named the timing analyzer to assist the timing analysis of the hardware design. Timing analyzer provides a report on slow paths and clearly displays the paths that failed on hardware. The System Generator block gives three options of clock frequency which are 100 MHz, 50 MHz or 33. 3MHz . for the Xilinx ML605 board. To start off, 50 MHz of clock frequency is selected which means that the system should operate within 20 ns of FPGA clock period. The formula for the calculation of clock period is: where f is the frequency. It is observed that the EXP system is failed to generate the hardware co-simulation and the total path delay is 112. 64 ns which is obviously higher than 20 ns of clock period as shown in Figure 5. The timing analyzer in Figure 6 is detailing on the failed path of the EXP system and will automatically highlighting the blockset of the EXP system as shown in Figure 7 when the cursor is pointed on to one of the listing as in Figure 6. The failing path shows that timing violations have occurred and the input from one synchronous output stage does not reach the input of the next stage within the required time by the system design. As observed in Figure 7, the timing failed for the paths of divide, square root and CORDIC 4. Henceforth, the failing paths need to be optimized. Figure 5. Histogram for EXP system failing path . MH. Int J Elec & Comp Eng. Vol. No. February 2019 : 171 - 180 Int J Elec & Comp Eng ISSN: 2088-8708 Figure 6. EXP rule system timing analyzer Figure 7. EXP rule algorithm failing path . MH. The slow path for each block is optimized using pipelining method since the hardware operation is working in parallel manner. Thus, the calculation is split up into multiple cycles. For example, the addition operation needs to wait for the division operation that takes much iteration to produce output. Thus, the latency is added to the addition operation as to wait for the division operation. One of the ways that can be used to address the problem as aforementioned is by implementing the pipelining method. This can be done by adding register or delay stages requirements during synthesis and tries to generate hardware co-simulation as to meet the requirement. In this research, the latency of the individual block is added throughout the design as tabulated in Table 1. Latency or clock period is the number of cycles required for the system to accept the next input. For example, if the design needs to accept new input and requires 10 cycles to propagate from input to output, thus, it means that the latency is 10. Thus, to address the problems as in Figure 6 to Figure 8, the clock frequency should be set to the minimum which is 33. 333 MHz. If the clock frequency is at a slower rate, then the timing constraint will be much easier to accomplish. Table 2 shows the frequency and FPGA clock period for the EXP-BET system before and after optimization process. Table 1. Latency Before and After Optimization Blockset Divide Square Root Multiply Before Latency After Table 2. Frequency and FPGA Clock Period Before and After Optimization Parameters Clock Period/Clock Rate . Frequency (MH. Before After Hardware simulation for exponential blind equal throughput algorithm using systemA(Yusmardiah Yusu. A ISSN: 2088-8708 The optimized EXP-BET system is simulated once again and achieves all the timing constraints. The EXP-BET system is successfully verified in the hardware co-simulation when the output of bitstream is successfully generated after the compilation stage. The hardware co-simulation is considered fail when the timing constraint is violated. Figure 8 and Figure 9 illustrate the histogram for EXP-BET path delay after the system is being The Histogram Charts of 150 paths delay distribution are behaviourally generated via the Xilinx timing analyzer targeting the Virtex-6 FPGA board. Each histogram chart is a useful metric to analyze the FPGA implementation of EXP-BET and grouping 150 paths into regions of roughly formed normal distribution cluster due to different portions of the system generator architectures, or from different timing clock region constraints. The numbers at the top of the bins indicate the number of slow paths. The improved parameterized FPGA implementation can be adjusted so that all signals are completely routed, and all timing constraints are met. Figure 8. Histogram for BET path delay . 333 MH. Figure 9. Histogram for EXP rule path delay . 333 MH. The histogram charts of Figure 8 and Figure 9 shows the BET and EXP Rule path delay operate within 30 ns of clock period . 333MH. and meet the timing constraints. As illustrated in Figure 8 and Figure 9, majority of the slow paths for BET occurred at 25. 06 ns whereas for EXP, the slowest path is observed at 29. 65 ns respectively. Therefore, it can be concluded that the EXP-BET system is able to run on the FPGA board within 30 ns of clock period. Int J Elec & Comp Eng. Vol. No. February 2019 : 171 - 180 Int J Elec & Comp Eng ISSN: 2088-8708 3 Power analysis Xilinx constantly innovates to make sure the power challenges associated with shrinking technologies can be overcome. Xilinx understands that FPGA power consumption is one of the biggest concerns of FPGA users. Xilinx Power Tools help to perform power estimation and analysis for a given Power estimation and analysis become even more important as FPGAs increase in logic capacity and performance by migrating to smaller process geometries . The Xilinx Power Analyser (XPA) is used to analyze the power consumption of the design which depends on the family of the device used, clock, logic, signal. I/Os and leakage power. Table 3 shows the estimated power consumption for EXP-BET system. The designed architecture uses a total power of 3. 472 Watt and 3. 437 Watt for EXP-BET respectively. As a conclusion, this power shows minimum consumption of Virtex-6 FPGA. It is being proved that, current FPGA technology such as Virtex-6 gives low power consumption and operates at maximum performance . Table 3. EXP-BET Power Analysis Parameter Clocks Logic Signals DSPAos IOs Leakage Total Power (W) EXP BET Used EXP Available EXP BET BET Utilization (%) EXP BET 4 Design summary for device utilization The EXP-BET was implemented in an XC6VLX240 FPGA. The flexibility of the Virtex6 FPGA is realized in the slice resources. Each slice is composed of two 6 input look-up tables (LUT. and associated The slices are laid out in an array-like structure and each can be reconfigured to form larger complex FPGA logic design is controlled at the bit level, giving the user the power to decide what resources to use, placement of the design in hardware and the maximum sustainable clock frequency. Table 4 shows the device utilization summary for EXP-BET system. The maximum operating frequency and power utilization along with the resource utilization before and after the optimization stage in the critical path are Table 4. Design Utilzation Summary EXP Rule System FFs LUTs Slices LUT-FF pairs Number of DSP48E1s Maximum Operating Frequency (MH. Clock Period . Before 5,931 1,765 Optimization After Before 2122 . %) 6288 . %) 1,967. %) 1631 . %) 6 . %) BET After 337 . %) 1030 . %) 309 . %) 255 . %) 3 . %) The FPGA framework is the fundamental structure of the logic device, which consist of Flip-flops (FF. Look Up Tables (LUT. and Slices. The IPs hard cores are DSP48E1 . Each Virtex-6 FPGA slice contains four LUTs and eight FFs. Only some slices can use their LUTs as distributed RAM. Each slice has one set of clock, clock enable, and set/reset signals that are common to both logic cells. According to the simulation reports . efer Appendi. , the BET system requires just 3% of the logic resources in FPGA. LUTs . %). FFs . %) and Slices . %). Whereas, for EXP Rule system require 10% of the logic resources in the FPGA. It is composed of LUTs . %). FFs . %) and Slices . %). A LUT Flip Flop pair for this architecture represents one LUT paired with one Flip Flop within a slice. The clock rate of FPGA Virtex-6 family is 600 MHz which is large enough to drive the whole system. According to the simulation results, the BET system took 0. 209 ns to finalize the generation of the The EXP system took 0. 246 ns to completely calculate the output. Since the latency is small, the EXP-BET system can generate output continuously because of the pipelined design of the system. Moreover, the pipelining design makes the delay of the clock net very small which is about 0. 2 ns and improved the Hardware simulation for exponential blind equal throughput algorithm using systemA(Yusmardiah Yusu. A ISSN: 2088-8708 system performance. Using Xilinx Power Analyzer as a power estimation tool, the total power is estimated depending on the device utilization, clock rate and device data model. 4 Hardware co-simulation The final verification was completed by implementing the hardware co-simulation of the system which allows a system simulation to be run completely on FPGA, while showing the results in Simulink. selecting the point-to-point Ethernet interface, a new hardware co-simulation block is automatically This is the process of generation of the equivalent hardware, for the EXP-BET. The Virtex-6 . c6vlx240t-1ff1. is used and with the help of XSG and Xilinx XFLOW, the equivalent hardware generated the programmable bit file as shown in Figure 10 and Figure 11. Table 5 shows the metric value of the EXP-BET algorithm generated using the Co-Simulation method using the fixed input values. Table 5. Output Produced by the Co-simulation Method Algorithm EXP BET Parameters Ei DHoL AverageDHoL NRT ei k ri. Ri. Input Figure 10. BET hardware co-simulation model Figure 11. EXP rule hardware co-simulation model Int J Elec & Comp Eng. Vol. No. February 2019 : 171 - 180 Output Int J Elec & Comp Eng ISSN: 2088-8708 The port names on the hardware co-simulation block which are Gateway In1 until Gateway In5 are matched to the port names on the original subsystem. The port types and rates also matched the original When a value is written to one of the block's Gateway input ports, the block sends the corresponding data to the appropriate location in hardware, the controller output (Gateway Ou. from the hardware is read back into the Simulink module using the Ethernet interface, the output port converts the fixed data type into the Simulink format and fed into the model. The EXP-BET system has been simulated for the hardware simulation and has been successfully implemented on the FPGA. The output values for the EXP-BET system are 10. 16 and 0. 1053 respectively and representing the metric value of the LTEAos scheduling algorithm. The EXP-BET system is verified since the calculation of the metric values in Simulink environment produce similar results to the Hardware Cosimulation. The chosen device for prototyping is Virtex-6 FPGA, and the hardware description language is Verilog. A system is then generated for Integrated System Environment (ISE), which includes the files for the structural description of the system. CONCLUSION The implementation of EXP-BET scheduling algorithm on FPGA was presented in this paper. The EXP-BET is an algorithm which consists of the Exponential Rule (EXP Rul. and Blind Equal Throughput (BET). The work presented was designed and simulated using the Xilinx System Generator. Xilinx ISE Design Suite and MATLAB Simulink. This resulted in a mathematical modelling of the EXP-BET metric equation using System Generator blocks. The time requirement for path delay is 30 ns which means that the system is expected to run at a clock rate of 30. Otherwise, the system will not meet the constraint and cannot run on FPGA. The final verification of the design is conducted using Hardware Co-simulation approach. The Hardware Co-simulation is a process of generating the equivalent hardware in terms of bitstream for the EXP-BET algorithm. Then, the System Generator generated the bit file which is downloaded to Virtex-6 FPGA. This study provides the design and implementation process of an FPGA based system using System Generator for a scheduling algorithm namely the EXP-BET algorithm. It can be used as a basis for the future work towards the application in LTE/LTE-A. In addition, a practical system could be established and implemented if the whole system of transmitting and receiving of the physical layer is established. The limitation of this research is that, there is no input signal that can be injected into the EXP-BET system on FPGA since the scheduling algorithm is located at the LTE MAC layer and the input is transmitted from the physical layer. Hence, the implementation must start from the physical layer to generate the input for the Further study should therefore concentrate on the hardware implementation for the whole system which starts from the physical layer protocol. Thus, the results of the implemented EXP-BET algorithm can be analysed and validated in terms of QoS requirements such as throughput, delay and packet loss rate. ACKNOWLEDGEMENTS We are grateful to the University Technology MARA (UiTM) for the research grant of Bistari 600IRMI/DANA 5/3/BESTARI . /2. as the financial support during the course of this research. REFERENCES