====== Rebot : A Reinforcement Learning Robot ====== Rebot is a simple line following robot used to explore reinforcement learning. The discussion of Rebot is broken into three sections: the robot itself, using a PID loop to collect state information, and the implementation of a reinforcement learning algorithm for the robot. ===== Rebot Construction ===== Rebot is a pretty standard line following robot. You can skip this section if you already have a line following robot to use in your reinforcement learning project. My approach to building Rebot was to build the electrical part first and to then build the mechanical part of the robot around the electronics. My hope was to reduce the amount of redesign and rebuilds needed to get a working robot. ==== Electrical and API Complete Rebot ==== {{ :rebot-elec.jpg?direct&400|}} This photo shows all of the electrical and electronics components in Rebot. Clockwise from top left are a five volt regulator, a Raspberry Pi Zero-W, an FPGA card, two eight input Pololu line sensors, an Adafruit octal Neopixel display, two Pololu gear motors, a dual H-bridge, a 7.4 volt battery, a power distribution card, and an octal 12 bit ADC card. The electronics package in this photo is "API complete" in that it is ready for a high level Linux application to read the sensors and control the motors. All of the real-time sensing and control is handled by the FPGA which has a USB-serial interface to the Raspberry Pi. The following table lists the FPGA peripherals selected from the FPGA build page ([[https://demandperipherals.com/support/build_fpga.html]]). ^ Peripheral ^ Description Page ^ | Octal QTR line sensor | [[https://demandperipherals.com/peripherals/qtr.html| qtr.html]] | | WS2812 (Neopixel) display | [[https://demandperipherals.com/peripherals/ws28.html| ws28.html]] | | Dual DC motor controller | [[https://demandperipherals.com/peripherals/dc2.html| dc2.html]] | | Dual quadrature decoder | [[https://demandperipherals.com/peripherals/quad2.html| quad2.html]] | | Octal ADC | [[https://demandperipherals.com/peripherals/adc812.html| adc812.html]] | A Linux daemon provides the interface between the low level packets sent over the USB-serial link and the high level application. The robot control application opens a TCP connection to the daemon and sends lines of text to the daemon to control, for example, the DC motors or Neopixel display. Sensor data is read as lines of ASCII text. The high level application usually has one TCP connection for sending commands and one TCP connection per sensor. Rebot has four sensors so the high level program has to manage five TCP connections using select(). A sixth socket is opened to allow setting of the PID parameters from a host computer. Although the protocol is TCP, the daemon software comes with Linux command line equivalents for each command. For example, to set the 8-bit RGB values for the first two Neopixels the application would send down the command TCP link the string "dpset ws28 led 1 e0e000400040". This sets the first LED string on the ws28 peripheral. This verb-noun-adjective format can be copied on the command line as '' dpset ws28 led 1 e0e000400040''. The nice thing about building a robot with an FPGA is that there was no microcontroller code to slow down development and I could test the entire robot from the Linux command line, The only difference between the TCP protocol version and the command line one is that a backslash character is sent when the command is sent using TCP and the command simply terminates when using the command line equivalent. I used the following commands to test the electronics package of Rebot. These are essentially the same commands that appear in the high level application. # Set the sensitivity of the QTR sensors dpset qtr8 sensitivity 20 # Set the sample period to 10 milliseconds dpset qtr8 update_period 10 # Start the stream of QTR sensor readings dpcat qtr8 qtrval ^C # Test all 8 LEDs on the Neopixel display dpset ws28 led 4 00ff00ff00000000ff00ff00ff00000000ff00ff00ff0000 # Config the ADC for single-ended inputs and 10 ms sample rate dpset adc812 config 100 00 # Start the stream of ADC readings dpcat adc812 samples ^C # Set the sample period for the quadrature decoder to 10 ms dpset quad2 update_period 10 # Start the stream of quadrature readings dpcat quad2 counts ^C # Set the watchdog timer on the motor controller to 200 ms dpset dc2 watchdog 200 # Set both motors to the forward direction dpset dc2 mode0 forward dpset dc2 mode1 forward # Give both motors 20 percent PWM dpset dc2 power0 20.0 dpset dc2 power1 20.0 Building the API complete electronics package took about 2 days. ==== Rebot Mechanical Components ==== {{ :rebot-mech.jpg?direct&400|}} The mechanically and electrically complete Rebot is shown in this photo. The bot is a cube about 10 by 10 by 7 cm. One layer in the stack is the FPGA card. The other two layers are 10x10 3D printed plates designed in OpenSCAD and printed on a Prusa MK3i. OpenSCAD files for the various components are linked below. Be aware that almost all of the 3D printed components needed filing or drilling or some other type of post processing. If you use these use them only as a starting point. ^ Component ^ OpenSCAD File ^ | Bottom plate | {{ :bottomplate.scad |}} | | Top plate | {{ :mm10bb4plate.scad |}} | | Motor mount | {{ :pololumicrometal.scad |}} | | Caster | {{ :caster_075v1.scad |}} | | Caster post | {{ :casterpost.scad |}} | ===== Rebot PID Programming ===== The first control system for Rebot is a classic PID control loop. The purpose of this control loop is a little different from most PID loops. In reinforcement learning we want to put the robot into as many "states" as possible so our goal with this PID loop is to get the robot to go around the track with as many combinations of speed and P, I, and D parameters as possible. We start this discussion with a description of what we mean by the "state" of Rebot. ==== The Rebot Data Model ==== {{ :rebot-bottom.jpg?direct&400|}} Intuitively you can think of the state of a robot as its velocity in the forward direction, its angular velocity, how much power is being applied to the motors, and the location of the robot relative to the line it is following. Rebot uses a control loop frequency and sensor sample rate of 100 Hertz. Using quadrature tick rate for speed, the forward velocity it the average of the left-right tick rates and needs about 10 bit of resolution. The angular velocity is the difference between the left and right tick rates and needs about 11 bits. The motors each have 10 bit PWM controllers, and the line sensors are 8 bits each. This gives us a total of about 57 bits of state information. In simple terms, reinforcement learning means that for every state the robot can be in, we know know what state we want next and how to set the motor PWM to get to that state. There are a couple of obvious problems with this simplified version. First, setting new PWM values for the motors does not guarantee that we will end up in the desired state. Maybe we will, maybe we won't. Adding "probability" to the state transition table makes it what is called a Markov Decision Process. The other problem with our simplified model is that we don't have enough memory for a transition table with 2^^57 entries! There are several ways to reduce the size of our state space for Rebot. First we can use just the front line sensor. (We will use the back but only as a measure of how well we are following the line.) We don't need all 8 bits for the line sensor. We can use one 4 bit number to tell the average right and left edges of the tape. The sensors are numbered from 0 to 7 but in our PID program we number them from 1 to 15 in steps of two. In this diagram, the top line sensor has a left-most sensor of 5 set and a right most sensor of 3 set. The PID program would record a tape location of 4 for this. The PID loop needs an error signal which is how far the line is from the center. To center the error between the fourth and fifth sensors (position=8) we define the error as 8 minus the 4 bit tape position. {{ :rebot-layout.jpg?direct&400|}} We can reduce the resolution of the motor PWM to 5 bits. The motors run at 6 volts but we limit the applied voltage to 5.17 volts. This means each step in the PWM control is about one-sixth of a volt. There is no inner PID loop on the motor speed so we really need to accurately apply our 5 bits of voltage control to the motors. The purpose of the ADC is to measure the battery voltage so we can scale the PWM voltage to always have steps of one-sixth of a volt. The forward speed and angular speed are both quantized to 6 bits of resolution. With all of the above changes our state space is now 25 bits, or 32 million entries. Each entry will specify the current state and how to set the PWM values to, hopefully, get to the next desired state. ==== Rebot PID Program Structure ==== The Rebot PID program is event driven with the events being new data arriving from the sensors. Recall that the FPGA peripheral are all configured to automatically send samples every 10 milliseconds. The response, setting the motor PWM values, is performed after all four sensors have reported their values. A fifth TCP connection to the FPGA is used to set the PWM values and to set the Neopixel LEDs. One other TCP connection is part of the system. The connection is accepted from a controlling host and is used to set the P, I, and D parameters without stopping the robot. At each time tick the current state of the robot along with a timestamp is saved to a file. After enough state reports are collected the reports are used on the host to figure out the best transitions for a given state to complete a loop faster and more accurately. This process is where the "learning" occurs. The speed of the robot is considered the time it takes to complete one loop around the track. The accuracy of the robot is the percent of the time that the tape is directly beneath the middle two sensors on the rear line sensor. Speed is measured in seconds and accuracy in percent. The rc.local script on the RPi0W starts the FPGA daemon, sets the LEDs, and then starts the PID daemon. /usr/local/bin/dpdaemon -f /home/pi/rebot/DPCore.bin sleep 1 dpset ws28 led 1 000808000808000808000808000808000808000808000808 /usr/local/bin/piddaemon The flow of the PID daemon is as follows: // Init // Become a realtime daemon fork(), close stdio, set working dir to / become process and session leader invoke realtime scheduling // Open sockets to communicate with the FPGA and controlling host open/bind socket to listen for host set PID parameters open file to save state reports open socket to FPGA for motor and LED control open socket to FPGA for ADC readings open socket to FPGA for quadrature readings open socket to FPGA for front line sensor readings open socket to FPGA for back line sensor readings set AllEventsIn flag to zero // set visible status set LEDs to green // main event loop loop forever if select() or poll() on open sockets if conn request on PID socket accept conn and add accepted socket to select() loop if data on accepted PID socket read and store new PID parameters if data on ADC socket read and record data set ADC bit in AllEventsIn if data on quadrature socket read and record data set quadrature bit in AllEventsIn if data on front line sensor socket read and record data set front sensor bit in AllEventsIn if data on back line sensor socket read and record data set back sensor bit in AllEventsIn if all bits set in AllEventsIn compute new motor PWM values based on error and PID values update FPGA with new PWM values clear AllEventsIn write current state report to save file // did we cross the start/stop marker? if front line sensor == FF set LEDs to white if back line sensor == FF set LEDs to green End of select If End of forever loop