Use your ESP32 as a remote web cam viewer
Intro
The ESP32 (there are now quite a few varieties), is an incredibly versatile microcontroller. It's main value proposition has always been that it's a low cost MCU with WiFi capability - something that was relatively unique when it was first released. The processing speed and internal RAM size prevent it from doing tasks that Linux machines like the Raspberry Pi series can do, but there is some overlap where the ESP32 can still surprise you with its abilities. I've been working on projects which use a variety of ESP32s as video and animation playback devices. I recently released an optimized MPEG-1 player (https://github.com/bitbank2/pl_mpeg) and was looking at other applications for video playback. Until now, I've seen multiple projects which have been using my JPEGDEC jpeg decoder library to play motion-jpeg videos. I was curious to see if anyone had published a project to play motion-jpeg streams from public IP camera URLs. I didn't find any, so I thought it would be a fun learning experience to write my own.
The first step in my journey was to find some unsecured ip video streams for testing. I found this Github repo with a list of urls:
https://github.com/fury999io/public-ip-cams?tab=readme-ov-file
Some are MPEG video, but a few sites stream motion-jpeg or to be more precise, a continuously updated stream of JPEG images. After finding a few ULRs to test, I wanted to see what kind of data was coming out of them. There are several types of live video stream protocols; I needed to find a simple one which just sends JPEG frames. I tried opening a few of the links in my browser and saw that there were sometimes multiple options from the same camera source. Apparently there are some popular devices in use to stream video over the internet. They can present a full web page with on-screen controls or they can just stream JPEG images. Here's an example of one that streams JPEG images:
The camera seems to be mounted in a surf shop in Australia. That link generates a 640x480 JPEG image which updates continuously. In my PC's web browser on a fast internet connection, it updates about 5 times per second. To understand what the camera is sending to the browser, I tried to capture the stream using wget. Since the stream starts with an HTTP header that doesn't have a content (payload) size, wget will capture data forever (until you press CTRL-C). Here's what the first few bytes of data looks like in a hex editor:
It's easy to see the HTTP content header followed by the JPEG image data (starts with 0xff 0xd8 at offset 65). At the end of the JPEG image, there's another content header for the next frame and it goes on forever like that. So if the ESP32 could capture this stream, I should be able to parse the headers and extract and display the JPEG frames. My first attempt at capturing the data worked on the first try! I opened the HTTP connection, saw that the server responded with 200 (success) and that the content-size returned by the HTTPClient library was -1 (undefined). That's expected for this type of continuously streamed data. I parsed the header data, read the jpeg payload and then closed the connection and displayed it. The images coming from that camera are 640x480 and about 65K bytes each. I used my JPEGDEC library to decode them at half size to fit the 320x240 LCD. With that all set, I was able to watch the camera stream images at about 1 frame every 2 seconds. The older ESP32 was able to do all of this without needing any PSRAM. My first thought after seeing it run was "can it run faster?". Of course, me being the perpetual optimizer, I'm always looking for more efficient ways of doing things. It's quite wasteful to have the camera send a 640x480 image only to have it displayed at 320x240. I tried guessing at possible parameters to add to the URL to ask the camera to downsize the frame, but they were always wrong. Then a friend pointed out that the axis-cgi output (from Axis Communication) is documented and has several recognized parameters to control the stream (see page 20 of this document):
https://www.domoticaworld.com/wp-content/uploads//2009/03/Axis-HTTP-API.pdf
Not only does it have a parameter to set the image size, but you can also control the JPEG Q (quality) setting - they denote it as 'compression'. First I asked the camera to send me 320x240 images:
http://71.43.10.26:9080/axix-cgi/mjpg/video.cgi?resolution=320x240
In my browser, the frame now updated at closer to 10FPS and the JPEG images were around 25-28K each. I tried various compression settings to see how small I could get the data without distorting the image too much. I found that a compression value between 50 and 60 was a good compromise between size and quality. At 60 (what I use in the example code), each 320x240 frame is about 13K of data. I tested this in my ESP32 program and it definitely improved the framerate, but not by a lot. I added diagnostic messages which can show the time (in milliseconds) in the Arduino terminal window. With this info I could see that establishing the connection with the server was a major portion of the image acquisition time.
Restructuring the code
In order to perform better, it was necessary to change the ESP32 code to continuously read and parse the stream instead of starting and stopping it for each frame. My concern was that if I write it as a single thread and make the server wait for me to decode the image before reading more data, it might cause data loss. I don't have a deep insight into how this type of data stream is managed by the ESP32, but my concerns were not realized - it seemed to work on the first try, but not quite. It would display a bunch of frames and then freeze. The freeze was because the server was sending data without waiting for the receiving end, so my parser got out of sync and was looking for HTTP headers in the middle of JPEG data. I added code to re-sync when data is missed and now the ESP32 is able to stream 2 frames per second! That doesn't sound terribly exciting, but it's 4x faster than my first attempt and still quite an accomplishment for a "$1 microcontroller". Here's a short video clip of the "Big dogs" stream:
An ESP32 (old MCU) driving an ILI9341 SPI LCD is certainly capable of decoding JPEGs a lot faster than 2 frames per second. The bottleneck is the reception of data using the HTTPClient class. The same stream can play at about 10FPS on my PC.
Going Further
The next logical step would be to decode the images incrementally, as they're being received. This is possible with JPEGDEC by providing file system callback functions that block (FreeRTOS task) until enough data has been received to satisfy the read request. This will hide most of the extra latency of decoding and displaying the JPEG images. However, based on the measurements I've made, it won't increase the framerate by a perceptible amount. So, the next phase of this project would be to take a dive into the HTTPClient code and see if there is anything that can be done to reduce the latency there. If you would like to try my current code, I've added it to my collection of "CYD" projects with the name ip_camera_viewer:
https://github.com/bitbank2/CYD_Projects
Please let me know in the comments if you find any ways of speeding up the data streaming on the ESP32.
As always, if you appreciate my work, please consider sponsoring me on Github so that I can continue sharing projects like this.
I really like you
ReplyDelete