Recurse SP2'23 #1: Fun with OpenCV
Wow, my batch finally started today! At a glance, this looked like:
- 8-9:45am PST: Initial call plus meet-and-greet
- DM’d lots of folks to set up coffee chats and pairing sessions this week
- 11, 12, and 1: various coffee chats
- One chat turned into a networking study group, which I’m looking forward to!
- I also learned about dn42. It’s a big VPN you can join to play with BGP in a way that won’t let you accidentally tear down the Internet or cost you lots of money.
- The last coffee chat devolved into pair programming on a small OpenCV project I had in mind, which I write about below.
- 3pm: Took a walk, during which I realized what caused a confusing bug (which I omitted here.)
- Came home and met with the same pairing partner to discuss Go network programming.
- Specifically, I was given a crash course through problem 6 of protohackers, a fun sequence of socket programming challenges. I’m now inspired to tackle these myself in Go. :)
- 6pm: Spent forever debugging video codec issues in order to make this blog post work in Firefox.
- Fixed code highlighting on the blog, then spent a while writing and revising this post.
Pairing on Numpy+OpenCV
I took my first stab at OpenCV this weekend by applying Canny edge detection to my webcam feed.
Today, I wanted to try something a friend described: delaying two of the three RGB color channels. He described its application for detecting fast moving objects in video, but I have no idea what the name for that technique is called. I feel like this is also a common thing I’ve seen in video synthesis.
I had a great time pairing with someone from my batch on this! Here’s what our results looked like:
EDIT: Sadly, Webm isn’t supported on iPhone browswers. :( You can see an MP4 version here.
Colorful confusion
What’s going on here? We can see:
- A few frames of green: there’s no red or blue channel data, since it’s being delayed.
- A few frames of cyan: the old red data is now emerging!
- A few frames of what looks like a non-delayed video… but that’s only because I’m being very still! This caused the “new” green channels match up with the delayed “old” red and blue channels.
- Once I start moving, the illusion of no delays stops, and colorful chaos ensues until I’m once again still for a while.
The cyan/magenta/yellow colors showing up might seem counter-intuitive. After all, my actual image (no delay) is the magenta one, but in theory the non-delayed data is given entirely by the green channel! In my head, I expected to see a green Ben being chased by a red Ben, who was in turn chased by a blue Ben.
This turned out to be the case when I closed my blinds! I’d see RGB traces in a dark room, but as I added more diffuse white light, the traces shifted to CMY.
I have a loose idea in my head of the math that’s responsible for this, but for now I wanna wrap up my day. It’d be cool to write a short post about it when I have more time to be sure I understand what’s going on.
Implementation
In short, the usual way that color is presented in digital images and video is
to use three channels: red, green, and blue (hence RGB) to represent each pixel.
We have a big matrix (called frame
) of pixel data.
Because there are three values instead of one at each location in the “matrix,”
it’s really a tensor and not a matrix, but that doesn’t really matter here.
All we want to do is extract the red channel of each pixel as a tensor made up of [r,0,0]
pixel values,
append it to the end of a queue (“delay line”) of similar frames,
get the “old” red frame by popping from the front of the queue,
and replace our current frame’s red channel with the old one.
(We do the same with the blue channels, but those pixels will look like [0,0,b]
.)
All this happens in just a few lines:
ret, frame = cap.read()
red_queue.append(np.copy(frame[..., R]))
blue_queue.append(np.copy(frame[..., B]))
frame[...,R], frame[...,B] = red_queue.pop(0), blue_queue.pop(0)
# Now we just write the frame
Anyway, here’s the full code:
#!/usr/bin/env python3
import cv2 as cv
import numpy as np
# Mirrors on-screen video when True.
mirror_monitor = True
# Mirrors file output when this is True AND mirror_monitor is also True
mirror_file = False
# Where to write output
#outfile = "redblue_delay.webm" # Sadness
outfile = "out/redblue_delay_2.mp4"
# Select camera 0
cap = cv.VideoCapture(0)
fps = cap.get(cv.CAP_PROP_FPS)
R,G,B = 0,1,2
# Read a frame to get our camera feed's dimensions
_, frame = cap.read()
y,x,_ = frame.shape
# Write at half FPS, since I seem to be dropping frames here
output = cv.VideoWriter(outfile,
cv.VideoWriter_fourcc(*'mp4v'), fps / 2, (x, y))
#cv.VideoWriter_fourcc(*'VP09'), fps, (x, y)) # :(
delay_ratio = 2
red_delay_frames = 30
blue_delay_frames = delay_ratio * red_delay_frames
red_queue = [np.zeros((y,x)) for _ in range(red_delay_frames)]
blue_queue = [np.zeros((y,x)) for _ in range(blue_delay_frames)]
while True:
# Catching Ctrl-C since the waitKey() line doesn't seem to be working on my laptop
try:
ret, frame = cap.read()
# All this for three lines: push to end of each delay line, then pop from the front
red_queue.append(np.copy(frame[..., R]))
blue_queue.append(np.copy(frame[..., B]))
frame[...,R], frame[...,B] = red_queue.pop(0), blue_queue.pop(0)
# Now mirror the result, show on screen, & write to file
monitor_out = np.flip(frame, axis=1) if mirror_monitor else frame
cv.imshow('frame', monitor_out)
file_out = monitor_out if mirror_file else frame
output.write(file_out)
if cv.waitKey(1) & 0xFF == ord('q'):
break
except KeyboardInterrupt:
break
# Clean up
print(f'Cleaning up and writing {outfile}')
output.release()
cap.release()
cv.destroyAllWindows()
Thoughts on the code
We actually tried this two ways:
- Using a queue, as shown above.
- Using a ring buffer of fixed size (treating a List as an array) and incrementing read and write heads for the delay line. (If these words mean nothing to you, I promise I’ll add a follow-up post that clarifies it soon!)
I stuck with the queue implementation for video creation since it’s a bit cleaner. However, I’d really like to allow live parameterization of the delay controls with a slider. The good news is that OpenCV supports this directly via its “trackbar” widget. The bad news is that the queue model doesn’t really work for this reparameterization, since resizing the queue means we lose data from at least one edge! We can solve this by switching back to the ring buffer, because we can allow the slider to adjust the indices we use to navigate that buffer.
One thing that did bother me is that I could get the FPS from my web cam
and specify that my output was the same rate, but the recorded video seemed twice as fast!
I confirmed that the output file had the correct FPS by checking ffmpeg -i redblue_delay.mp4
,
so my best guess is that I’m missing frames from the incoming video stream by spending time
on the filter.
Things look about right if I divide FPS by two,
but I’m not sure how I’d address this on an arbitrary user’s device that I can’t test.
So, that’s a fun thing I can dig into later.
Anyway, this was a great first pairing session!
Video codecs rabbithole
You may have noticed a couple of lines of commented out code above, which are quite related to why I wrote out an MP4 but instead served you a WebM file with this webpage.
It turns out I couldn’t view the original video in Firefox (on Kubuntu 22.04) despite it working fine VLC.
Firefox kept showing an error: No video with supported format and MIME type found.
I tried figuring out why it didn’t seem to want to play H.264 codec videos, but most of the results I found were from support tickets circa 2015.
I eventually found that AVC/H.264 has weird patent/licensing stuff going on anyway, so I switched to trying WebM. Sadly, this led to an error from OpenCV (with a similar error for the VP8 codec):
OpenCV: FFMPEG: tag 0x39305056/'VP09' is not supported with codec id 167 and format 'webm / WebM'
Assuming this meant I was lacking VP9 support in ffmpeg,
I looked around for how to fix that and found
this question
on Stackoverflow.
One answer suggested the vpx-tools
library, and sure enough, that was what I wanted:
ben@tin-can:~/b/im$ apt-cache search vpx-tools
vpx-tools - VP8 and VP9 video codec encoding/decoding tools
ben@tin-can:~/b/im$ sudo apt install vpx-tools
…but this didn’t fix OpenCV.
However, I was at least able to use ffmpeg to convert my MP4 output (original here) to a WebM version, thanks to the other answer:
ffmpeg -i redblue_delay.mp4 \
-c:v libvpx-vp9 -crf 30 -b:v 0 \
-c:a libopus -b:a 96k \
redblue_delay.webm
Here we’re just manually specifying the video codec with -c:v libvpx-vp9
.
I’m suspicious I need to recompile my own ffmpeg to get support out of the box, but for now I’m just glad this post can go up tonight. What a fun way to start the batch!
EDIT: I guess WebM isn’t supported on iOS. Whoops! I’ll try and find a work around tomorrow, but for now I’ll just link to the MP4 as an alternative.