Search sheldonbrown.com and sheldonbrown.org
![]()
When shooting video using multiple cameras, you must synchronize their output so you can cut back and forth between them, or show a picture in picture. In bicycle videos, you may want to use forward-facing and rearward-facing cameras at the same time, or cameras mounted on different bicycles.
The high-tech way to synchronize video and audio recordings is called time code and is a clock signal with timing information, recorded along with the audio and video. This is used to keep playback devices in sync. Time code can actually fast-forward and rewind tapes to to synchronize with each other.
Time code is traditionally used in studios where all the equipment is connected by cables. It also may be sent and received using wireless devices, but as of now, no consumer-grade helmet camera supports this feature. In any case, a wireless signal only carries over a limited distance, and is vulnerable to interference.
Professional-grade equipment which derives time code from GPS satellite signals is available, but is still expensive and complicated to use. GPS can synchronize any number of cameras, anywhere on planet Earth. It only works where there is a view of the open sky, though short gaps in satellite signals can be bridged using cameras' internal timers. GPS can also, of course, also identify the location of the shoot to within a few feet, and the direction in which a camera is pointing.
Timing information may similarly be taken from shortwave radio time reference stations or computer networks
I expect that a smartphone app will be offering GPS timing sooner rather than later, but smartphone cameras are rather limited, and this feature would be more useful in a dedicated camera.
How do we synchronize when shooting on a budget, lacking time code?
We fall back on the classic synchronizing technique used in film: a marker visible in the image and audible in the soundtrack. Bicycle video takes are generally rather long – you start recording, and then ride – so setting this marker isn't much of an inconvenience.
A traditional wooden slate clapperboard
The clapperboard – with its chalkboard to indicate a take number – is the time-honored time-alignment tool in the motion-picture industry. The clapperboard provides a timing reference at the start of a take, and the speed of all the cameras is synchronized to the power-line frequency or to a pilot tone derived from the film perforations and recorded on a spare audio track. Once clips are aligned on the editing table, they remain in sync.
You don't need to carry a clapperboard around with you to apply this technique. I use a hand clap, visible to every camera and recorded in every soundtrack. If I am using two cameras on my helmet to get front and rear views, I will back up to a window so the rear-facing camera shows the reflection, or tap the side of the helmet to shake both images and make a sound. I also make an announcement to identify the take.
In the short video clip, the main image is from a camera on my helmet. The picture-in-picture is from a camera duct-taped to the rear rack of a friend's bicycle. My camera is looking forward at him and his is looking back at me. You will see and hear the hand clap, and then hear my short announcement identifying the take. Here is a still of the frame of the video where my hands come together:

To keep sync with cameras that are no way connected to each other, we apply crystal control, as developed by documentary filmmaker Richard Leacock using 8mm film and audio cassette recorders, at the Massachusetts Institute of Technology, in the early 1970s.
There's no need any more to modify equipment to do this. The timing of all modern video cameras is set by quartz-crystal electronic clocks. Video cameras must be able to output video for live display, and timing standards are very precise. Cameras will typically run for many minutes without a noticeable loss of synchronization, avoiding the need for a pilot tone.
Now, for a bit of technical background to help you use this technique when editing video.
A video consists of a series of frames (individual images) shown one after another quickly enough to create the illusion of motion. The rate is nominally 30 frames per second (actually, 29.97 for color signals) in countries which use 60 Hertz (cycle per second) AC power; the rate is 25 frames per second in countries which use 50 Hertz AC power. The frame rate is nominally ½ the power-line frequency so that any irregularity in the image due to imperfect power-supply filtering stays in the same place from one frame to the next or moves very slowly, rather than causing annoying flickering or wobbling.
Video editing is frame by frame: both the video and the audio associated with it can only be trimmed or moved in one-frame, 1/25 or 1/30 second increments. Special software can adjust audio more precisely; and that may be desirable for some purposes. I'll discuss that later.
Some digital video storage formats align the audio to each frame. Other formats use keyframing, where the audio is only tacked to the video once every few frames. Between keyframes, these formats only store the differences between images, resulting in smaller digital files. If a video clip doesn't start with a keyframe, the audio and video may start slightly out of step – and once they start out that way, they stay that way.
All of the better video editing software applications give you a way to slide tracks back or forward against each other to adjust timing. To align tracks easily, you need a clear cue in every track -- your hand clap.
Sometimes, you need to find any cue you can. One of my favorites in my traffic videos is the sight and sound of a car rolling over a manhole cover. Aligning on speech is harder, though there are vocal sounds – P, B, M – where the mouth opens suddenly and the waveform in the soundtrack shows a sudden increase in volume. If you do much aligning on speech, you will find that you are beginning to learn to read lips!
Sound travels only about 1100 feet (300 meters), per second, so a distance of only 30 feet or 10 meters will throw synchronization off by one frame – like when you see a dribbled basketball out of sync with its sound. For good synchronization on an audible cue, all microphones need to be within a few feet of the cameras. Align a distant camera on the visual cue, rather than the audible one. An additional hand clap near the distant camera will align its audio and video tracks.
When editing, I zoom in on the tracks until I can move back and forth one frame at a time. I turn on the “audio scrubbing” function so I can hear the hand clap or other cue. I set a marker for the hand clap separately for each audio and video track, then slide the tracks until all the markers align.
Moving video the video forward one frame at a time in Pinnacle Studio, the editing suite I use, plays the audio for the frame that is being displayed. So, as I reach the same frame where the hands come together, I hear the sound of the clap, and see its sharp peak in the audio waveform.
The image below is of the tracks from the video example above, in the video editing application Pinnacle Studio 12. In the thumbnail images, you can see the first frame of each video. The lower thumbnail holds an icon indicating that it contains a picture in picture. The time line (orange line near the top) is expanded so each tick represents an individual frame of video. The cursor is aligned over four markers, which I placed at the hand clap frame in each video and audio track, then slid left or right until they lined up. The video in the lower track is grayed out because it is locked, so I can move the audio independently of it.
Hand clap editing in Pinnacle Studio

I align a take this way, save the resulting file, and then when I am editing it down, I save under a new filename so I still have the original full-length take to re-use.
You might ask: with timing only to the nearest 1/25 or 1/30 second, will there be an echo? No! The sense of hearing and the frame rate of video are nicely matched. Humans can't hear an echo unless it is delayed by 5/100 of a second or more, a phenomenon called the “precedence effect” or the “Haas effect”, after the scientist who first described it. Two clips aligned to the nearest frame will be out of sync by no more than ½ frame, 2/100 of a second. This amount may either increase or decrease slowly over time if the camera speeds are very slightly different -- but I have yet to have to readjust timing in clips up to 15 minutes long. If the timing of two cameras is off by 1/2 frame, you might choose the alignment so drift decreases the error as the take progresses.
Sometimes, synchronization requires conversion of the video format due to peculiarities of the camera, editing software, or both. The Aiptek digital video recorder with my helmetcameras.com camera records at the standard 29.97 frames per second, but Pinnacle Studio reports it as 25. Similarly, in Pinnacle Studio, an Insight POV HD helmet camera reports speeds slightly different from the 29.97 at which it records. Clips from these cameras do not stay in sync with others. Converting the files using AVS4YOU software solves the problem, and then clips will stay in sync. I convert to AVI with the XVid/DivX Mpeg-4 codec, which keeps the file sizes down -- though this codec uses keyframing, so I sometimes have to realign the audio to the video.
There is one important application in which closer synchronization is important. A very small timing error can seriously disturb stereo or surround-sound.
This is not much of a problem if you are feeding the right and left channels from each recorder to the right and left channels of a stereo mix. Then each stereo pair keeps its timing. You need to adjust levels though, to select which stereo pair you will use at a particular time. Even with nicely-synchronized video, two pairs of stereo microphones recording at the same time can create odd stereo perspectives, or echoes if one pair of microphones is much farther from the sound source than the other. If the sound source that is on camera is distant, you may deliberately advance the audio so it appears in sync with the video. That makes sense, for example, for a telephoto shot of a basketball being dribbled. It can be taken too far, though. In many war documentaries, a mortar shell or rocket lands in the distance, and the sound of the explosion occurs at the same instant as the flash. Fake! Fake! It Just Doesn't Happen That Way!!!
There is a special problem if you are using one recorder for the left channel and another for the right, or one for front channels and another for surround channels. In this case, frame-by-frame timing is not accurate enough, and you may need to export one or more soundtrack(s) to an audio editing application so you can align them much more tightly and adjust their speed. The free audio editing application Audacity -- available for Windows, the Macintosh and Linux -- does this nicely, and there are many other applications which also do.
Surround sound from a stereo microphone on each of two bicycles one behind the other can be very compelling. A front/rear bicycle spacing of 20 feet or so will result in a good surround image in a typical listening room. Shadowing the microphones with the riders' bodies -- front microphone ahead, rear microphone behind -- will increase the front/rear separation. You then of course need to take extra trouble to screen the front microphones from the wind.
Alignment of the front and rear recordings within one or two 1000ths of a second is easy to achieve, and good enough so the delay due to the speed of sound from one bicycle to the other makes sounds from the front bicycle play back first, and so the precedence effect (remember?) places them correctly in the front loudspeakers. (Vice versa for sounds from the rear). If, on the other hand, the timing drifts so far that the front channels lag behind the rear ones, the audio image will drift to the rear, and vice versa.
It is easy to place left and right microphones on the same bicycle, so their output goes to the same recorder. That's good because aligning right and left channels from different recorders is much trickier. Very, very small timing differences cause a right-left shift in a stereo image.
You will want a hand clap or other similar cue at both the beginning and the end of a take at least the first time you adjust speeds to match one another. Once you have determined the drift between two cameras, you probably can apply the same correction to additional clips from the same cameras, so you then need only a hand clap at the start of each clip.
You need to use manual level control when using multiple recorders, and adjust levels later. Different automatic level between channels of a recording will cause shifts in the auditory image.
You could also record surround sound from one bicycle, placing microphones at its forward and rearward extremes. This way, you can use a single recorder and avoid the timing and level-control issues. Many camcorders can in fact record 4-channel sound.