Monday, May 28, 2007

Building an Ogg Theora camera using an FPGA and embedded Linux

Building an Ogg Theora camera using an FPGA and embedded Linux by Andrey N. Filippov (Mar. 23, 2005)

Foreword: This article introduces a network camera based on embedded Linux, an open FPGA, and a free, open codec called Ogg Theora. Author Andrey Filippov, who designed the camera, says it is the first high-resolution, high frame-rate digital camera to offer a low bit rate. Enjoy . . . !


Most of the Elphel cameras were first introduced on, and the new Model 333 camera will follow this tradition. The first was the Model 303, a high speed gated Intensified camera that used embedded Linux running on an Axis ETRAX100LX processor. The second -- the Model 313 -- added the high performance of a reconfigurable FPGA (field programmable gate array), a 300K-gate Xilinx Spartan-2e that was capable of JPEG encoding 1280x1024 @ 15fps. This rate was later increased to 22 fps, at the same time that a higher-resolution Micron image sensor provided 1600x1200 and 2048x1536 options. FPGA-based hardware solutions are very flexible, enabling the same camera main board to later be used without any modifications in a very different product -- the Model 323 camera. The Model 323 has a huge 36 x 24mm Kodak CCD sensor, the KAI-11000, which supports 4004 x 2672 resolution for true snapshot operation.

What should the ideal network cameras be able to do?

The most common application for network cameras is video security. In order to take over from legacy analog systems, digital cameras must have high resolution, so that there will be no need to use rotating (PTZ) platforms, which can easily miss an important event by looking in a different direction at the wrong moment. But high resolution combined with high frame rate (which was already available in analog systems) produces huge amounts of video data that can easily saturate the network with just a few cameras. It also requires too much space on the hard drives for archiving. However, this is only true when using JPEG/MJPEG compression -- true video compression can make a big difference.

So, the "ideal" network camera should combine three features: high resolution, high frame rate, and low bit rate.

No currently available cameras provide all three of these features at once. Some have low bit rate (i.e. MPEG-4) and high frame rate, but the resolution is low (same as analog cameras). Some have high resolution and high frame rate (including the Elphel 313), but the bit rate is high. This is so because the real-time video encoding for high-resolution data is a challenging task -- and it needs a lot of processing power.

An FPGA that can handle the job

The FPGA in the model 313 has 98 percent utilization, so not much can be added. But after that camera was built, more powerful FPGAs have become available. The new model 333 camera uses a million-gate Xilinx Spartan 3, which has three times more general resources, is faster, and has some useful new features (including embedded multipliers and DDR I/O support) -- all in the same compact FT256 package as the previously used Spartan 2e.

As soon as support for this FPGA was added to Xilinx's free-for-download (I hope that one day they will release really Free software, not just free "as in beer") WebPack tools, I was ready for the new design. Support by free development software is essential for our products, as they come with all the source code (including the routines for the FPGA, written in Verilog HDL) under GNU/GPinterL. We want our customers to be able to hack into -- and improve -- our designs.

Hardware details

The model 333 electronic design was based on that of the previous model, and has the same CPU and center portion of the rather compact PC board layout, where the data bus connects processor and memory (SDRAM and flash) chips to each other. In addition to the FPGA, all the memory components were upgraded; Flash is now 16MB (instead of 8MB), and system SDRAM is 32MB (instead of 16MB). The dedicated memory that connects directly to the FPGA is also twice as large, and it is also more than twice as fast -- it is DDR. That memory is a very critical component for video compression, since each piece of data goes through this memory three times while being processed by the FPGA.

Despite these upgrades, the size of the board did not increase. Actually, I was even able to shrink the board a little. So, while the outside dimensions remain the same -- 3.5 by 1.5 inches -- the corners of the board near the network connector are cut out, so that the RJ-45 connector fits into a sealed shell, making the camera suitable for outdoor applications without requiring an external protective enclosure.

Model 333 camera main board
(Click to enlarge)

The Model 333, in a Model 313 case -- the production version will use a weatherproof case
(Click to enlarge)

The right codec for the camera

While working on the hardware design, I didn't get involved in the video compression algorithms -- I just estimated that something like p-frames of MPEG-2 can make a big difference with the fixed-view cameras, where most of the image is usually a constant background. And the next level of compression efficiency, motion compensation, needs higher memory bandwidth than is available in this camera, so I decided to skip it in this design, and leave it for future upgrades.

In August 2004, I had the first new camera hardware tested, and I ported the MJPEG functionality from the model 313 camera. It immediately ran faster -- 1280 x 1024 @ 30fps, instead of 22fps -- with frame rates limited by the sensor. I ordered a book about MPEG-2 implementation details. Only then did I discover that use of this standard (in contrast to JPEG/MJPEG) requires licensing. The license fee is reasonable, and rather small compared to the hardware costs, but being an opponent of the idea of software patents, I didn't want to support it financially.

That meant that I had to look for an alternative codec, and it didn't take long to find a better solution -- both technically, and from a licensing perspective. It is Theora, developed by the Foundation. The algorithm is based on VP3, by On2 Technologies, who has granted an irrevocable royalty-free license to use VP3 and derivatives. Theora is an advanced video codec that competes with MPEG-4 and other low bit-rate video compression technologies.

FPGA implementation of the Theora encoder

At that point, I had both hardware to play with, and a codec to implement. As it turned out, the documentation is quite accurate. Being the first one to re-implement the codec from the documentation, I ran into just a single error, where there was a mismatch between the docs and the standard software implementation.

Even with a number of shortcuts I made (unlike a decoder, an encoder need not implement all possible features), it was still not an easy task. I omitted motion vectors and the loop filter, and still the required memory bandwidth turned out to be rather high -- with FPN (fixed-pattern noise) correction enabled, the total data rate was about 95 percent of the theoretical bandwidth of the SDRAM chip I used (500MB/sec @ 125MHz clock). For each pixel encoded, the memory should:
  • Provide FPN correction data (2 bytes)

  • Receive corrected pixel data and store it in scan-line order (1 byte)

  • Provide data to the color converter (Bayer->YCbCr 4:2:0) that is connected to the compressor. Data is sent in 20x20 overlapping tiles for each 16x16 pixels, so each pixel needs (400/256)~=1.56 bytes

  • For the INTER frames, reference frame data (that is subtracted from the current frame) is needed, and each frame produces a new reference frame. That gives 2*1.5=3 bytes more (1.5 is used because in YCbCr 4:2:0 encoding each 4 sensor pixels provide 2 intensity values (Y) and one of each color components (Cb and Cr)
And that is not all. In the Theora format, quantized DCT components are globally reordered before being sent out -- first, go all the DC components (average values of 8x8 blocks); then, both luma (Y) and chroma (Cb and Cr); then, all the rest, the AC components (from lower to higher spatial frequencies), each in the same order. And, as the 8x8 DCT processes data in 64 pixel blocks, yielding all 64 coefficients (DC and 63 AC) together, the whole frame of coefficient data has to be stored before reaching the compressor output. As this intermediate data needs 12 bits per coefficient, it gives 2*1.5*(12/8) ~= 4.5 bytes per pixel more.

The total amount of data to be transferred to/from SDRAM is 12.11 bytes per each pixel; and, as 1280x1024 at 30fps corresponds to an average pixel rate of 39.3MPix/sec, 476MB/sec of bandwidth is needed. So, it could fit in the 500MB/sec available. But normally, such efficiency in SDRAM transfer is achieved only when the data is written and read continuously; so, it is easy to organize bank interleaving such that the activation/precharge operations on other banks is hidden while the active bank is sending or receiving data.

Here, the task was more complicated, especially when writing and reading intermediate data quantized DCT coefficient as 12-bit tokens, since the write and read sequences are very orthogonal to each other -- tokens that are close while being written are very far while being read out (and vice versa). "Very far" in this case means much farther than can be buffered inside the FPGA -- the chip has 24 of 2KB embedded memory blocks.

All this made the memory controller design one of the trickiest parts of the system. Yet, it is a job that an FPGA can handle much better than many general purpose SDRAM controllers, since the specially designed data structures can be "compiled" into the hardware.

After the concurrent eight-channel DDR SDRAM controller code was written and simulated, the rest was easier. The Bayer-to-YCbCr 4:2:0 conversion code was reused from the previous design, and DCT and IDCT were designed to follow exactly the Theora documentation. Each stage uses embedded multipliers available in the FPGA that run on twice the YCbCr pixel clock (now 125MHz), so only four multipliers are needed. The quantizer and dequantizer use the embedded memory blocks to store multiplication tables prepared by the software in advance, according to the codec specs.

The DC prediction module is a mandatory part of the compressor that uses an additional memory block to store information from the DC components of the blocks in the previous row. Based on this approach, with one block, it is possible to process frames as wide as 4096 pixels. Output from this module is combined with the AC coefficients, and they are all processed in reverse zig-zag order, to extract zero runs to and prepare tokens to be encoded to the output data. Because this is done in a different order, these 12-bit tokens are first stored in the SDRAM.

When the complete frame of tokens is stored in SDRAM, the second encoder stage starts while the first stage is processing the next acquired frame. Tokens are read to the FPGA in the "coded" order. The outer loop goes through DCT coefficients -- starting from DC, then AC -- from the lowest to the highest spatial frequencies. For each coefficient, index color planes (Y, Cb and Cr) are iterated, for each plane. Superblocks (32x32 pixel squares) are scanned row first, left to right, then bottom to top. And, in each superblock, 8x8 pixel blocks are scanned in Hilbert order (for 4x4 blocks this sequence looks like an upper-case omega with a dip at the very top).

It is very likely that multiple consecutive blocks will have only zeros for all AC coefficients. These zero runs are combined in special EOB (end of block) tokens -- that could not be done in the first stage, since at that point the neighbors in the coded order were processed far apart in time.

Now all the tokens -- both the DCT coefficient ones received from the SDRAM, and the newly calculated EOB runs -- are encoded using Huffman tables (individual for color planes and for the groups of coefficient indices). The tables themselves are loaded into the embedded memory block by the software before the compression. The resulting variable-length data is consolidated into 16-bit words, buffered, and later sent out to system memory using 32-bit wide DMA.

Results, credits and plans

At this point, the very basic software has been developed, and the most obvious bugs in the FPGA implementation of the Theora encoder have been found and fixed. The camera has been successfully tested with a 1280x1024 sensor running at 30 fps (the camera can also run with a 2048x1536 sensor at 12fps, and can accommodate future sensors up to 4.5MPix).

Basically, the current software was developed to serve as a test bench for the FPGA. It does not have any streamer yet -- the short hardware-compressed clips (up to 18MB) are stored in the camera memory, and then later sent out as an Ogg-encapsulated file. I do not think it will take long to implement a streamer -- there is a team of programmers that came together nearly a year ago when Elphel announced a software competition for the best video streamer for the previous JPEG/MJPEG model 313 camera in a Russian online magazine, Computerra. Thanks to that effort, the model 313 now has seven alternative streamers, some running as fast as 1280x1024 at 22fps (FPGA limited) and sending out up to 70Mbps (that rate is needed only for very high JPEG quality settings).

The winner of that competition -- Alexander Melichenko (Kiev, Ukraine) -- was able to create the first version of his streamer before he even got the camera from us. ftp and telnet access to the camera over the Internet was enough to remotely install, run, and troubleshoot the application for the GNU/Linux system, which ran on a CPU he had never experienced previously (an Axis Communications ETRAX 100LX).

Sergey Khlutchin (Samara, Russia) customized the Knoppix Live CD GNU/Linux distribution, enabling our customers who normally use other operating system to see the full capabilities of the camera. Apple's Quicktime player does a good job displaying the RTP/RTSP videostream that carries MJPEG from the camera, but we could not figure out how to get rid of the three second buffering delay of that proprietary product. And Mplayer -- well, it seems to feel better when launched from GNU/Linux.

And this is the way to go for Elphel. We will not wait for the day when most of our customers are using FOSS (free and open source software) operating systems on their desktop. Thanks to Klaus Knopper, we can ship the Knoppix Live CD system with each of our cameras, including the new Model 333, which is the first network camera that combines high resolution, high frame rate, and low bit rate -- and produces Ogg Theora video.


According to Filippov, high resolution, high-frame rate, low-bit video does present one challenge, at least for now -- finding a system fast enough to decode the ouput at full resolution and full frame rate. Filippov has asked LinuxDevices readers with fast systems (such as dual-processor 3.6GHz Xeon systems) to download sample files and email success reports. He hopes to demonstrate the camera at an upcoming trade show, and is hoping to gauge how fast a system he'll need.

Says Filippov, "The decoders are not optimized enough yet (maybe the camera will somewhat push developers). Just today there was a posting with a patch that gives an 11 percent improvement. And, I hope that cheaper multi-core systems will be available soon. Finally, we could record full speed/full resolution video on the disk (to be able to analyze some videosecurity event later in detail), but render real-time (for the operator watching multiple cameras) with reduced resolution. It is possible to make software that will use abbreviated IDCT with resolution 1/2, 1/4 or 1/8 of the original. In the last case, just DC coefficients are needed -- no DCT at all. For JPEG, such functions are already in libjpeg, and similar things can be done with Theora."

About the author: Andrey N. Filippov has a passion for applying modern technologies to embedded devices, especially in advanced imaging applications. He has over twenty years of experience in R&D, including high-speed high-resolution, mixed signal design, PDDs and FPGAs, and microprocessor-based embedded system hardware and software design, with a special focus on image acquisition methods for Laser Physics studies and computer automation of scientific experiments. Andrey holds a PhD in Physics from the Moscow Institute for Physics and Technology. This photo of the author was made using a Model 303 High Speed Gated Intensified Camera.

A Russian translation of this article is available here.

Debian Linux controls copter-like UAV

Apr. 02, 2007

Trek Aerospace used Debian Linux and open-source flight control software to build an unmanned aerial vehicle (UAV) capable of vertical take-off and landing (VTOL). The Oviwun weighs about six pounds, fits in a backpack, and includes a GPS system that enables autonomous flight and position control.

(Click for larger view of Trek Oviwun)

Spread the word:
digg this story
The Oviwun UAV can fly into tight spaces, hover in one spot in order to capture still or video images, and send data back to the user in real time. Optional night vision cameras allow the device to be flown into caves, dark buildings, and tunnels, Trek said.

Trek Oviwun
(Click to enlarge)

Oviwun drivetrain
(Click to enlarge)
The Oviwun's lift and propulsion system is based on twin five-blade helicopter rotors that are housed in ducts. The dects allow the vehicle to bump into things without destroying them and/or itself. The blades are powered by a rotary engine designed by Trek in partnership with its engine supplier.

Directional control is accomplished via three vanes within the rotor ducts. Additionally, the ducts themselves can be rotated. A gyroscope allows for control on "all three axes," Trek said.

VersaLogic Puma
(Click to enlarge)
Underneath the cowl, the Oviwun is controlled by VersaLogic's PC/104-Plus form-factor Puma SBC (single-board computer), which is based on an x86-compatible AMD GX500 processor. The operating system software is built upon Debian Linux, according to board-maker VersaLogic, which supplies Debian Linux BSPs with many of its boards.

Harry Falk, Trek Aerospace president, stated, "We chose VersaLogic embedded computers because they are robust and reliable, and because VersaLogic stays on top of ever-advancing technologies. They are incredibly responsive in supporting our efforts to embed their products into our platforms, and they listen to our feedback."

Trek says it teamed with DARPA (Defense Advanced Research Projects Agency) and NASA to develop and test the Oviwun.


A limited number of Trek Oviwun VTOL UAVs are available for purchase through Trek's beta testing program, priced at $15,000.

Friday, May 25, 2007

機器人市場興起 教育娛樂、家用服務將成先鋒



他根據國際機器人聯盟(World Robotics)的統計數據指出,截至2005年為止,全球有31600台的「專業服務型機器人」安裝數量。預期此市場在2006~2009年間,將有 34000台的安裝數量,總市值約77.8億美元。他強調,「這是一個多種、少量,但單價高的應用市場。」

而以「個人/家用服務型機器人」的市場來看,至2005年的安裝量為296萬台,其中家用機器人裝置佔190萬台,娛樂休閒機器人裝置 102萬台。World Robotics並預估2006~2009年總計將有550萬台安裝量,總市值約26.7億美元。其中家用機器人佔390萬台,市值約16.8億美元;娛 樂休閒機器人160萬台,市值約9.6億美元。


針對玩具市場的發展來看,白忠哲引述麻省理工學院(MIT)教授Rodney A. Brooks的話說,「不要忽略把機器人當禮物的市場。」MIT團隊曾開發出‘My Real Baby’玩具,銷售量超過10萬台。


白忠哲指出,全球的玩具市場大約有700億美元的規模,其中傳統玩具佔550億美元。近年來,玩具的消費年齡層也逐漸擴大,從嬰幼兒、青少 年,擴展到成人、甚至是銀髮族。這種結合科技與玩具的智慧型電子玩具以及教育娛樂玩具,不但成長率高且產品生命週期較長,目前包括日本、韓國等許多業者也 都開始積極投入。另外,更有所謂‘療癒系玩具’的利基型玩具興起,試圖創造出更多元化的玩具市場商機。

除了玩具之外,家用機器人也是近來另一個引人注目的焦點。最有名的產品大概就是iRobot公司推出的自動吸塵器Roomba了。白忠哲 表示,iRobot是一家從開發軍用機器人起家的公司,但該公司的執行長Colin Angle對機器人產業的發展自有看法。他曾指出,「過度聚焦於人型機器人的研究,將會減緩此產業的進展。建造機器人,成本是至關重要的;而機器人技術的 大躍進將來自於降低機構複雜度的發明。」

而從市場面來看,Colin Angle則認為,「必須建造完整的機器人,才能為此產品開創新市場。同時,需以大量裝置為基礎,建構機器人的零組件,並參與建立機器人的價值鏈系統。因 此,即使‘殺手級應用’仍未出現,但也不再等待技術,而是朝著這些方向去開創出機器人的市場商機。」

在這種思維下,iRobot已推出一系列的家用機器人,可說是推動了「家電機器人化」的這股趨勢。白忠哲表示,Roomba產品目前已經銷售超過250萬台。為了趁勝追擊,iRobot近來也持續推出地板清洗機器人Scooba,以及游泳池清洗機器人Verro 300。




總結來看,白忠哲指出,先進國家人口結構的轉變,使教育娛樂、家庭勞務與照護需求日增,同時,技術進步也使各種設施逐漸「機器人化」,這是 電子、電機與機械業朝多元化發展的新藍海。而從產業發展的眼光來看,機器人的應用範圍廣泛,各產業亦可思考規劃機器人技術如何應用。例如,保全服務與醫療 服務亦可主導機器人的研發與應用。在應用驅動的需求下,專業使用者與研發者的合作開發將是未來趨勢。

Wednesday, May 09, 2007

HDMI 1.3 Speeds xvYCC Adoption in Camcorders

HDMI 1.3 Speeds xvYCC Adoption in Camcorders

Nikkei Electronics Asia -- May 2007

Sony Corp of Japan has released a camcorder (Fig 1) capable of shooting using the extended-gamut YCC (xvYCC) color space standard, considerably wider than conventional camcorders. The xvYCC color space is already supported by imaging equipment such as a portable viewer from Seiko Epson Corp of Japan and a liquid crystal display TV from Sony Corp of Japan, but there has been very little video content with xvYCC color to utilize the full potential of these display devices. The new camcorder is effectively the first consumer product capable of producing xvYCC video content.

Sony said it adopted xvYCC in its camcorder so that it could express deep greens, brilliant pinks and other colors unobtainable from International Telecommunication Union Radiocommunication Sector (ITU-R) BT.709 (equivalent to sRGB in still pictures), which is the most commonly used color space (Fig 2). A source at Sony commented, "We can express the colors of the natural world more faithfully than ever." To promote consumer recognition of the xvYCC standard, Sony has proposed to the electronics industry that it is called "x.v.Color" and has already begun using the new name in the new camcorder.

"Our first concern in commercialization was ensuring compatibility," said a source at Sony. xvYCC is defined as an extension to existing color space, so that color information expressed as per ITU-R BT.709 will display the same under xvYCC. When xvYCC video content is played back using a video interface or equipment not designed for xvYCC, it was impossible to rule out unexpected display results, such as no picture at all. To avoid this type of problem, Sony assigned priority to verifying that compatibility was assured. It connected the new camcorder to a range of TVs which did not support xvYCC, and displayed imagery. The results confirmed that no problem exists, said Sony.

The camcorder is the first to come with a High Definition Multimedia Interface (HDMI) 1.3 interface in addition to the component video and S video output interfaces. HDMI 1.3 is used to notify the TV or other equipment that the video signal is xvYCC-compliant, using metadata. Display devices supporting HDMI 1.3 can check the metadata to facilitate xvYCC color space-enabled display. Making it possible to view xvYCC via HDMI 1.2 or existing analog interfaces incapable of making decisions without analyzing the signal content is likely to be fairly complex.

by Chikashi Horikiri

Thursday, May 03, 2007

三大廠共同投資 美研究計畫可望加速機器人普及

機器人將大舉入侵日常生活?目前Google、英特爾(Intel)和微軟(Microsoft)正在進行一項投資,這三家公司提供資金讓美國Carnegie Mellon大學的研究人員,製造一系列可連接網際網路、且幾乎能讓任何人都可以利用現成零件來組裝的機器人。

這項計畫稱為遠端機器人開發套件(Telepresence Robot Kit,TeRK), 是在去年夏天由Carnegie Mellon的機器人學院(Robotics Institute)和位於美國德州奧斯汀的Charmed Labs,所共同公佈的合作成果。機器人學副教授Illah Nourbakhsh,以及他主持的Community Robotics, Education, and Technology Empowerment (CREATE)的成員,並已經針對機器人組裝設計一系列“製作手冊”。


TeRK計畫的核心是被稱為Qwerk的機器人控制器,該控制器可透過Charmed Labs的網站購得(售價349美元)。該元件可做為電子大腦,能處理無線網際網路連結、運動控制,以及擁有諸如發送/接收照片、視訊,RSS閱讀器以及網路搜尋引擎的功能。


Charmed Labs的總裁Rich LeGrand在一份聲明中指出:「在我們設計Qwerk時,我們利用一些低成本、高性能的零組件,這些零組件最初是開發用於消費性電子領域的;而最後我們發明了這個擁有優異性能、且具成本效益的機器人控制器。」

除了教育和娛樂領域的應用外,該計畫也希望將機器人應用於日常生活,例如用機器人來對住宅或寵物監視。未來將要開發的方向包括用來測量噪音 和空氣污染的環境感測器。不過Nourbakhsh不太贊同去定義「機器人應該是怎麼樣」,而他的觀念或許可以解釋其工作團隊正在研究的題目之一:可控制 的填充泰迪熊。


(參考原文:Google, Intel, and Microsoft fund robot 'recipes')

(Thomas Claburn)