Custom annotated bibliography: H.264 Video Streaming System on Embedded Platform

H.264 Video Streaming System on Embedded Platform ABSTRACT The adoption of technological products like digital television and video conferencing has made video streaming an active research area. This report presents the integration of a video streamer module into a baseline H.264/AVC encoder running a TMSDM6446EVM embedded platform. The main objective of this project is to achieve real-time streaming of the baseline H.264/AVC video over a local area network (LAN) which is a part of the surveillance video system. The encoding of baseline H.264/AVC and the hardware components of the platform are first discussed. Various streaming protocols are studied in order to implement the video streamer on the DM6446 board. The multi-threaded application encoder program is used to encode raw video frames into H.264/AVC format onto a file. For the video streaming, open source Live555 MediaServer was used to stream video data to a remote VLC client over LAN. Initially, file streaming was implemented from PC to PC. Upon successfully implementation on PC, the video streamer was ported to the board. The steps involved in porting the Live555 application were also described in the report. Both unicast and multicast file streaming were implemented in the video streamer. Due to the problems of file streaming, the live streaming approach was adopted. Several methodologies were discussed in integrating the video streamer and the encoder program. Modification was made both the encoder program and the Live555 application to achieve live streaming of H.264/AVC video. Results of both file and live streaming will be shown in this report. The implemented video streamer module will be used as a base module of the video surveillance system. Chapter 1: Introduction 1.1. Background Significant breakthroughs have been made over the last few years in the area of digital video compression technologies. As such applications making use of these technologies have also become prevalent and continue to be of active research topics today. For example, digital television and video conferencing are some of the applications that are now commonly encountered in our daily lives. One application of interest here is to make use of the technologies to implement a video camera surveillance system which can enhance the security of consumers business and home environment. In typical surveillance systems, the captured video is sent over a cable networks to be monitored and stored at remote stations. As the captured raw video contains large amount of data, it will be of advantage to first compress the data by using a compression technique before it is transferred over the network. One such compression technique that is suitable for this type of application is the H.264 coding standard. H.264 coding is better than the other coding technique for video streaming as it is more robust to data losses and coding efficiency, which are important factors when streaming is performed over a shared Local Area Network. As there is an increasing acceptance of H.264 coding and the availability of high computing power embedded systems, digital video surveillance system based on H.264 on embedded platform is hence a feasible and a potentially more cost-effective system. Implementing a H.264 video streaming system on an embedded platform is a logical extension of video surveillance systems which are still typical implemented using high computing power stations (e.g. PC). In a embedded version, a Digital Signal Processor (DSP) forms the core of the embedded system and executes the intensive signal processing algorithm. Current embedded systems typical also include network features which enable the implementation of data streaming applications. To facilitate data streaming, a number of network protocol standards have also being defined, and are currently used for digital video applications. 1.2. Objective and Scope The objective of this final year project is to implement a video surveillance system based on the H.264 coding standard running on an embedded platform. Such a system contains extensive scopes of functionalities and would require extensive amount of development time if implemented from scratch. Hence this project is to focus on the data streaming aspect of a video surveillance system. After some initial investigation and experimentation, it is decided to confine the main scope of the project to developing a live streaming H.264 based video system running on a DM6446 EVM development platform. The breakdown of the work to be progressive performed are then identified as follows: 1. Familiarization of open source live555 streaming media server Due to the complexity of implementing the various standard protocols needed for multimedia streaming, the live555 media server program is used as a base to implement the streaming of the H.264.based video data. 2. Streaming of stored H.264 file over the network The live555 is then modified to support streaming of raw encoded H.264 file from the DM6446 EVM board over the network. Knowledge of H.264 coding standard is necessary in order to parse the file stream before streaming over the network. 3. Modifying a demo version of an encoder program and integrating it together with live555 to achieve live streaming The demo encoder was modified to send encoded video data to the Live555 program which would do the necessary packetization to be streamed over the network. Since data is passed from one process to another, various inter-process communication techniques were studied and used in this project. 1.3. Resources The resources used for this project are as follows: 1. DM6446 (DaVinciÃ¢â€ž ¢) Evaluation Module 2. SWANN C500 Professional CCTV Camera Solution 400 TV Lines CCD Color Camera 3. LCD Display 4. IR Remote Control 5. TI Davinci demo version of MontaVista Linux Pro v4.0 6. A Personal Workstation with Centos v5.0 7. VLC player v.0.9.8a as client 8. Open source live555 program (downloaded from www.live555.com) The system setup of this project is shown below: 1.4. Report Organization This report consists of 7 chapters. Chapter 1 introduces the motivation behind embedded video streaming system and defines the scope of the project. Chapter 2 illustrates the video literature review of the H.264/AVC video coding technique and the various streaming protocols which are to be implemented in the project. Chapter 3 explains the hardware literature review of the platform being used in the project. The architecture, memory management, inter-process communication and the software tools are also discussed in this chapter. Chapter 4 explains the execution of the encoder program of the DM6446EVM board. The interaction of the various threads in this multi-threaded application is also discussed to fully understand the encoder program. Chapter 5 gives an overview of the Live555 MediaServer which is used as a base to implement the video streamer module on the board. Adding support to unicast and multicast streaming, porting of live555 to the board and receiving video stream on remote VCL client are explained in this chapter. Chapter 6 explains the limitations of file streaming and moving towards live streaming system. Various integration methodologies and modification to both encoder program and live555 program are shown as well. Chapters 7 summarize the implementation results of file and live streaming, analysis the performance of these results. Chapter 8 gives the conclusion by stating the current limitation and problems, scope for future implementation. Chapter 2: Video Literature Review 2.1. H.264/AVC Video Codec Overview H.264 is the most advanced and latest video coding technique. Although there are many video coding schemes like H.26x and MPEG, H.264/AVC made many improvements and tools for coding efficiency and error resiliency. This chapter briefly will discuss the network aspect of the video coding technique. It will also cover error resiliency needed for transmission of video data over the network. For a more detailed explanation of the H.264/AVC, refer to appendix A. 2.1.1. Network Abstraction Layer (NAL) The aim of the NAL is to ensure that the data coming from the VCL layer is Ã¢â‚¬Å"network worthyÃ¢â‚¬ so that the data can be used for numerous systems. NAL facilitates the mapping of H.264/AVC VCL data for different transport layers such as: * RTP/IP real-time streaming over wired and wireless mediums * Different storage file formats such as MP4, MMS, AVI and etc. The concepts of NAL and error robustness techniques of the H.264/AVC will be discussed in the following parts of the report. NAL Units The encoded data from the VCL are packed into NAL units. A NAL unit represents a packet which makes up of a certain number of bytes. The first byte of the NAL unit is called the header byte which indicates the data type of the NAL unit. The remaining bytes make up the payload data of the NAL unit. The NAL unit structure allows provision for different transport systems namely packet-oriented and bit stream-oriented. To cater for bit stream-oriented transport systems like MPEG-2, the NAL units are organized into byte stream format. These units are prefixed by a specific start code prefix of three bytes which is namely 0x000001. The start code prefix indicates and the start of each NAL units and hence defining the boundaries of the units. For packet-oriented transport systems, the encoded video data are transported via packets defined by transport protocols. Hence, the boundaries of the NAL units are known without having to include start code prefix byte. The details of packetization of NAL units will be discussed in later sections of the report. NAL units are further categorized into two types: * VCL unit: comprises of encoded video data Ã‚ · Non-VCL unit: comprises of additional information like parameter sets which is the important header information. Also contains supplementary enhancement information (SEI) which contains the timing information and other data which increases the usability of the decoded video signal. Access units A group of NAL units which adhere to a certain form is called a access unit. When one access unit is decoded, one decoded picture is formed. In the table 1 below, the functions of the NAL units derived from the access units are explained. Data/Error robustness techniques H.264/AVC has several techniques to mitigate error/data loss which is an essential quality when it comes to streaming applications. The techniques are as follows: Ã‚ · Parameter sets: contains information that is being applied to large number of VCL NAL units. It comprises of two kinds of parameter sets: Sequence Parameter set (SPS) : Information pertaining to sequence of encoded picture Picture Parameter Set (PPS) : Information pertaining to one or more individual pictures The above mentioned parameters hardly changes and hence it need not be transmitted repeatedly and saves overhead. The parameter sets can be sent Ã¢â‚¬Å"in-bandÃ¢â‚¬ which is carried in the same channel as the VCL NAL units. It can also be sent Ã¢â‚¬Å"out-of-bandÃ¢â‚¬ using reliable transport protocol. Therefore, it enhances the resiliency towards data and error loss. Ã‚ · Flexible Macroblock Ordering (FMO) FMO maps the macroblocks to different slice groups. In the event of any slice group loss, missing data is masked up by interpolating from the other slice groups. Ã‚ · Redundancy Slices (RS) Redundant representation of the picture can be stored in the redundant slices. If the loss of the original slice occurs, the decoder can make use of the redundant slices to recover the original slice. These techniques introduced in the H.264/AVC makes the codec more robust and resilient towards data and error loss. 2.1.2. Profiles and Levels A profile of a codec is defined as the set of features identified to meet a certain specifications of intended applications For the H.264/AVC codec, it is defined as a set of features identified to generate a conforming bit stream. A level is imposes restrictions on some key parameters of the bit stream. In H.264/AVC, there are three profiles namely: Baseline, Main and Extended. 5 shows the relationship between these profiles. The Baseline profile is most likely to be used by network cameras and encoders as it requires limited computing resources. It is quite ideal to make use of this profile to support real-time streaming applications in a embedded platform. 2.2. Overview of Video Streaming In previous systems, accessing video data across network exploit the Ã¢â‚¬Ëœdownload and play approach. In this approach, the client had to wait until the whole video data is downloaded to the media player before play out begins. To combat the long initial play out delay, the concept of streaming was introduced. Streaming allows the client to play out the earlier part of the video data whilst still transferring the remaining part of the video data. The major advantage of the streaming concept is that the video data need not be stored in the clients computer as compared to the traditional Ã¢â‚¬Ëœdownload and play approach. This reduces the long initial play out delay experienced by the client. Streaming adopts the traditional client/server model. The client connects to the listening server and request for video data. The server sends video data over to the client for play out of video data. 2.2.1. Types of Streaming There are three different types of streaming video data. They are pre-recorded/ file streaming, live/real-time streaming and interactive streaming. * Pre-recorded/live streaming: The encoded video is stored into a file and the system streams the file over the network. A major overhead is that there is a long initial play out delay (10-15s) experienced by the client. * Live/real-time streaming: The encoded video is streamed over the network directly without being stored into a file. The initial play out delay reduces. Consideration must be taken to ensure that play out rate does not exceed sending rate which may result in jerky the picture. On the other hand, if the sending rate is too slow, the packets arriving at the client may be dropped, causing in a freezing the picture. The timing requirement for the end-to-end delay is more stringent in this scenario. * Interactive streaming: Like live streaming, the video is streamed directly over the network. It responds to users control input such as rewind, pause, stop, play and forward the particular video stream. The system should respond in accordance to those inputs by the user. In this project, both pre-recorded and live streaming are implemented. Some functionality of interactive streaming controls like stop and play are also part of the system. 2.2.2. Video Streaming System modules Video Source The intent of the video source is to capture the raw video sequence. The CCTV camera is used as the video source in this project. Most cameras are of analogue inputs and these inputs are connected to the encoding station via video connections. This project makes use of only one video source due to the limitation of the video connections on the encoding station. The raw video sequence is then passed onto the encoding station. Encoding Station The aim of the encoding station digitized and encodes the raw video sequence into the desired format. In the actual system, the encoding is done by the DM6446 board into the H.264/AVC format. Since the hardware encoding is CPU intensive, this forms the bottleneck of the whole streaming system. The H.264 video is passed onto the video streamer server module of the system. Video Streaming and WebServer The role of the video streaming server is to packetize the H.264/AVC to be streamed over the network. It serves the requests from individual clients. It needs to support the total bandwidth requirements of the particular video stream requested by clients. WebServer offers a URL link which connects to the video streaming server. For this project, the video streaming server module is embedded inside DM6446 board and it is serves every individual clients requests. Video Player The video player acts a client connecting to and requesting video data from the video streaming server. Once the video data is received, the video player buffers the data for a while and then begins play out of data. The video player used for this project is the VideoLAN (VLC) Player. It has the relevant H.264/AVC codec so that it can decode and play the H264/AVC video data. 2.2.3. Unicast VS Multicast There are two key delivery techniques employed by streaming media distribution. Unicast transmission is the sending of data to one particular network destination host over a packet switched network. It establishes two way point-to-point connection between client and server. The client communicates directly with the server via this connection. The drawback is that every connection receives a separate video stream which uses up network bandwidth rapidly. Multicast transmission is the sending of only one copy of data via the network so that many clients can receive simultaneously. In video streaming, it is more cost effective to send single copy of video data over the network so as to conserve the network bandwidth. Since multicast is not connection oriented, the clients cannot control the streams that they can receive. In this project, unicast transmission is used to stream encoded video over the network. The client connects directly to the DM6446 board where it gets the encoded video data. The project can easily be extended to multicast transmission. 2.3. Streaming Protocols When streaming video content over a network, a number of network protocols are used. These protocols are well defined by the Internet Engineering Task Force (IETF) and the Internet Society (IS) and documented in Request for Comments (RFC) documents. These standards are adopted by many developers today. In this project, the same standards are also employed in order to successfully stream H.264/AVC content over a simple Local Area Network (LAN). The following sections will discuss about the various protocols that are studied in the course of this project. 2.3.1. Real-Time Streaming Protocol (RTSP) The most commonly used application layer protocol is RTSP. RTSP acts a control protocol to media streaming servers. It establishes connection between two end points of the system and control media sessions. Clients issue VCR-like commands like play and pause to facilitate the control of real-time playback of media streams from the servers. However, this protocol is not involved in the transport of the media stream over the network. For this project, RTSP version 1.0 is used. RTSP States Like the Hyper Text Transfer Protocol (HTTP), it contains several methods. They are OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, RECORD and TEARDOWN. These commands are sent by using the RTSP URL. The default port number used in this protocol is 554. An example of such as URL is: rtsp:// Ã‚ · OPTIONS: An OPTIONS request returns the types of request that the server will accept. An example of the request is: OPTIONS rtsp://155.69.148.136:554/test.264 RTSP/1.0 CSeq: 1rn User-agent: VLC media Player The CSeq parameter keeps track of the number of request send to the server and it is incremented every time a new request is issued. The User-agent refers to the client making the request. * DESCRIBE: This method gets the presentation or the media object identified in the request URL from the server. An example of such a request: DESCRIBE rtsp://155.69.148.138:554/test.264 RTSP/1.0 CSeq: 2rn Accept: application/sdprn User agent: VLC media Player The Accept header is used to describe the formats understood by the client. All the initialization of the media resource must be present in the DESCRIBE method that it describes. Ã‚ · SETUP: This method will specify the mode of transport mechanism to be used for the media stream. A typical example is: SETUP rtsp://155.69.148.138:554/test.264 RTSP/1.0 CSeq: 3rn Transport: RTP/AVP; unicast; client_port = 1200-1201 User agent: VLC media Player The Transport header specifies the transport mechanism to be used. In this case, real-time transport protocol is used in a unicast manner. The relevant client port number is also reflected and it is selected randomly by the server. Since RTSP is a stateful protocol, a session is created upon successful acknowledgement to this method. Ã‚ · PLAY: This method request the server to start sending the data via the transport mechanism stated in the SETUP method. The URL is the same as the other methods except for: Session: 6 Range: npt= 0.000- rn The Session header specifies the unique session id. This is important as server may establish various sessions and this keep tracks of them. The Range header positions play time to the beginning and plays till the end of the range. * PAUSE: This method informs the server to pause sending of the media stream. Once the PAUSE request is sent, the range header will capture the position at which the media stream is paused. When a PLAY request is sent again, the client will resume playing from the current position of the media stream as specified in the range header. RSTP Status Codes Whenever the client sends a request message to the server, the server forms a equivalent response message to be sent to the client. The response codes are similar to HTTP as they are both in ASCII text. They are as follows: 200: OK 301: Redirection 405: Method Not Allowed 451: Parameter Not Understood 454: Session Not Found 457: Invalid Range 461: Unsupported Transport 462: Destination Unreachable These are some of the RTSP status codes. There are many others but the codes mentioned above are of importance in the context of this project. 2.3.2. Real-time Transport Protocol (RTP) RTP is a defined packet structure which is used for transporting media stream over the network. It is a transport layer protocol but developers view it as a application layer protocol stack. This protocol facilitates jitter compensation and detection of incorrect sequence arrival of data which is common for transmission over IP network. For the transmission of media data over the network, it is important that packets arrive in a timely manner as it is loss tolerant but not delay tolerant. Due to the high latency of Transmission Control Protocol in establishing connections, RTP is often built on top of the User Datagram Protocol (UDP). RTP also supports multicast transmission of data. RTP is also a stateful protocol as a session is established before data can be packed into the RTP packet and sent over the network. The session contains the IP address of the destination and port number of the RTP which is usually an even number. The following section will explain about the packet structure of RTP which is used for transmission. RTP Packet Structure The below shows a RTP packet header which is appended in front of the media data.s The minimum size of the RTP header is 12 bytes.. Optional extension information may be present after the header information. The fields of the header are: Ã‚ · V: (2 bits) to indicate the version number of the protocol. Version used in this project is 2. Ã‚ · P (Padding): (1 bit) to indicate if there padding which can be used for encryption algorithm Ã‚ · X (Extension): (1 bit) to indicate if there is extension information between header and payload data. Ã‚ · CC (CSRC Count) : (4 bits) indicates the number of CSRC identifiers Ã‚ · M (Marker): (1 bit) used by application to indicate data has specific relevance in the perspective of the application. The setting for M bit marks the end of video data in this project Ã‚ · PT (Payload Type): (7 bits) to indicate the type of payload data carried by the packet. H.264 is used for this project Ã‚ · Sequence number: (16 bits) incremented by one for every RTP packet. It is used to detect packet loss and out of sequence packet arrival. Based on this information, application can take appropriate action to correct them. Ã‚ · Time Stamp: (32 bits) receivers use this information to play samples at correct intervals of time. Each stream has independent time stamps. Ã‚ · SSRC: (32 bits) it unique identifies source of the stream. Ã‚ · CSRC: sources of a stream from different sources are enumerated according to its source IDs. This project does not involve the use of Extension field in the packet header and hence will not be explained in this report. Once this header information is appended to the payload data, the packet is sent over the network to the client to be played. The table below summarizes the payload types of RTP and highlighted region is of interest in this project. Table 2: Payload Types of RTP Packets 2.3.3. RTP Control Protocol (RTCP) RTCP is a sister protocol which is used in conjunction with the RTP. It provides out-of-band statistical and control information to the RTP session. This provides certain Quality of Service (QoS) for transmission of video data over the network. The primary functions of the RTCP are: * To gather statistical information about the quality aspect of the media stream during a RTP session. This data is sent to the session media source and its participants. The source can exploit this information for adaptive media encoding and detect transmission errors. * It provides canonical end point identifiers (CNAME) to all its session participants. It allows unique identification of end points across different application instances and serves as a third party monitoring tool. * It also sends RTCP reports to all its session participants. By doing so, the traffic bandwidth increases proportionally. In order to avoid congestion, RTCP has bandwidth management techniques to only use 5% of the total session bandwidth. RTCP statistical data is sent odd numbered ports. For instance, if RTP port number is 196, then RTCP will use the 197 as its port number. There is no default port number assigned to RTCP. RTCP Message Types RTCP sends several types of packets different from RTP packets. They are sender report, receiver report, source description and bye. Ã‚ · Sender Report (SR): Sent periodically by senders to report the transmission and reception statistics of RTP packets sent in a period of time. It also includes the senders SSRC and senders packet count information. The timestamp of the RTP packet is also sent to allow the receiver to synchronize the RTP packets. The bandwidth required for SR is 25% of RTCP bandwidth. Ã‚ · Receiver Report (RR): It reports the QoS to other receivers and senders. Information like highest sequence number received, inter arrival jitter of RTP packets and fraction of packets loss further explains the QoS of the transmitted media streams. The bandwidth required for RR is 75% of the RTCP bandwidth. Ã‚ · Source Description (SDES): Sends the CNAME to its session participants. Additional information like name, address of the owner of the source can also be sent. Ã‚ · End of Participation (BYE): The source sends a BYE message to indicate that it is shutting down the stream. It serves as an announcement that a particular end point is leaving the conference. Further RTCP Consideration This protocol is important to ensure that QoS standards are achieved. The acceptable frequencies of these reports are less than one minute. In major application, the frequency may increase as RTCP bandwidth control mechanism. Then, the statistical reporting on the quality of the media stream becomes inaccurate. Since there are no long delays introduced between the reports in this project, the RTCP is adopted to incorporate a certain level of QoS on streaming H.264/AVC video over embedded platform. 2.3.4. Session Description Protocol (SDP) The Session Description Protocol is a standard to describe streaming media initialization parameters. These initializations describe the sessions for session announcement, session invitation and parameter negotiation. This protocol can be used together with RTSP. In the previous sections of this chapter, SDP is used in the DESCRIBE state of RTSP to get sessions media initialization parameters. SDP is scalable to include different media types and formats. SDP Syntax The session is described by attribute/value pairs. The syntax of SDP are summarized in the below. In this project, the use of SDP is important in streaming as the client is VLC Media Player. If the streaming is done via RTSP, then VLC expects a sdp description from the server in order to setup the session and facilitate the playback of the streaming media. Chapter 3: Hardware Literature Review 3.1. Introduction to Texas Instrument DM6446EVM DavinciTM The development of this project based on the DM6446EVM board. It is necessary to understand the hardware and software aspects of this board. The DM6446 board has a ARM processor operating at a clock speed up to 300MHz and a C64x Digital Signal Processor operating at a clock speed of up to 600MHz. 3.1.1. Key Features of DM6446 The key features that are shown in the above are: * 1 video port which supports composite of S video * 4 video DAC outputs: component, RGB, composite * 256 MB of DDR2 DRAM * UART, Media Card interface (SD, xD, SM, MS ,MMC Cards) * 16 MB of non-volatile Flash Memory, 64 MB NAND Flash, 4 MB SRAM * USB2 interface * 10/100 MBS Ethernet interface * Configurable boot load options * IR Remote Interface, real time clock via MSP430 3.1.2. DM6446EVM Architecture The architecture of the DM6446 board is organized into several subsystems. By knowing the architecture of the DM6446, the developer can then design and built his application module on the boards underlining architecture. The shows that DM6446 has three subsystems which are connected to the underlying hardware peripherals. This provides a decoupled architecture which allows the developers to implement his applications on a particular subsystem without having to modify the other subsystems. Some of subsystems are discussed in the next sections. ARM Subsystem The ARM subsystem is responsible for the master control of the DM6446 board. It handles the system-level initializations, configurations, user interface, connectivity functions and control of DSP subsystems. The ARM has a larger program memory space and better context switching capabilities and hence it is more suited to handle complex and multi tasks of the system. DSP Subsystem The DSP subsystem is mainly the encoding the raw captured video frames into the desired format. It performs several number crunching operations in order to achieve the desired compression technique. It works together with the Video Imaging Coprocessor to compress the video frames. Video Imaging Coprocessor (VICP) The VICP is a signal processing library which contains various software algorithms that execute on VICP hardware accelerator. It helps the DSP by taking over computation of varied intensive tasks. Since hardware implementation of number cru H.264 Video Streaming System on Embedded Platform H.264 Video Streaming System on Embedded Platform ABSTRACT The adoption of technological products like digital television and video conferencing has made video streaming an active research area. This report presents the integration of a video streamer module into a baseline H.264/AVC encoder running a TMSDM6446EVM embedded platform. The main objective of this project is to achieve real-time streaming of the baseline H.264/AVC video over a local area network (LAN) which is a part of the surveillance video system. The encoding of baseline H.264/AVC and the hardware components of the platform are first discussed. Various streaming protocols are studied in order to implement the video streamer on the DM6446 board. The multi-threaded application encoder program is used to encode raw video frames into H.264/AVC format onto a file. For the video streaming, open source Live555 MediaServer was used to stream video data to a remote VLC client over LAN. Initially, file streaming was implemented from PC to PC. Upon successfully implementation on PC, the video streamer was ported to the board. The steps involved in porting the Live555 application were also described in the report. Both unicast and multicast file streaming were implemented in the video streamer. Due to the problems of file streaming, the live streaming approach was adopted. Several methodologies were discussed in integrating the video streamer and the encoder program. Modification was made both the encoder program and the Live555 application to achieve live streaming of H.264/AVC video. Results of both file and live streaming will be shown in this report. The implemented video streamer module will be used as a base module of the video surveillance system. Chapter 1: Introduction 1.1. Background Significant breakthroughs have been made over the last few years in the area of digital video compression technologies. As such applications making use of these technologies have also become prevalent and continue to be of active research topics today. For example, digital television and video conferencing are some of the applications that are now commonly encountered in our daily lives. One application of interest here is to make use of the technologies to implement a video camera surveillance system which can enhance the security of consumers business and home environment. In typical surveillance systems, the captured video is sent over a cable networks to be monitored and stored at remote stations. As the captured raw video contains large amount of data, it will be of advantage to first compress the data by using a compression technique before it is transferred over the network. One such compression technique that is suitable for this type of application is the H.264 coding standard. H.264 coding is better than the other coding technique for video streaming as it is more robust to data losses and coding efficiency, which are important factors when streaming is performed over a shared Local Area Network. As there is an increasing acceptance of H.264 coding and the availability of high computing power embedded systems, digital video surveillance system based on H.264 on embedded platform is hence a feasible and a potentially more cost-effective system. Implementing a H.264 video streaming system on an embedded platform is a logical extension of video surveillance systems which are still typical implemented using high computing power stations (e.g. PC). In a embedded version, a Digital Signal Processor (DSP) forms the core of the embedded system and executes the intensive signal processing algorithm. Current embedded systems typical also include network features which enable the implementation of data streaming applications. To facilitate data streaming, a number of network protocol standards have also being defined, and are currently used for digital video applications. 1.2. Objective and Scope The objective of this final year project is to implement a video surveillance system based on the H.264 coding standard running on an embedded platform. Such a system contains extensive scopes of functionalities and would require extensive amount of development time if implemented from scratch. Hence this project is to focus on the data streaming aspect of a video surveillance system. After some initial investigation and experimentation, it is decided to confine the main scope of the project to developing a live streaming H.264 based video system running on a DM6446 EVM development platform. The breakdown of the work to be progressive performed are then identified as follows: 1. Familiarization of open source live555 streaming media server Due to the complexity of implementing the various standard protocols needed for multimedia streaming, the live555 media server program is used as a base to implement the streaming of the H.264.based video data. 2. Streaming of stored H.264 file over the network The live555 is then modified to support streaming of raw encoded H.264 file from the DM6446 EVM board over the network. Knowledge of H.264 coding standard is necessary in order to parse the file stream before streaming over the network. 3. Modifying a demo version of an encoder program and integrating it together with live555 to achieve live streaming The demo encoder was modified to send encoded video data to the Live555 program which would do the necessary packetization to be streamed over the network. Since data is passed from one process to another, various inter-process communication techniques were studied and used in this project. 1.3. Resources The resources used for this project are as follows: 1. DM6446 (DaVinciÃ¢â€ž ¢) Evaluation Module 2. SWANN C500 Professional CCTV Camera Solution 400 TV Lines CCD Color Camera 3. LCD Display 4. IR Remote Control 5. TI Davinci demo version of MontaVista Linux Pro v4.0 6. A Personal Workstation with Centos v5.0 7. VLC player v.0.9.8a as client 8. Open source live555 program (downloaded from www.live555.com) The system setup of this project is shown below: 1.4. Report Organization This report consists of 7 chapters. Chapter 1 introduces the motivation behind embedded video streaming system and defines the scope of the project. Chapter 2 illustrates the video literature review of the H.264/AVC video coding technique and the various streaming protocols which are to be implemented in the project. Chapter 3 explains the hardware literature review of the platform being used in the project. The architecture, memory management, inter-process communication and the software tools are also discussed in this chapter. Chapter 4 explains the execution of the encoder program of the DM6446EVM board. The interaction of the various threads in this multi-threaded application is also discussed to fully understand the encoder program. Chapter 5 gives an overview of the Live555 MediaServer which is used as a base to implement the video streamer module on the board. Adding support to unicast and multicast streaming, porting of live555 to the board and receiving video stream on remote VCL client are explained in this chapter. Chapter 6 explains the limitations of file streaming and moving towards live streaming system. Various integration methodologies and modification to both encoder program and live555 program are shown as well. Chapters 7 summarize the implementation results of file and live streaming, analysis the performance of these results. Chapter 8 gives the conclusion by stating the current limitation and problems, scope for future implementation. Chapter 2: Video Literature Review 2.1. H.264/AVC Video Codec Overview H.264 is the most advanced and latest video coding technique. Although there are many video coding schemes like H.26x and MPEG, H.264/AVC made many improvements and tools for coding efficiency and error resiliency. This chapter briefly will discuss the network aspect of the video coding technique. It will also cover error resiliency needed for transmission of video data over the network. For a more detailed explanation of the H.264/AVC, refer to appendix A. 2.1.1. Network Abstraction Layer (NAL) The aim of the NAL is to ensure that the data coming from the VCL layer is Ã¢â‚¬Å"network worthyÃ¢â‚¬ so that the data can be used for numerous systems. NAL facilitates the mapping of H.264/AVC VCL data for different transport layers such as: * RTP/IP real-time streaming over wired and wireless mediums * Different storage file formats such as MP4, MMS, AVI and etc. The concepts of NAL and error robustness techniques of the H.264/AVC will be discussed in the following parts of the report. NAL Units The encoded data from the VCL are packed into NAL units. A NAL unit represents a packet which makes up of a certain number of bytes. The first byte of the NAL unit is called the header byte which indicates the data type of the NAL unit. The remaining bytes make up the payload data of the NAL unit. The NAL unit structure allows provision for different transport systems namely packet-oriented and bit stream-oriented. To cater for bit stream-oriented transport systems like MPEG-2, the NAL units are organized into byte stream format. These units are prefixed by a specific start code prefix of three bytes which is namely 0x000001. The start code prefix indicates and the start of each NAL units and hence defining the boundaries of the units. For packet-oriented transport systems, the encoded video data are transported via packets defined by transport protocols. Hence, the boundaries of the NAL units are known without having to include start code prefix byte. The details of packetization of NAL units will be discussed in later sections of the report. NAL units are further categorized into two types: * VCL unit: comprises of encoded video data Ã‚ · Non-VCL unit: comprises of additional information like parameter sets which is the important header information. Also contains supplementary enhancement information (SEI) which contains the timing information and other data which increases the usability of the decoded video signal. Access units A group of NAL units which adhere to a certain form is called a access unit. When one access unit is decoded, one decoded picture is formed. In the table 1 below, the functions of the NAL units derived from the access units are explained. Data/Error robustness techniques H.264/AVC has several techniques to mitigate error/data loss which is an essential quality when it comes to streaming applications. The techniques are as follows: Ã‚ · Parameter sets: contains information that is being applied to large number of VCL NAL units. It comprises of two kinds of parameter sets: Sequence Parameter set (SPS) : Information pertaining to sequence of encoded picture Picture Parameter Set (PPS) : Information pertaining to one or more individual pictures The above mentioned parameters hardly changes and hence it need not be transmitted repeatedly and saves overhead. The parameter sets can be sent Ã¢â‚¬Å"in-bandÃ¢â‚¬ which is carried in the same channel as the VCL NAL units. It can also be sent Ã¢â‚¬Å"out-of-bandÃ¢â‚¬ using reliable transport protocol. Therefore, it enhances the resiliency towards data and error loss. Ã‚ · Flexible Macroblock Ordering (FMO) FMO maps the macroblocks to different slice groups. In the event of any slice group loss, missing data is masked up by interpolating from the other slice groups. Ã‚ · Redundancy Slices (RS) Redundant representation of the picture can be stored in the redundant slices. If the loss of the original slice occurs, the decoder can make use of the redundant slices to recover the original slice. These techniques introduced in the H.264/AVC makes the codec more robust and resilient towards data and error loss. 2.1.2. Profiles and Levels A profile of a codec is defined as the set of features identified to meet a certain specifications of intended applications For the H.264/AVC codec, it is defined as a set of features identified to generate a conforming bit stream. A level is imposes restrictions on some key parameters of the bit stream. In H.264/AVC, there are three profiles namely: Baseline, Main and Extended. 5 shows the relationship between these profiles. The Baseline profile is most likely to be used by network cameras and encoders as it requires limited computing resources. It is quite ideal to make use of this profile to support real-time streaming applications in a embedded platform. 2.2. Overview of Video Streaming In previous systems, accessing video data across network exploit the Ã¢â‚¬Ëœdownload and play approach. In this approach, the client had to wait until the whole video data is downloaded to the media player before play out begins. To combat the long initial play out delay, the concept of streaming was introduced. Streaming allows the client to play out the earlier part of the video data whilst still transferring the remaining part of the video data. The major advantage of the streaming concept is that the video data need not be stored in the clients computer as compared to the traditional Ã¢â‚¬Ëœdownload and play approach. This reduces the long initial play out delay experienced by the client. Streaming adopts the traditional client/server model. The client connects to the listening server and request for video data. The server sends video data over to the client for play out of video data. 2.2.1. Types of Streaming There are three different types of streaming video data. They are pre-recorded/ file streaming, live/real-time streaming and interactive streaming. * Pre-recorded/live streaming: The encoded video is stored into a file and the system streams the file over the network. A major overhead is that there is a long initial play out delay (10-15s) experienced by the client. * Live/real-time streaming: The encoded video is streamed over the network directly without being stored into a file. The initial play out delay reduces. Consideration must be taken to ensure that play out rate does not exceed sending rate which may result in jerky the picture. On the other hand, if the sending rate is too slow, the packets arriving at the client may be dropped, causing in a freezing the picture. The timing requirement for the end-to-end delay is more stringent in this scenario. * Interactive streaming: Like live streaming, the video is streamed directly over the network. It responds to users control input such as rewind, pause, stop, play and forward the particular video stream. The system should respond in accordance to those inputs by the user. In this project, both pre-recorded and live streaming are implemented. Some functionality of interactive streaming controls like stop and play are also part of the system. 2.2.2. Video Streaming System modules Video Source The intent of the video source is to capture the raw video sequence. The CCTV camera is used as the video source in this project. Most cameras are of analogue inputs and these inputs are connected to the encoding station via video connections. This project makes use of only one video source due to the limitation of the video connections on the encoding station. The raw video sequence is then passed onto the encoding station. Encoding Station The aim of the encoding station digitized and encodes the raw video sequence into the desired format. In the actual system, the encoding is done by the DM6446 board into the H.264/AVC format. Since the hardware encoding is CPU intensive, this forms the bottleneck of the whole streaming system. The H.264 video is passed onto the video streamer server module of the system. Video Streaming and WebServer The role of the video streaming server is to packetize the H.264/AVC to be streamed over the network. It serves the requests from individual clients. It needs to support the total bandwidth requirements of the particular video stream requested by clients. WebServer offers a URL link which connects to the video streaming server. For this project, the video streaming server module is embedded inside DM6446 board and it is serves every individual clients requests. Video Player The video player acts a client connecting to and requesting video data from the video streaming server. Once the video data is received, the video player buffers the data for a while and then begins play out of data. The video player used for this project is the VideoLAN (VLC) Player. It has the relevant H.264/AVC codec so that it can decode and play the H264/AVC video data. 2.2.3. Unicast VS Multicast There are two key delivery techniques employed by streaming media distribution. Unicast transmission is the sending of data to one particular network destination host over a packet switched network. It establishes two way point-to-point connection between client and server. The client communicates directly with the server via this connection. The drawback is that every connection receives a separate video stream which uses up network bandwidth rapidly. Multicast transmission is the sending of only one copy of data via the network so that many clients can receive simultaneously. In video streaming, it is more cost effective to send single copy of video data over the network so as to conserve the network bandwidth. Since multicast is not connection oriented, the clients cannot control the streams that they can receive. In this project, unicast transmission is used to stream encoded video over the network. The client connects directly to the DM6446 board where it gets the encoded video data. The project can easily be extended to multicast transmission. 2.3. Streaming Protocols When streaming video content over a network, a number of network protocols are used. These protocols are well defined by the Internet Engineering Task Force (IETF) and the Internet Society (IS) and documented in Request for Comments (RFC) documents. These standards are adopted by many developers today. In this project, the same standards are also employed in order to successfully stream H.264/AVC content over a simple Local Area Network (LAN). The following sections will discuss about the various protocols that are studied in the course of this project. 2.3.1. Real-Time Streaming Protocol (RTSP) The most commonly used application layer protocol is RTSP. RTSP acts a control protocol to media streaming servers. It establishes connection between two end points of the system and control media sessions. Clients issue VCR-like commands like play and pause to facilitate the control of real-time playback of media streams from the servers. However, this protocol is not involved in the transport of the media stream over the network. For this project, RTSP version 1.0 is used. RTSP States Like the Hyper Text Transfer Protocol (HTTP), it contains several methods. They are OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, RECORD and TEARDOWN. These commands are sent by using the RTSP URL. The default port number used in this protocol is 554. An example of such as URL is: rtsp:// Ã‚ · OPTIONS: An OPTIONS request returns the types of request that the server will accept. An example of the request is: OPTIONS rtsp://155.69.148.136:554/test.264 RTSP/1.0 CSeq: 1rn User-agent: VLC media Player The CSeq parameter keeps track of the number of request send to the server and it is incremented every time a new request is issued. The User-agent refers to the client making the request. * DESCRIBE: This method gets the presentation or the media object identified in the request URL from the server. An example of such a request: DESCRIBE rtsp://155.69.148.138:554/test.264 RTSP/1.0 CSeq: 2rn Accept: application/sdprn User agent: VLC media Player The Accept header is used to describe the formats understood by the client. All the initialization of the media resource must be present in the DESCRIBE method that it describes. Ã‚ · SETUP: This method will specify the mode of transport mechanism to be used for the media stream. A typical example is: SETUP rtsp://155.69.148.138:554/test.264 RTSP/1.0 CSeq: 3rn Transport: RTP/AVP; unicast; client_port = 1200-1201 User agent: VLC media Player The Transport header specifies the transport mechanism to be used. In this case, real-time transport protocol is used in a unicast manner. The relevant client port number is also reflected and it is selected randomly by the server. Since RTSP is a stateful protocol, a session is created upon successful acknowledgement to this method. Ã‚ · PLAY: This method request the server to start sending the data via the transport mechanism stated in the SETUP method. The URL is the same as the other methods except for: Session: 6 Range: npt= 0.000- rn The Session header specifies the unique session id. This is important as server may establish various sessions and this keep tracks of them. The Range header positions play time to the beginning and plays till the end of the range. * PAUSE: This method informs the server to pause sending of the media stream. Once the PAUSE request is sent, the range header will capture the position at which the media stream is paused. When a PLAY request is sent again, the client will resume playing from the current position of the media stream as specified in the range header. RSTP Status Codes Whenever the client sends a request message to the server, the server forms a equivalent response message to be sent to the client. The response codes are similar to HTTP as they are both in ASCII text. They are as follows: 200: OK 301: Redirection 405: Method Not Allowed 451: Parameter Not Understood 454: Session Not Found 457: Invalid Range 461: Unsupported Transport 462: Destination Unreachable These are some of the RTSP status codes. There are many others but the codes mentioned above are of importance in the context of this project. 2.3.2. Real-time Transport Protocol (RTP) RTP is a defined packet structure which is used for transporting media stream over the network. It is a transport layer protocol but developers view it as a application layer protocol stack. This protocol facilitates jitter compensation and detection of incorrect sequence arrival of data which is common for transmission over IP network. For the transmission of media data over the network, it is important that packets arrive in a timely manner as it is loss tolerant but not delay tolerant. Due to the high latency of Transmission Control Protocol in establishing connections, RTP is often built on top of the User Datagram Protocol (UDP). RTP also supports multicast transmission of data. RTP is also a stateful protocol as a session is established before data can be packed into the RTP packet and sent over the network. The session contains the IP address of the destination and port number of the RTP which is usually an even number. The following section will explain about the packet structure of RTP which is used for transmission. RTP Packet Structure The below shows a RTP packet header which is appended in front of the media data.s The minimum size of the RTP header is 12 bytes.. Optional extension information may be present after the header information. The fields of the header are: Ã‚ · V: (2 bits) to indicate the version number of the protocol. Version used in this project is 2. Ã‚ · P (Padding): (1 bit) to indicate if there padding which can be used for encryption algorithm Ã‚ · X (Extension): (1 bit) to indicate if there is extension information between header and payload data. Ã‚ · CC (CSRC Count) : (4 bits) indicates the number of CSRC identifiers Ã‚ · M (Marker): (1 bit) used by application to indicate data has specific relevance in the perspective of the application. The setting for M bit marks the end of video data in this project Ã‚ · PT (Payload Type): (7 bits) to indicate the type of payload data carried by the packet. H.264 is used for this project Ã‚ · Sequence number: (16 bits) incremented by one for every RTP packet. It is used to detect packet loss and out of sequence packet arrival. Based on this information, application can take appropriate action to correct them. Ã‚ · Time Stamp: (32 bits) receivers use this information to play samples at correct intervals of time. Each stream has independent time stamps. Ã‚ · SSRC: (32 bits) it unique identifies source of the stream. Ã‚ · CSRC: sources of a stream from different sources are enumerated according to its source IDs. This project does not involve the use of Extension field in the packet header and hence will not be explained in this report. Once this header information is appended to the payload data, the packet is sent over the network to the client to be played. The table below summarizes the payload types of RTP and highlighted region is of interest in this project. Table 2: Payload Types of RTP Packets 2.3.3. RTP Control Protocol (RTCP) RTCP is a sister protocol which is used in conjunction with the RTP. It provides out-of-band statistical and control information to the RTP session. This provides certain Quality of Service (QoS) for transmission of video data over the network. The primary functions of the RTCP are: * To gather statistical information about the quality aspect of the media stream during a RTP session. This data is sent to the session media source and its participants. The source can exploit this information for adaptive media encoding and detect transmission errors. * It provides canonical end point identifiers (CNAME) to all its session participants. It allows unique identification of end points across different application instances and serves as a third party monitoring tool. * It also sends RTCP reports to all its session participants. By doing so, the traffic bandwidth increases proportionally. In order to avoid congestion, RTCP has bandwidth management techniques to only use 5% of the total session bandwidth. RTCP statistical data is sent odd numbered ports. For instance, if RTP port number is 196, then RTCP will use the 197 as its port number. There is no default port number assigned to RTCP. RTCP Message Types RTCP sends several types of packets different from RTP packets. They are sender report, receiver report, source description and bye. Ã‚ · Sender Report (SR): Sent periodically by senders to report the transmission and reception statistics of RTP packets sent in a period of time. It also includes the senders SSRC and senders packet count information. The timestamp of the RTP packet is also sent to allow the receiver to synchronize the RTP packets. The bandwidth required for SR is 25% of RTCP bandwidth. Ã‚ · Receiver Report (RR): It reports the QoS to other receivers and senders. Information like highest sequence number received, inter arrival jitter of RTP packets and fraction of packets loss further explains the QoS of the transmitted media streams. The bandwidth required for RR is 75% of the RTCP bandwidth. Ã‚ · Source Description (SDES): Sends the CNAME to its session participants. Additional information like name, address of the owner of the source can also be sent. Ã‚ · End of Participation (BYE): The source sends a BYE message to indicate that it is shutting down the stream. It serves as an announcement that a particular end point is leaving the conference. Further RTCP Consideration This protocol is important to ensure that QoS standards are achieved. The acceptable frequencies of these reports are less than one minute. In major application, the frequency may increase as RTCP bandwidth control mechanism. Then, the statistical reporting on the quality of the media stream becomes inaccurate. Since there are no long delays introduced between the reports in this project, the RTCP is adopted to incorporate a certain level of QoS on streaming H.264/AVC video over embedded platform. 2.3.4. Session Description Protocol (SDP) The Session Description Protocol is a standard to describe streaming media initialization parameters. These initializations describe the sessions for session announcement, session invitation and parameter negotiation. This protocol can be used together with RTSP. In the previous sections of this chapter, SDP is used in the DESCRIBE state of RTSP to get sessions media initialization parameters. SDP is scalable to include different media types and formats. SDP Syntax The session is described by attribute/value pairs. The syntax of SDP are summarized in the below. In this project, the use of SDP is important in streaming as the client is VLC Media Player. If the streaming is done via RTSP, then VLC expects a sdp description from the server in order to setup the session and facilitate the playback of the streaming media. Chapter 3: Hardware Literature Review 3.1. Introduction to Texas Instrument DM6446EVM DavinciTM The development of this project based on the DM6446EVM board. It is necessary to understand the hardware and software aspects of this board. The DM6446 board has a ARM processor operating at a clock speed up to 300MHz and a C64x Digital Signal Processor operating at a clock speed of up to 600MHz. 3.1.1. Key Features of DM6446 The key features that are shown in the above are: * 1 video port which supports composite of S video * 4 video DAC outputs: component, RGB, composite * 256 MB of DDR2 DRAM * UART, Media Card interface (SD, xD, SM, MS ,MMC Cards) * 16 MB of non-volatile Flash Memory, 64 MB NAND Flash, 4 MB SRAM * USB2 interface * 10/100 MBS Ethernet interface * Configurable boot load options * IR Remote Interface, real time clock via MSP430 3.1.2. DM6446EVM Architecture The architecture of the DM6446 board is organized into several subsystems. By knowing the architecture of the DM6446, the developer can then design and built his application module on the boards underlining architecture. The shows that DM6446 has three subsystems which are connected to the underlying hardware peripherals. This provides a decoupled architecture which allows the developers to implement his applications on a particular subsystem without having to modify the other subsystems. Some of subsystems are discussed in the next sections. ARM Subsystem The ARM subsystem is responsible for the master control of the DM6446 board. It handles the system-level initializations, configurations, user interface, connectivity functions and control of DSP subsystems. The ARM has a larger program memory space and better context switching capabilities and hence it is more suited to handle complex and multi tasks of the system. DSP Subsystem The DSP subsystem is mainly the encoding the raw captured video frames into the desired format. It performs several number crunching operations in order to achieve the desired compression technique. It works together with the Video Imaging Coprocessor to compress the video frames. Video Imaging Coprocessor (VICP) The VICP is a signal processing library which contains various software algorithms that execute on VICP hardware accelerator. It helps the DSP by taking over computation of varied intensive tasks. Since hardware implementation of number cru

Custom annotated bibliography

Sunday, July 21, 2019

H.264 Video Streaming System on Embedded Platform

No comments:

Post a Comment