CASE STUDY
Mobile Internet Big Data Platform in China Unicom
ABSTRACT
China Unicom, the largest WCDMA 3G operator in China, meets the requirements of the historical Mobile Internet Explosion, or the surging of Mobile Internet Traffic from mobile terminals. According to the internal statistics of China Unicom, mobile user traffic has increased rapidly with a Compound Annual Growth Rate (CAGR) of 135%. Currently China Unicom monthly stores more than 2 trillion records, data volume is over 525 TB, and the highest data volume has reached a peak of 5 PB. Since October 2009, China Unicom has been developing a home-brewed big data storage and analysis platform based on the open source Hadoop Distributed File System (HDFS) as it has a long-term strategy to make full use of this Big Data. All Mobile Internet Traffic is well served using this big data platform. Currently, the writing speed has reached 1 390 000 records per second, and the record retrieval time in the table that contains trillions of records is less than 100 ms. To take advantage of this opportunity to be a Big Data Operator, China Unicom has developed new functions and has multiple innovations to solve space and time constraint challenges presented in data processing. In this paper, we will introduce our big data platform in detail. Based on this big data platform, China Unicom is building an industry ecosystem based on Mobile Internet Big Data and considers that a telecom operator-centric ecosystem can be formed that is critical to reaching prosperity in the modern communications business.
The object of this case:
China Unicom wants to lead to embracing the Mobile Internet Explosion and builds a big data platform to solve the challenges of data acquisition, data analysis, and data value-added services.
The problem in this case:
- The client users of China Unicom increase rapidly with a Compound Annual Growth Rate (CAGR) of 135%.
- China Unicom’s big data platform, starting from October 2009, has recorded monthly traffic of more than 2 trillion records, monthly data volume is over 525 TB and the maximum data volume recorded has reached a peak of 5 PB. Overall writing speed has reached 1.390.000 records per second, and the recorded retrieval time in the table that contains trillions of records is less than 100 ms.
- Any mobile network operator even only recording network flow data, the resulting data repository could easily reach the Terabyte level on a yearly basis. However, if all mobile traffic data is recorded for forensic analysis, the volume of the data could easily reach the Petabyte level.
The solution to this case:
- Use the principle of aggregation. The principle of the aggregation is that a user’s valid behaviour data should not be lost and that efficiency is required to reduce the invalid data. Then the file is produced in less than five minutes, and the volume of every file is less than 200 MB. If the size of one single file exceeds 200 MB, multiple files will be produced to guarantee that the size of the single file is below the threshold, and the additional related files are identified by appending a hexadecimal number.
- Transmit file by FTP protocol to the twenty-four FTP servers located in Beijing and to reduce the bandwidth of transmission, all files are compressed by the bzip2 compression algorithm before the files are uploaded to Beijing from every province. After being decompressed, the files are written into an HBase by a native Java API supported by HBase. In HBase, an online record table will be generated for each month.
- China Unicom compares the performance of the Oracle and HBase by querying the record of a specified telephone number. The results are:
- Compared with Oracle database, China Unicom HBase shows very consistent performance.
- Oracle database, the higher the number of concurrent query transactions conducted, the slower the average response time becomes. The impact of the size of records in the database has a deleterious effect on the query performance. However, for our optimized HBase system, the latency of most responses is in milliseconds, and the impact of the records already in the database is quite low compared with Oracle database.
- China Unicom work is optimized based on the open source nature of HBase, while Oracle database is a proprietary one where China Unicom cannot optimize the code to speed up transactions in the traffic records repository.