What is Open Data?

Significance and Purposes of Open Data

Background:
> Dissemination of broadband, as well as performance improvement and increased choice of devices (an environment that allows corporations and people to handle large amount of data)
> ICT policy planning is changing from "sectional" efforts to strengthening "cross-sectional" efforts (the importance of cross-sectional information sharing emerged in the aftermath of the Great East Japan Earthquake).
> Rise of expectations for business use and other utilization of public data owned by the nation, local public bodies, independent administrative agencies, utilities, etc.
>> It is necessary to develop an environment (open data circulation environment) where data, which is currently used only within an organization or an industry, can be used effectively in society.

ICT Policies

Significance and purposes:
   Open Government Data StrategyPop-up Window (Strategic Headquarters for the Promotion of an Advanced Information and Telecommunications Network SocietyPop-up Window decision, July 4, 2012) lists the following three points regarding significance and purposes of open data.

> Improvement of transparency and reliability:
   By providing public data in formats that allow secondary use, citizens can sufficiently analyze and make decisions on policies and other functions of the government, either on their own or through services provided by private sectors. This improves transparency of the government, leading to more trust among citizens.

> Facilitation of public participation as well as cooperation between public and private sectors:
   As utilization of public data by a wide range of entities moves forward and more information sharing between public and private sectors takes place, provision of public services through cooperation between public and private sectors, and further, creation of services by the private sector based on information provided by public governments, are facilitated. This leads to a variety of public services with originality and ingenuity being provided quickly and efficiently, allowing the government to appropriately cope with various circumstances surrounding Japan, such as limited budget, diversification of demands and values in various activities, and advancement of information and communication technology.

> Stimulation of economy as well as improved efficiency of the government:
   Providing public data in formats that allow secondary use facilitates creation of various new businesses and efficiency of business activities through data handling stages in the market such as editing, processing, and analyzing. Thus, this action stimulates the entire Japanese economy. Furthermore, national and local governments can analyze or otherwise utilize public data in occasions such as public policy decisions to improve and advance operations.

Requirements of Being Called "Open Data"

   Ministries and agencies are already set forward to publish various data on their websites. However, to be called "open data" that matches the previously mentioned significance and purposes, the following must apply:

   (1) the data format must be machine-readable; and
   (2) the data should be published under a usage rule that allows secondary use.

   This realizes secondary use of data without requiring a lot of manpower.

   (1) Machine-readable data format
   In order for a computer to automatically reuse data, it requires that the computer must be able to identify (read) the logical structure of the data and process the values (numerical values, text and the like in the table) in the structure. Machine-readable data formats can be classified into several stages. Formats such as image files and PDF files make it difficult for a computer program to identify the data in them. In these cases, data must be manually re-entered for secondary use. When the Great East Japan Earthquake occurred, national and local governments tried to get disaster-related information they owned, such as evacuation center information, known to the public. However, aggregation and secondary use of information often consumed much time and manpower because of data format issues. In addition, with data formats not easily machine readable, automatic processing by applications for devices such as smartphones would be extremely difficult. In such conditions, spontaneous creation of public services by the private sector could not be much expected.

   (2) Usage rule that allows secondary use:
   To establish a usage rule that allows secondary use, it is necessary to explicitly indicate that the data owner allows in advance for a third party to use the data by partially modifying them (or in other words, secondary use of data). For example, copyright arises from copyrighted work. Consequently, in order for secondary use to become widely available, it is necessary to proclaim non-exercise of the copyright in advance. Taking a look at the current terms and conditions of the websites of each ministry reveals that most of them prohibit modifications without permission. They are not presented in a way that allows broad secondary use. In some cases, the terms are comprehensive and give an impression that data that is not part of a copyrighted material, such as numerical data, is covered by copyright.

Data Formats and Five Stages of Open Data

   Open data differs in its openness depending on conditions such as machine readability and handling of copyright and other rights. This can be roughly categorized in five stages, as in the following figure.

Five stages of open data and data formats

Jump to top of the page