National data access policy – open issues

National Data Sharing and Accessibility Policy 2012 is drafted to address a major gap in current information flow between Government departments and citizens. This is a 4 year old policy i draft state for which the Government is asking for public opinion now.

You can find the complete policy document at NDSAP policy draft (opens in new window)

Based on my understanding of this policy, below are some areas which need further attention of the policymakers.

  • #Clause2: In addition to Metadata, all data sets should be accompanies with appropriate number of “tags”. The data sets themselves will be very large and hence difficult to search. Hence the producer of the data set should provide “tags” that will make search easier. This is a common concept followed in many web applications these days.
  • #Clause4: Objective should include one key thing – we are not only looking at large data sets generated over longer durations, but also daily, weekly and monthly data sets. In essence, the ultimate goal of this policy is to help reduce the dependence on RTI Act to get informtion from respective departments.
  • #Clause8: An additional type of access has to be called out here. As the data set volume grows in Terabytes and also their usefulness increases, it will be best to allow “Intermediary access” to these data sets. This enables third party companies to create commercial and non-commercial solutions for general public use. As the intermediaries themselves are not consumers of the data, it is important we legalize such an access type
  • #Clause9: OLAP and Datawarehouse are concepts that work well for structured data. The data sets received from various departments will have different structures and do not easily fit into a traditional OLAP model. Hence it is better to simple call this as a large data store. Technology plays a very important role. Web based UI to access this data is one part of solution. Other key areas to be mentioned are:
    • Creation of a National Data Cloud (NDC) with file access APIs. Think of this as similar to Amazon AWS S3. NDC should be capable of handling Petabytes of data hosted within India’s land borders
    • Another key technical feature to be called out is the Search capability based on data set Metadata and Tags
  • #Clause13: Project implementation and maintenance costs can be met partially be commercializing these data sets to 3rd party organizations. As a lot of this information will be helpful for companies to decide on their product roadmaps, customer behavior etc, it is possible to offset some of the expenditures by charging a fee from such companies

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s