读者QQ群②:190771709,投稿请发dashuju36@qq.com
我要投稿

揭密全球最大生物识别数据库——印度Aadhar项目

Aadhar

印度实施的身份识别项目“Unique Identification project”(亦称“Aadhar”计划)在本周早些时候完成了对逾5亿人的人口统计与生物识别数据采集工作,它也是当今世界上规模最大的一个生物识别工程。

“Aadhar”计划已经实施多年时间,一直伴随着隐私和安全保护团体的批评声。这个项目在本周的最新进展又让外界对其捕获、存储和管理数据的方法,以及美国公司MongoDB的参与充满了担忧。

MongoDB 是一家基于NoSQL数据库的创业公司,去年获得了与美国中情局(CIA)有着千丝万缕关系的In-Q-Tel的投资。In-Q-Tel是一家独立的非盈利创投机构,得到了CIA及美国其他情报机构的支持。

在过去几天,印度多家媒体援引本国政党和活动人士的说法,对Aadhar项目中的敏感数据是否遭泄露提出了质疑,这个项目目前由Infosys联合创始人南丹·尼尔卡尼(Nandan Nilekani)领导。一些报道还将这场争议与MongoDB联系起来。

世界各国政府都对美国国家安全局(NSA)实施的“棱镜”计划提出了担忧,与美国政府情报机构有任何一点瓜葛的事情,都足以引起轩然大波。此外,由于印度将在明年举行大选,与政治有关的论调也达到了历史上前所未有的水平。

上述指责的时机恐怕再糟糕不过了,至少对于“Aadhar”这个雄心勃勃的身份识别项目来说是如此:一个与该项目有关的法案目前正在接受印度国会的审批,一旦通过,它将成为一个完全符合宪法的机构。

我日前拜访了位于印度班加罗尔的Aadhar办公地点,根据接受我采访的官员的说法,整个事情的真相也浮出水面:虽然有人指责合同中包含了与MongoDB分享数据的条款,但事实上Aadhar只是使用MongoDB的开源代码,并不涉及敏感数据。此次班加罗尔之行还让我有机会了解到当前地球上最大的生物识别数据库的运作机制,及其应对安全与隐私方面担忧的办法。

此外,印度独有身份识别局(the Unique Identification Authority of India,以下简称“UIDAI”)还驳斥了Aadhar项目与美国情报机构分享印度公民数据的指控。

Aadhar对印度意味着什么?

若想明白Aadhar计划实施的原因,以及这个项目对印度这样的国家意味着什么,我们先来了解一下相关背景。印度有超过5亿人没有正式的身份证明(ID),这引发了一系列问题,如他们没办法得到政府援助、开设银行账户、申请贷款、考取驾照等等。Aadhar项目目前正以每天100万人的速度录入印度公民的数据,预计将在明年底录入约12亿人的数据,由此成为地球上最大的生物识别数据库。

拥有12位Aadhar数字的最大优势是,政府能够将银行账户与穷人联系起来,直接将现金收益和其他补贴转到他们的银行账户。目前,印度已有近4000万个银行账户与Aadhar数据联网。

市场研究机构CLSA的统计数据显示,印度政府为本国穷人专门安排了规模达2500亿美元的补贴和其他收益,其中40%以上将在未来几年被有关人员贪污。Aadhar项目会消除补贴穷人过程中的中间环节,将现金直接打到需要政府补贴的人的银行账户上,从而遏制腐败行为。

但一些智库机构和活动人士始终对Aadhar项目存在的隐私问题表示担忧,甚至对整个项目的有效性提出了质疑,这其中就包括位于班加罗尔的互联网和社会中心(Centre for Internet & Society)。

走进全球最大的生物识别数据库

我曾多次尝试与Aadhar项目官员进行会面,以了解该项目在安全方面的问题、最新进展,以及他们对有关MongoDB数据分享争议的反应。

在周五,他们最终同意在班加罗尔南郊的Aadhar项目总部与我见面,英特尔和思科印度总部也都位于该地区。从外面看,存储了印度所有公民数据(目前数据总量达到5PB)的 Aadhar 技术中心,完全不像是政府建筑,而是让人误以为是附近英特尔或思科的办公大楼。

在Aadhar项目总部里面,当我走进一个中央位置摆放着十几台电视屏幕的房间时,看到几个二十出头的年轻工程师正兴奋地坐在前面,在电脑键盘上不断敲击着数字或字母,查询存储信息的数据包的活动,整个场景看上去就像是一个功能先进的控制中心。

电视屏幕上显示了这些数据包(每个容量在5MB左右)的情况,这个过程从它们在印度全国的3万个录入中心载入开始,还要经过至少三次信息验证。验证过程包括对每一份个人档案进行双重校验,确保同一人不会拥有两个以上的Aadhar号码。

所以,每建立一个新的个人档案,都要对现存所有档案进行一次“数据重复删除”(de-duplication)校验,目前这个数字已经超过了5亿。

前英特尔工程师斯利坎特·纳德哈姆尼(Srikanth Nadhamuni)在2010年9月帮助开发了Aadhar技术平台,目前他是班加罗尔Khosla实验室的负责人。纳德哈姆尼告诉我,这些数据包都经过了2048位加密存储处理,一旦有人试图未经授权访问这些数据,它们都能自我销毁。

如何应对MongoDB争议

Aadhar为何从一开始就与MongoDB合作呢?这种合作是否还会继续下去呢?Aadhar技术中心助理总干事苏德希尔·纳拉亚纳(Sudhir Narayana)告诉我,MongoDB只是Aadhar项目最初选用的用于处理数据库搜索的多家产品之一,其他的还包括MySQL、Hadoop和HBase。与只能存储人口统计数据的MySQL不同的是,MongoDB还能存储图像数据。

但是,在意识到MongoDB无法处理数百万的数据包以后,Aadhar逐渐将大部分相关数据库工作转移到MySQL平台上。目前,他们已经在使用“数据库分片”(database sharding)技术:将数据包存储在不同的机器上,确保系统不会在数据量增加时突然崩溃。

这种做法帮助Aadhar项目减少了对MongoDB的依赖,同时转而采用MySQL平台存储大部分数据。Aadhar 技术中心副总干事阿索科·达尔瓦伊(Ashok Dalwai)告诉我,MongoDB无法访问任何的生物识别数据。“我们认为,使用开源技术可以避免完全依赖某一家厂商的情况,但这不意味着我们会在安全方面做出丝毫的妥协,”达尔瓦伊说。

在我们与MongoDB取得接触后,该公司发言人给了这个声明链接,让我们从中了解它与In-Q-Tel的关系。

更为重要的是,印度UIDAI早在MongoDB从In-Q-Tel获得资助之前,就开始使用该公司的开源软件。正如Crunchbase的数据显示,MongoDB在2012年从红帽、英特尔创投和In-Q-Tel等三家公司获得总计770万美元的风险投资。

Aadhar项目未来前景如何?

上述官员们表示,尽管围绕这个项目存在诸多争议,但Aadhar仍在朝着2014年录入逾12亿印度公民数据的目标迈进,这种努力最终将创建一个规模达15PB的数据库。

目前,Aadhar项目正以每天100万人的速度录入个人身份识别信息。纳拉亚纳告诉我说,他相信从明年开始录入速度将会提升至每天大约200万人,从而更快地实现将剩余7亿人录入数据库的目标。

英语原文:

India’s Unique Identification project, also known as Aadhar, earlier this week finished capturing demographic and biometric data of over half a billion residents–the largest biometric project of its kind currently in the world.

It’s been a multi-year effort not without its critics among privacy and security advocates and others. The latest development this week concerned the method that Aadhar is using to capture, store and manage the data, and the role a startup from the U.S. called MongoDB may be playing in it.

MongoDB, a NoSQL database startup, last year raised funding from the CIA-backed In-Q-Tel, an independent non-profit venture backed by the CIA and other U.S. intelligence agencies.

During past few days, several reports in the Indian media have quoted political parties and activists, raising questions about whether sensitive data is being compromised by Aadhar, headed by the Infosys co-founder Nandan Nilekani.

Some of the reports have linked the controversy with MongoDB.

Governments across the world are raising concerns over spying by the National Security Agency, and anything even remotely associated with U.S. government intelligence agencies is enough to cause uproar. Moreover, with general elections set to be held next year, political rhetoric is at an all time high in India.

Still, the timing of these allegations couldn’t have been worse, at least for the ambitious identification project, which is waiting for a parliament bill to be passed this year to be established as a fully constitutional authority.

I took a tour of Aadhar’s offices in Bangalore, and the truth of the matter, according to officials I spoke to, is that while some have alleged large contracts that include sharing data with MongoDB, the reality is that Aadhar is using MongoDB open source code that doesn’t touch sensitive data. The meeting also offered an opportunity to understand how the biggest biometrics database on earth is functioning, and dealing with concerns of security and privacy.

Moreover, the Unique Identification Authority of India (UIDAI), refuted allegations of sharing Indian residents’ data with any U.S. agencies.

What Aadhar means for India

To set the context right here about Aadhar, and what it means for a country like India, more than half a billion people have no official ID of any kind, which makes it impossible for them to receive government aids, open a bank account, get a loan, get a driving license, and so on. The database project, which is now enrolling over one million Indians residents a day, is scheduled to sign up about 1.2 billion people by the end of next year, making it the biggest biometrics database on earth.

One of the biggest advantages of having a 12-digit Aadhar number is that the government can link bank accounts of the country’s poor with it, and directly transfer cash benefits and other subsidies. Already, nearly 40 million bank accounts in India have been linked with Aadhar.

According to research firm CLSA, more than 40% of the Indian government’s $250 billion worth of subsidies and other benefits meant for poor, will be lost to corruption over next few years. Aadhar will remove the middlemen and curb any corruption by enabling direct cash transfer to those who need government subsidies.

But several think-tanks and activists including Bangalore-based Centre for Internet & Society, have been raising concerns about privacy issues and even questioning the effectiveness of the entire project.

Inside the biggest biometrics database on earth

I have been trying to get meetings with the officials at Aadhar to understand security aspects, progress so far and their reaction to the MongoDB allegations.

They finally agreed to meet on Friday in their headquarters across the road in one of Bangalore’s southern suburbs, where both Intel’s and Cisco’s India headquarters are located. From outside, Aadhar’s technology center, which stores all residents’ data (now totalling 5 Petabytes in size) does not look like a government building at all—it could pass for as one of the buildings housing Intel or Cisco nearby.

Inside, as I walked into a room with about dozen television screens in the center of it, some twenty young engineers feverishly looked ahead, typing on their computer keyboards, checking the movement of data packets storing information, the setting looked like a very sophisticated command center. The television screens they were looking at showed the journey of these data packets (each sized at around 5MB) from the time they are logged at one of the 30,000 enrollment centers around the country, through at least three stages of validation. Validation includes running duplication checks for each of the profiles to ensure there are not more than one Aadhar number for the same person.

So, for every new enrollment, a ‘de-duplication’ check is done against all existing profiles, which is over half a billion currently.

Srikanth Nadhamuni, a former Intel engineer who helped set up Aadhar’s technology platform in September 2010, and is now running Khosla Labs in Bangalore, tells me that these data packets are stored behind 2048-bit encryption and capable of self-destruction if any unauthorized access is attempted.

Dealing with MongoDB controversy

So why did Aadhar engage with MongoDB in the first place and will it continue working with the startup?

Sudhir Narayana, assistant director general at Aadhar’s technology center, told me that MongoDB was among several database products, apart from MySQL, Hadoop and HBase, originally procured for running the database search. Unlike MySQL, which could only store demographic data, MongoDB was able to store pictures.

However, Aadhar has been slowly shifting most of its database related work to MySQL, after realizing that MongoDB was not being able to cope with massive chunks of data, millions of packets.

They have already started using ‘database sharding’: a process where data packets are stored across different machines to ensure the system does not crash as volumes rise.

This has helped Aadhar reduce its dependency on MongoDB and instead use MySQL for storing most of the data.

Ashok Dalwai, deputy director general of the tech center, told me that MongoDB has no access to any biometric data.

“We believe in using open source technologies to avoid any vendor lock-in, but that doesn’t mean we are in any way, compromising security,” Dalwai said.

When contacted, a MongoDB spokesperson redirected to this announcement about the company’s funding involving In-Q-Tel.

And more importantly, UIDAI started using MongoDB’s open source software much before the startup received any funding from In-Q-Tel. As this Crunchbase entry shows, MongoDB received venture round funding of $7.7 million from Red Hat, Intel Capital and In-Q-Tel, only in 2012.

So what lies ahead for Aadhar?

Despite all the controversies surrounding it, Aadhar is on track to enroll over 1.2 billion Indian residents by end of 2014, the officials added. This will create a database of about 15 petabytes in size.

Currently, the project is enrolling around one million residents in the country a day. Narayana told me that he’s confident of achieving around two million enrollments a day from next year, and that will help bring the remaining 700 million people into the database.

via:techcrunch

36大数据

End.

转载请注明来自36大数据(36dsj.com):36大数据 » 揭密全球最大生物识别数据库——印度Aadhar项目

36大数据   除非特别注明,本站所有文章均不代表本站观点。报道中出现的商标属于其合法持有人。请遵守理性,宽容,换位思考的原则。

评论 抢沙发