Today, big data has implications across all industries: from healthcare, automotive, telecom; to IoT and security. As the data deluge continues, we are finding newer ways of managing and analyzing, to gather actionable insights and grapple with the challenges if security and privacy.
The Association of Computing Machinery (ACM) just concluded a celebration of 50 years of the ACM A.M. Turing Award (commonly known as the “Nobel Prize of computing”) with a two-day conference in San Francisco. The conference brought together some of the brightest minds in computing to explore how computing has evolved and where the field is headed. Big data was the focus of a number of panels and discussions at the conference. The following is a discussion with Vipin Kumar, Regents Professor and William Norris Chair in Large Scale Computing at University of Minnesota; ACM Fellow 2015.
Question: Gartner estimates that there are currently about 4.9 billion connected devices (cars, homes, appliances, industrial equipment, etc.) generating data. This is expected to reach 25 billion by 2020. What do you see as some of the primary challenges and opportunities this wave of data will create?
Vipin Kumar: This explosion of data from interconnected devices provides information about ourselves, our surroundings, and the gadgets we use in our daily life at an unprecedented level of detail. It also offers a huge opportunity to improve our everyday lives—in terms of safety, health, and many kinds of efficiencies.
One of the major challenges we’re going to see is that the data being gathered from these connected devices and sensors is very different from other datasets that our Big Data community has had to deal with in the past.
For example, the biggest recent successes we’ve seen in Big Data are in applications such as Internet search, e-commerce, placement of online ads, language translation, image processing, and autonomous driving. These successes have been enabled, to a great extent, by the availability of large, relatively structured data sets that can be used to train a broad range of machine learning algorithms. But the data from multitudes of interconnected devices, in its raw state, can be highly fragmented, disparate in space and time, and very heterogeneous. Analyzing such data will be a big and new technological challenge for the machine learning and data mining communities.
Question: As more data is collected from a growing pool of devices, has the individual lost the right to information privacy?
Vipin Kumar: As the technology keeps advancing, there is a need to redefine and understand what we mean by privacy. For example, airlines, hotels and credit card companies all keep track of where we are, places we travel to, and what we buy as customers. We are happy to have them collect that information in exchange for some benefits, such as frequent flyer rewards. On the other hand, when we visit Internet sites or use mobile devices, a good deal of information is automatically collected about our location, what we access, and so forth. We are not always aware of what information is being collected about us and how that information is being used or misused.
Now, in the context of interconnected devices and appliances, the information gathered grows exponentially. Technology is advancing so rapidly in this area that there is an urgent need to build safeguards and better policies. In particular, it won’t be appropriate for us to say that no one should keep track of any data about us, given the huge value that can be gained from shared information, but we have to be careful.
One of the biggest examples of the possible benefits and pitfalls of Big Data can be in the healthcare sector. Healthcare data about the population at large can be analyzed to create individualized treatments, an area also known as precision medicine. However, there are huge concerns about possible misuse of these kinds of information, such as discrimination in hiring or in purchasing health insurance, if this information is not handled properly. The healthcare community is on the front lines in this area, but, given the complexity of issues involved, progress in addressing these concerns is very slow.
I wouldn’t say that individuals have lost the right to privacy in the age of data being collected all around us, but we do need to evolve our understanding of what kind of privacy we want, and protect it through policies and technological solutions.
Question: What moral quandaries do you see arising from the increasing use of predictive data analytics? How do we overcome these challenges?
Vipin Kumar: Predictive data analytics is a key technology for harnessing the power of Big Data. But its focus on optimizing gains at an aggregate level can lead to unintended consequences in a heterogeneous population that contains a long tail of smaller subgroups. Since Big Data analytics may be able to optimize the overall performance by focusing only on the larger groups, it is possible for some groups to be left out completely. For example, a method for scoring credit worthiness of customers may maximize overall profits for a mortgage lender but could unfairly downgrade members of a certain demographic subgroup. It’s important to make a conscious effort to avoid such situations, especially when they impact people from disadvantaged groups.
Technically speaking, what we need are metrics for profit or loss that pay special attention to this long tail. That is, you’re not just maximizing the overall profit, but you’re also considering if different sub-populations are being treated fairly.
Question: Security is a hot topic regarding Big Data. To what extent will Big Data be responsible for new security problems and challenges?
Vipin Kumar: Actually, it is already responsible for some spectacular failures, and you’ve probably heard about many of them in the news, such as the recently announced hacking of over 1 billion accounts at Yahoo! and other similar incidents at companies such as Sony and Target. Even government systems that are meant to be highly secure are not immune from hacking. For example, during the summer of 2015, it was discovered that the entire database of the US Office of Personnel Management got hacked. Since it contained detailed background information on people who were considered for national security clearance, it impacted not only those who applied for clearance but everyone closely related to them.
When you have lots of data available in one place, it offers huge opportunities for doing something really good, but it also becomes prone to being hacked. These problems become even more serious in the context of interconnected devices, as they can often be hacked much more easily than large data centers. And the stakes can be much greater. For example, hacking of a self-driving car could turn it into a highly dangerous weapon under the control of a terrorist. Security-related concerns will probably be the biggest impediment to the wide adoption of the IoT.
Question: Are there potential technological breakthroughs on the horizon that you think could transform this area again in the near future?
Vipin Kumar: New types of sensors and communication technologies can be quite transformational. The kinds of sensors that we see today, we couldn’t’t even have been imagined just a few decades ago. Mobile health sensors such as Fitbit and Apple Watches that can record our physiological parameters at unprecedented detail have been around only for the past decade or so. New types of sensors based on advances in electronics, nanotechnology and biomedical sciences are already enabling deployment of a whole bunch of small and inexpensive satellites that can monitor the Earth and its environment at spatial and temporal resolutions that would not have been possible just a few years ago. Without technologies such as RFID, it would be very hard for someone to imagine that you could walk into a store and purchase something just by looking at it or by being close to it. This is now possible at Amazon Go, a grocery store in Seattle that has no checkout counter. New sensors based on quantum technology may open up entirely new applications that we are not even considering today.
Breakthroughs are also needed in our ability to analyze new kinds of data from interconnected but disparate sensors that are distributed in space and time. The data we have been able to collect through these methods has often been incorrect, incomplete, unreliable, and untrustworthy.
Question: In what ways can Big Data be better utilized for greater public benefit?
Vipin Kumar: I believe that Big Data and associated technologies, if used effectively, can have a huge impact on just about every major problem affecting our society today. It could improve the efficiency of food production, reduce waste of critical resources such as water and energy, and help us live healthier and more fulfilling lives.
For example, Big Data from agricultural machinery and environmental sensors could enable farmers to give just the right amount of water and fertilizers to the right seed at the right time. Smart and interconnected devices at home and workplaces can configure themselves to use energy efficiently while adapting to the erratic nature of supply from the renewable sources such as wind and solar.
This will be absolutely critical as we try to feed the world’s growing population with a shrinking amount of land available for producing food. The challenges include competition from energy crops and urbanization, as well as stagnant growth in crop yields, which may come under further stress due to changing climate.
If we keep doing things the way we have been doing today, the burden on the environment is going to be far greater than it can handle. So, we have to look toward Big Data and predictive data analytics to help us live in the world sustainably.
I’m very positive that Big Data will be a force for the greater public good.
Sign up for the free insideAI News newsletter.