Challenges to AI-ML-based search engines.

Do privacy laws have an impact?


The power of search engines is astonishing. You go to your favorite search engine, enter a few terms, and it magically crawls the entire internet for the most relevant search result for each query.

With Google Search, Google has achieved verbal dominion over a whole category of digital experience, which is a rare feat for a tech business. A trademarked title becoming so ubiquitous that it has entered common usage is not something that one sees happening every day.

How does Google produce such accurate results?

The answer lies in the algorithms and models in its arsenal; which are fed the data Google gathers from its users and personalizes the results. These editorialized results are informed by the personal information Google has (like search, browsing, and purchase history), and put the user in a bubble based on what Google's algorithms think they’re most likely to click on.

Far from mere data packets that may even be encrypted, Google directly sees the personal interests, plans, purchases, fears, hopes, fantasies, and secrets of its users. Google users routinely conduct searches that reveal their fitness or medical condition (e.g. by searching for medications, specialists, and terminology), financial health, shopping and purchases, and so on.

Despite the great volume and diversity of data collected by Google, this is not alone what distinguishes them from so many other Internet services and applications. On the contrary, when combined with large-scale artificial intelligence and machine learning algorithms and models, it is the intimacy of this data and the powerful inferences that can be made from it.

While the complexities of these algorithms are not public, Google uses something known as RankBrain as a key component of Google’s core algorithm.

Pre-RankBrain, Google utilized its basic algorithm to determine which results to show for a given query. Post-RankBrain, it is believed that the query now goes through an interpretation model that can apply possible factors like the location of the searcher, personalization, and the words of the query to determine the searcher’s true intent. By discerning this true intent, Google can deliver more relevant results.

Recently, it also announced another technology termed DeepRank which will use Deep Learning to understand each sentence and paragraph and the meaning behind those paragraphs, and after Google understands the meaning of each paragraph on the Internet, it matches the meaning of the search query with the paragraph that gives the best answer.

Impact of Privacy laws

Artificial intelligence is a data hog; effectively building and deploying AI and machine learning systems require large data sets, which Google has aplenty. Moreover, the amount of data in the world doubles every two years, which would mean that these AI/ML-based search engines (not limited to Google) should flourish and improve as time goes.

Here comes a different side to the coin. There is a rising awareness of digital privacy and data ownership and people have begun to realize how much data these digital services collect in exchange for their services. As such, to protect consumers’ data there have been many data protection laws enforced by lawmakers.

For example, the European Union (EU) General Data Protection Regulation, which took effect in May 2018, imposes an onerous burden on any organization that handles data from any European individual. Businesses that fail to comply with the GDPR's core principles and requirements may face substantial fines.

Simply put, AI systems rely on data, and any attempt to prohibit or limit access to that data through data protection rules may have a chilling effect on the speed of AI innovation.

What Do We Understand About Our Data?

The topic of digital privacy has hardly been out of the news. There is significant debate in governments, boardrooms, and the media about how data is gathered, stored and used and what level of ownership the public should have over their data.

People realize that there are trade-offs when they use digital services, but many are unaware of the amount of information collected, how it is used, and with whom it is shared. It's easy to think of an email address or a birth date as a single, separate puzzle piece, but when small amounts of data are constantly fed into an ever-consuming, ever-calculating algorithm, they add up to a startlingly complete picture.

Challenges to AI-ML based search engines

One of the most revolutionary aspects of the GDPR was the “right to be forgotten” — a debated right that is sometimes interpreted as empowering individuals to request the erasure of their information on the internet, most commonly from search engines or social networks.

This data privacy principle simply means that any European consumer can request that a corporation erase and permanently remove any data that has been saved about them.

But the same principle, when applied to artificial intelligence, will cause several problems for companies. First, the data is used to "train" the machine learning algorithm to make it smarter over time. Algorithms are designed to get as much data as possible, and then use it wisely to make informed decisions. However, the "right to be forgotten" seems to mean that the machine needs to "unlearn" what it has learned.

Another key issue facing the artificial intelligence industry is "algorithm transparency." This means that any consumer has the right to be explained when making automatic decisions that directly affect them. From the legislator's perspective, this makes sense because it ensures that biases and prejudices cannot enter the decision-making process.

However, as anyone who has ever studied machine learning knows, the algorithms used by intelligent machines are quite complex. In many cases, machine learning algorithms are a "black box" that is difficult to explain to consumers.

Regulations will encourage innovation.

As one can see, the purpose of data protection laws is not to kill AI, their purpose is to legally protect personal data. GDPR is not about eliminating innovation in artificial intelligence, it is a way to incorporate data protection standards and protect people's privacy.

Here’s the silver lining: Far from stifling innovation, privacy requirements and regulations aim to promote the data economy and provide organizations with innovative ways to engage with customers and build a competitive advantage. Successful organizations will be those that can go beyond regulatory constraints and use them as a catalyst for digital transformation.

Companies such as Google have already started to come up with new methods of privacy-oriented machine learning techniques, such as

  • Federated learning - A technique that trains an AI algorithm across decentralized devices or servers (i.e., nodes) holding data samples without exchanging those samples, enabling multiple parties to build a common machine learning model without sharing data liberally.

  • Differential privacy - A system for publicly sharing information about a data set by describing patterns of groups within the corpus while withholding data about individuals.

Moreover, it also paves the way for new companies to showcase their privacy-oriented services, encouraging competition in an otherwise dominated industry.

For example, DuckDuckGo is a search engine service that is gaining traction rapidly and claims to collect no personal information, yet is nearly as effective as Google. It also offers users the non-existence of the "filter bubble" phenomenon of personalized search results i.e. the search results will effectively be the same, regardless of the user.

With a privacy-first approach to development and a better understanding of how data travels inside an organization, data laws, and AI can coexist. Increased AI-ML-based regulation will likely have a good impact on AI's future, driving us to be more innovative and adopt a new perspective that balances data access with privacy.