Your AI is Only As Good As Your Data: “Garbage In- Garbage Out”

The age-old adage "you are what you eat" takes on new meaning in the world of artificial intelligence. Just as our bodies reflect the quality of food we consume, AI systems mirror the quality of data they're trained on. This fundamental principle, known as GIGO (Garbage In, Garbage Out), reveals a critical truth: AI can only perform as well as the data it learns from.

“You are what you eat”. Does this sound familiar from your childhood? That applies well to AI as well . “You are what you learn from”. This is also known as the GIGO principle aka Garbage in Garbage out. In simple words, garbage in garbage out is related to AI in a way that the AI systems can only perform in the quality of data they are fed. But what does this really mean and why is this an important concept for everyone to know as AI becomes more prevalent in our lives?

Data is the heart of AI.

Data is the heart of AI

What exactly does this mean by Garbage In, Garbage Out? It is like trying to cook a fancy dinner but all your ingredients are rotten? You may be very good at baking, but your dish will most probably not come out well. The same goes for AI. Most importantly, any AI decisions, predictions, or actions will be biased on using flawed data. This is more than just some minor bug: the consequences can be dire, particularly if AI’s are being used in mission critical areas such as medicine or the finance sector, or where public safety is concerned, like the justice system.

A Story of AI and Loan Approvals:

To understand how the concept of GIGO is put into use, let us see a real-life use case example where a bank uses an AI system to decide whether to approve a loan for an applicant. Through the AI system, the algorithm reviews prior loan applications to understand the profile of both successful and unsuccessful applicants. But if the historical data is biased — say it has more cases from some locations — the AI may wrongly give preference to applicants from those areas. This means similar applicants from different zip codes could have their loan denied. Beyond being unfair, it’s also bad business and can be bad for the bank.

As in the same scenario, let’s say that the data we are making the model train is too old (Like train data is from 2010 but it’s 2024 now). Let’s say that one criteria for loan approval is that model understanding the ability of the person to pay the loan back. On doing its own computation and studies the model decides that “Hmm, if the person has an average income of $1500 per month he would be able to pay back this loan as he would have enough money left after his monthly utilities.” But in reality in 2010 yes he could do that, but in 2024 with the same amount he would still need to make some extra pay to meet the ends. And now the trained AI model starts giving approval to everyone who has an avg $1500 for a loan which will result in people defaulting and becoming a burden to the bank and affecting its whole process.

How Can We Fix This???

Clean Up Your Act — Just as we wash our fruits and veggies before eating it, data needs to be cleaned. This will involve correcting errors, filling in missing data, ensuring consistency etc.
A Well-rounded Data Diet — Just as a balanced diet includes fruits, veggies, proteins and carbohydrates, AI systems need data of different types. Instead of getting “nutritional deficiencies” — or biases in AI-speak — in which food is missing and bodies are sick, this helps prevent biases alongside the concept of “overfeeding” or “hyperthermia” in AI.
Expiry Date — Just as food expires, data gets old. Regular monitoring can stop your AI deploying stale information and making decisions based upon it. This was explained in the example we have discussed.
All Serve Equally — As we design norms for implementing AI, it is our responsibility to make sure they are inclusive and mutually beneficial to everyone. It is akin to ensuring that everyone at the table has an equal portion of the meal.

What Lies Ahead

The Future of AI and Data Quality As the pervasiveness of AI expands, from managing our transportation to diagnosing medical conditions, it becomes increasingly important that the data to which AI is exposed is as clean, diverse, and well-curated as possible. Advances in technology can help manage and make data quality better, however, we have a big role to play in keeping ourselves informed and concerned.

Conclusion

Also take note, AI is a tool and that like any other tool, depends how one uses it. We can only make sure that AI works for us if it’s clear that GIGO is true, with all its implications for the efficacy, fairness, reliability. So next time we implement a loaded data set, let’s try to remember to control the data we allow these AI systems to consume that way better data equals better AI which is ultimately good for everyone.

If you liked the content go ahead and give a 👏.

If you are someone with fascinating ideas with AI and like to work it out together connect with me on LinkedIn.

https://www.linkedin.com/in/anudev-manju-satheesh-218b71175/