This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Could data storage capacity become a bottleneck for generative AI?
While I can assure you ChatGPT was not the author of this article, generative AI has rightly dominated media headlines over the last few months for its potential to transform all kinds of industries.
Major tech companies have founded their operational plans in AI. Microsoft has stated generative AI could add $40 billion to its top line. According to Goldman Sachs, the generative AI market could increase global GDP by almost $7 trillion. Around three-quarters of companies expect to adopt AI technologies over the next five years. ChatGPT acquired over 100 million users in its first two months alone, becoming the fastest-growing consumer application ever.
However, the best AI models would be unworkable without one key element: data.
Data is critical to train AI models to uncover insights and deliver value from previously untapped information. Since tomorrow’s AI tools will be able to derive currently unimaginable insights from yesterday’s data, organisations should retain as much data as possible.
Chatbots and image and video AI generators will also produce more data for companies to manage, and they will need to keep their inferences to inform future algorithms.
By 2025, Gartner expects generative AI to account for 10% of all data produced worldwide, up from less than 1% today. By comparing this study with IDC’s Global DataSphere Forecast study, we can expect that generative AI technology like ChatGPT, DALL-E, Bard, and DeepBrain AI will create zettabytes of data across the next five years.
The only way organisations can take advantage of AI applications is if their data storage strategy allows for easy and cost-efficient ways to train and deploy these tools at scale. Massive data sets need mass-capacity storage. Now is the time to save data, if not yesterday.
Why AI needs data
According to IDC, 84% of enterprise data generated in 2022 was useful for analysis, but barely a quarter (24%) of it was analysed or fed into AI or ML algorithms. This means companies are failing to take most advantage of available data. This results in lost business value. Compare this to having an electric car: if the battery isn’t charged, the car won’t take you to your destination. If data isn’t stored, not even the smartest of AI tools will do what you want them to.
As businesses look to train AI models, mass-capacity storage will enable both raw and generated data. Companies will need robust data storage strategies to make this happen. They should look to the cloud for some of their AI data, workloads and storage, and they will also store and process some data on the premises. Hard drives, which make up approximately 90% of public cloud storage, are a reliable and cost-effective solution designed for massive data sets. They can store the vast data required to feed AI models for ongoing training.
Keeping raw data even after processing is vital too. Intellectual property disputes will occur around some content generated by AI. Industry inquiries or litigation will include questions surrounding the basis for AI insights. Demonstrating your work through stored data will help show ownership and the soundness of your conclusions.
Data quality also has an impact on the reliability of insights. To help ensure better quality data, organisations should use methods that include data preprocessing, data labelling, data augmentation, monitoring data quality metrics, data governance, and subject matter expert review.
How organisations can prepare
Companies can sometimes delete data due to retention costs, but going forward they will need to balance these costs against the need for AI insights to help drive business value.
To reduce data costs, leading organisations deploy cloud cost comparison and estimation tools. For on-premises storage, they should explore TCO-optimising storage systems built with hard drives. In addition, they should prioritise monitoring data and workload patterns over time, and automate workflows wherever possible.
Comprehensive data classification is key to identifying data needed to train AI models. Part of this means making sure sensitive data — such as personally identifiable or financial data — is handled in compliance with regulations. Data security must be robust. Many organisations encrypt data for safekeeping, but in general AI algorithms can’t learn from encrypted data. Companies need to put in place a process that securely decrypts data for training and re-encrypts it for storage.
For AI analysis to succeed, businesses should:
– Become accustomed to storing more data because data is becoming more and more valuable. Keep your raw data and the insights and don’t limit the data that can be stored. Instead, limit the data that can be deleted.
– Put in place processes that enhance data quality.
– Implement proven methods for reducing data costs.
– Deploy robust data classification and compliance.
– Ensure data is secure.
By not taking these actions, even the best generative AI models will deliver little value.
Even before the emergence of generative AI, data was critical to unlocking innovation. Companies that are the most adept at managing their multicloud storage are over five times (5.3) more likely than their peers to exceed revenue goals. Generative AI could substantially widen the innovation gap between industry competitors.
The innovative potential of generative AI has rightly been the focus of industry and media buzz. However, business leaders will soon recognise that their data storage and management strategies will make or break their AI success.
#BeInformed
Subscribe to our Editor's weekly newsletter