Four key impacts of AI on data storage

Forrester: Preparing for the era of the AI PC

May 28, 2024

CIO interview: Belinda Finch, CIO, IFS

May 26, 2024

‘You knew’ – former ally accused Paula Vennells of knowing about Horizon problems

May 26, 2024

Artificial intelligence (AI) is one of the fastest-growing enterprise technologies.

According to IBM, 42% of firms with more than 1,000 employees now use AI in their business. A further 40% are testing or experimenting with it.

Much of that innovation is being driven by generative AI (GenAI), or large language models (LLM), such as ChatGPT. Increasingly, these forms of AI are being used in enterprise applications or via chatbots that interact with customers.

Most GenAI systems are, for now, cloud-based, but suppliers are working to make it easier to integrate LLMs with enterprise data.

LLMs and more “conventional” forms of AI and machine learning need significant compute and data storage resources, either on-premise or in the cloud.

Here, we look at some of the pressure points around data storage, as well as the need for compliance, during the training and operational phases of AI.

AI training puts big demands on storage I/O

AI models need to be trained before use. The better the training, the more reliable the model – and when it comes to model training, the more data the better.

“The critical aspect of any model is how good it is,” says Roy Illsley, chief analyst in the cloud and datacentre practice at Omdia. “This is an adaptation of the saying, ‘Poor data plus a perfect model equals poor prediction,’ which says it all. The data must be clean, reliable and accessible.”

As a result, the training phase is where AI projects put the most demand on IT infrastructure, including storage.

But there is no single storage architecture that supports AI. The type of storage will depend on the type of data.

For large language models, most training is done with unstructured data. This will usually be on file or object storage.

Meanwhile, financial models use structured data, where block storage is more common, and there will be AI projects that use all three types of storage.

Another factor is where the model training takes place. Ideally, data needs to be as close to the compute resources as possible.

For a cloud-based model, this makes cloud storage the typical choice. Bottlenecks in I/O in a cloud infrastructure are less of a problem than latency suffered moving data to or from the cloud, and the