pwshub.com

Fast-scaling GenAI users can be 'corner cases' in testing

Interview When OpenAI launched GPT-4 in March last year, it was coy about the model's size and what went into making it. Nonetheless, the current focus of AI-obsessed media and investors is understood to have employed a diverse dataset of around 1 petabyte. Aside from the challenge of getting that data to provide meaningful output, the company was tasked with getting the data in the right place.

AI

OpenAI claims GPT-4 will beat 90% of you in an exam

READ MORE

Step forward Fivetran, an automated data integration outfit that isn't shy to talk about its partnership with OpenAI, the company that – for good or for ill – has come to symbolize the tidal wave of interest in GenAI.

Speaking to The Register, CEO George Fraser said OpenAI represented one end of the extreme of its customers, while long-established global businesses such as consumer goods giant Procter & Gamble represented another.

"You look at a company like OpenAI or other startups; they have infrastructure that looks like a small company infrastructure, except for scale. It's like a baby that's like 100 stories tall. You encounter unexpected and different problems," he said.

Fraser explained that companies like P&G will generally have a lot of data that is spread out across enterprise systems such as SAP, which, while complex, is known to the user.

"The company that's had large data volumes for a long time, like Procter & Gamble – you come in, there are challenges, but you tend to work them out in the proof-of-concept phase," he said.

However, a user that has grown with Fivetran, such as OpenAI, presents different challenges in terms of data integration, he said.

"The scale of data brings serious challenges, but not the ones that people think. People tend to think the problem of scale is spinning up lots of machines… lots of CPUs, and crunching numbers really hard, but that's not really it: that part is easy.

"The hard part is that you hit corner cases of the APIs that no one ever really thought about. You find you cannot pull an endpoint as frequently as you want. Or you have weird, like, n-squared behavior when you try to update data.

"It's more like problems with the design of all the other systems you have to work around. Whoever designed this system and designed the APIs didn't anticipate this extreme scenario or new problems appearing in these extreme scenarios. It's not like the sort of big iron number-crunching, supercomputer-type stuff that people want it to be."

In September, Fivetran announced it had surpassed $300 million in annual recurring revenue, up from $200 million in 2023, although these figures have not been audited according to the rules of public companies.

The company says its aim is to help organizations move data securely and efficiently, supporting GenAI, real-time decision-making, and optimized business operations. Recent wins included UK-based retail group Kingfisher, which owns the B&Q and Screwfix brands.

  • Microsoft, Databricks double act tries to sew up the data platform market
  • VCs lay $52.5M golden egg for MotherDuck's serverless analytics platform
  • Fivetran slammed for dropping SQL support. CEO: 'Blame me for this'
  • Fivetran snags $565m funding round as Snowflake attempts to eat its lunch with in-house data integration tools

Fivetran remains VC-funded. Its most recent funding round was in 2021, when it announced a Series D round of $565 million, valuing the company at $5.6 billion. At the same time, it used some of its startup capital to buy HVR, a data pipeline company that specializes in replicating data from commonly used mission-critical databases.

Despite its popularity, Fivetran has attracted criticism for its slow support for data lakes, especially those using AWS S3 storage, which the company launched last year. It has since introduced a managed data lakes service.

It's not like the sort of big iron number-crunching, supercomputer-type stuff that people want it to be

It promised the new service would remove the repetitive work of managing data lakes by automating and streamlining the process for clients. The service currently supports Amazon S3, Azure Data Lake Storage (ADLS), and Microsoft OneLake, with support for Google Cloud on the horizon.

Fraser explained that support for data lakes required table formats – particularly Apache Iceberg – to become sufficiently mature before it could support data lakes.

"It also took time for us to develop a good implementation," he said. "The key thing we needed was Iceberg, and then there was a bunch of work that we had to do downstream of that. That took a long time. It took a couple of tries, and two years of development."

Despite the significant engineering investment, Fraser said Fivetran was not desperate to raise more capital. "We haven't raised money in years: we are a pretty mature business and our cash flows are pretty predictable. Like a lot of people, after COVID, we rediscovered how to be efficient. We operate basically on a break-even cash flow basis."

Nonetheless, he said the long-term plan was to take the company public, just like data lake and analytics company Databricks, which started talking about its long-delayed IPO about four years ago.

Fraser said: "We will go public but I'm not sure exactly, I joke it will be six months after Databricks." ®

Source: theregister.com

Related stories
1 week ago - If you're looking for an Instant Pot that performs specific food functions, you'll need to know which models can actually do that. Here are our favorite Instant Pot models and what they do best, to help you find the perfect fit.
6 days ago - The secret to improving workload performance is to stop bottlenecking your AI Commissioned  In the fast-paced world of AI, GPUs are often hailed as the quiet powerhouse driving innovation.…
1 week ago - You can take advantage of your wireless carrier's offers, or you can DIY your trade-in to get money without committing to your carrier.
4 days ago - You can take advantage of your wireless carrier's promotions, or you can DIY your trade-in to get money without committing to a carrier.
2 weeks ago - Fluid Motion Frames 2 (FMF2) is AMD's AI frame generation for improving smoothness on Radeon RX 6000 and 7000 series GPUs on both desktop and integrated graphics. Other driver changes include geometric downscaling as well as HYPR-Tune...
Other stories
23 minutes ago - This "architectural license" agreement is the core partnership that's kept Android smartphones humming with Qualcomm's cutting-edge processors for years. Now, according to documents viewed by Bloomberg, Arm has fired off a 60-day...
38 minutes ago - Author would like to see a switch back to plain old static HTML. Us too Developer Loris Cro reckons his LSP language server for HTML is a world first, and that the absence of such tools up to now has had grave consequences for the web.…
44 minutes ago - Volvo has developed an electric semitruck with a range of 373 miles in a single charge that it plans to release in the second half of 2025.
50 minutes ago - Everything is great in moderation, but if you've got health goals you're striving for, consider cutting back on these 11 foods to achieve your goals.
50 minutes ago - Before you upgrade to Google One, try these tricks to free up some digital storage space.