pwshub.com

Alibaba Cloud finds log timestamps improve fault detection

Alibaba Cloud has revealed homebrew tech it used to improve server fault prediction and detection, which it claims saw its ability to detect problems beat comparable tech by ten percent.

The Chinese cloud champ's claims emerged last week in a paper [PDF] presented at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

The document points out that reliability is a major selling point for public clouds, making predicting failures an important ability. Log files, the authors observe, contain plenty of info on "exceptions" to normal performance that indicate potential performance problems. The authors opine that tools using logs to predict failures rely on machine learning and deep learning to detect future failures, when more obvious indicators – timestamps – aren't paid the attention they are due.

Here's the thinking, in a nutshell:

Alibaba Cloud therefore created its own tool called Time-Aware Attention-Based Transformer (TAAT) to analyze timestamp info.

TAAT doesn't entirely ignore ML tools. Instead, it uses the Bidirectional Encoder Representations from Transformers (BERT) – a language model developed by Google that represents text as vectors and has been used to predict server failures. The paper asserts, however, that BERT hasn't been tuned to make full use of log timestamps.

Alibaba's tool therefore relies on BERT for some failure analysis and compares that with TAAT's analysis of logfile timestamps. The paper contains a lot of math describing exactly how Alibaba analyzes log info, but the bottom line was apparently a ten percent improvement in fault predictions – and presumably slightly more reliable cloudy IaaS.

  • Alibaba Cloud claims K8s service meshes can require more resources than the apps they run
  • Alibaba Cloud reveals its datacenter design, homebrew network used for LLM training
  • Alibaba Cloud closing Australian and Indian datacenters
  • Alibaba Cloud built its edge network hardware on Intel Ethernet ASICs

Alibaba's boffins think TAAT's output is also useful because it doesn't need expert analysis – meaning folks familiar with cloudy crashes aren't needed to help as often. It's already in production at Alibaba Cloud.

TAAT appears not to be available for download. But Alibaba Cloud has posted a colossal dataset comprising "∼2.7 billion syslogs from ∼300,000 servers in a four-month period of the real productional system of Alibaba Cloud" to help researchers consider how to develop log sampling strategies of their own to inform future failure prediction efforts.

The authors have also posted a video outlining TAAT's operation. ®

Source: theregister.com

Related stories
1 month ago - Payment arm of Korean messaging app denies any illegal activity Kakao Pay, a subsidiary of Korea's WhatsApp analog Kakao, handed over data from more than 40 million users to the Singaporean arm of Chinese payment platform Alipay, without...
1 month ago - Let's get physical, physical ... I don't wanna hear your MMU talk Black Hat Computer security researchers at the CISPA Helmholtz Center for Information Security in Germany have found serious security flaws in some of Alibaba subsidiary...
1 month ago - Lenovo also cashes in on AI demand, without being able to turn it into profit Demand for cloudy CPUs has levelled out at top Chinese clouds Alibaba and Tencent, whose customers increasingly want GPUs instead.…
1 week ago - CEO argues more restrictive licensing was key to DB refresh, and says team 'expected' the fork Interview Redis is the most popular database on AWS, which is, of course, the most popular cloud. The fact the relatively little known...
1 month ago - Why run your own evil infrastructure when Big Tech offers robust tools hosted at trusted URLs? Black Hat State-sponsored cyber spies and criminals are increasingly using legitimate cloud services to attack their victims, according to...
Other stories
1 minute ago - After California passed laws cracking down on AI-generated deepfakes of election-related content, a popular conservative influencer promptly sued,...
24 minutes ago - Act fast to grab this high-performing mesh router for less than $500, keeping you connected while saving some cash too.
24 minutes ago - If the old-school PlayStation is dear to your heart, you can soon relive those totally sweet 1990s memories. Sony is releasing a series of products...
25 minutes ago - If you've got an old phone to part with, T-Mobile is offering both new and existing customers the brand-new Apple iPhone 16 Pro for free with this trade-in deal.
25 minutes ago - Who doesn't want the best for their beloved pooch? Grab some of these tasty treats to make your dog feel special.