2020 has been a topsy-turvy year for a whole lot of reasons.

But even then, an Intel graphics card?

Raja Koduri’sbaap of allisn’t actually Intel’s first attempt at a desktop-grade discrete GPU.

Article image

Larrabee terrified the competition to the point that AMD and Nvidia briefly considered a merger.

Three years of development, an infamous IDX demo, and then… nothing.

Intel went back to integrating small, low-profile iGPUs with its processors.

What happened to Larrabee?

And why haven’t we seen a competitive Intel GPU in over a decade?

Intel Larrabee: What Was It?

A highly parallel, programmable architecture, capable of scaling to teraflops of performance.

Larrabee rendered graphics, but it wasn’t a GPU in the conventional sense.

A GPU, a CPU, or Something Else?

GPUs are very good at rendering graphics.

But the fixed function nature of their shader cores made it difficult to run non-gaming workloads on them.

This meant that game technology innovations were often held in check by graphics hardware capabilities.

New DirectX or OpenGL graphics feature sets required entirely new hardware designs.

Tessellation, for instance, is a DirectX 11 feature that dynamically increases the geometric complexity of on-screen objects.

Unlike GPUs, CPUs feature programmable, general-purpose logic.

They can handle just about any kind of workload with ease.

This was meant to be Larrabee’s big trump card.

A general-purpose, programmable solution like Larrabee could do tessellation (or any other graphics workload) in software.

Larrabee lacked fixed function hardware for rasterization, interpolation, and pixel blending.

In theory, this’d incur a performance penalty.

In practice, however, Larrabee failed to deliver.

Higher-core configurations scaled poorly.

They struggled to match, let alone beat AMD and Nvidia GPUs in conventional raster workloads.

What exactly led Intel down this technological dead-end?

By the mid-2000s, GPUs were gradually becoming more flexible.

ATI debuted a unified shader architecture with theXbox 360’sXenos GPU.

Terascale and Tesla (the Nvidia GeForce 200 series) brought unified shaders to the PC space.

GPUs were getting better at generalized compute workloads, and this concerned Intel and other chip makers.

Were GPUs about to make CPUs redundant?

What could be done to stem the tide?

Many chipmakers looked to multi-core, simplified CPUs as a way forward.

The PlayStation 3’s Cell processor is the best-known outcome of this line of thought.

Cell certainly helped, but it clearly wasn’t fast enough to handle graphics rendering on its own.

Intel thought along similar lines with Larrabee.

Unlike Cell, Larrabee could scale to 24 or 32-core designs.

Intel believed that the raw amount of processing grunt would let Larrabee compete effectively with fixed-function GPU hardware.

Intel’s graphics philosophy wasn’t the deciding factor, though.

Designing a GPU from scratch is extremely complicated, time-consuming, and expensive.

An all-new GPU would take years to design and cost Intel several billion dollars.

Worse, there was no guarantee that it’d beat or even match upcoming Nvidia and AMD GPU designs.

Larrabee, in contrast, repurposed Intel’s existingPentium MMXarchitecture, shrunk down to the 45nm process node.

It would also make it easier to set and monitor performance expectations.

Larrabee ended up burning a several-billion-dollar hole in Intel’s pockets.

However, cost-effectiveness was, ironically, one of its initial selling points.

Larrabee looked revolutionary on paper.

Why did it never take off?

What Went Wrong with Larrabee?

Larrabee was a great idea.

But execution matters just as much as innovation.

This is where Intel failed.

In retrospect, there were red flags right from the beginning.

Gaming wasn’t even mentioned as a use case in Larrabee’s initial announcement.

However, almost immediately after, Intel started talking about Larrabee’s gaming capabilities, setting expectations sky-high.

In 2007, Intel was several times larger than Nvidia and AMD put together.

The studio’s first game, Project Offset, was demoed in 2007 and showcased unprecedented visuals.

Unfortunately, nothing came out of the Offset Software purchase.

Intel shuttered the studio in 2010, around the time it put Larrabee on hold.

Intel’s gaming performance estimates ran counter to the hype.

A 1GHz Larrabee design, with somewhere between 8 to 25 cores, could run 2005’s F.E.A.R.

at 1600x1200 at 60 FPS.

This wasn’t impressive, even by 2007 standards.

Who was Larrabee for?

What was it good at doing?

How did it stack up to the competition?

The lack of clarity on Intel’s part meant that none of these questions were ever answered.

Communication wasn’t the only issue, however.

As development was underway, Intel engineers discovered that Larrabee had serious architectural and design issues.

Per-core performance was a fraction ofIntel’s Core 2 parts.

However, Larrabee was supposed to make up for this by scaling to 32 cores or more.

It was these big Larrabee implementations with 24 and 32 cores that Intel compared to Nvidia and AMD GPUs.

The problem here was getting the cores to talk to each other and work efficiently together.

This was a dual 512-bit interconnect with over 1 TB/s of bandwidth.

The more cores you have, the greater the delay.

Caching can alleviate the issue, but only to a certain extent.

Unfortunately, this added complexity to the design and did little to alleviate scaling issues.

This was supposed to be Larrabee’s watershed moment: in-development silicon running Quake 4 with ray-tracing enabled.

The Quake 4 ray-tracing demo wasn’t a conventional DirectX or OpenGL raster workload.

It was based on a software renderer that Intel had earlier showcased running on a Tigerton Xeon setup.

The IDF 2009 demo showed that Larrabee could run a complex piece of CPU code fairly well.

But it did nothing to clear questions about Larrabee’s raster performance.

This was the end of Larrabee as a consumer product.

In 2012, Intel announced its Xeon Phi coprocessor, designed to do exactly that.

Learning from Larrabee: Where Does Xe Go From Here?

Intel Xeis fundamentally different from Larrabee.

For starters, Intel now has a lot of experience building and supporting modern GPUs.

Xe builds on top of over a decade’s experience, and it shows.

The Intel Xe-LP GPU in top-end Tiger Lake configurations matches or beats entry-level discrete GPUs from AMD and Nvidia.

Xe manages this, even when it has to share 28W of power with four Tiger Lake CPU cores.

Inconsistent performance across games indicates that Intel’s driver stack still needs some work.

But by and large, Xe-LP holds its own against entry-level AMD and Nvidia offerings.

Efficient, low-power GPUs don’t always scale up into 4K gaming flagships.

This means hardware ray-tracing and full support for other aspects of the DirectX 12 Ultimate feature-set.

Intel also talked about using MCM to scale performance on upcoming Xe parts.

However, the competition isn’t standing still.

Intel has lessons to learn here from Larrabee: clear communication and expectation management are critical.

Moreover, Intel needs to evolve and stick to realistic development timelines.

A two-year wait could set Xe back a generation or more.

Is Xe going to usher in a new era of Intel graphics dominance?

Or will it go the way of Larrabee?