Microsoft is first to get HBM-juiced AMD CPUs

(nextplatform.com)

53 points | by rbanffy 5 days ago

61 comments

  • _bare_metal a day ago

    HBM or not, those latest server chips are crazy fast and efficient. You can probably condense 8 servers from just a few years ago into one latest-gen Epyc.

    I run BareMetalSavings.com[0], a toy for ballpark-estimating bare-metal/cloud savings, and the things you can do with just a few servers today are pretty crazy.

    [0]: https://www.BareMetalSavings.com

    • tame3902 a day ago

      Core counts have increased dramatically. The latest AMD server CPUs have up to 192 cores. The Zen1 top model had only 32 cores and that was already a lot compared to Intel. However, the power consumption has also increased: the current top model has a TDP of 500W.

      • Guzba a day ago

        Does absolute power consumption matter or would it not be better to focus on per-core power consumption? Eg running 6 32-core CPUs seems unlikely to be better than 1 192-core.

        • tame3902 a day ago

          Yes, per core power consumption or better performance per Watt is usually more relevant than the total power consumption. And 1 high-core CPU is usually better than the same number of cores on multiple CPUs. (That is unless you are trying to maximize memory bandwidth per Watt.)

          What I wanted to get at is that the pure core count can be misleading if you care about power consumption. If you don't and just look at performance, the current CPU generations are monsters. But if you care about performance/Watt, the improvement isn't that large. The Zen1 CPU I was talking about had a TDP of 180 W. So you get 6x as many cores, but the power consumption increases by 2.7x.

          • Guzba a day ago

            Makes sense, thanks for the good reply.

    • 1oooqooq a day ago

      a graph showing this against cloud instance costs and aws profits would be funny.

    • phodge a day ago

      That could be an interesting site when it's done but I couldn't see where you factor in the price of electricity for running bare metal in a 24/7 climate-controlled environment, which I would assume expect is the biggest expense by far.

      • _bare_metal a day ago

        The first FAQ question addresses exactly that: colocation costs are added to every bare metal item (even storage drives).

        Note that this doesn't intend to be used for accounting, but for estimating, and it's good at that. If anything, it's more favorable to the cloud (e.g, no egress costs).

        If you're on the cloud right now and BMS shows you can save a lot of money, that's a good indicator to carefully research the subject.

  • whatever1 a day ago

    So currently our consumer grade CPUs with DDR5 are limited to less than 100GB/s. Meanwhile Apple is shipping computers with multiples of that.

    • mananaysiempre a day ago

      On the other hand, I bought an 8C/16T Zen 4 laptop with 64GB RAM and an 4TB SSD for less than $2000 total including tax. I’ll take that trade.

      • sroussey a day ago

        How are 70b LLMs running on that?

        • cma a day ago

          Qwen coder 32b instruct is the state of the art for local LLM coding and will run with a smallish context with that on a 64GB laptop with partial GPU offload. Probably around .8 tok/sec.

          With a quantization of it you can run larger contexts and go a bit faster. 1.4 tok/sec at 8b quant with offload to a 6GB laptop GPU.

          Speculative decoding has been being added to lots of the runtimes recently and can give a 20-30% boost with a 1 billion weight model running the speculative token stream.

        • jocaal a day ago

          The free version of chatgpt is better than your 70b LLM, whats the point?

      • YetAnotherNick a day ago

        Why do you need 64GB RAM?

        • mananaysiempre a day ago

          Partly because I can, because unless you go absolutely wild with excess it’s the RAM equivalent of fuck-you money. (Note it’s unified though, so in some situations a desktop with 48GB main RAM and 16GB VRAM can be comparable, and from what I know about today’s desktops that could be a good machine but not a lavish one.) Partly because I need to do exploratory statistics to say ten- or twenty-gigabyte I/O traces, and being able to chuck the whole thing into Pandas and not agonize over cleaning up every temporary is just comfy.

        • shakabrah a day ago

          I have 128gb in PC (largely because I can) and android studio, a few containers and running emulators will take a sizable bite into that. My 18gb MacBook would be digging into swap and compressing to get there.

        • scheme271 a day ago

          Memory can get eaten up pretty quickly between IDEs, containers, and other dev tools. I have had a combination of a fairly small C++ application, clion, and a container use up more than 32GB when combined with my typical applications.

          • evoke4908 20 hours ago

            I just built a new PC with 64GB for just this reason. With my workloads, the 32GB in my work laptop is getting cramped. For an extra $150 I can double that and not worry about memory for the next several years

        • jchw a day ago

          Running 128 GiB of RAM on the box I am typing on. I could list a lot of things but if you really wanted a quick demonstration, compiling Chromium will eat 128 GiB of RAM happily.

        • whatever1 a day ago

          Nobody ever regretted having extra memory on their computer.

        • criticalfault 16 hours ago

          Maybe because of electron apps

        • theandrewbailey a day ago

          Several Electron apps and 1000+ Chrome tabs. (just guessing)

    • oDot a day ago

      Strix Halo is rumored to be about twice as fast but unfortunately not near Apple's speed.

  • sroussey a day ago

    This is very unlikely, but it would be interesting if Apple included HBM memory interfaces in the Max series of Apple Silicon, to be used in MacPro (and maybe the studio, but the Pro needs some more differentiation like HBM or a NUMA layout).

    • throwaway48476 a day ago

      They'd have to redesign the on die memory controller and tape out a new die all of which is expensive. Apple is a consumer technology company not a cutting tech tech company making high cost products for niche markets. There's just no way to make HBM work in the consumer space at the current price.

      • sroussey a day ago

        Well, they could put in a memory controller for both DDR5 and HBM on the die, so they would only have one die to tape out.

        The Max variant is something they are using in their own datacenters. It would be possible that they would use an HBM solely for themselves, but it would be cheaper overall if they did the same thing for workstations.

        • nsteel a day ago

          HBM has a very wide, relatively slow interface. A HBM phy is physically large and takes up a lot of beachfront, a massive waste of area (money) if you're not going to use it. It also (currently) requires you to use a silicon interposer, another huge extra expense in your design.

          • sroussey a day ago

            > A HBM phy is physically large and takes up a lot of beachfront, a massive waste of area (money) if you're not going to use it.

            The M3 Max dropped the area for the interposer to connect two chips, and there was no resulting Ultra chip.

            But the M1 Max and M2 Max both did.

            I have yet to see an x-ray of the M4 Max to see if they have built in support for combining two, have used area for HBM or anything exotic, but they have done it before.

            Could you recognize HBM support in an x-ray?

            As for the Ultra, they used to have 2.5 TB/s of interprocessor bandwidth years ago based on M1, so I hope they would step that up a notch.

            I don’t put much stock in the idea of the 4 or 8 way hydra. I think HBM would be more useful, but I’m just a rando on the interwebs.

          • sroussey a day ago

            > It also (currently) requires you to use a silicon interposer, another huge extra expense in your design.

            Guess what the Ultra chips use? That’s right, a silicon interposer. :)

            • nsteel 9 hours ago

              OK, but that's an ultra expensive chip, pretty much by definition. The suggestion was to burden other products with that big expense, and that doesn't make sense to me.

              • sroussey 5 hours ago

                I guess I was pointing out that they already do that with the UltraFusion interconnect that’s on the Max chip found in notebooks but never used there.

                But the more I think about it, the more I bet they are creating a native Ultra chip that is not a combo of two Max chips.

                I bet the Ultra will have the interconnect so you can put two together and get the often rumored Extreme chip.

                They will have enough volume for their own datacenters, that the Mac Studio and Mac Pro will simply be consumer beneficiaries.

                It makes more sense in this framing to put HBM on these chips. And no DDR5.

                In this case, the M4 Max has neither HBM nor the interconnect. I’d love to see someone de-lid and get an X-ray die shot.

      • 7e a day ago

        The MacPro is not a consumer device. It is very much a high cost niche (professional) product.

        • throwaway48476 a day ago

          It may be priced like one but the technology inside it isn't.

  • Tepix a day ago

    People are buying dual Epyc Zen5 systems to get 24 DDR5-6000 memory channel bandwidth for inferemcing large LLMs on CPU. Clearly there is a demand for very fast memory.

    • bobim a day ago

      Sure, implicit finite elements analysis scales up to two cores per DDR4 channel. Core density just grew up faster than bandwidth and it makes all those high cores cpus a waste for this kinds of workloads.

  • AnotherGoodName a day ago

    I’m having trouble parsing the article even though I know fully what the mi300 is and what hbm memory is.

    I’m not alone right? This article seems to be complete ai nonsense at various points confusing the gpu and cpu portions of the product and not at all giving clarity on which parts of the product have hbm memory.

    • kristianp a day ago

      I agree, the article lacks clarity, jumping between 3 different AMD models and an Intel one. I'd suggest it's flaws hint at a human writer more than an AI.

  • gessha a day ago

    Does it make sense to put HBM memory on mobile computing like laptops and smartphones?

    • nsteel a day ago

      They'd be very expensive. Is there really a consumer market for large amounts (tens of GBs) of RAM with super high (800+GB/s) bandwidth? I guess you'll say AI applications but doing that amount of work on a mobile seems mad.

      • gessha a day ago

        Yeah, I feel similarly about the development of NPUs. I guess it might be useful if we find more maybe non-AI uses for high-bandwidth memory that’s needed on the edge and not in centralized servers.

  • pixelpoet a day ago

    This website and its spend-128-hours-disabling-1024-separate-cookies-and-vendors is pure cancer, I wish HN would just ban all these disgusting data hoovering leeches.

    By now I get that no one else cares and I should just stop coming here.

    • AnotherGoodName a day ago

      That entire site is 100% ai generated click farming. The fact that the top comments here not even talking about the content of the article but instead more general ‘hbm is great’ worries me.

      For anyone that read the article which product has hbm attached? The cpu or gpu? What is the name of this product?

      There’s literally nothing specific in here and the article is rambling ai nonsense. The whole site is a machine gun of such articles.

      • pixelpoet a day ago

        1000% agreed, especially the silent carrying on with the topic without acknowledging the cancer actual link. I know e.g. Deng is a real person who works hard to make HN decent, but wtf is this, are we just going to accept Daily Mail links too?

        I weep for the internet we had as children.

      • mitjam a day ago

        These kind of articles are like denial of service attacks on human attention. If I read just a few of those, I would be confused for the rest of the day.

      • imtringued 16 hours ago

        Epyc 9V64H

    • scouw a day ago

      It's far from an ideal solution but I've started to just have JS disabled by default in uBlock Origin and then enabling it manually on a per-site basis. Bit of a hassle but many sites, including this one, render just fine witbout JS and are arguably better without it.

    • akimbostrawman 14 hours ago

      That describes almost all news sites even if most of them don't make it that obvious. Just open them in web.archive.org and avoid all of that.

    • zelphirkalt a day ago

      Nope, you are not alone. Without uBlock Origin I wouldn't go much anywhere these days.

    • codr7 a day ago

      Agreed, I wouldn't mind banning paywall crap while we're at it.

  • alias_neo a day ago

    Another article source that uses a headline initialisation, "HBM"[0] in this case, and almost 30 times at that, and yet doesn't spell out what it stands for even once. I will point this out every time I see it, and continue to refuse to read from places that don't follow this simple etiquette.

    Be better.

    [0] High Bandwidth Memory

    • dang a day ago

      "Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."

      https://news.ycombinator.com/newsguidelines.html

      • alias_neo 14 hours ago

        While I understand, I can't find something interesting to respond to in this case because I refused to read it.

        I'd argue that I also provided value by solving the complaint I made by spelling out what it stood for, for those who might not know.

        • dang 3 hours ago

          I hear you and agree there's benefit in that; it's just that the cost (what it does to the thread) is a lot larger than the benefit.

      • Dylan16807 a day ago

        I don't feel like that rule works here? If you cut out part of the second sentence to get "Find something interesting to respond to", that's a good point, but the full context is "instead [of the most provocative thing in the article]" and that doesn't fit a complaint about acronyms.

        • dang a day ago

          To paraphrase McLuhan, you don't like that guideline? We got others:

          "Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

          The point, in any case, is to avoid off-topic indignation about tangential things, even annoying ones.

    • sroussey a day ago

      They don't define HPC either, but I think the audience of this site knows these acronyms.

      • alias_neo 14 hours ago

        There were two reasons it was drilled into me in engineering-school; it provides context and avoidance of doubt about the topic, particularly when there are so many overlapping initialisms these days, often in the same space.

        The second is that you should never make assumptions about the audience of your writing, and their understanding of the topic; provide any and all information that might be pertinent for a non-subject-matter-specialist to understand, or at least find the information they need to understand.

    • switchbak a day ago

      > Be better

      They're almost certainly not on this forum, and they're not reading your post. So who is that quip directed at?

      • alias_neo 14 hours ago

        > They're almost certainly not on this forum, and they're not reading your post

        I don't know much about the site in the OP, but I work on the assumption that almost anyone could be reading comments on links to their site on this forum.

        It's directed at them, you and even myself.

      • rcthompson a day ago

        Presumably it's directed at anyone writing an article for public consumption.