A question for the author(s) since they seem to be very responsive to this thread :).
1. How fine grain is each task? In a traditional matrix multiplication kernel, for example, each thread block is responsible for a small output tile of the resulting matrix. In Mirage's mega kernel, would there correspondingly be a task for each small output tile?
2. How does the Mirage compiler form the task graph? Does it have domain knowledge of every operator's data flow at the granularity of individual elements? Again taking matmul as an example: a given output output tile requires the correspond M_BLOCK rows of the A matrix. If the A matrix was itself an output of a prior matmul (+ nonlinearity), the dependees would be all of output tile tasks corresponding to those M_BLOCK rows of the operator that produced A?
1. In MPK, each task is mapped to an individual SM. The amount of work handled by a task is similar to that of a thread block in the traditional kernel-per-operator approach.
2. TL;DR: MPK automatically analyzes inter-task dependencies by tracking the input and output tensors associated with each task. A longer version: Longer version: MPK uses imap, omap, and fmap (see Section 2 of the Mirage paper) to determine each task’s input and output tensors. A dependency is introduced between task A and task B if A produces any tensor elements that B consumes—that is, if A's outputs overlap with B's inputs.
> Again taking matmul as an example: a given output output tile requires the correspond M_BLOCK rows of the A matrix. If the A matrix was itself an output of a prior matmul (+ nonlinearity), the dependees would be all of output tile tasks corresponding to those M_BLOCK rows of the operator that produced A?
Exactly. In this case, all output tile tasks that consume those M_BLOCK rows of A will depend on all tasks responsible for producing the corresponding parts of A in the previous operator.
AI agents excel in tasks like automating workflows, handling complex decision trees, and managing multi-step processes across APIs. For example, an agent can monitor a sales pipeline, send follow-ups, update CRMs, or manage logistics autonomously. See our demo here: https://github.com/picahq/onetool-demo.
I believe this post is referring to device-scoped memory barriers - also sometimes called fences - as opposed to execution barriers.
The former being a mechanism to ensure memory accesses follow a well defined order (e.g. it'd be bad if the memory accesses executed inside a critical section could be reordered before or after the lock and unlock calls).
The latter being a mechanism that ensures all threads (within some scope, perhaps all threads running on the "device") reach the same point in the program before any are allowed to proceed.
That's correct, it's the memory scope that I expect to be device-scoped. GPUs tend not to have execution barriers in the shader language beyond workgroup scope; generally the next coarser granularity for synchronization is a separate dispatch. However, single-pass prefix sum algorithms, including decoupled look-back, can function just fine with device-scoped memory barriers, and do not require execution barriers with coarser scope than workgroup.
The post also mentions unspecified behavior (mixing atomic and non-atomic memory accesses) where everybody has to cross their fingers and hope that the hardware designers had the same idea about how it should work. Which is almost fine with enough test coverage, but a shader translation layer adds uncomfortable complexity on top of it.
The author states it wasn't actually work-life balance that was making him happy and tired. Rather, he discovered:
> By mid-2021 I was tired all the time. I know I wasn’t alone, because it was an ongoing meme inside Google2. It’s only now that I realize what was wrong: I missed the satisfaction of building things and finishing projects.
This is the classic justification that leads people to self-defeating workaholism: The idea that you can fill the voids in your life by just working harder.
The false dichotomy is the idea that the alternative to Google is to work more hours + evenings + weekends at a startup. He's replacing one problem with another, but this new problem feels fresh and new and like turning over a new leaf. At least for now.
I get what he’s saying though. There can be great joy and a positive feeling of “losing yourself” in your work when you actually get to create. I think his role and and the internal bureaucracy prevented him from using that creative energy.
I don’t think it makes you a workaholic to observe that a shit work environment drains your energy and burns you out, whereas a good one can leave you feeling energized.
They weren’t saying that they needed to work harder at Google to be happy, they were saying they needed to move somewhere else where they could get job satisfaction from completing projects.
I mean, I'm the same way. I look back at my life and the times I didn't create things of value seem so meaningless. I don't want to go back to creating meaningless things. Even if I'm working harder, I'm enjoying what I'm doing.
This is the same for any specialized software engineering role. Compilers, GPGPU, embedded systems, computer graphics, image processing, etc. In an interview panel for any of these roles, you will be expected to be a competent software engineer and have domain knowledge about the sub-field.
> The players are traded freely among teams
Yes, this happens more often than in the past, but there are still many players and staff that stay with an organization for an extended period of time (e.g. Tom Brady and Bill Belichick).
> all teams are owned by the same business
What do you mean? Each team has a different owner.
> So for example the San Francisco 49ers have very little to do with San Francisco, except for the name'
In many geographical regions, you grow up watching the team that is closest in proximity. e.g. I grew up in rural western New York State and I grew up watching the Buffalo Bills because they were the closest franchise despite being 100 miles away.
The name is just a semantic. Would you be happier if the 49ers were called the "Northern California 49ers"?
There is an American baseball team that started their life as Florida Marlins. The thinking was that they would establish this tribe mentality for all of Florida. After years of not great results in that regard they changed their name to the Miami Marlins for the same reason - to better establish a tribe. They haven’t moved locations, they are just re-targeting their brand
My impression was that the iTunes stunt was not received well because they had already lost popularity. The iTunes stunt didn't help, but I doubt it materially hurt their popularity either.
To me, it exposed U2 to a bunch of people that had never heard of them, in a very bad light. Killing off any chance of new fans. Not that they would have grown in popularity, but that the trail off got much steeper.
I agree. Other things are going on. Rock has long been in decline, the social messages and trends U2 focused on 30-40 years ago have passed, and U2 is no longer breaking new musical ground like they were from the late 70s until Pop.
As a lifelong rock fan, it's really been sad to live through the long slow inexorable decline of the genre. The vast majority of current popular music doesn't appeal to me at all, which simply wasn't true even as recently as the 90s.
I think the machine that leveraged interest and controlled attention for so long through radio, magazine, videos, and festivals has basically failed and young people have turned away from guitar-based rock.
My teenaged kids have no interest in what I listen to, with the possible exception of Queen, and I think that has more to do with the biopic than organic discovery.
I won't argue that young people are turning away from guitar-based rock, but I'm not seeing the connection between that and the changes in the popular music machine. Are you implying that people would not have liked guitar-based rock this entire time if not for said machine? That the natural state of things, which we're now reverting to, is to prefer other types of music? I think it's more just that tastes change and go through cycles, and guitar-based rock is on the decline (and might or might not recover; there are so many dead dead dead musical genres and instruments out there).
1. How fine grain is each task? In a traditional matrix multiplication kernel, for example, each thread block is responsible for a small output tile of the resulting matrix. In Mirage's mega kernel, would there correspondingly be a task for each small output tile?
2. How does the Mirage compiler form the task graph? Does it have domain knowledge of every operator's data flow at the granularity of individual elements? Again taking matmul as an example: a given output output tile requires the correspond M_BLOCK rows of the A matrix. If the A matrix was itself an output of a prior matmul (+ nonlinearity), the dependees would be all of output tile tasks corresponding to those M_BLOCK rows of the operator that produced A?