Will Squires is the founder of Macrocosmos. Alongside his co-founder, Steffen Cruz, they’ve built a 24-person team that has become a powerhouse within the Bittensor ecosystem. Starting out at the OpenTensor Foundation, the pair earned their stripes early, helping pioneer some of Bittensor’s biggest technical milestones, and collecting more battle scars than just about anyone in the Subnet ecosystem along the way.

In this interview, we focus on Will’s main project, IOTA (Subnet 9). The team is tackling one of the hardest problems in AI today — distributed training at scale — and the opportunity is enormous. It’s one of the most ambitious deep-tech efforts happening not just in Bittensor, but across the broader crypto-AI space.

We’ve known Will for nearly two years, and it’s hard to find someone more knowledgeable or driven. His energy comes through in every part of this conversation.

We hope you enjoy it.

Oh, and if you’re looking for a great Bittensor-native wallet, check out Crucible Labs. It’s got full Ledger support, makes it easy to buy and manage your Subnet positions, and even lets you auto-stake your TAO yield into top Subnets for effortless exposure.

Macrocosmos is one of the most veteran and experienced teams in Bittensor. What is it that your team does better than anyone else?

“I think we’ve mastered a lot of the art of building robust, well designed incentives that scale well. Also, we’ve gone through more trial and error than anyone, and I say that with a positive attitude. We’ve been exploited left, right and centre, and know very well how to build systems that are defensible and give you what you need from your miners. It’s not exactly an intuitive skill that people can simply come to the ecosystem with. We see quite a few builders express their ideas using suboptimal incentive mechanism designs, whereas game theoretic AI is something of an obsession for us. We spend a lot of time discussing and debating the best ways to get the most out of our Subnets and we’ve learned to red team everything we build. We’re also very scientific in our approach; we run lots of experiments and have sophisticated monitoring, benchmarking and testing infrastructure. It’s a big part of our culture, which means when we say something, we stand behind it.

Macrocosmos started out as a broad decentralized AI research lab, but your recent work with IOTA (Subnet 9) signals a sharper focus on decentralized training. What made you double down here?

Our first year was about validating and prototyping core ideas across the AI stack, from data collection and indexing to pre- and post-training and agentic inference. In our journey, we learned where strategic advantages (cost, speed, scale) could be created by leveraging Bittensor. We’ve also learned to iterate through ideas quickly and discard the ones which don’t work, because this ecosystem moves lightning fast.

In the last 6 months we’ve really been trying to evolve from experimenting with what Bittensor CAN do (proof of concept) to what Bittensor SHOULD do (where its unique and powerful qualities represent real-world scaling potential and competitive advantage).

Prior to IOTA, we spent around a year refining a pretraining approach on SN9 which was based on model training competitions. We trained some respectable models, in fact some of the earliest decentralized 7B and 14B models ever made were ours. However, we were eventually confronted by a rather stark reality: our design had reached full maturity but fell short of our ambitious goals to compete with centralized labs. There was an economic barrier to entry and other inefficiencies which stood between us and our goal to train the first large decentralized models. In other words, the scaling laws that ground model training were not in our favour.

When we re-conceived IOTA, we tried to think two or three steps ahead. How could we create a truly collaborative system for training? How could we create a system that aligned its strengths with Bittensor’s (the ability to organise thousands of nodes and co-train a model) and gave us a path to overcome the scaling laws? Working on this has been challenging, but really rewarding. It feels like we’ve come back to our roots, armed to the teeth with more mature ideas and a much deeper understanding of what is technically possible. Our other Subnets (1 & 13) are now oriented towards supporting our pretraining efforts so there is much greater cohesion and focus in our work.

If we project IOTA forward, we will have a system that we could actually train a frontier model on, that supplies compute for the world’s most valuable task at a fraction of the cost.

How do you picture the end state of IOTA? Is it as a Bitcoin for training, operating autonomously, or as something closer to Together AI, where clients work directly with Macrocosmos to train models on the Subnet?

Our current north star for IOTA is to make our system indistinguishable for training purposes from centralised alternatives. This means just as fast (hence our focus on throughput and algorithmic design), just as large (hence our push to scale network size from 256 nodes to much, much larger - more on that very soon), and we have a lot of research ongoing in numerical instability to resolve this.

Ultimately, we should just be able to monetise training FLOPS. This could be done by the network, by combining intelligence from model designers, data from other systems, and training compute from IOTA to create phenomenal models, or we could lease the network and swarm to other participants to train their own models. We are doing a lot of work on making the model weights inherently inconstructible to help unlock this, as privacy preserving is critical for enterprise customers. 

We want to be the first and only distributed compute layer for model training that is just as good, unlocking the path for organisations, whether decentralised or otherwise, to train brilliant models cheaply with distributed resources.

There are a handful of other strong crypto-AI teams pursuing decentralized training, teams we both know and respect. So this isn’t shade, but what advantage do you have by building on Bittensor that they might not?

Firstly, none of these teams have a live mechanism, or a live token. Coming to Bittensor, you are forced to comprehend both from Day One, and we have more war wounds than anyone to help us learn from this. We view ourselves as experts in game theoretic AI, and these battle scars help us to build better, more defensible, and more performant systems. 

Secondarily, the miner community on Bittensor is second to none. Most competitor teams have effectively done permissioned, friends and family runs - this limits the available compute at max scale, and means it is in effect hand offs between friends. Without unlocking the permissionless scaling that Bittensor provides, you cannot achieve the upsides that make decentralised training economically competitive, which is a critical proof point in the whole thesis. Our vision is that anyone, anywhere, can train at any time. This means the system must be adversarially resistant – if you don’t solve this issue there will always be a ceiling and you’ll always be fighting gravity. 

Finally, working on Bittensor can feel like being part of a group of investors and other teams, something like a start up incubator. We get live feedback on our design, our systems, our metrics, we have advisors with deep experience in AI like Jacob Steeves supporting us, and we have fellow comrades in arms like Templar. We have our own bubble of talent to help us succeed. 

What’s the current state of IOTA? Are we in a proof-of-concept stage or are we training AGI yet?

IOTA is currently working in production. We’re iterating relentlessly to drive the speed of the network up. We now believe we have the world’s best implementation of pipeline parallelism in speed terms (the crucial metric), we are the only team in the world that has built a ground-up orchestration layer and scheduling system for training multiple models at one point (all other teams build on Hivemind, an open source framework), and we have a huge release coming this month that we think will excite the community, and really change the perception of decentralised training in the market.

Who’s actually mining and training models on IOTA? Are we talking about individuals with a few GPUs, small-scale research labs, or full data centers joining the network?

It’s a diverse group. We have professional Bittensor mining teams with relationships to the scale data centre providers funding the UK Sovereign AI Stargate programme, we have nameless crypto pirates in shades, we have AI engineers moonlighting in their spare time, and we have dedicated experts. Compute and talent come from all places, and it’s what makes IOTA so strong.

What was the key technical insight or breakthrough that made IOTA’s methodology of decentralized training possible? What specific problem did you solve that others hadn't?

Current SOTA methods in decentralised pretraining use a methodology called Data Parallel, where each node hosts a full copy of the weights. This means that as model size scales, the requirement for any individual node scales to the point it becomes just as expensive as centralised training. Our research into compression allowed us to create a novel bottleneck architecture that allows us to “fracture” the model, splitting it up into blocks of much smaller size while reducing the amount of data that must be communicated between them by orders of magnitude. For participants, this means that the hardware requirements and economic barrier to entry is low and independent of model size. In other words, our system is able to perform global cost arbitrage in order to train large models cheaply. We also developed a suite of techniques which make the system adversarially robust, which is a key factor for hitting critical scale thresholds.

The only other team working in this space is Pluralis Research, who we have immense respect for, but we believe we have edges with our orchestration system and beyond, on top of building on Bittensor that means we will win.

What broader market or technological tailwinds point to decentralized training becoming increasingly necessary and ultimately winning out over centralized approaches?

“You can’t look anywhere in the AI sector without looking at compute build out, the insane GPU deals, or the CAPEX associated. This represents two key theses we have - one is that compute will continue to be constrained throughout the 2020s, and the second is that the economic barrier to participate in the AI race will increase if CAPEX is necessary. Keith Rush and Arthur Douillard of DeepMind are working on distributed training as they believe that it will become a critical sustaining innovation to allow Google to keep training bigger and better models.

We view it as a disruptive innovation, that whilst many thought initially as impossible, it will come to be seen as inevitable. Analogous to cloud computing, we think in the 2030s, many, many more models will be trained in a fully distributed way.”

Looking for a Bittensor-native wallet? Visit Crucible Labs

This content is provided for informational purposes only and does not constitute investment advice or a recommendation to buy or sell any security. Unsupervised Capital holds a position in TAO and may hold positions in the subnet tokens or other digital assets discussed herein and may buy, sell, or change positions at any time. Past performance is not indicative of future results. Digital assets involve substantial risk, including potential total loss of capital. Consult your own advisers regarding any investment decisions.

Keep Reading

No posts found