Open Sourcing Science

There was a time in which I wanted to dedicate my life to Science. The process of publishing my work was the main reason to change my path. Let's see what is wrong with it and how to solve it.

and

Aug 22, 2023

Hello, Hypers!

When I was finishing my major in Computer Science, I intended to be a scientist. Yes, like those that write papers and go to conferences. I wanted to pursue a Ph.D. and dedicate my life to pushing the limits of knowledge.

Right now, I’m a Software Developer who has never published a paper in a journal. What happened? I lost the desire to dedicate my life to Science. Among the factors that demotivated me, the most important one was the publishing and reviewing process. I don’t understand why is the relationship between journals and scientists so unfair. Also, how scientists are ranked is misaligned with the purpose of Science.

But you don’t have to believe me. The best thing about trying to do Science is that I met some amazing scientists on my way. One of the best is

Alejandro Piad Morffis

. A couple of years ago he wrote this great article called What Academia can learn from Open Source. Sometime later he added it to his Substack. I strongly recommend you to read it, especially if you care about Science. The main reason to read it is that it is the foundation of this article :). But don’t worry, I’ll do a quick recap.

The purpose of this article is not to tell why I’m not a scientist or to summarize Alejandro’s essay. I want to complement the ideas of Alejandro with my vision of how an implementation of those ideas should be performed. Thus, it is important to know first what is the Alejandro’s proposal.

The Open Source Idea

I completely resonated with the above article. I was thinking of a similar process to “fix Science” but Alejandro was able to summarize the problems and show a way out of those problems in a way I would not have been able to match.

The first big point is to show how misaligned scientific work and the scientists ’ incentives are. There are millions of scientists around the world competing to publish in the most prestigious journals and conferences. The incentive to publish in those journals and conferences is receiving more attention which translates to more citations which is what a scientist's prestige is all about (simplification intended).

The competition is fierce. So fierce that it has deviated from the purpose of the scientific work. It influences what papers scientists write and how they write them. For example, scientists are incentivized “to write many low-effort papers instead of a few better ones”.

On the other hand, we have the big winners: publishers. Journals and conferences that gatekeep knowledge. A knowledge that is created by the society as a whole. Scientific research is often made with public money, but once the results arrive, they are put behind a paywalled subscription. This is not just unfair, but also harmful for scientific progress in general.

But, how to fix this status quo?

The proposal is to borrow some concepts and dynamics from the Open Source movement. Doing science in an Open Source-like environment would solve many of the aforementioned problems. Making public reviews, freely sharing the knowledge with everybody, and making continuous improvements to the work from the reviews it receives are ways to straighten the path of Science.

Other actors like conferences and journals would become services around the Open ecosystem, but the core of scientific work stays public and open to everyone. Then, how to implement such an ecosystem?

Alejandro proposes to build a platform like GitHub with specific features for scientific work. This implementation is classified in the article as a minor problem compared to the big one: making a significant number of scientists adopt the new paradigm.

While I agree with the claim that adoption is the biggest obstacle to a paradigm shift, I think the implementation details are very important to make a fair and robust ecosystem.

We have summarized the main ideas of the Alejandro’s proposal. Now it’s time to say something new. The focus of this article is to unveil some more subtle problems concerning the implementation of this platform. We’ll end up having some proposals for this implementation and discussing the pros and cons. But first, let’s see what are those subtle problems that could arise from careless enough implementation.

The centralized evil

First of all, let’s be clear: this proposal is not meant to solve the adoption problem. The big obstacle will remain intact. From now on, we assume this obstacle doesn’t exist and thus, there is a significant number of scientists willing to adopt the new paradigm.

The main problem I see with the platform proposed by Alejandro (although the platform itself is not the center of the argument in his article) is that it consists of centralizing the whole process of doing Science. A single entity in full control of all the scientific work. Let’s agree on one claim:

Science is (one of) the most valuable products of humankind

Having a single entity controlling all the scientific work could be extremely harmful in the long term. How to ensure people in charge of this entity don’t use their power badly?

Yes, laws can protect scientists and other users of the platform. Also, this platform can be a non-profit organization and evolve to minimize any conflict of interest with scientific work. But still, we need humans to create and maintain this platform. This organization needs to pay for servers and other technology, and the companies providing those services will be in control of Science. Laws are unable to predict all possible human behaviors and the expenses of such a large platform should be sustainably covered.

For example, GitHub is not a non-profit and certainly not a decentralized organization. Yet, it contains the biggest collection of code in the world. Many of the biggest software projects of our time are in GH either publicly available or private. Some laws protect us in certain ways. The platform has been a success not just from a business point of view but also because of all the support to software developers it has provided.

But this doesn’t mean there are no latent dangers with this level of centralization. For example, we saw a couple of years ago how they trained Copilot with both public and private code examples. From the Open Source point of view, being consistent with the Open Source principles would mean that Copilot should be Open Source too. On the other hand, if my code is private I might not be interested in having it as a training example for an LLM. Anyways, Copilot is still here and it will be trained with more and more of our code.

I’m not against Copilot, but this kind of unilateral decision shows us that we need to be careful with centralized organizations. If we want to build a platform that is responsible for all the scientific work, that is the heart of Science (with capital S), then we must consider the most decentralized alternative. We need this solution to be not just socially open, but also to be technologically and economically open.

I think the best way of returning to humanity the knowledge it has created is through a blockchain.

A blockchain-based platform will open two possibilities:

A decentralized ecosystem to share and participate
An economic system that makes the incentives more concrete and sustainable

From now on, we are going to discuss these two points.

The decentralized ecosystem

Everybody, anytime, anywhere can access the platform and participate in any of the possible ways to do so. Either publishing a draft, making a revision, reading a paper, etc. A blockchain-based platform assures that a non-significant number of people can’t make unilateral decisions.

I can come up with two possible ways to implement that.

Creating a new domain-specific blockchain
Creating a Decentralized Autonomous Organization (DAO) in an existing blockchain

Creating a new blockchain has the advantage that it can be optimized for Science purposes. This would mean a better user experience and a lower cost for participation. The system would be composed of the following actors:

Validators: Nodes of the blockchain that allow for consensus in the chain and receive rewards in the form of the native token. (The rewards are analyzed later)
Scientists: Any people that publish papers and make reviews.
Investors: We’ll talk about these later.
Users: Any people that read papers and reviews from the platform.
Nodes: Computers that run the blockchain software around the world. Anyone can set up a node to contribute to the decentralization and security of the network.

The platform would run on this domain-specific blockchain so no centralized servers or organizations are controlling it. The improvements over the platform and the blockchain are proposed, discussed, and implemented as they do on Bitcoin or Ethereum, for example. Anyone could write an Improvement Proposal (like the BIPs and EIPs on Bitcoin and Ethereum respectively) and share it with the community. A voting system is implemented so scientists and validators (the main supporters of the platform) can democratically approve or revoke the proposal.

The problem with creating a brand-new blockchain for the platform is that it is a bigger engineering challenge. There are many details that we’d need to take care of. The main one would be the security. A plausible alternative is to build the platform on top of a reliable public blockchain like Ethereum (just to mention an example).

In this case, all the security, decentralization, and scaling rely on an established public blockchain. The platform would be a protocol on this blockchain, and the protocol would be controlled by a Decentralized Autonomous Organization (DAO). The actors would be:

DAO’s token holders: Token holders can vote on critical decisions
Scientists: Any people that publish papers and make reviews. Ideally, the ecosystem ensures scientists have voting power (a.k.a they have tokens to vote)
Users: Any people that read papers and reviews from the platform.

DAOs create what is known as governance tokens. Any person that owns some of the governance tokens has a vote on the future of the DAO. The voting power of a person is determined by the amount of governance tokens in their wallet.

This way, when we refer to token holders we mean any investor, scientist, or user concerned about the future of the DAO and willing to shape that future. Of course, there could be also speculators that only look to take a profit from the price of the token in the market, but they will play an important role in the economic ecosystem. More on this soon.

While a DAO on a public blockchain removes the concerns about network security (as far as we trust the underlying blockchain), it adds an external dependency. Also, the underlying blockchain won’t be optimized for the specific case of publishing, reviewing, and reading papers. But the main concern is that the platform is tied to this blockchain and heavily depends on it.

Both alternatives, domain-specific blockchain, and a DAO on a reliable public blockchain, seem to be feasible and to solve the centralization problem. They remove unilateral decisions and centralized control, replacing them with a public ecosystem supported by a peer-to-peer public database (a.k.a a blockchain) and a democratic system to make decisions. This is the closest society could be to owning its precious Science.

But a blockchain-based solution by itself doesn’t solve all the problems. As we have seen, there are many actors involved in the ecosystem, and if we are not cautious enough, this ecosystem can become chaotic and harmful to Science. In the following section, I sketch some of the main points to look after and some proposals to create a healthy economic ecosystem.

… and Science for all

We have defined the technological grounds of the platform, but now we need to define how to make the whole thing sustainable. We need to define an economic system that doesn’t end up with another misalignment between Science and scientist’s incentives.

For those of you who are not familiar with the dynamics of public blockchains, this proposal could be harder to understand. But I’ll do my best to explain myself as clear as possible.

A blockchain such as Ethereum can be seen as a computer distributed all over the world. It is not a powerful computer, it is just a decentralized and secure computer that is owned by all of its nodes and validators (there are thousands of them). When we interact with any program that is executed in this blockchain, we need to pay for the computational resources we consume. This payment is done with the native token of the blockchain and is the key to the sustainability of the network.

Validators assure the consensus within the blockchain. This is, that the records stay consistent and all the nodes agree on the state of those records. Validators receive a reward in the form of an amount of native token for their work. They also receive the fees we pay for interacting with the blockchain. This way, the entire system is sustainable economically speaking: users are willing to pay to use the decentralized computer and the maintainers of this computer receive a reward.

If we create a domain-specific blockchain for our platform. We would have a native token to pay for interactions. For example, when a scientist publishes a paper, she must pay a fee. The native token is produced whenever a validator earns some reward. This native token allows us to enter to this scientific platform.

In the case of the DAO, the fees are paid in the native token of the underlying blockchain. The governance token would allow for voting and proposing changes in the platform only.

In both cases, the first problem is how to ensure a fair distribution of the token.

Token Distribution

A fair initial distribution of the token (either native or governance) is crucial for decentralization and sustainability. If a reduced group of people gathers most of the available tokens, then they can take full control of the platform.

We should write a piece of software on the blockchain with specific rules to distribute the tokens among scientists. For example, in a domain-specific blockchain, we can make a part of the rewards and/or the fees go to the funds of this software. Then, these funds are distributed among scientists following some criteria we won’t define here.

The main point is that all scientists (people who create new knowledge) have a way to publish their work on the platform and have a voice on that platform. The token should be an enabler, not an obstacle.

Now all scientists can publish and vote on the platform. But, what are the incentives to maintain this protocol?

Incentives

We not only need scientists to adopt this new platform but also need maintainers for the platform. People around the world that are willing to propose improvements and implement those improvements.

I think the early adopters would be scientists that will also play the role of maintainers. But what is the point of all this? Besides publishing, these scientists would pay fees and maintain a protocol. Is this decentralized and open protocol a sufficient reason by itself?

It doesn’t have to be. Have you ever wondered how to invest in Science? Let’s imagine we are investors looking for investing opportunities. For example, we can invest in the S&P 500 index: the top 500 companies in the U.S. But what if you want to invest in Science?

Donating to a University is not investing. Buying shares in some institutions is just investing in the scope, vision, and principles of those specific institutions. The concept of Science is so abstract and wide that it is hard to think of a way to invest in it.

Now let’s turn our heads and look again at our token, the support of an economic ecosystem that keeps the open scientific world alive (at least in the imaginary world of these lines). This would be the best approximation to a representation of Science. Buying this token is the closest we’ll be to investing in Science.

Investors will provide the last ingredient of the formula: liquidity. Now, scientists can receive other rewards besides prestige or a digital token. Investors will buy tokens from scientists and maintainers as a way to invest in Science. Of course, that means investors will have a vote on the future of the platform but that’s okay. They are a key part of this economic system since the demand for the token in the market is necessary to incentivize scientists and maintainers.

We should also consider a reward system where scientists obtain more tokens according to the quality of their work. The metrics to measure the quality of a work will always be a matter of discussion but I think we would have all the data to implement good metrics.

This opens the possibility for the rise of one new character in our story: the freelancer scientist. A scientist who does not depend on a wage, that is not tied to an institution and is not desperately looking for funding. But maybe we are trying to see too far into the future.

Pros and cons

What is good about our proposal?

All the features proposed by Alejandro can be implemented in this blockchain-based platform
There is not a reduced number of people controlling how Science is done
Anyone willing to participate in shaping the future of the platform can do so
The proposed incentive system can potentially decentralize the way science is done (the freelancer scientist). Of course, this opportunity might not be available for some scientific fields that require expensive equipment and large funding.
Investing in Science is possible for all people and not just for rich investors

There are more benefits I won’t discuss in this quite extensive article. For example, in this ecosystem papers and reviews could be NFTs. This gives authors total ownership over their creations. These creations would now be assets that are easily interchangeable and portable.

But of course, this proposal also has a dark side. As Alejandro said, scientists who are not so familiar with technology and Computer Science are less prone to adopt this new paradigm. Implementing it on a blockchain will increase this entry cost.

Someone could say that requiring a digital token to participate could leave out scientists from some parts of the world, mainly from underdeveloped countries. I think getting this token would be easier for them than getting accepted at a prestigious conference. Even scientists from countries under economic sanctions (like the one I come from) could access the blockchain and participate. So, I don’t think the token is a problem. But there are other obstacles.

No matter if the platform is implemented on a domain-specific blockchain or on an already established one. Security will always be a concern. For example, the software that coordinates the token issuance and distribution is prone to hacks. These hacks can be especially disastrous. All the knowledge in the platform could disappear or the hackers could take over the entire system.

The implementation of the economics should be carried on carefully because it could end up being more harmful than the current status quo. There are too many details to take care of. But on the bright side, there are similar systems that have been resilient to time. They are the result of many iterations and improvements performed by their communities.

Of course, this is not an exhaustive list of pros and cons. Maybe you can come up with other advantages and disadvantages. I would love to hear them! So please, don’t hesitate to reply.

Conclusions

Since my disappointment with the publishing and reviewing process, the unfair relationships, and the incentive system on Science, I have been thinking about how this could change.

Alejandro’s article What Academia Can Learn from Open Source summarizes and proposes some of the ideas I had (and a lot more that I didn’t think of). I think the decentralized nature of public blockchains makes them perfect for implementing this proposal.

In this article, I tried to sketch what this implementation would look like. I’m sure there is a lot to improve from this sketch. I’m just trying to bring this topic back to the table. This is not just one more topic to talk about, this is about giving back to all humans what belongs to them, our most precious product.

A guest post by