A few charts on the biotech opportunity, and some problems startups can address (part 2/2)

Richard Murphey, 5/1/2018

For context on the industry, please see part 1 of this two-part post. This post will reference many of the ideas in part 1.

There is a ton of amazing new technology being developed, and there are a ton of problems in drug development and public health. The job of the bio entrepreneur is to find the optimal "binding affinity" between a technology and a problem, and to guide that solution to the market.

Below are some "target" problems to screen solutions against, as well as some potential leads.

Most of these ideas have a drug development (rather than diagnostics, or outsourced research service provider) business model -- it is hard to build a big business in this space if you don't develop your own drugs.

Problem 1: too many expensive Phase 2 and 3 failures

This is the biggest problem for the drug industry, and the biggest bottleneck for developing new medicines.

As Figure 7 in part 1 shows, this is by far the largest contributor to the cost of drug development. Reducing the cost of failure is the most direct way to reduce the cost of drug development, and thus the price of drugs. Cheaper drug development also enables companies to profitably pursue research in disease areas that are not profitable today (see Figure 5).

What can we do about this?

Succeed (or fail) faster / cheaper

One obvious solution is to determine whether your drug will work faster / cheaper (ie in Phase 1/2 rather than Phase 2/3). This seems rather crude, but actually implementing this requires being very thoughtful about trial design, patient populations, endpoints and translational research (translational research is roughly defined as the science of making sure what you do at the bench / in animals works in humans). This has become a very common strategy in venture-backed biopharma startups with experienced management teams, primarily in severe cancer and rare disease, where FDA has made it easier to get approval based on early data from small studies. It is possible to get to human proof of concept (which generally opens the IPO and large M&A window) on a modest venture budget, and then to get to approval with IPO proceeds.

The advent of genomic medicine has been an important enabler of this approach, and FDA incentives have been crucial as well. Failing (or succeeding) earlier generally requires

A good objective biomarker or surrogate marker for effectiveness: Traditionally, effectiveness is measured based on clinical outcomes: how long do cancer patients live, what is the rate of heart attack among patients with cardiovascular disease, etc. These endpoints take a long time to measure (often years), and many occur at low frequency, so you need a large sample to detect a significant signal. Finding "biomarkers" (easily detectable, objective biological signals that correlate with clinical outcomes) can enable researchers to get an earlier read on whether a drug is working and detect a signal with fewer patients. This can dramatically shorten the length of studies. In cancer research, Phase 1/2 studies often measure progression free survival or tumor response -- objective surrogate markers of effectiveness -- and FDA's Accelerated Approval program can allow drugs to be approved based on this data rather than the traditional overall survival outcomes endpoint.
A homogenous, molecularly defined patient population: Historically, drug companies have studied their products in the largest possible populations so that FDA will approve the drug to treat the largest number of patients. This strategy is costly (as clinical studies must be large) and can be risky (as each patient has a unique biology so many patients do not respond to particular therapies). Having a homogenous population can enable researchers to detect signal with smaller studies, and if that homogenous population is defined by a molecular biomarker related to the drug's biologic activity, researchers can have more confidence that their patient population will be rich with potential responders. In fact, using the right biomarker to select a study's patient population can almost double the chances of success. However, many times the correct subpopulation is only discovered after a large study with a heterogenous population fails. Incorporating a smart biomarker-driven patient selection strategy as a core pillar of company strategy is very valuable.

FDA incentives like Breakthrough Designation, Fast Track, Priority Review and Accelerated Approval are very important as well. These incentives typically focus on novel, high impact drugs to treat severe disease, and can provide companies with better feedback on how FDA will view any creative study designs or endpoints as well as even enabling approval based on surrogate endpoints or uncontrolled studies.

Most of the drugs developed under this "playbook" are for cancer or rare disease, but there is potential to apply this to other diseases as well. There are a few startups breaking heterogenous cardiovascular disease, neurodegenerative and psychiatric disease into more homogenous subgroups where human proof of concept can be acheived much more quickly, and with much higher rates of success. There's probably room for further innovation in this area, especially with the emergence of new tools to measure the brain. In addition to cardiovascular, neurodegenerative and psychitric disease, addiction, respiratory disease, diabetes and stroke are all areas with significant unmet need and few promising drug development candidates on the horizon. Breaking these into biomarker-defined subgroups can potentially make finding drugs against these diseases tractable.

I described this strategy as "failing" rather than "succeeding" faster. In practice, however, companies generally try to succeed rather than fail. However, one should be careful that by "failing fast" they are not merely pushing the risk from Phase 2 to Phase 3. While a negative Phase 1/2 result certainly will save investors money, a false positive Phase 1/2 result is a dangerous outcome, as it can drive tons of Phase 3 investment based on a false confidence in the Phase 1/2 results. Incyte's IDO inhibitor's recent Phase 3 failure is a great example of this risk, although the fallout for the rest of the field is still uncertain.

Advances in data analytics, imaging, microfluidics, gene sequencing and other biomarker-discovery tools have been applied to developing diagnostics or decision-support tools, but it is difficult to make this business model work (reimbursement is a huge issue). The highest value use of these biomarkers is to facilitate more efficient and targeted drug development, but many biomarker companies are intimidated by the prospect of developing drugs. Ultimately, however, many companies realize that this is the best -- and sometimes only -- way to build a business with their platform.

Opportunity: Apply the cancer and rare disease playbook to other major diseases: identify molecularly distinct subpopulations and develop drugs for them

Fail less

Failing more cheaply certainly decreases the cost of R&D, but if you are starting a company you'd probably prefer to reduce your chances of failure rather than just failing more quickly. Strategies for "failing less" range from focusing on biomarker-defined patient populations (same ideas as above), which can double the chances of success, to developing better preclinical models, to finding better ways to drug validated targets, to using bioinformatics to identify targets that are more likely to work in humans.

It would be wonderful if there was an AI capable of predicting likelihood of Phase 2 success given a molecule. That doesn't exist yet, and it isn't clear whether it could given the current data we have and our understanding of biology. But any incremental progress towards this would be incredibly valuable -- remember the above idea that every 10% improvement in predictive power of a preclinical model equates to a 10-100x improvement in brute-force efficiency.

Though the future of AI is promising, other bioinformatics approaches are uncovering high-quality targets today. The most promising initiatives combine a targeted biological hypothesis and focused data collection strategy with analytics capabilities.

Example: Regeneron Genetics Center

One of the most interesting approaches I've seen is the Regeneron Genetics Center. To put it simply, the idea is to find "human knockouts". "Knockout mice" are a very common research tool for studying links between a specific molecule and disease: delete a gene of interest from a mouse genome, and see if it makes the mouse healthier or sicker. The issue with this is that mouse biology is very different from human biology, so what works in mice most often doesn't work in humans. We cannot do knockout studies in humans for obvious reasons.

However, mother nature has created many human knockouts -- we just have to find them. That's what the Regeneron Genetics Center ("RGC") is doing. Inspired by the discovery of the link between PCSK9 mutations and cholesterol levels, RGC has created a large scale, automated pipeline to search the world for more human knockouts to inspire drug discovery efforts ¹.

Specifically, they are searching for rare single-gene mutations with large effect sizes (since you can only target one protein with a drug, you want to look for monogenic mutations with a strong association with disease) linked to extreme, disease-relevant phenotypes (in the case of PCSK9, the goal is not to treat people with rare PCSK9 mutations and high / low cholesterol, but to generalize this genotype:phenotype relationship to treat high cholesterol in all people). They have built a platform to cost-effectively execute on this thesis:

Leverage partnerships to quickly get large volumes of high-quality, longitudinal clinically annotated data
Focused strategy for data collection: large-scale population studies, Mendelian population studies (to observe the role genes play within multiple generations of a family), and founder population studies (populations that have less "background" genetic variation)
Low-cost whole exome sequencing rather than higher cost whole genome sequencing because only focused on protein-coding genes
Automated, high-throughput sequencing and analytics platform to lower costs of large-scale analysis

This strategy has worked well so far, with over 200K exomes sequenced (compared to the UK's nationwide sequencing program targeting 100K sequences), a recent partnership funded by AbbVie, Alnylam, AstraZeneca and Pfizer to sequence 500K exomes for the UK Biobank, and several interesting targets, including one for severe liver disease. It is an illustration of the fact that having a strong, clearly defined therapeutic hypothesis can be a much stronger starting point than a new analytics technology for finding targets with higher probability of clinical success. That is not to say that new tools like deep learning don't have potential, but their potential is best exploited when paired with a strong understanding of biology, an understanding of the strengths and limitations of various data sources, and a targeted strategy for efficiently getting the data you need.

I should note that you don't need a high-throughput platform for identifying genetic targets to support a company. Many startups are founded on the basis of just one genetic finding -- if this translates into a successful drug, that can be a billion-dollar company right there. But if you can do this at scale, maybe that billion-dollar company becomes a ten-billion-dollar company.

Other bioinformatics approaches

While RGC is a great example of using big data and genomics to reduce the cost of drug discovery, it is only one example. There is significant opportunity to create novel data pipelines to explore biologic hypotheses and find new therapies. One recent example is exploring the link between the microbiome and cancer. A few recent studies have shown links between gut microbe popuations and response to cancer immunotherapy. Companies like Second Genome are analyzing the gut microbiome in cancer patients before, during and after treatment with immuno-oncology drugs to predict which patients will respond to therapy and to identify potential therapeutic candidates. The amount of genomic and other data we have will continue to skyrocket, and there will be plenty of opportunities to wrangle new data sets to try and identify new, high quality targets.

This paper describes a similar model to RGC, but for predicting side effects rather than effectiveness.

Opportunity: "Biology first" data platforms: start with a biologic hypothesis, figure out what data you need to test it, get that data as cost-efficiently as possible, and analyze it with the right tools

Generating new data sets gives you the ability to tightly control the type of data you collect, but it can be expensive. There has been an explosion (see section starting on slide 288 of linked presentation) of clinical and genomic data in recent years, and we are in the early stages of deriving insight from that data. There is a huge amount of genomic data that has identified associations between genes and disease, but uncovering which of these associations are most meaningful and causative, and figuring out the mechanisms by which they cause disease, is a huge challenge. This paper is an example of how one can get more signal out of existing data by incorporating additional information from relevant functional studies or changing the weights of variants based on findings from other data sources. This "augmentation" of typical analyses of existing data sets can give researchers more confidence that a target is linked to disease, and also identify promising leads that actually don't hold up under further scrutiny.

The amount of data that we have is constantly growing, and I am sure that some of this data would be well suited to deep learning analyses. One interesting use of deep learning is to "connect the dots" between genomic and phenotypic associations. A recent Nature methods paper trains a visible neural network on millions of genotypes to simulate cellular growth. Tools like this, with the right datasets, can potentially enable identification of new disease-associated pathways. If the data used to generate these models is sufficiently representative of human biology, this can potentially identify many new validated targets. Again, the data is not the limiting factor: understanding the biology and developing good hypotheses that the data can test is the real bottleneck. Biology skills come first, data analysis second.

Opportunity: Combine tools and insights from multiple existing data sets to discover, validate and triage interesting targets and pathways, and develop drugs against them

Better ways to drug validated targets

If Phase 2 failure is caused by going after targets that aren't really important for disease, then it makes a lot of sense to focus on validated targets. A lot of validated targets are already adequately covered by existing drugs, but there are many "undruggable" proteins that we are fairly confident are involved in disease, but that we cannot target with traditional small molecules or biologics. Classic examples include Kras, Myc and Wnt.

Figuring out how to drug these targets is an area that traditional biotech VCs and entrepreneurs spend a lot of time on. With these targets, the biggest risk -- Phase 2 failure -- is greatly minimized, but actually hitting the targets requires a lot of creativity. Some groups focus on specific targets like Ras / Kras, some focus upstream / downstream of an "undruggable" target in known disease-related pathways, while others focus on new platforms for targeting proteins like protein-protein interactions or targeted protein degradation via the ubiquitin-proteasome system (or even targets upstream of ubiquitination). There are other new chemistry platforms or that potentially enable "drugging of the undruggable". Finding other such platforms that are "ready for prime time" can certainly form the basis of a company.

New biology platforms can unveil new ways to drug the undruggable, but clever biochemistry work can lead to solutions as well. Delinia is a great example of this. Delinia's founders recognized that a cytokine called IL-2 plays conflicting roles in guiding immune response. Low doses of IL-2 have been shown to treat several autoimmune diseases, but high doses lead to a toxic level of immune stimulation. Delinia realized that there are two types of IL-2 receptors: one regulates the immune system, treating autoimmune disease; the other stimulates the immune system, leading to toxic immune activation. Based on this insight, the team developed a drug to hit just the autoimmune-treating form of the receptor. Because the role of low dose IL-2 in treating disease was already established in other human studies (thus "Phase 2 risk" was low), once the team showed their drug was specific to the right receptor subtype, Celgene jumped in and bought them for $775M just three months after their $35M Series A, before the company had initiated any human studies. Another company is using synthetic biology to introduce synthetic amino acids to isolate the immune-stimulating properties of IL-2 to treat cancer.

Better preclinical models

One of the main reasons that so many drugs fail in Phase 2 is because the preclinical models we use to test whether drugs are effective are very poor predictors of human activity. Many drugs that cure cancer in mice don't work in humans. Hundreds of drugs have cured Type 1 Diabetes in mice, but no disease modifying drugs have worked in humans. For diseases of the brain like Alzheimer's, animal models are particularly poor predictors of human activity.

Even incremental improvements in the predictive power of preclinical models can yield significant reductions in the cost of failure. A 2016 study (commentary here) calculates that a 0.1 absolute increase in the correlation coefficient of a predictive model versus clinical outcome equates to a 10-100x increase in efficency. This report shows success rates by disease area; diseases with better predictive clinical models (eg infectious disease) are more successful in clinical studies than diseases with poorly predictive animal models (neurology, psychiatry, oncology). A dramatically better preclinical model for a given disease can unlock a tremendous amount of value.

Currently, the most common preclinical models are animal models (often mice or rats), whose biology is vastly different than that of humans, or cell-based functional assays, which can be done in human cells but lack the biological complexity of most diseases. The development of organs on a chip promises to enable researchers to study organ-level complexity (rather than just cellular-level complexity) in human organ systems. This will certainly not be perfect, but as stated above, even marginal improvements in predictive power can be highly valuable. To date, it seems that most organ-on-chip companies are outlicensing their assays to pharma companies rather than developing their own drugs. Integrating in-house organ-on-a-chip platform with proprietary drug development efforts could enable companies to capture more value and to more rapidly generate proprietary insight on how these models compare to standard animal models, cell-based assays, and clinical studies.

One promising new source of "next-gen" preclinical models is induced pluripotent stem cells (iPSCs). These cells are derived from skin or blood cells and can be programmed to develop into almost any cell type, which enables the creation of cell-based assays that more closely resemble human physiology. These advances are relatively recent (winning the Nobel Prize in Medicine in 2012) and are in the very early stages of being used to develop new drugs. These models are particularly useful in cardiovascular and neurological disorders, where our current predictive models are pretty terrible. As an example, a recent paper (summary here) used an iPSC-derived neuronal cell model to identify the biologic effect of a certain protein implicated in Alzheimer's. There was strong genetic evidence that this protein was associated with Alzheimer's, but in mouse models of disease, this protein did not have the expected negative effect. In the human neuron model, however, the protein's role in disease was confirmed. As these iPSC-derived models are still new, most applications of them have been to try to repurpose existing drugs. There is significant potential to build new drug discovery efforts using these models.

These models will not be perfect predictors of human biology. However, incremental improvements can be powerful. There is a certain threshold where improvement in a disease model's predictive power makes previously intractable diseases tractable: that threshold is higher than what current models offer, but lower than a perfect recapitulation of human biology.

Opportunity: Develop novel iPSC-derived cell-based assays to fill in the blanks between genetically identified targets and observed phenotypes, focused on cardiovascular and neurological / psychiatric disease

Phenotypic screening is often discussed as an alternative to the currently dominant target-based paradigm. This is a good summary of the potential and challenges of phenotypic screening. Basically, phenotypic screens, when done well, are better predictors of human activity than target-based models. The downside is that you don't always know what target your drug is interacting with, so it is hard to rationally optimize the chemical structure of the drug to interact with the target in the best possible way. The value of the phenotypic screen depends very highly on the quality of the assay -- if you have a bad phenotypic assay, not only will you not know exactly what molecules your drug is interacting with, but the activity you are seeing is probably meaningless. Phenotypic screening programs often require more investment upfront and a "front loading of risk". With disciplined go / no-go decision making based on early data, this approach can enable faster failing that target-based screening programs, which often don't fail until Phase 2.

There is probably an opportunity for more startups based on novel phenotypic screening strategies combined with advanced analytics and a platform for deconvoluting targets and mechanisms of action (Recursion Pharmaceuticals is an example of a company of this phenotype). As most established biopharma companies are firmly entrenched culturally and infrastructurally in the target-based discovery camp, there are probably a lot of phenotypic screening approaches that have been underinvested in or underexplored.

Finding promising new phenotypic assays, whether they be traditional cell-based, organs-on-chips, or stem-cell based, and intelligently building a drug discovery and development platform around the assays could be interesting if done right. Bioinformatics and computational biology can be very helpful, especially in analyzing genetic and transcriptomic and proteomic data (molecular phenotyping) to link phenotypic observations with specific molecular pathways. This could be an area where deep learning is useful, especially if the assays have an imaging component. Once the mechanism of action is identified, additional assays can be developed to do more traditional target-based screens. As Derek Lowe suggests, however, anyone embarking on this path should be very wary of the pitfalls of this approach and recruit a team or advisors with some battle scars.

Opportunity: Hybrid phenotypic / target-based screening platforms leveraging new assays (iPSC, organs-on-chips, etc.), high-throughput and high-content image-based screening platforms, machine learning and strong understanding of disease biology and genetics to discover new drugs with higher probabilities of clinical success

Problem 2: drug development toolkit is too limited

It is estimated that most drug development efforts focus on only 1-6% of potential targets. There are many diseases for which we don't have great targets or models, and many targets that are biologically important but difficult to drug. There are several ways to go about expanding this universe of targets, from advances in biology (identifying new disease-relevant targets) and chemistry (finding new ways to hit validated targets) to developing new therapeutic modalities like gene and cell therapy. We'll discuss some of those below. If you want a primer on the topic of drugging the undruggable, this post by Michael Gilman, CEO of Arrakis Therapeutics and EIR at Atlas Ventures, is a good overview (it describes his company's strategy of using established small molecule chemistry to target a whole new class of targets, RNA, but the way he positions the limits of medicinal chemistry is very nice).

Create or improve new therapeutic platforms

With the recent approvals of the first gene and genetically modified cell therapies in the United States, and the recent massive M&A deals for novel cell and gene therapy startups, gene and cell therapy have become very hot.

Gene therapy and cell therapy have been around for decades, but as completely new and very complex therapeutic modalities, there have been a lot of kinks to work out. Part of the reason Kite, Juno and AveXis garnered such significant premiums is because these companies had worked through a lot of these complexities related to safety, quality, and manufacturing (not to mention effectiveness). However, there is still a lot left to figure out. For example, the approved cell therapies are all CAR-T therapies targeting CD-19 to treat B cell lymphomas. Designing cell therapies that work for other cancers, especially solid tumors, is still a big challenge, not to mention cell therapies for other diseases. Gene therapy is limited by the "one shot" nature of the products -- genes are delivered using viral vectors, and the immune system develops resistance to these vectors after initial administration, so you just get one shot.

There has been a lot of investment in gene and T cell therapy 2.0, but there may still be opportunities for targeted startups with new technological solutions. Looking beyond T cells, allogeneic (off-the-shelf rather than custom for each patient) cell therapy, using machine learning to identify new candidate antigens, expanding beyond cancer, and next-gen programmed T cells, are some of areas the field is currently focusing on.

Delivery of oligonucleotide therapy is also a huge challenge. Using oligonucleotides (including DNA and RNA) as a therapy has the potential to be as revolutionary as antibodies, but the field has been hindered for various reasons for the last 20 years or so. One of the challenges is getting oligonucleotides to the right cells, and then to the right place in cells to do their work. The current hack is pulling the cells of interest out of the body, adding the oligos, then readministering them (this is cell therapy) or delivering the DNA with viral vectors (gene therapy does this). There are emerging non-viral vector approaches, but these only work in specific situations.

Another opportunity relates to the supply chain and infrastructure for gene and cell therapy. Manufacturing and logistics for cell and gene therapies is incredibly labor-intensive and challenging, and small deviances from standard processes (for example, using a similar piece of equipment from a different manufacturer) can result in a product with different biological activity. The big companies that have put the time and money behind scaling their operations are the clear leaders here, but there exists opportunity for advanced technology to automate or simplify the process.

Other novel modalities include microbiome therapies and next-gen bioelectronic medicines.

Opportunity: Ride the gene and cell therapy wave by exploring new disease areas, vectors, and cells, or integrate new technologies from other disciplines (deep learning, imaging, synthetic biology, etc.) to expand potential applications of existing CAR-T, TCR or AAV vector technology; or create software to support therapeutics companies

Explore new biology

This is perhaps the bread and butter of traditional biopharma VCs and entrepreneurs. The immuno-oncology revolution of the last few years resulted from exploration of new biology (tumors' ability to evade the immune system), and exploring therapeutics applications of new biological findings is a big source of new ideas for VCs and entrepreneurs. This is why most biotech VCs have PhDs in molecular biology.

Some areas of interest these days include improving response to immune checkpoint inhibitors in cancer, cancer treatments based on metabolism or targeting aspects of the tumor microenvironment, better targeting of tumors, new mechanisms for treating Alzheimer's (including inflammation), treatments for liver disease like NASH, exploring new rare disease biology, antigen-specific tolerance induction, exploring RNA's role in disease and potential as a small molecule target, looking for therapeutic targets related to epigenetics, exploring immune-cell specific metabolic pathways, non-addictive / non-opioid pain medications, and many many others.

There are opportunities for those of us who are not cell biologists in this area as well. There are a massive number of underexplored but potentially druggable targets. It seems like there would be an opportunity to mine the literature for information on these "dark" proteins (collectively referred to in the paper as the "ignorome"), triage those which seem most interesting, and then design proof-of-concept assays for some promising targets based on their associations with known active targets.

One major biological system that is still largely mysterious to us is the brain. It is certainly possible that major advances in this area could be possible in our lifetimes. There are many new tools, such as optogenetics, opening new windows into the brain, but there is still a long way to go until we can really get a good, high resolution, molecular picture of what happens in a living human brain. Current tools, like fMRI, EEG and genetics have led to a lot of great insights, but this has yet to translate into a wave of new treatments.

Problem 3: many burdensome diseases neglected by pharma

As Figure 5 in the prior post shows, most major diseases (cardiovascular disease, stroke, neurodegenerative disease, psychiatric disease, respiratory disease in particular) don't get a lot of R&D investment. In many cases, the biology doesn't lend itself to being easily druggable. A few companies are expanding the therapeutic arsenal using cell or gene therapy to treat heart disease, neurodegenerative disease, or neurological disease. Other companies are combining drugs with software-based, hardware-based or tech-supported behavioral interventions to augment (or replace) pharmaceutical therapy. Other companies are exploring how the microbiome influences the brain and other disease (especially gastrointestinal (GI) disease, inflammatory disease, and cancer).

While there has been a significant amount of investment in rare disease, advances in cell and gene therapy can potentially make more rare disease tractable. Particularly interesting, but also challenging, is using gene therapy to treat genetic disease: matching the therapy to the molecular cause of disease. Roughly 80% of rare diseases are genetic in origin, so there should still be plenty of rare genetic diseases in need of treatment.

Opportunity: Create treatments for under-invested disease areas leveraging new therapeutic modalities to match disease biology with treatment modality (gene therapy, cell therapy, microbiome therapy or bioelectronic medicines, or by combining drugs with behavioral interventions)

An aside: AI

There is a lot of talk about AI in bio these days, and it is a polarizing topic. I'm far from an expert on the subject, but I'll highlight a few articles outlining specific promising current use cases, as well as approaches that may not be as value-added. With a field like AI where the state of the art advances so quickly, however, it probably makes sense to look at least a bit into the future when trying to understand the technology's potential. As I am not able to do that, I'll leave that as an exercise for the reader. If you have thoughts about near-term applications of AI in bio, please I'd love to hear about them!

Derek Lowe, a very well respected medicinal chemist and biotech veteran (and a prolific blogger), discusses a few potentially interesting applications of AI / computational tools in automating some parts of chemical synthesis. Derek takes a balanced view on these topics in series of articles (I'll just link one here, you can follow the links to see the rest), although he is not shy about calling out overhyped tech when he sees it. One of the major uses of AI in drug discovery thus far is virtual screening, which is probably the "easiest" application of AI in drug discovery, but is not that useful, as Derek suggests (for the reasons described in Figure 7 in part 1 of this post). Another application of AI is finding new indications for existing drugs, which can sometimes find things that people can't, but sometiems it can't.

This paper seems interesting and describes a model that could potentially benefit from AI, although I'm not knowledgable enough in the area to know for sure. The authors demonstrated they were rapidly able to engineer organisms to produce a given set of molecules in a short period of time. Synthetic biology seems to be coming of age in bio, and I'd be interested to learn more about how these techniques can improve production of difficult to synthesize chemicals or open new areas of chemistry, and how AI might facilitate this process. I know basically nothing about this field, however, so I may be totally off base. Would love to learn more -- if you know of any good papers, let me know!

The holy grail would be an AI that can predict whether a drug will be safe and effective in humans given a molecular structure. That certainly doesn't seem to be a near-term possibility, and probably not even likely in the next few decades (think about what kind of data you'd need to do this, and how hard it might be to get that data), but advances in this field seem to come faster than expected these days. However, even an incremental improvement in predictive power vs. what we currently can accomplish would be incredibly valuable. If you are an AI researcher and want to make a lot of people healthier, work on this.

Beyond the above technical limitations of AI, there are some business-related issues as well. The biggest bottleneck for AI in bio is getting enough good data, and in most cases, the only groups that have the good data are pharma companies (and potentially providers and payers, though that's a different story). If they own the data, they'll own most of the economics.

Despite these limitations, there are some companies, notably Recursion Pharmaceuticals, targeting higher-value uses of AI, and building their own platforms for data generation and drug development to capture that value (insitro, recently launched by leading AI researcher Daphne Koller, seems to be doing something similar). The rate-limiting step to formation of more companies like this is talent -- there just aren't many people in the world who are highly skilled in biology, biochem and machine learning. If you're one of those people, there are lots of investors who'd fund you!

Follow @murphey_richard