To Digitize and Decipher: An introduction to Viome’s Research Blog
Wish you a Healthy New Year in 2023! At Viome, we re-dedicate ourselves every year to our mission of addressing the epidemic of chronic disease globally. And this year, we are excited to start a new research blog to share our learnings and findings from the unique data we’ve been collecting at Viome!
Six years ago when I embarked on this adventure called Viome, I never thought I would learn so much so quickly about our nature as humans. And I never imagined that we would find so many opportunities to make a difference to the health and wellness of humanity.
From the beginning our goal at Viome Life Sciences has been to ‘digitize and decipher the human body at a molecular level to prevent and reverse chronic disease’. An ambitious goal, but I and our Viome founders have always believed that in order to address the epidemic of chronic disease in the 21st century, we need to get to the underlying biological processes that drive entire spectrums of disease, by collecting and analyzing vast amounts of biological data.
As of this writing, we have collected almost 450,000 biological samples from about 300,000 individuals from more than 100 countries around the world. From those samples, we have extracted and sequenced more than 700 trillion RNA nucleotides, the building blocks of our genomeactivity, which gives us an unprecedented view into biological activity of microbial and host human cells within these samples. We also have an amazingly rich set of phenotype metadata from these individuals, such as their symptoms, diseases, medications, lifestyles, and demographics.
This is an absolutely groundbreaking dataset in terms of quality and quantity, and it continues to grow every single day! This is a goldmine of potential new insights that could answer so many questions, or at least help us ask the right questions to lead us to our goal of addressing a wide range of chronic diseases.
Which brings us to the purpose of this Research Blog. Thousands upon thousands of Viome customers and research participants are really citizen scientists who have joined our mission to uncover the mysteries behind chronic disease. We want to thank you sincerely for joining us on this journey, and we want to share with you some of the exciting insights we uncover as we explore questions and analyze this unique primary dataset from a very large and diverse population. We will only present summaries of de-identified information, with the intent of sharing with you the key insights that will hopefully inform, educate, and inspire. Your contributions could ultimately help us help you and your fellow citizens, with new and improved products.
Every story I create, creates me
Before jumping in, let’s take a step back. In 1958 Francis Crick stated a ‘central dogma of molecular biology’ that DNA (deoxyribonucleic acid) makes RNA (ribonucleic acid) makes Proteins, implying that genetic information travels in this one direction. Well, scientific dogmas are only accurate until they are disproven, and this one is no exception; scientists found that RNA can sometimes rewrite the information in genes, and that RNA can produce very different forms of proteins, and that there are many different forms of RNA, etc. In fact, RNA was a precursor of DNA in the evolution of life on our planet. Most importantly, what became clear is that RNA has a central role in molecular biology, and that is a good conceptual starting point to start our exploration of the molecular root causes of chronic disease.
Each of our cells has a nucleus which contains DNA molecules, organized into genes, which contain instructions for cell functioning and the characteristics that make you unique. But your individual DNA is a rather static genetic code throughout your life. Yes, some of these DNA randomly mutate during cell division and other processes, but there is a lot of built-in cellular machinery that immediately tries to “repair” these mutations and restore them to their original state. The mutations that survive may lead to certain genetic disorders, but these lead to a small percentage (probably <20%) of diseases, and certainly not the majority of what we call chronic diseases.
So your DNA genetic code is something like a static dictionary of words and phrases for the language of your life. The actual story of your life, on the other hand, is expressed dynamically in sentences, paragraphs, and chapters captured by RNA, by selecting words and phrases that are relevant, and stringing them together to form the story. This dynamic expression of words and phrases, the genetic expression, is much more interesting and unique to you, and turns out to reveal the secrets of your illness and your health.
In other words, your biological story – the work that your cells perform to generate energy, divide, repair, signal, fight infections, etc., – is unique because a specific set of your genetic words are “expressed” in response to what you decide to do every day. These things you do give rise to your environmental triggers, specifically the food you eat, the activities you do, and very importantly, the trillions of microorganisms that live inside and around you.
When genes are expressed from these triggers, DNA molecules are copied, or “transcribed”, to become RNA molecules. Which in turn are the starting point of most biological activity – they get translated into a variety of proteins that bind, activate, inhibit, and perform various actions “on the streets of life”. And the concrete products of this activity, as well as the debris (byproducts) left behind, are the thousands of metabolites in our system.
Now hopefully it’s easier to see why Viome collects information about RNA rather than DNA or proteins or metabolites. Rather than collecting information about the static DNA genetic words and phrases in your cells, or the very complex protein actions in the streets of life, let alone the products and byproducts of these actions, we decided in Viome to collect information about your expressed RNA genetic story. This story, also known technically as metatranscriptomics, is an intertwined plot of many known and unknown characters, with layers upon layers of sub-plots and motivations, enacted sometimes with poor lighting, but always moving forward with intention and purpose. This RNA story, if it can be captured, rather elegantly describes your biology better than anything else.
Data to information to insights
With the laboratory technologies we have today, such as shotgun sequencing, the biological story we want to capture is unfortunately all scrambled up when your biological sample is turned into digital data by a sequencing machine. Fortunately, we have developed some excellent bioinformatics algorithms that can reverse engineer and re-assemble huge amounts of these scrambled words (sequencing reads), identify the phrases (genes), and turn them into sentences (genomes) that make sense – from raw data to understandable information.
We then use a suite of advanced algorithms from the world of artificial intelligence (AI) and machine learning (ML) to turn that information into insights that can be interpreted and used. For example, we combine the assembled genes from your sample with the information you provided in your questionnaires (aka phenotype metadata such as your symptoms, diseases, medications, lifestyles, and demographics), to turn it into health scores and recommendations in the Viome App that you can see, as well as diagnostic biomarkers that underlie our discovery products such as CancerDetect. All this is at the cutting edge of technology that we will write more blog articles about.
Now we can look again at the numbers at the beginning of this article. Viome has so far processed almost 450,000 samples from customers and study participants, sequencing a total of about 700 trillion nucleotides. The raw data produced by the sequencer is processed by our bioinformatics algorithms to produce more than 40 billion molecular features (genes and genomes). From that information, our AI/ML algorithms have produced upwards of 30 million health scores to date. On top of that, there are an untold number of biological insights that are waiting to be discovered, and that is the topic of our research blog series.
Clearly, the bioinformatics and AI/ML algorithms shown in the figure process huge amounts of molecular and phenotype data. It is impossible to make sense out of the massive data from molecular biology without the use of modern computing and algorithms. Any way you look at it, modern molecular biology is as much an information science as it is a natural science.
Cheers to your favorite cocktail!
Every single day I’m fascinated by what we find, a signal in the noise, a pattern in the chaos, an insight that might help us ask the next question. We want to convey this excitement and joy of discovery, with a sense of wonder, a sense of humility, and hopefully also a sense of humor.
Making sense of these complex phenomena and findings is one of the hardest parts of our endeavor. And while figuring this out, it’s easy to get carried away with jargon and details that take away from the main point. It’s also easy to oversimplify important ideas and potentially mislead. Just when I think that I recognized a primary color in the spectrum of biology, I’m shown that the spectrum of colors is huge, like visible light being a part of the vast electromagnetic spectrum, with properties unknown and effects to be discovered.
We don’t want to describe these like the dry gin of scientific manuscripts, nor the syrupy sweetness of tabloid stories, but more like a gourmet cocktail that mixes the right amount of kick from scientific rigor, an exciting taste of the ‘aha!’ moments of discovery, and the irresistible colors of what it might mean to you personally. And to top it off, we will put a small beach umbrella or a tropical flower to take home with you and maybe flaunt at your next cocktail party!
Happy New Year!