mjpdejong
Shared posts
Blockchain, Data Storage: The Future Is Decentralized
Bioinformatics vs Computational Biology
First Human Embryos Edited in U.S.
Could Hermione Tackle MinION Yield Variability?
Read more »
HiSeq move over, here comes Nova! A first look at Illumina NovaSeq
Illumina have announced NovaSeq, an entirely new sequencing system that completely disrupts their existing HiSeq user-base. In my opinion, if you have a HiSeq and you are NOT currently engaged in planning to migrate to NovaSeq, then you will be out of business in 1-2 years time. It’s not quite the death knell for HiSeqs, but it’s pretty close and moving to NovaSeq over the next couple of years is now the only viable option if you see Illumina as an important part of your offering.
Illumina have done this before, it’s what they do, so no-one should be surprised.
The stats
I’ve taken the stats from the spec sheet linked above and produced the following. If there are any mistakes let me know.
There are two machines – the NovaSeq 5000 and 6000 – and 4 flowcell types – S1, S2, S3 and S4. The 6000 will run all four flowcell types and the 5000 will only run the first two. Not all flowcell types are immediately available, with S4 scheduled for 2018 (See below)
S1 | S2 | S3 | S4 | 2500 HO | 4000 | X | |
---|---|---|---|---|---|---|---|
Reads per flowcell (billion) | 1.6 | 3.3 | 6.6 | 10 | 2 | 2.8 | 3.44 |
Lanes per flowcell | 2 | 2 | 4 | 4 | 8 | 8 | 8 |
Reads per lane (million) | 800 | 1650 | 1650 | 2500 | 250 | 350 | 430 |
Throughput per lane (Gb) | 240 | 495 | 495 | 750 | 62.5 | 105 | 129 |
Throughput per flowcell (Gb) | 480 | 990 | 1980 | 3000 | 500 | 840 | 1032 |
Total Lanes | 4 | 4 | 8 | 8 | 16 | 16 | 16 |
Total Flowcells | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Run Throughput (Gb) | 960 | 1980 | 3960 | 6000 | 1000 | 1680 | 2064 |
Run Time (days) | 2-2.5 | 2-2.5 | 2-2.5 | 2-2.5 | 6 | 3.5 | 3 |
For X Ten, simply mutiply X figures by 10. These are maximum figures, and assume maximum read lengths.
Read lengths available on NovaSeq 2×50, 2×100 and 2x150bp. This is unfortunate as the sweet spot for RNA-Seq and exomes is 2x75bp.
As you can see from the stats, the massive innovation here is the cluster density, which has hugely increased. We also have shorter run times.
So what does this all mean?
Well let’s put this to bed straight away – HiSeq X installations are still viable. This from an Illumina tech on Twitter:
@biomickwatson HiSeqX will still be cheaper per genome until the S4 flow cell is launched. S4 currently scheduled for 2018
— Neil Ward (@GenomicsUK) January 9, 2017
We learn two things from this – first, that HiSeq X is still going to be cheaper for human genomes until S4 comes out, and S4 won’t be out until 2018.
So Illumina won’t sell any more HiSeq X, but current installations are still viable and still the cheapest way to sequence genomes.
I also have this from an un-named source:
speculation from Illumina rep “X’s will be king for awhile. Cost per GB on those will likely be adjusted to keep them competitive for a long time.”
So X is OK, for a while.
What about HiSeq 4000? Well to understand this, you need to understand 4000 and X.
The HiSeq 4000 and HiSeq X
First off, the HiSeq X IS NOT a human genome only machine. It is a genome-only machine. You have been able to do non-human genomes for about a year now. Anything you like as long as it’s a whole genome and it’s 30X or above. The 4000 is reserved for everything else because you cannot do exomes, RNA-Seq, ChIP-Seq etc on the HiSeq X. HiSeq 4000 reagents are more expensive, which means that per-Gb every assay is more expensive than genome sequencing on Illumina.
However, no such restrictions exist on the NovaSeq – which means that every assay will now cost the same on NovaSeq. This is what led me to say this on Twitter:
NovaSeq kills 4000 not X
— Mick Watson (@BioMickWatson) January 9, 2017
At Edinburgh Genomics, roughly speaking, we charge approx. 2x as much for a 4000 lane as we do for an X lane. Therefore, per Gb, RNA-Seq is approx. twice as expensive as genome sequencing. NovaSeq promises to make this per-Gb cost the same, so does that mean RNA-Seq will be half price? Not quite. Of course no-one does a whole lane of RNA-Seq, we multiplex multiple samples in one lane. When you do this, library prep costs begin to dominate, and for most of my own RNA-Seq samples, library prep is about 50% of the per-sample cost, and 50% is sequencing. NovaSeq promises to half the sequencing costs, which means the per-sample cost will come down by 25%.
These are really rough numbers, but they will do for now. To be honest, I think this will make a huge difference to some facilities, but not for others. Larger centers will absolutely need to grab that 25% reduction to remain competitive, but smaller, boutique facilities may be able to ignore it for a while.
Capital outlay
Expect to get pay $985k for a NovaSeq 6000 and $850k for a 5000.
Time issues
One supposedly big advantage is that NovaSeq takes 40 hours to run, compared to the existing 3 days for a HiSeq X. Comparing like with like that’s 40 hours vs 72 hours. This might be important in the clinical space, but not for much else.
Putting this in context, when you send your samples to a facility, they will be QC-ed first, then put in library prep queue, then put in sequencing queue, then QC-ed bioinformatically before finally being delivered. Let’s be generous and say this takes 2 weeks. Out of that sequencing time is 3 days. So instead of waiting 14 days, you’re waiting 13 days. Who cares?
Clinically having the answer 1 day earlier may be important, but let’s not forget, even on our £1M cluster, at scale the BWA+GATK pipeline itself takes 3 days. So again you’re looking at 5 days vs 6 days. Is that a massive advantage? I’m not sure. Of course you could buy one of the super-fast bioinformatics solutions, and maybe then the 40 hour run time will count.
Colours and quality
NovaSeq marks a switch from the traditional HiSeq 4 colour chemistry to the quicker NextSeq 2 colour chemistry. As Brian Bushnell has noted on this blog, NextSeq data quality is quite a lot worse than HiSeq 2500, so we may see a dip in data quality, though Illumina claim 85% above Q30.
Illumina Unveils HiSeq Successor NovaSeq
Read more »
BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences
A Genetically Modified Malaria Vaccine Has Passed an Important Hurdle
University of California Cries "Thief!" on Genia Patents
Read more »
Sequencing Technology Outlook, January 2017
Read more »
University of California makes legal move against Roger Chen (and Genia)
The relationship between sequencing companies is frosty and beset by legal issues, which I’ve covered before here and here. Keith Robison tends to cover in more detail
Most recently, PacBio moved against Oxford Nanopore, we think claiming that ONT’s 2D technology violated their patent on CCR (link).
Well now the absolute latest is a filing by the University of California against Roger Chen and therefore Genia. If you click through to the documents (requires registration) you’ll see that UC claim Chen, with others, produced key inventions whilst at UC that he later assigned to Genia, but which should have automatically been assigned to UC according to UC’s “oath of allegiance”, which Chen signed as a UC employee.
It awaits to be seen how important this is and no doubt Chen/Genia/Roche will fight tooth and nail; however if the courts decide in UC’s favour it could spell the end of Genia, and at the very least see a large cash settlement with UC.
Fascinating times!
De smartphone steelt de stilte
This young scientist retracted a paper. And it didn’t hurt his career
Nathan Georgette experienced the peaks and troughs of a life in science, all before he was old enough to buy a beer.
And despite the typical stigma of retracting a scientific paper, Georgette, a fourth-year student at Harvard Medical School, is doing just fine — serving as a model to those many decades his senior.
Can Reproduction Be Ageless?
Roche Abruptly Breaks Off PacBio Partnership
Read more »
New Funds for ONT, Self-Sequencing Challenge from Clive Brown
‘Dear plagiarist’: A scientist calls out his double-crosser
It’s a researcher’s worst nightmare: Pour five years, and at least 4,000 hours, of sweat and tears into a study, only to have the work stolen from you — by someone who was entrusted to confidentially review the manuscript.
But unlike many sordid tales of academia, this one is being made public. Dr. Michael Dansinger, of Tufts Medical Center, has taken to print to excoriate a group of researchers in Italy who stole his data and published it as their own.
Is the long read sequencing war already over?
My enthusiasm for nanopore sequencing is well known; we have some awesome software for working with the data; we won a grant to support this work; and we successfully assembled a tricky bacterial genome. This all led to Nick and I writing an editorial for Nature Methods.
So, clearly some bias towards ONT from me.
Having said all of that, when PacBio announced the Sequel, I was genuinely excited. Why? Well, revolutionary and wonderful as the MinION was at the time, we were getting ~100Mb runs. Amazing technology, mobile sequencer, tri-corder, just incredible engineering – but 100Mb was never going to change the world. Some uses, yes; but for other uses we need more data. Enter Sequel.
However, it turns out Sequel isn’t really delivering on promises. Rather than 10Gb runs, folk are getting between 3 and 5Gb from the Sequel:
@AW_NGS @BioMickWatson @flashton2003 TBH we’ve not switched new projects from RSII yet. But anecdotally other users told me 3-5Gb at UGM.
— John Kenny (@jgskenny) December 8, 2016
At the same time, MinION has been coming along great guns:
Yesterday we announced 5-10Gb, high accuracy MinION Flow Cell release, PromethION Early Access, more. Missed it? https://t.co/8GTto8W9Ur
— Oxford Nanopore (@nanopore) September 30, 2016
Whilst we are right to be skeptical about ONT’s claims about their own sequencer, other people who use the MinION have backed up these claims and say they regularly get figures similar to this. If you don’t believe me, go get some of the World’s first Nanopore human data here.
PacBio also released some data for Sequel here.
So how do they stack up against one another? I won’t deal with accuracy here, but we can look at #reads, read length and throughput.
To be clear, we are comparing “rel2-nanopore-wgs-216722908-FAB42316.fastq.gz” a fairly middling run from the NA12878 release, m54113_160913_184949.subreads.bam and one of the Sequel SMRT cell datasets released.
Read length histograms:
As you can see, the longer reads are roughly equivalent in length, but MinION has far more reads at shorter read lengths. I know the PacBio samples were size selected on Blue Pippin, but unsure about the MinION data.
The MinION dataset includes 466,325 reads, over twice as many as the Sequel dataset at 208,573 reads.
In terms of throughput, MinION again came out on top, with 2.4Gbases of data compared to just 2Gbases for the Sequel.
We can limit to reads >1000bp, and see a bit more detail:
- The MinION data has 326,466 reads greater than 1000bp summing to 2.37Gb.
- The Sequel data has 192,718 reads greater than 1000bp, summing to 2Gb.
Finally, for reads over 10,000bp:
- The MinION data has 84,803 reads greater than 10000bp summing to 1.36Gb.
- The Sequel data has 83,771 reads greater than 10000bp, summing to 1.48Gb.
These are very interesting stats!
This is pretty bad news for PacBio. If you add in the low cost of entry for MinION, and the £300k cost of the Sequel, the fact that MinION is performing as well as, if not better, than Sequel is incredible. Both machines have a long way to go – PacBio will point to their roadmap, with longer reads scheduled and improvements in chemistry and flowcells. In response, ONT will point to the incredible development path of MinION, increased sequencing speeds and bigger flowcells. And then there is PromethION.
So is the war already over? Not quite yet. But PacBio are fighting for their lives.
People are wrong about sequencing costs on the internet again
People are wrong about sequencing costs on the internet again and it hurts my face, so I had to write a blog post.
Phil Ashton, whom I like very much, posted this blog:
New blog post – how much would it cost to sequence a bacterial genome on the promethion? https://t.co/jjskQCrKUP
— Phil Ashton (@flashton2003) November 25, 2016
But the words are all wrong I’ll keep this short:
- COST is what it COSTS to do something. It includes all COSTS. The clue is in the name. COST. It’s right there.
- PRICE is what a consumer pays for something.
These are not the same thing.
As a service provider, if the PRICE you charge to users is lower than your COST, then you are either SUBSIDISED or LOSING MONEY, and are probably NOT SUSTAINABLE.
COST, amongst other things, includes:
- Reagents
- Staff time
- Capital cost or replacement cost (sequencer and compute)
- Service and maintenance fees
- Overheads
- Rent
Someone is paying these, even if it’s not the consumer. So please – when discussing sequencing – distinguish between PRICE and COST.
Thank you
Microsoft Spends Big to Build a Computer Out of Science Fiction
How high-protein diets cause weight loss
Can CRISPR Save Ben Dupree?
72 miljoen winst voor Rijk Zwaan
The Discovery of DNA Structure – Who Stayed in the Shadows of a Nobel?
Leslie Pray. (2008) Discovery of DNA structure and function: Watson and Crick. Nature Education, 1(1). info:other/
Vitamins A and C help erase cell memory
Hore, T., von Meyenn, F., Ravichandran, M., Bachman, M., Ficz, G., Oxley, D., Santos, F., Balasubramanian, S., Jurkowski, T., & Reik, W. (2016) Retinol and ascorbate drive erasure of epigenetic memory and enhance reprogramming to naïve pluripotency by complementary mechanisms. Proceedings of the National Academy of Sciences, 201608679. DOI: 10.1073/pnas.1608679113
Retinol and ascorbate drive erasure of epigenetic memory and enhance reprogramming to naïve pluripotency by complementary mechanismsAs DNA reveals its secrets, scientists are assembling a new picture of humanity
When Benedict Paten stares at his computer monitor, he sometimes gazes at what looks like a map of the worst subway system in the world. The screen is sprinkled with little circles that look like stations. Some are joined by straight lines — sometimes a single path from one circle to the next, sometimes a burst of spokes radiating out in many directions. And sometimes the lines bend into sweeping curves that soar off on express routes to distant stations.
A rainbow palette of colors makes it a little easier to digest the complexity. But if you stare a little too long, vertigo sets in.
Oxford Nanopore Announces New Pores, Kits and Updates on Projects
The curious case of the $9,500 skin gel
Even in an age when prescription drugs are increasingly expensive, a $9,500 tube of gel to combat scaly skin can gain notice — especially when the price spikes 128 percent overnight.
That’s what happened earlier this month when a little-known company called Novum Pharma suddenly hiked wholesale prices for all three of its dermatology products by whopping amounts.
Largest-ever study reveals environmental impact of genetically modified crops
Perry, E., Ciliberto, F., Hennessy, D., & Moschini, G. (2016) Genetically engineered crops and pesticide use in U.S. maize and soybeans. Science Advances, 2(8). DOI: 10.1126/sciadv.1600850
Genetically engineered crops and pesticide use in U.S. maize and soybeans