Shared posts

06 Aug 09:57

Facilitating Genomics Research with Google Cloud Platform

by Research @ Google
Posted by Paul C. Boutros, Ontario Institute for Cancer Research, Josh Stuart, UC Santa Cruz, Adam Margolin, Oregon Health & Science University; Nicole Deflaux and Jonathan Bingham, Google Cloud Platform and Google Genomics

The understanding of the origin and progression of cancer remains in its infancy. However, due to rapid advances in the ability to accurately read and identify (i.e. sequence) the DNA of cancerous cells, the knowledge in this field is growing rapidly. Several comprehensive sequencing studies have shown that alterations of single base pairs within the DNA, known as Single Nucleotide Variants (SNVs), or duplications, deletions and rearrangements of larger segments of the genome, known as Structural Variations (SVs), are the primary causes of cancer and can influence what drugs will be effective against an individual tumor.

However, one of the major roadblocks hampering progress is the availability of accurate methods for interpreting genome sequence data. Due to the sheer volume of genomics data (the entire genome of just one person produces more than 100 gigabytes of raw data!), the ability to precisely localize a genomic alteration (SNV or SV) and resolve its association with cancer remains a considerable research challenge. Furthermore, preliminary benchmark studies conducted by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) have discovered that different mutation calling software run on the same data can result in detection of different sets of mutations. Clearly, optimization and standardization of mutation detection methods is a prerequisite for realizing personalized medicine applications based on a patient’s own genome.

The ICGC and TCGA are working to address this issue through an open community-based collaborative competition, run in conjunction with leading research institutions: the Ontario Institute for Cancer Research, University of California Santa Cruz, Sage Bionetworks, IBM-DREAM, and Oregon Health and Sciences University. Together, they are running the DREAM Somatic Mutation Calling Challenge, in which researchers from across the world “compete” to find the most accurate SNV and SV detection algorithms. By creating a living benchmark for mutation detection, the DREAM Challenge aims to improve standard methods for identifying cancer-associated mutations and rearrangements in tumor and normal samples from whole-genome sequencing data.

Given Google’s recent partnership with the Global Alliance for Genomics and Health, we are excited to provide cloud computing resources on Google Cloud Platform for competitors in the DREAM Challenge, enabling scientists who do not have ready access to large local computer clusters to participate with open access to contest data as well as credits that can be used for Google Compute Engine virtual machines. By leveraging the power of cloud technologies for genomics computing, contestants have access to powerful computational resources and a platform that allows the sharing of data. We hope to democratize research, foster the open access of data, and spur collaboration.

In addition to the core Google Cloud Platform infrastructure, the Google Genomics team has implemented a simple web-based API to store, process, explore, and share genomic data at scale. We have made the Challenge datasets available through the Google Genomics API. The challenge includes both simulated tumor data for which the correct answers are known and real tumor data for which the correct answers are not known.
Genomics API Browser showing a particular cancer variant position (highlighted) in dataset in silico #1 that was missed by many challenge participants.
Although submissions for the simulated data can be scored immediately, the winners on the real tumor data will not immediately be known when the challenge closes. This is a consequence of the fact that current DNA sequencing technology does not provide 100% accurate data, which adds to the complexity of the problem these algorithms are attempting to tackle. Therefore, to identify the winners, researchers must turn to alternative laboratory technologies to verify if a particular mutation that was found in sequencing data is actually (or likely) to be true. As such, additional data will be collected after the Challenge is complete in order to determine the winner. The organizers will re-sequence DNA from the cells of the real tumor using an independent sequencing technology (Ion Torrent), specifically examining regions overlapping the positions of the cancer mutations submitted by the contest participants.

As an analogy, a "scratched magnifying glass" is used to examine the genome the first time around. The second time around, a "stronger magnifying glass with scratches in different places" is used to look at the specific locations in the genome reported by the challenge participants. By combining the data collected by those two different "magnifying glasses", and then comparing that against the cancer mutations submitted by the contest participants, the winner will then be determined.

We believe we are at the beginning of a transformation in medicine and basic research, driven by advances in genome sequencing and computing at scale. With the DREAM Challenge, we are all excited to be part of bringing researchers around the world to focus on this particular cancer research problem. To learn more about how to participate in the challenge register here.
04 Aug 13:18

Venter hires Google machine learning expert to build Translate for genomics

by Nick Paul Taylor

Venter has gone to Mountain View to make his latest hire, nabbing Franz Och to build a Google Translate for genomics.

04 Aug 07:20

At risk for Cancer? Try Spicy Food!

by Gabriel, Lunatic Laboratories
Like spicy food? Well next time you eat some you should hug that chili pepper, because he might just save your life. This is an interesting find, researchers have found that […]...

de Jong PR, Takahashi N, Harris AR, Lee J, Bertin S, Jeffries J, Jung M, Duong J, Triano AI, Lee J.... (2014) Ion channel TRPV1-dependent activation of PTP1B suppresses EGFR-associated intestinal tumorigenesis. The Journal of clinical investigation. PMID: 25083990  

Ion channel TRPV1-dependent activation of PTP1B suppresses EGFR-associated intestinal tumorigenesis.

01 Aug 07:57

Key to aging immune system: Discovery of DNA replication problem

The immune system ages and weakens with time, making the elderly prone to life-threatening infection and other maladies, and scientists have now discovered a reason why.
23 Jul 07:53

U.K. Cabinet Office Adopts ODF as Exclusive Standard for Sharable Documents

The U.K. Cabinet Office accomplished today what the Commonwealth of Massachusetts set out (unsuccessfully) to achieve ten years ago

23 Jul 07:22

Release van Battlefield Hardline is uitgesteld naar begin 2015

by Sander Woestenburg

Of je nu een groot Battlefield-enthousiasteling bent of niet; er kan objectief gesteld worden dat de release van Battlefield 4 niet zo’n succes was. Er zaten iets teveel fouten, oftewel bugs, in de game en dat maakte deze niet lekker speelbaar. Natuurlijk moet de geschiedenis zich niet herhalen en dat is één van de redenen dat de release van Battlefield Hardline is uitgesteld naar begin 2015. Ontwikkelaars Visceral Games en DICE maken dit bekend op het officiële Battlefield Blog.

In het bericht dat door Karl Magnus Troedsson van DICE op het officiële Battlefield Blog werd gezet, maakt hij bekend dat ze meer tijd nodig hebben om van Battlefield Hardline de game te maken die ze willen. Vanaf de E3 2014 was er een bètaversie van de game beschikbaar voor alle Battlefield-enthousiastelingen en uit alle verzamelde data is voortgekomen dat er nog veel puntjes te verbeteren zijn. Dat is vrij logisch in een bètafase, maar ze hebben bij DICE en Visceral Games meer tijd nodig om alles dan ook voor elkaar te krijgen.

Het zijn niet alleen bugs die ervoor zorgen dat Battlefield Hardline is verschonen naar het volgende jaar, want de bèta liet DICE ook concluderen dat ze de game willen uitbreiden. Ze hebben gekeken naar hoe de multiplayer werd gespeeld in de testfase en willen daarop anticiperen. Karl Magnus Troedsson heeft het over een verbeterde ‘cops and criminals fantasy’ en ideeën van de community, die de Battlefield-ervaring moeten verrijken. Daarnaast wordt er ook aandacht geschonken aan de singleplayer-modus, gezien deze volgens Troedsson meer diepgang nodig heeft. Ze willen dat de ervaring voldoet aan de eisen van de fans, waardoor ze er iets meer tijd voor nodig hebben.

Ondanks dat er veel extra tijd in het ontwikkelen van Battlefield Hardline gaat zitten, wil Troedsson benadrukken dat Battlefield 4 niet vergeten wordt. Ook dat deel gaat nog geruime tijd voorzien worden van bug fixes en nieuwe content. ‘Begin 2015′ is echter wel waar we het mee moeten doen wat Battlefield Hardline betreft, want een specifieke releasedatum wordt niet bekendgemaakt.

Release van Battlefield Hardline is uitgesteld naar begin 2015

Release van Battlefield Hardline is uitgesteld naar begin 2015

The post Release van Battlefield Hardline is uitgesteld naar begin 2015 appeared first on Gamekings.

21 Jul 08:53

Mantle voor Thief: een interview met de ontwikkelaars

Thief is na Battlefield 4 de tweede game die gebruik kan maken van AMD’s Mantle API. De Nederlandse gamestudio Nixxes was verantwoordelijk voor de Mantle-implementatie. Hardware.Info had eind maart een uitgebreid interview met Nixxes-oprichter Jurjen Katsman en Mantle-ontwikkelaars Frank de Bresser en Tim van Klooster. Dit interview werd eerder gepubliceerd in Hardware.Info Magazine #2/2014. Mantle is een nieuwe 3D-API die volgens AMD de videokaart veel efficiënter kan aansturen dan DirectX, “c...
10 Jul 12:53

Genetic Genealogy and the Single Segment

by Steve On Genetics
Last year, my wife Janet and I sent our DNA off to 23andMe for analysis. Among the tools that they provide is a "Relative Finder," which lists other people on the site who share regions of DNA that appear to be identical by descent. In my case, there are 476 people listed, each sharing between 0.07% and 0.46% of my genome, almost always as a single segment (there are 18 people with whom I share two segments). These people are generally anonymous, but you have an opportunity to make contact and invite them to "share genomes," which means only that you can see which regions are shared.

There are a lot of people on 23andMe who are quite interested in this tool, and who use it for genetic genealogy. Many of these same people also use Family Tree DNA and ancestry.com. As a result of my interactions with these 23andMe relatives, and following the discussions on the 23andMe community forums, I have been thinking about, and researching, what it means to share one segment of DNA by descent with someone. In the process, I have realized some things that are not fully appreciated by most of the genealogy buffs on 23andMe.

I am presenting these insights here, and will consider them one at a time.
  • Distant relatives often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.
Distant relatives (fourth cousins and beyond) often share no genetic material.
The chances of not sharing any DNA at all becomes appreciable with fourth cousins and rises to approximately half with fifth cousins. This is based on my own simplified calculations and those of Donnelly (1983), who opines that "proof of descent from William Shakespeare does little to increase the probability that the claimant has genes in common with him." There are limits to what can be accomplished by genetic genealogy that are imposed by the real chance that you simply do not share any DNA at all with distant relatives. The more distant the relationship, the more likely it is that no DNA is shared.

On the other hand, you have to inherit your DNA from somebody, so there are some blocks of identity by descent that have been transmitted many generations.

It is possible to share a segment with very distant relatives.
"The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.

No matter how far back you go, every nucleotide of one's genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. This number is based on the term (1 - P(rec))n, where P(rec) is the probability that the segment will be broken up by recombination (1-size/100, where size is in cM.). For 19th cousins sharing a single ancestor, n is 40.

Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.

Sometimes, more distant relationships are more likely.
23andMe reports a "predicted relationship" (e.g. "4th cousin") and a "relationship range" (e.g. "3rd to 7th cousin"). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.

Here is my estimate of these values. You can skip this paragraph is you're not interested in the details.
The probability that a segment, if transmitted, will not be broken up by recombination is 1 minus the probability of recombination, which is 5% for a 5 cM. segment, 10% for a 10 cM. segment and so on. (If you are moving up a pedigree, this is the probability the segment was transmitted rather than created by recombination, but the value is the same.)
The probability that a segment is will be transmitted at all is one-half per generation.
Thus, for an nth cousin sharing a single ancestor, the probability is ((1-P(rec))/2)^(2n+2).
For an nth cousin sharing two ancestors (the usual case), the probability is
2(((1-P(rec))/2)^(2n+2)). For example, the probability of two 4th cousins sharing a specific 5 cM. segment is 2(((0.95))/2)^(10)) = 0.00117. If one has more than 855 4th cousins, then the expected number of 4th cousins sharing this segment will be greater than 1. Because every 4th cousin has the same chance of inheriting the segment, the expected number of 4th cousins who do share the segment will be directly proportional to the number of 4th cousins one has. In the case of 5th cousins, the probability of sharing a specific segment is 2(((0.95))/2)^(12)) = 0.00026, which would require 3,790 cousins for the expected number sharing the segment to exceed 1.0. In general, the number of cousins of a specific degree who should be expected to share a segment is given by

2(((1-P(rec))/2)^(2n+2)) x N

world population growth
where N is the number of relatives of that degree. For a 5 cM. segment, if the number of cousins of degree n+1 that you have is 4.43 times the number of cousins of degree n that you have, then you expect more cousins of degree n+1 than cousins of degree n to share the segment. For a 10 cM. segment, this ratio is 4.94.

Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.

As an alternative to the math, consider the case of my (hypothetical) great-great-great-grandfather Joe. Let’s say that I have inherited a 5 cM. segment of DNA from him. (It’s likely that I have inherited at least one segment from him.) Our concern is whether a distant relative that shares this segment is more likely to be a fourth cousin also descended from Joe or a fifth cousin descended from Joe’s father Jacob. The chance that the 5 cM. segment was inherited by Joe, from Jacob, is slightly less than half (because of the possibility of recombination in that generation). Jacob had 12 children, so I can expect to have 12 times as many fifth cousins descended from Jacob as fourth cousins descended from Joe. That fact ends up being more significant than the chance of recombination, so I will share the segment in question with more fifth cousins than fourth cousins. This same logic applies to fifth vs. sixth cousins and so on.

Thus, my 23andMe relatives sharing one IBD segment might be fourth cousins, as predicted, or they might be distant cousins connected by prolific ancestors. There is no way to know.

The world population has increased perhaps 20-fold in the last millennium, but that works out to significantly less growth than the sustained doubling required to predict distant ancestry for people who share one IBD segment. Nevertheless, there are well-documented cases of rapid demographic expansion.
Most of your relatives may be descended from a small fraction of your ancestors.
Given that family size varies a great deal, it is no doubt common to have some ancestors who have left many more descendants than others. We all have 64 great-great-great-grandparents, typically in 32 couples. If one family among the 32 had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 31 great-great-grandparents) would account for over 3/4 of your fourth cousins.

In summary, it is impossible to know the relationship one has to relatives who are discovered by virtue of their sharing a single autosomal segment of DNA. The "predicted relationship" is uncertain, and even the range is hard to be sure of. The extensive information provided by 23andMe is a very useful tool for genealogy, but it cannot tell you about relatives with whom you do not share any genetic material by descent. On the other hand, relatives with whom you do share genetic material by descent can be quite distant.
10 Jul 12:17

Unsuck your writing

by Stephen Turner
I recently found this little gem of a web app that analyzes the clarity of your writing. Hemingway highlights long, complex, and hard to read sentences. It also highlights complex words where a simple one would do, and highlights adverbs, suggesting you use a stronger verb instead. It highlights passive voice (bad!), and tells you the minimum reading grade level necessary to understand your writing.

When I pasted in some text from an abstract I submitted to ASHG years ago it showed me just how terrible and difficult to understand my scientific writing really is. My abstract text, which should have been hard-hitting and easy to understand at a glance, required a minimum grade 20 reading level. The majority of my 14 sentences were very hard to read and littered with too many adverbs, complicated words, and several uses of passive voice. (I still got a talk out of the submission, so maybe we as scientists enjoy reading tortuous verbiage...).



It looks like a desktop version is in the works, but the web app seemed to work fine, even for a 100,000-word manuscript I tried.

http://www.hemingwayapp.com/
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
10 Jul 12:11

Collaborative lesson development with GitHub

by Stephen Turner
If you're doing any kind of scientific computing and not using version control, you're doing it wrong. The git version control system and GitHub, a web-based service for hosting and collaborating on git-controlled projects, have both become wildly popular over the last few years. Late last year GitHub announced that the 10-millionth repository had been created, and Wired recently ran an article reporting on how git and GitHub were being used to version control everything from wedding invitations to Gregorian chants to legal documents. Version control and GitHub-enabled collaboration isn't just for software development anymore.

We recently held our second Software Carpentry bootcamp at UVA where I taught the UNIX shell and version control with git. Software Carpentry keeps all its bootcamp lesson material on GitHub, where anyone is free to use these materials and encouraged to contribute back new material. The typical way to contribute to any open-source project being hosted on GitHub is the fork and pull model. That is, if I wanted to contribute to the "bc" repository developed by user "swcarpentry" (swcarpentry/bc), I would first fork the project, which creates a copy for myself that I can work on. I make some changes and additions to my fork, then submit a pull request to the developer of the original "bc" repository, requesting that they review and merge in my changes.

GitHub makes this process extremely simple and effective, and preserves the entire history of changes that were submitted and the conversation that resulted from the pull request. I recently contributed a lesson on visualization with ggplot2 to the Software Carpentry bootcamp material repository. Take a look at this pull request and all the conversation that went with it here:

https://github.com/swcarpentry/bc/pull/395

On March 27 I forked swcarpentry/bc and started making a bunch of changes and additions, creating a new ggplot2 lesson. After submitting the pull request, I instantly received tons of helpful feedback from others reviewing my lesson material. This development-review cycle went back and forth a few times, and finally, when the Software Carpentry team was satisfied with all the changes to the lesson material, those changes were merged into the official bootcamp repository (the rendered lesson can be viewed here).

Git and GitHub are excellent tools for very effectively managing conflict resolution that inevitably results from merging work done asynchronously by both small and very large teams of contributors. As of this writing, the swcarpentry/bc repository has been forked 178 times, with pull requests merged from 71 different contributors, for a total of 1,464 committed changes and counting. Next time you try reconciling "tracked changes" and comments from 71 contributors in a M$ Word or Powerpoint file, please let me know how that goes.

In the meantime, if you're collaboratively developing code, lesson material, chord progressions, song lyrics, or anything else that involves text, consider using something like git and GitHub to make your life a bit easier. There are tons of resources for learning git. I'd start with Software Carpentry's material (or better yet, find an upcoming bootcamp near you). GitHub also offers courses online and in-person training classes, both free for-fee (cheap). You can also learn git right now by trying git commands in the browser at https://try.github.io.
Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
01 Jul 07:43

Learn Dutch in your sleep: Listening to lessons while sleeping reinforces memory

When you have learned words in another language, it may be worth listening to them again in your sleep. A study has now shown that this method reinforces memory. "Our method is easy to use in daily life and can be adopted by anyone," says the study director.
20 May 07:29

Test: het nut van meer cores

Quad-core processors zijn inmiddels mainstream en er zijn al processors met zes cores of zelfs meer. Worden al deze cores eigenlijk wel gebruikt door momenteel courante software? Hardware.Info zoekt het uit. Begin januari introduceerde AMD haar nieuwe APU’s codenaam Kaveri, chips met een viertal CPU-cores van de Streamroller generatie en daarnaast 512 GPU-kernen. Klik hier voor onze review. Hoewel de CPU-prestaties op gelijke klokfrequentie een dikke 10% zijn toegenomen ten opzichte v...
19 May 07:31

The World Class Guide To Body Fat Percentage

by Siddharth Saini, Workout Trends
It seems like body fat percentage is a topic well versed among most of us or hell, at least bears a feeble acquaintance to us. Exhibiting a notorious character, body fat percentage becomes quite a task in terms like, how to calculate it, how to keep a follow up and track it, or worse, how […] The post The World Class Guide To Body Fat Percentage appeared first on ....

Blaak E. (2001) Gender differences in fat metabolism. Current opinion in clinical nutrition and metabolic care, 4(6), 499-502. PMID: 11706283  

Gender differences in fat metabolism.

Adam-Perrot A, Clifton P, & Brouns F. (2006) Low-carbohydrate diets: nutritional and physiological aspects. Obesity reviews : an official journal of the International Association for the Study of Obesity, 7(1), 49-58. PMID: 16436102  

Low-carbohydrate diets: nutritional and physiological aspects.

19 May 07:09

Parrot Flower Power: Plantenverzorging 2.0

Gaan planten bij jou thuis ook altijd standaard binnen een paar maanden dood? Dan heeft Parrot wellicht een oplossing in de vorm van de ‘Flower Power’. Dit kleine apparaatje lijkt op een plantje met twee afgeknotte takjes, maar is feitelijk een combinatie van sensoren die het wel en wee van jouw planten in de gaten houden. De Flower Power logt temperatuur, lichtinval, vochtigheid van de aarde en de hoeveelheid voedingsstoffen in de aarde. Deze data wordt opgeslagen en kan draadl...
16 May 07:44

Is Desktop Linux Secure?

 Datamation: I'm asked this all the time: is using Linux on the desktop more secure than Windows?

30 Apr 08:51

The Language of DNA

by sedeer, Inspiring Science
One of the striking things about the genetic code is the remarkable way it twists back on itself, combining redundancy …Continue reading →...

Goodman DB, Church GM, & Kosuri S. (2013) Causes and effects of N-terminal codon bias in bacterial genes. Science (New York, N.Y.), 342(6157), 475-9. PMID: 24072823  

Causes and effects of N-terminal codon bias in bacterial genes.

29 Apr 07:27

Ook Nederlandse politie ontraadt gebruik Internet Explorer

by redactie
mjpdejong

Wellicht handig om in de grondwet op te nemen!

Ook de Nederlandse politie raadt het gebruik van Internet Explorer af. 'Gebruik het alleen als het niet anders kan, en in uitgebreide beveiligde ...
29 Apr 07:26

Details and Perspectives as Illumina Announces their Newest DNA Sequencing Machines and the $1,000 Human Genome

by Geoffrey Hannigan, Prophage
A couple of days ago, at the healthcare investment JP Morgan Healthcare Conference, the CEO of Illumina (one of the major DNA sequencing technology companies) announced their newest line sequencing machines. The two new DNA sequencers are the NextSeq 500 and the HiSeq X10, with the NextSeq 500 being marketed for everyday laboratory use, and the HiSeq X10 being marketed as a factory level, population sequencer (this is the higher power model)......

Erika Check Hayden. (2014) Is the $1,000 genome for real?. Nature. DOI: 10.1038/nature.2014.14530  

Is the $1,000 genome for real?

Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, & Law M. (2012) Comparison of next-generation sequencing systems. J Biomed Biotechnol. DOI: 10.1155/2012/251364  

Comparison of next-generation sequencing systems.

29 Apr 07:26

Hacking Illumina GA IIx to Study RNA-Protein Interactions

by nextgenseek

Although RNA-Protein binding is very important in many biological processes, like splicing & many post-transcriptional processes, there are not that many high-throughput assays to study Protein-RNA interactions. Contrast this to the advances in DNA-Protein interactions.  Nature Biotechnology has an interesting paper from Greenleaf group at Stanford, that hacked Illumina GA IIx to study RNA-binding in experiments.

Hacking or repurposing (as the authors call) the Illumina sequencer, they were able to design an assay that quantitatively measures protein binding to over 10 million RNA targets on the Illumina flow cell surface. And thus give the ability to perform ultra-high-throughput & quantitative measurement on RNA-protein interactions.  The team also developed methods to analyze the images from sequencing reactions to measure equilibrium binding constants and dissociation kinetics.

The team led by William Greenleaf  used the hacked Illumina sequencers cleverly to create transcribed RNA pieces that are still attached to the DNA on the flow cell.  These DNA tethered RNA molecules attached to the flow cell were then used as substrate for the fluorescent labeled RNA Binding protein to bind. Since the protein is fluorecently labeled, the RNA binding event can then be quantitatively measured using the imaging setup that is part of the sequencer.

Hacking Illumina GA IIx to Study RNA-Protein Interactions

Hacking Illumina GA IIx to Study RNA-Protein Interactions (Source: Greenleaf Lab)

Tethering Millions of RNA Pieces to DNA on Illumina Flow Cell

Briefly, instead of sequencing regular DNA samples, the team designed RNA target sequences that can be transcribed by E. coli RNA polymerase (RNAP) in the Illumina flow cell. The designed RNA target sequences contained RNAP initiation-and-stall sequence and a region coding for diverse sequence variants of the MS2 RNA. They were also barcoded to identify different RNA variants.

The really clever experimental tricks used to quantify RNA-protein binding is as follows.  After sequencing the designed RNA targets, they removed the sequenced DNA strand and generated dsDNA, created a terminal biotin-streptavidin roadblock on the dsDNA fragments.  Then they used E.Coli RNAP to generate 26 bases of RNA, just before the RNA Polymerase is stalled.  Then, removing any excess RNA polymerase by a wash step, provided all four nucleotides to allow RNAP to transcribe the variable region and stall at the biotin-streptavidin roadblock.

Essentially, this sequence of complicated procedures give rise to “transcribed RNA” that is still tethered to its parent DNA by RNA polymerase. This way they could create RNA array containing over 12 Million distinct clonal RNA populations comprising 1.48 × 10^5 unique sequences in a single sequencing lane.

Quantitative Measurement of  RNA-Protein Binding

Now that you have RNA tethered to DNA on the Illumina flowcell, the team flowed the flourescently labeled RNA-binding protein (MS2 coat protein) over the flow cell, so that RNA binding protein can be bound to the RNA.  They performed the experiment at 10 different concentrations of the RNA binding protein. Then, the team used the image analysis tools that they developed to analyze the fluorescent decay to measure binding and dissociation constants. Pretty neat hack.

How the Stanford Team Hacked Illumina GA IIx?

The method section in the RNA MaP paper says that sequencing was done in California based ELIM Biopharmaceuticals. It is not clear whether the whole hacking also done by the company.[See the comment from one of the authors of the paper.] If you wondered, how they hacked the Illumina GA IIx, here is the part describing that in the paper.

To improve the optics and allow for equilibrium measurements on an Illumina sequencer, we modified the sequencer in several ways. First, we exchanged the standard Illumina fluorescence filter to a filter optimized for SNAP-Surface 549 fluorescence emission (Semrock FF01-562/40-25). Second, we eliminated unwanted wash steps after imaging and during the ‘safe state’ mode by changing the default SCS files. C:\Illumina\SCS2.10\DataCollection\bin\Config\HCMConfig.xml was modified to: , and C:\Illumina\SCS2.10\DataCollection\bin\Config\ImageCyclePump.xml was modified to . We also shortened all the fluidics lines of the GAIIx and the associated paired-end module.

16 Apr 07:27

Lens turns any smartphone into a portable microscope

The Micro Phone Lens can turn any smartphone or tablet computer into a hand-held microscope. The soft, pliable lens sticks to a device's camera without any adhesive or glue and makes it possible to see things magnified dozens of times on the screen.
04 Apr 07:43

Russisch leger loopt ver achter op high-tech leger NAVO

by Stieven Ramdharie
T-72-tanks zonder goede bepantsering. Een luchtmachtgeneraal die al zijn operaties leidt met een mobieltje. Soldaten zonder nachtzichtkijkers. ...
31 Mar 07:29

Supercomputers, The Human Brain and the Advent of Computational Biology

by JB Sheppard, Antisense Science
What makes a supercomputer different from a human brain, and how is this leading to a better understanding of ourselves? ...


28 Mar 13:11

Switch from Photoshop to Gimp: Tips From a Pro (rileybrandt.com)

Sjon shared this story from Hacker News 100.

Comments
28 Mar 08:35

Online gaming augments players' social lives, study shows

Online social behavior isn’t replacing offline social behavior in the gaming community, new research shows. Instead, online gaming is expanding players’ social lives. "Gamers aren't the antisocial basement-dwellers we see in pop culture stereotypes, they're highly social people," says the lead author of a paper. "This won't be a surprise to the gaming community, but it's worth telling everyone else. Loners are the outliers in gaming, not the norm."
27 Mar 16:15

The Dead Sea Scrolls and an Open Marine Transcriptome Project

by C. Titus Brown

In 1947 a Bedouin shepherd found a bunch of ancient scrolls in a cave near the Dead Sea. These scrolls, now known as the Dead Sea scrolls, included some of the oldest known Biblical texts as well as other Jewish religious writing. Over the next few decades, these scrolls - of immense historical importance -- remained in the possession of a small team of scholars, largely unpublished.

In 1991, the Huntington Library made available a complete microfilm copy of the Scrolls, thus dramatically opening up research in this area. Cool! A story of open data and open research!

What the heck does this have to do with transcriptomes?

With the advent of ridiculously inexpensive deep sequencing, many labs, big and small, have been sequencing lots and lots of transcriptomes. Transcriptomes are, generally speaking, fairly inexpensive to sequence ($1000/sample, using a HiSeq); much easier to assemble than genomes; and quite useful in their own right, in terms of enabling downstream research.

Unfortunately, many of these transcriptomes remain immured behind lab walls, often for lack of bioinformatics (human) resources. Even worse, for non-model organisms, the transcriptomes are most useful in context -- as we discussed in the Cephalopod Sequencing Consortium white paper, an isolated transcriptome from a deeply divergent critter is only useful inasmuch as you can annotate the transcripts by homology. So these transcriptomes are subject to a classic network effect where individually they are not as useful as they would be collectively.

In other words, there's a lot of potential for accelerating biology if we can only figure out how to get people to open up their data. (Just like the Dead Sea Scrolls! Sort of.)

Hmm. I wonder if offering to do the analysis for them would help?

So let's try that, shall we?

The basic idea

A while back I suggested crowdsourcing -omic analysis. I think we are going to try out a related idea on marine transcriptomes.

The bare-bones details go something like this:

  1. We would solicit (say) 100 marine animal mRNAseq data sets, ~50-100m reads each (for two+ conditions, if you have 'em), from anyone.
  2. We would take each data set and pass them through our transcriptomics pipeline (open, versioned, etc). Estimated cost to run on Amazon? ~$100-200.
  3. We would then provide an annotated transcriptome for download, a BLAST server, spreadsheets of the annotations, and spreadsheets of differential expression information, to the owners of the data.
  4. One year after the data was given to us, we would put make the data and analysis publicly available under a CC-BY or CC0 license (on figshare? SRA? Amazon?) and provide a citation handle for the data+analysis (e.g. on figshare).

Potential embellishments include the idea of finding money to sequence ~20 or more samples as part of this.

A collaborator and I are planning to post ~5-10 such data sets already, and the protocols for doing the analysis are getting closer to complete (see them here).

What's in it for you?

If you're the proud provider of an mRNAseq data set, what's in it for you?

  1. You'll get an initial transcriptome analysis that will help you drive your biology.
  2. We'll give you some tools to explore the transcriptome data (although they might be just a BLAST server, assembly download, and spreadsheet download at first).
  3. We'll also automatically compute appropriate diagnostic outputs and cross-checks to reassure you that your results are OK.
  4. We'll manage submission of your raw data to an archival server, sufficient for publication purposes.
  5. There will be an easy citation for the first part of the methods section of your mRNAseq analysis.
  6. Free and decent bioinformatics analysis. Consulting fees for this kind of thing range from $50/hr to $200/hr; if someone hired me to run this kind of transcriptome assembly for them, I'd charge ~$1000 per 100m reads. But you can't beat free :)

What's in it for us?

Why on earth would we do this?

  1. I did my graduate work in evo devo & have a lot of friends in this field; I can't think of many cheaper ways to accelerate invertebrate research within the Bilateria than to help people analyze their data.
  2. We really do believe in open source, open science, and open data, and think this would be a great demo project.
  3. We get to study the computation as well as the bioinformatics of many different mRNAseq assemblies (which has got to be fascinating!) and thoroughly test our assembly pipeline on a lot of data. Having a chance to assemble many different transcriptomes would help us understand when it does and doesn't work, and help us (and, later on, others) improve assembly.
  4. We expect to develop lots of new assembly comparison tools as part of figuring out whether or not our approaches worked!
  5. Citations of our work, increased impact, and collaborations on hard research problems are sure to follow.
  6. I think this would upset people who are currently hoarding genomic and transcriptomic data of great interest, and I like being disruptive.

Tough decisions and our proposals to address them

What license would we publicly release the data and analysis under? Probably either CC-BY or CC 0, which would adhere to normal academic requirements of "I used this so I should cite it."

For the data & analysis citation handle, what would be the authorship? My proposal is 1/2 and 1/2: my lab gets either first or last, your lab gets the other one, and we do the rest alphabetically.

Who would have access to the data during the embargo period? We're thinking of offering three options: (1) the data becomes available immediately upon the analysis being completed; (2) the data can be used for aggregate analysis by my lab and specifically named collaborators during the 1-yr embargo period; or (3) the data can only be used by my lab to do the analysis, and not for anything else by anyone other than the original authors, until the 1-yr embargo is up.

The last option, (3), would potentially limit our ability to improve your assembly and would certainly reduce the quality of the annotations, but would be the most conservative and acceptable for some.

What license would the software and pipeline be under? Oh, that's actually easy. BSD/CC0: free for all to use, reuse, and abuse.

Where can you sign up?

If you're potentially interested, fill out this short form and we'll let you know when we have all our ducks in a row.

Anything else you want to say, Titus?

Yep, two things:

First, we're looking for partners in crime. If anyone wants to work with us on this, we're game. Drop me a note.

Second, this would be a sunlight operation: everything we did would be freely and openly visible, excepting only the specifics of the data where appropriate. We have a really annoying culture of inside dealing and data hoarding in evo devo, and I don't want to play that game.

I'd love your comments and thoughts.

--titus

p.s. Thanks to Andy Cameron for the Dead Sea Scrolls story!

27 Mar 15:49

Apps uit Play Store minen litecoins en dogecoins tijdens opladen toestel

by Arnoud Wokke
Diverse apps uit de Play Store gebruiken apparaten van gebruikers om cryptocurrencies als litecoins en dogecoins te minen. Dat gebeurt als de gebruiker het toestel oplaadt, waardoor het stroomverbruik minder moet opvallen.
27 Mar 15:47

Respawn laat Titanfall-cheaters alleen nog tegen elkaar spelen

by Mark Hendrikman
Cheaters in Titanfall mogen de game wel blijven spelen, maar alleen tegen andere cheaters. Respawn omschrijft het als het 'Wimbledon van de aimbot-wedstrijden' en stelt "hopelijk heb je de beste aimbot die er is, anders kan het nog frustrerend worden voor je. Succes."