Thank you for the question. One of the techniques I use for determine the gene codes for hepatitis C virus is called Sanger sequencing, which was invented in 1977 by a guy called Frederick Sanger.
During Sanger sequencing a copy of the hepatitis C virus genes are made from building blocks called nucleotides. The four different nucleotides (adenine, guanine, thymine and cytosine) make up the genetic code for all genes, even our own. Some of the nucleotides used in Sanger sequencing have a fluorescent colour attached to them and when these nucleotide becomes incorporated into the copy gene the copying process stops and a camera registers the fluorescent colour. Several gene-copying processes are carried out for the same gene and because the fluorescent nucleotides stop the process at different places in the gene, the copy genes will have different lengths. The camera will register the fluorescent nucleotide of all these gene copies and a computer program will then put the length of the copied genes in order, starting with the shortest, and make a note of the fluorescent colour at the end of each gene copy. This way it will slowly build up a long line of gene codes consisting of the four nucleotides.
I also use a technique called next-generation sequencing, which is the latest form of sequencing. With this process I can find the genetic code for 1 million genes in one reaction!
One of the most used methods currently is called massive parallel sequencing- this comes in many flavors.
This works sort of like what Frederick Sanger invented: adding labeled nucleotides, shooting them with lasers, and using a camera to see what the nucleotide was.
But instead of doing this to one strand of DNA, millions are built up on a plate at once and the very sensitive camera takes a picture of all of these at the same time.
By doing a million side by side it means we can get A LOT of data very quickly now; compared to 20 years ago.
I also use next generation sequencing. I like to think of reading a genome like a book except it’s too difficult for us to read it from start to finish at the moment (scientists are trying to read DNA directly but we can’t yet).
So instead we take lots of copies of the same book and chop up the sentences (DNA) into short bits randomly so that each copy is cut up differently. Then we can read these small pieces of around 100-500 letters (DNA bases) using next generation sequencing machines.
But how do we put it all back together again to get the original book (genome)?! We need computers to do this part, it is called assembly. Essentially we line up the short reads using the bits that overlap like bricks in the wall of a house, until as many of the short reads overlap to re-create a longer pieces of the original. This can get us from millions of short reads to less than 100 pieces of the genome and sometimes with a little extra work this can result in us getting the whole of the genome into one single piece.
Using short pieces to work out the full sequence
Book example:
-THE- ENTENCE
THIS-IS E-WHO -SEN NCE
IS-TH HOLE-SE
Comments
Ben commented on :
One of the most used methods currently is called massive parallel sequencing- this comes in many flavors.
This works sort of like what Frederick Sanger invented: adding labeled nucleotides, shooting them with lasers, and using a camera to see what the nucleotide was.
But instead of doing this to one strand of DNA, millions are built up on a plate at once and the very sensitive camera takes a picture of all of these at the same time.
By doing a million side by side it means we can get A LOT of data very quickly now; compared to 20 years ago.
Rebecca commented on :
I also use next generation sequencing. I like to think of reading a genome like a book except it’s too difficult for us to read it from start to finish at the moment (scientists are trying to read DNA directly but we can’t yet).
So instead we take lots of copies of the same book and chop up the sentences (DNA) into short bits randomly so that each copy is cut up differently. Then we can read these small pieces of around 100-500 letters (DNA bases) using next generation sequencing machines.
But how do we put it all back together again to get the original book (genome)?! We need computers to do this part, it is called assembly. Essentially we line up the short reads using the bits that overlap like bricks in the wall of a house, until as many of the short reads overlap to re-create a longer pieces of the original. This can get us from millions of short reads to less than 100 pieces of the genome and sometimes with a little extra work this can result in us getting the whole of the genome into one single piece.
Using short pieces to work out the full sequence
Book example:
-THE- ENTENCE
THIS-IS E-WHO -SEN NCE
IS-TH HOLE-SE
THIS-IS-THE-WHOLE-SENTENCE
DNA example:
TAGCG TAGG
ATGGA TGCGT GCCGT GCTTCA
GACTG GTAGC GTA
ATGGACTGCGTAGCGCCGTAGGCTTCA