NAV

INTERPIN Help Page


We designed the INTERPIN (INtrinsic TERmination hairPIN) program to predict Intrinsic transcription terminators in bacterial genomes. The INTERPIN database contains predictions on 12745 bacterial genomes. It is the largest collection of predicted intrinsic terminators to date with approximately 25 million hairpins.

Through the INTERPIN algorithm, we have found effective transcription termination units where the contiguous (within a distance of 14 bases) hairpins group together to form cluster hairpins. Here two or more hairpins work in tandem to cause termination and might even have higher efficiency than the sum of individual termination efficiencies.

In the database, the bacterial genomes have been organized into respective phyla or can be searched by their name as well as NCBI ID. In each prediction for a bacteria, the users can gain information on the predicted operons, frequency of cluster and single hairpins and their distance from the stop codon. Raw information about all hairpin predictions including their location, energies etc. can also be downloaded.

To aid the visualization of hairpins, an IGV visualizer has been added for each genome. Users can view and interact with the predicted hairpins, operons and corresponding gene annotations here. Hairpin secondary and tertiary structures can also be viewed, along with other features.

The figure shows the classification of cluster and single hairpins by our algorithm. Hairpins at less than 15 bases from each other are clubbed in a group called the cluster. These hairpins are expected to work together to cause intrinsic termination. The other hairpins lying at larger distances from each other are called single. In the figure above, c1 and c2 form one cluster while c3, c4 and c5 form another. s1 is a single hairpin.

INTERPIN webserver

The webserver is divided into four hyperlinked tabs (see the navigation bar below):

NAVIGATION BAR



HOME PAGE

RESULTS


Obtaining hairpin terminator predictions

NOTE: Prediction results for bacteria can be viewed by two ways

  1. Choose Phyla for bacteria of interest (PHYLA page)
  2. Enter NCBI ID for bacteria of interest (NCBI page)

PHYLA PAGE


Classification is a way to group similar organisms and describe the diversity of bacterial species. Bacteria have been classified on different parameters including cell structure, metabolism, shape etc. Phyla-based classification is a taxonomic classification.

We have grouped all analyzed bacteria into respective phylum. Here, the Phyla page displays Phylum/ Class names as buttons that can be selected to show and select all contained bacterial species in that phylum. A phylogenetic tree is also shown below the buttons.

A phylogenetic tree or evolutionary tree is a way of showing the evolutionary relationship among various biological species or other entities based on similarities and differences in their genetic or physical characters. It helps us see how species / groups evolved from a common ancestor and in what order they diverged. One can calculate how much related two species are using distances from the tree.

The phylogenetic tree displayed here shows the phylogenetic tree of phyla taken in our study consists of 9 phyla and 6 classes belonging to phylum proteobacteria. These include Firmicutes, Chlamydiae, Actinobacteria, Planctomycetes, Spirochaetes, Fusobacteria, Cyanobacteria, Thermodesulfobacteria, Acidobacteria, Proteobacteria (Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, Epsilonproteobacteria, Other proteobacteria). Under each phylum, the total number of bacterial species under that phylum that are presently stored in our webserver is shown.

Given below are the steps to retrieve terminators for a bacterium if its phylum is known.



NCBI PAGE


NCBI stands for National Center for Biotechnology Information. The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. The NCBI houses a series of databases relevant to biotechnology and biomedicine and is an important resource for bioinformatics tools and services (https://www.ncbi.nlm.nih.gov/).

All sequences deposited in the NCBI database have a unique sequence identifier. Users can use this unique sequence identifier to retrieve terminators for a bacterium, as given below.



PREDICTIONS


In this section, we explain the format of results displayed on selecting a bacterium. The page shows a ‘PREDICTION RESULTS’ table, which summarizes the terminator hairpins predicted in each bacterium. Shown below are results from ‘Bordetella pertussis I669’.



  1. The tabular output above shows the following data
    1. Bacteria name and NCBI ID (linked to corresponding page in NCBI website)
    2. Phylum
    3. Genome size (in nucleotides, nt)
    4. % GC content
    5. Number of operons in the genome
    6. Number of predicted single hairpin
    7. Number of predicted cluster hairpin

    Note: Operons have been found using the Molquest program (http://molquest.com/), with additions of missing regions from NCBI. For more information see our publication [ref]

  2. Pie chart
    1. The figure is an interactive pie chart displaying the distribution of predicted hairpins in Bordetella pertussis I669 (NZ_CP010265) located at different distance ranges from stop codon (i.e operon end). This helps in understanding where the majority of the predictions lie from the operon end.


    2. In the pie chart above, 36.4% of hairpins lie at 0-5 nt distance from the stop codon, 23.5% of hairpins lie at 6-10 nt distance and so on.

    3. Hover pointer/ mouse over each sector to see its label (hairpin distance / hp-distance from operon end, and total prediction number falling in that distance range)
    4. The bin sizes for the distance shown in the pie chart: 0-5 nt, 6-10 nt, 11-15 nt, 16-25 nt, 26-35 nt, 36-50 nt, above 50 nt. (nt- nucleotides)
    5. Download plot as png option available - Hover cursor on pie title or pie and camera icon appears on the top right corner. Click on it and save in the desired location

    NOTE:



  3. Hyperlink to view prediction file
    1. Prediction results can be viewed by clicking on the button available on the results page ('Click here to view prediction file')
    2. A new page appears with the prediction results data in a tabular format
    3. The columns of the prediction table are summarized below in the form of Column number – Content type
      • 1 – Hairpin predicted on which strand type (forward, reverse)
      • 2 - Operon start position in the genome
      • 3 - Operon end position in the genome
      • 4 - Hairpin start position in the genome
      • 5 - Hairpin end position in the genome
      • 6 - Hairpin associated energy
      • 7 - Type of hairpin (Cluster, Single)
      • 8 - # hairpin (1 for single), else number specified

    4. Users can also download the prediction file, by clicking the button available at the top of the page. The prediction table displayed will be downloaded in CSV format.

  4. Hyperlink to view alternate prediction file
    1. Alternative termination site prediction results can be viewed by clicking on the button available on the results page ('Click here to view alternative termination sites prediction file')
    2. A new page appears with the prediction results data in a tabular format
    3. The columns of the prediction table are summarized below in the form of Column number – Content type
      • 1- Operon boundary (operon start, operon non-coding region end, operon end); for operons with more than one alternate termination site, separate rows show the predicted sites individually
      • 2- Alternate termination site start
      • 3- Alternate termination site end
      • 4- Hairpin length- stem length (one side) followed by loop length eg. 5s4l, so total hairpin length will be 5*2+4=14. For cluster hairpin, lengths of all constituent hairpin are given and the reported energy is the average energy of constituent hairpins.
      • 5- Hairpin energy (in Kcal/mol)
      • 6- Hairpin type(cluster, single)
      • 7- Hairpin predicted on which strand type (forward, reverse)

    4. Users can also download the prediction file, by clicking the button available at the top of the page. The prediction table displayed will be downloaded in CSV format.

  5. Link to genome Browser (IGV)

    IGV or the Integrative Genomics Viewer is a widely used interactive tool for the visualization of genomic data in the context of a reference genome. One can view sequences, alignments, annotations inside the IGV tool. More information can be found on its webpage. (https://software.broadinstitute.org/software/igv/ ).

    We have integrated the tool into the INTERPIN webserver to enable users to view hairpin predictions, operons, gene annotations of all analyzed bacterial genomes with respect to location on the genome. Below is an example of what all features can be used in the viewer:

  6. Integrated Genome Viewer (IGV) loaded with 3 tracks displayed - predicted hairpin track, operon track and genome annotation track


    IGV searched by gene name - metW; fc_15 and fc_16 are clicked from track for further analysis. fc_15 and fc_16 appear under Selected Hairpins


    Integrated Genome Viewer (IGV) is displayed with three loaded tracks. The names of the tracks can be toggled on/off by the "Track Label" button on top right side of the window.
    1. Track information
      1. Hairpin track (all hp) - This track contains predicted terminator hairpins at the predicted location in the genomes. This is shown in green and maroon color.
      2. Operon track (operon) - This track contains the operons at their location in the genomes. It helps in correlating the position of the hairpin with respect to the operon end. This is shown in yellow and red color
      3. Annotation track (Bordetella_pertussis_I669NZ_CP010265)- This track displays gene annotations and other genomic features available in gff files of the bacteria. It helps in understanding the terminator hairpin locations with respect to the gene. This is shown in blue and red.
    2. Color coding using in the tracks is described in the legend below


    3. Tracks can be zoomed into using (+ / - symbol) or clicking and dragging across the reference line containing size.
    4. Clicking any element from any track gives its name, location and gene information (for annotation track only).
    5. There are two ways to search for a location of choice by using the below box.
      1. Search using location - Enter location coordinates in the box, to reload the tracks at the desired location within the genome. The locations should be input in this format: Id (already displayed in the search box): location range eg. CP010265: 1-20000; the selected range is zoomed into.
      2. Search using gene of interest - Users can enter gene of interest (eg. metW) and click the search icon. The genome browser reloads by zooming into the track at gene location
    6. To get more detailed information like the 2D and 3D structure of any desired hairpin, it can be clicked from the ‘allhp’ track. All selected hairpins will appear with ID (eg. fc_12) on the right pane under ‘Selected Hairpins’
    7. Click on the hyperlink to see more details pertaining to the 2D and 3D structures of the hairpin.

HAIRPIN information obtained upon clicking hairpin link on IGV page