RNA Probe Design

Overview of RNA probe design

The RNA Probe Design tab allows you to enter RefSeq annotations in order to retrieve the probes for your target or list of targets. You can either enter the annotations manually into the text box provided, or upload a file with one annotation per line. If entering annotations manually, please separate each entry with a comma and a space, as shown in the example entry. Once you have entered your annotations, select which input option you used, and press the Submit button to retrieve your probes.

Screenshot of the PaintSHOP target input interface.

Screenshot of the PaintSHOP target input interface.

PaintSHOP will return a density plot of probe coverage and a dynamic table with your probes. The density plot is intended to provide a visual estimate of how well covered your annotations are. RNA probe tables have the following columns:

Isoform-resolved RNA probe sets:

  • refseq: The transcript ID of the transcript that the probe targets, stripped of any version suffixes e.g. NM_001180043
  • chrom: the chromosome of the probe sequence
  • start: the start coordinate of the probe sequence
  • stop: the stop coordinate of the probe sequence
  • sequence: The DNA sequence of the oligo probe
  • Tm: The melting temperature of the probe sequence
  • on_target: The on-target score generated by the Homology Optimization Pipeline
  • off_target: The off-target score generated by the Homology Optimization Pipeline
  • repeat_seq: Whether or not the probe sequence contains bases flagged as repetitive by RepeatMasker. 0 = False, 1 = True
  • max_kmer: The maximum number of occurences of all 18-mers in the probe sequence in the genome it targets
  • probe_strand: The strand orientation of the probe sequence. Plus (+) or minus (-)
  • transcript_id: The unmodified transcript ID of the transcript that the probe targets e.g. NM_001180043.1
  • gene_id: The gene ID of the gene whose transcript the probe targets e.g. PAU8

Isoform-flattened RNA probe sets:

  • refseq: The gene ID of the gene whose transcript the probe targets e.g. PAU8
  • chrom: the chromosome of the probe sequence
  • start: the start coordinate of the probe sequence
  • stop: the stop coordinate of the probe sequence
  • sequence: The DNA sequence of the oligo probe
  • Tm: The melting temperature of the probe sequence
  • on_target: The on-target score generated by the Homology Optimization Pipeline
  • off_target: The off-target score generated by the Homology Optimization Pipeline
  • repeat_seq: Whether or not the probe sequence contains bases flagged as repetitive by RepeatMasker. 0 = False, 1 = True
  • max_kmer: The maximum number of occurences of all 18-mers in the probe sequence in the genome it targets
  • probe_strand: The strand orientation of the probe sequence. Plus (+) or minus (-)
  • transcripts: The number of isoforms that this probe targets

Note: If your target is on the + strand, PaintSHOP will automatically return the - strand probe sequence. This is to ensure your FISH experiment works the way you want!

Also note that the table can be searched, resized, and paged through. For more information about the advanced settings and set balancing features, please read those topic descriptions.

PaintSHOP RNA probe sets

newBalance (RNA)

probe sets for the hg38, hg19, mm10, mm9, dm6, ce11, danRer11, TAIR10, and sacCer3 (new: rn6, galGal5, galGal6) genomes with a length window of 30-37 nucleotides and a Tm of window of 42-47 degrees. These parameters were selected to optimize probe coverage and hybridization. For more information on these new probe sets, please refer to the PaintSHOP manuscript.

OligoMiner (RNA)

the ‘balance’ probe sets generated by OligoMiner for the hg38 and hg19 reference genomes. These probes have a length window of 35-41 nucleotides, and a Tm window of 42-47 degrees. For more information on these probes, please refer to the OligoMiner manuscript.

2012 Oligopaints (RNA)

the original Oligopaints genome-scale probe set from the Beliveau et al. 2012 PNAS publication. The probes have a length of 32 bases, and have an approximate Tm window of 34-42 degrees. For more information on this probe set, please refer to the 2012 Oligopaints publication.

iFISH4U (RNA)

the full 40-mer probe set from iFISH4U.

All probe sets in the RNA Probe Design tab have been intersected with the RefSeq for their respective reference genomes in order to provide quick retrieval of probes for annotations. To retrieve probes target regions outside of these annotations, please use the DNA Probe Design tab.

We have now added a new set of ‘isoform flattened’ probe sets for RNA FISH probe design. These ‘isoform flattened’ annotation sets prioritize shared exonic sequence between isoforms (Methods) in order to maximize the chance of detection and only modestly reduce the coverage of the transcriptome when used for probe intersects. These sets exist with the newBalance probes for the hg38, hg19, mm10, dm6, ce11, danRer11, TAIR10, and sacCer3 (new: rn6, galGal5, galGal6) reference genomes.

DNA Probe Design

Overview of DNA probe design

The RNA Probe Design tab allows you to enter genomic coordinates in order to retrieve the probes for your target or list of targets. You can either enter the coordinates manually into the text box provided, or upload a file with the format shown in the example. If entering coordinates manually, please separate each entry with a comma and a space, as shown in the example entry. Once you have entered your annotations, select which input option you used, and press the Submit button to retrieve your probes.

PaintSHOP will return a density plot of probe coverage and a dynamic table with your probes. The density plot is intended to provide a visual estimate of how well covered your targets are. The table has the following columns:

  • chrom: the chromosome of the probe sequence
  • start: the start coordinate of the probe sequence
  • stop: the stop coordinate of the probe sequence
  • sequence: The DNA sequence of the oligo probe
  • Tm: The melting temperature of the probe sequence
  • on_target: The on-target score generated by the Homology Optimization Pipeline
  • off_target: The off-target score generated by the Homology Optimization Pipeline
  • repeat_seq: Whether or not the probe sequence contains bases flagged as repetitive by RepeatMasker. 0 = False, 1 = True
  • max_kmer: The maximum number of occurences of all 18-mers in the probe sequence in the genome it targets
  • probe_strand: The strand orientation of the probe sequence. Plus (+) or minus (-)

Note: If you specify probe strand (+ or -) in either the manual or file entry, PaintSHOP will return the probe in the same orientation as what you enter. For example specifying + will return +. You are specifying the probe strand, not the target strand.

Also note that the table can be searched, resized, and paged through. For more information about the advanced settings and set balancing features, please read those topic descriptions.

PaintSHOP DNA probe sets

newBalance (DNA)

probe sets for the hg38, hg19, mm10, mm9, dm6, ce11, danRer11, TAIR10, and sacCer3 (new: rn6, galGal5, galGal6) genomes with a length window of 30-37 nucleotides and a Tm of window of 42-47 degrees. These parameters were selected to optimize probe coverage and hybridization. For more information on these new probe sets, please refer to the PaintSHOP manuscript.

OligoMiner (DNA)

the ‘balance’ probe sets generated by OligoMiner for the hg38 and hg19 reference genomes. These probes have a length window of 35-41 nucleotides, and a Tm window of 42-47 degrees. For more information on these probes, please refer to the OligoMiner manuscript.

2012 Oligopaints (DNA)

the original Oligopaints genome-scale probe set from the Beliveau et al. 2012 PNAS publication. The probes have a length of 32 bases, and have an approximate Tm window of 34-42 degrees. For more information on this probe set, please refer to the 2012 Oligopaints publication.

iFISH4U (DNA)

the full 40-mer probe set from iFISH4U.

All probe sets in the DNA Probe Design tab include all probes targeting the entire reference genome that they target. It takes longer to load and search these sets, but probes can be designed against any coordinates in a given genome. You can use the RNA Probe Design for faster retrieval of probes targeting specific RefSeq coordinates.

Advanced Probe Settings

In some cases, you may find that you want to increase the number of probes returned for your targets. PaintSHOP provides a set of advanced features to enable this flexibility. In computational probe design there is an inherent trade-off between probe coverage and specifity. Probe specificity is the likelihood that a probe hybridizes to its intended target instead of at another site in the genome. A greater emphasis on probe specificity inevitably filters probes, reducing coverage. In order to have more control over this design choice, PaintSHOP provides three parameters: 1) repeat inclusion, 2) off-target score, and 3) the maximum k-mer count. These three parameters are described below.

Repeat: RepeatMasker is a program that identifies the presence of repetitive elements in a given DNA sequence. The human genome has been annotated by RepeatMasker. Previous probe design tools have excluded repetitive sequences. PaintSHOP provides the option to allow for probes which contain bases that have been flagged as repetitive, if it is necessary to have enough probes for a target that is challenging to cover. By default, repeat mode is set to off.

Off-Target Score: One important component of PaintSHOP is the Homology Optimization Pipeline (HOP). The pipeline is used when creating new probe sets to create an on-target and off-target score for every probe identified. We have developed a machine learning model to approximate nucleic acid thermodynamics. For the on-target score, the model is used to score the likelihood (0-100) that a probe is likely to hybridize at its intended target. For the off-target score, we start by searching for up to 100 possible alignments for each candidate probe. Any candidate with greater than 100 possible alignments is discarded. Next, we use our model to generate a score for the likelihood of hybridization at each possible site. We sum these scores, generating an off-target score between 0 and 10,000. By default, PaintSHOP sets the default maximum off-target score to 200. The off-target score slider can be used to make this value more or less stringent, depending on the experiment. The probe table and density plot will dynamically update, providing information on how the parameters chosen affect the probe set.

Max K-mer count: A k-mer is a substring of a DNA sequence of length k. We use JELLYFISH to count how many times each 18-mer substring in a given probe occurs in the genome it targets. The maximum value of all 18-mer counts is another way to control probe specifity, and can identify problematic substrings that other alignment approaches may miss. By default, PaintSHOP uses a maximum k-mer count of 5. This can be changed using the slider, and the probe table and density plot will dynamically update.

Note: At any time, the Restore Default Parameters button can be used to reset the default values if you want to remove changes you have made.

Optimizing a Probe Set

In some instances, you may want to even out your probe set by selecting a certain number of probes per target. PaintSHOP offers two features for optimizing a probe set: 1) trim, and 2) unify number. For either option, use the slider to choose how many probes you want per target. The trim option simply ranks the probes for each target by off-target score, and keeps however many you need, removing the rest. The unify number option will behave the same way for targets with enough probes, and will automatically relax stringency parameters for targets without enough probes with the currently selected parameters to return enough probes to meet the target, and trim the probes from targets with too many probes. Importantly, this means that changes from the Advanced Probe Settings section will be overriden using the unify number feature.

Appending Sequences

Overview of appending feature

PaintSHOP provides a suite of functionality for appending the necessary sequences to your probes to carry out your experiment. The following diagram shows where you can append sequences to your probes:

Schematic of a PaintSHOP probe showing with the required homology domain (H) as well as the optional inner (I), bridge (B), and outer (O) domains where sequences can be appended to facilitate design of complex targeting and readout schemes.

Schematic of a PaintSHOP probe showing with the required homology domain (H) as well as the optional inner (I), bridge (B), and outer (O) domains where sequences can be appended to facilitate design of complex targeting and readout schemes.

The probe sequence itself is the Homology Region (H) in the center. If you are going to amplify your probes from an oligo pool using PCR, you can add forward primers to the 5’ I location, and reverse primers to the 3’ I location (I stands for Inner Primer). The next regions where sequences can be added are the bridge regions (B) on both the 5’ and 3’ sides of the probe. Other names for this include read out, ear, and barcode. This sequence can be used as a location for secondary oligos to bind to. Sequences can also be appended to the outer region (O) on the 5’ and 3’ sides. One useful application of this is to include region/target specific primer pairs to be able to amplify only specific portions of your oligo pool at certain times in your experimental workflow. You can also choose to add SABER concatemer sequences to your probes.

Every appending location is optional.

To append sequences, first select whether you used the RNA Probe Design or DNA Probe Design tab. If you want to append sequences to a specific region, switch the radio button for the region from None to Append. Once you choose to append a sequence, a set of options will appear. For example, once the 5’ Outer Primer Sequence is selected, the appending menu will look like this: