CLASH Logo

1. What CLASH data is currently available in CLASHub?

CLASHub hosts data from four species: Human, Mouse, Drosophila melanogaster, and Caenorhabditis elegans. Below is the summary of available datasets:

Human

Sample Name wild type (#) Non-targeting sgRNA Control (#) ZSWIM8 Knockout (#) BioProject Number SRR Number
A549 6 6 PRJNA1166120 SRR34738798, SRR34738799, SRR34738800, SRR34738801, SRR34738802, SRR34738803, SRR34738804, SRR34738805, SRR34738790, SRR34738791, SRR34738792, SRR34738793
Colorectal tissue 2 PRJNA1166120 SRR37216684, SRR37216685
D425 3 PRJNA1166120 SRR34757946, SRR34757949, SRR34757950
ES2 3 3 PRJNA1166120 SRR34757940, SRR34757941, SRR34757942, SRR34757943, SRR34757944, SRR34757945
HCT116 5 3 GSE164634, PRJNA1166120 SRR13415087, SRR13415088, SRR13415089, SRR13415090, SRR13415091, SRR34757939, SRR34757947, SRR34757948
HEK293T 8 GSE198250, PRJNA1166120 SRR18281055, SRR18281057, SRR18281067, SRR18281068, SRR34761041, SRR34761042, SRR34761043, SRR34761044
HepG2 3 PRJNA1166120 SRR34783077, SRR34783079, SRR34783080
H1299 3 3 PRJNA1166120 SRR34768260, SRR34768261, SRR34768262, SRR34768263, SRR34768274, SRR34768275
MB002 4 4 PRJNA1166120 SRR34783070, SRR34783071, SRR34783072, SRR34783073, SRR34783074, SRR34783075, SRR34783076, SRR34783078
MDA-MB-231 6 6 PRJNA1166120 SRR30817646, SRR30817647, SRR30817648, SRR30817649, SRR30817650, SRR30817651, SRR34738794, SRR34738795, SRR34738796, SRR34738797, SRR34738806, SRR34738807
OVCAR8 3 3 PRJNA1166120 SRR34768264, SRR34768265, SRR34768266, SRR34768267, SRR34768276, SRR34768277
TIVE-EX-LTC 3 GSE101978 SRR5876947, SRR5876948, SRR5876949
T98G 3 3 PRJNA1166120 SRR34743309, SRR34743310, SRR34743311, SRR34743312, SRR34743317, SRR34743318
U87MG 3 3 PRJNA1166120 SRR34743313, SRR34743314, SRR34743315, SRR34743316, SRR34743319, SRR34743320
501Mel 3 3 PRJNA1166120 SRR34768268, SRR34768269, SRR34768270, SRR34768271, SRR34768272, SRR34768273

Mouse

Sample Name wild type (#) Non-targeting sgRNA Control (#) Zswim8 Knockout (#) BioProject Number SRR Number
HE2.1B 6 GSE124687 SRR8395242, SRR8395243, SRR8395244, SRR8395245, SRR8395246, SRR8395247
MEF 2 2 PRJNA1166120 SRR34793109, SRR34793110, SRR34793111, SRR34793112
Striatal cell 4 4 PRJNA1093144 SRR28497185, SRR28497186, SRR28497189, SRR28497190, SRR2849718, 6SRR28497197, SRR28497198, SRR28497199, SRR28497200
3T12 3 GSE124687 SRR8395248, SRR8395249, SRR8395250
Cortex tissue 8 GSE73058 SRR2413277, SRR2413278, SRR2413282, SRR2413289, SRR2413290, SRR2413300, SRR2413301, SRR2413302
Heart tissue 2 PRJNA1166120 SRR34793107, SRR34793108
Kidney tissue 2 PRJNA1166120 SRR34793105, SRR34793106

Drosophila melanogaster

Sample Name wild type (#) Non-targeting sgRNA Control (#) Dora Knockout (#) BioProject Number SRR Number
S2 cells 3 3 PRJNA896239 SRR22129325, SRR22129327, SRR22129328, SRR22129284, SRR22129287, SRR22129298

Caenorhabditis elegans

Sample Name wild type (#) Non-targeting sgRNA Control (#) Ebax Knockout (#) BioProject Number SRR Number
Embryo 4 4 GSE303817
mid-L4 stage 4 PRJNA328816 SRR3882724, SRR3882949, SRR3882950, SRR3882951

2. What Gene Expression Profile data is available in CLASHub?

Gene Expression Profile from four species: Human, Mouse, Drosophila melanogaster, and Caenorhabditis elegans. Below is the summary of available datasets:

Human

Sample Name wild type (#) Non-targeting sgRNA Control (#) ZSWIM8 Knockout (#) BioProject Number SRR Number
A5497 GSE263036, GSE212057, GSE199309 SRR28535493, SRR28535494, SRR28535495, SRR21237863, SRR21237869, SRR21237879, SRR18462418
D4255 GSE151810, GSE185024, GSE123760 SRR11924485, SRR11924486, SRR16119415, SRR16119416, SRR8315029
ES26 GSE218794, GSE245778 SRR22410790, SRR22410791, SRR22410792, SRR26439462, SRR26439463, SRR26439464
HEK293T7 GSE231583, GSE196043 SRR24421974, SRR24421975, SRR24421976, SRR18074813, SRR18074814, SRR18074815, SRR18074816
Hela7 GSE273634, GSE218727, GSE199309 SRR30058518, SRR30058519, SRR30058520, SRR22407570, SRR22407571, SRR22407572, SRR18462415
HepG25 GSE224980, GSE264010 SRR28685775, SRR28685776, SRR28685777, SRR23387178, SRR23387179
H12994 GSE212057, GSE199309 SRR21237865, SRR21237873, SRR21237881, SRR18462412
K5626 GSE199309, GSE167869 SRR18462409, SRR13800753, SRR13800754, SRR13800737, SRR13800738, SRR13800739
MB0025 GSE229150 GSE261568 SRR28341540, SRR28341541, SRR28341542,SRR28341543
MCF77 GSE195761, GSE178905, GSE163791 SRR17944548, SRR17944549, SRR14915857, SRR14915858, SRR13296901, SRR13296902, SRR13296903
MDA-MB-2316 GSE178532 SRR11544576, SRR11544577, SRR11544578, SRR14870088, SRR14870089, SRR14870090
OVCAR84 GSE246325 SRR26536798, SRR26536799, SRR26536802, SRR26536803
T98G5 GSE112241, PRJNA580150 SRR10358029, SRR10358030, SRR10358031, SRR6881782, SRR6881783
U87MG6 GSE147626, GSE235568 SRR11433766, SRR11433767, SRR11433768, SRR24991947, SRR24991948, SRR24991949
501Mel7 PRJNA515302, GSE104869 SRR8473015, SRR8473019, SRR8473020, SRR6163777, SRR6163778, SRR6163779, SRR6163780

Mouse

Sample Name wild type (#) Non-targeting sgRNA Control (#) Zswim8 Knockout (#) BioProject Number SRR Number
Eye33GSE231447SRR24391488, SRR24391489, SRR24391526, SRR24391480, SRR24391481, SRR24391536
Forebrain33GSE231447SRR24391522, SRR24391523, SRR24391534, SRR24391514, SRR24391515, SRR24391547
Heart33GSE231447SRR24391502, SRR24391503, SRR24391533, SRR24391510, SRR24391511, SRR24391543
Hindbrain33GSE231447SRR24391520, SRR24391521, SRR24391538, SRR24391512, SRR24391513, SRR24391546
Intestine33GSE231447SRR24391494, SRR24391495, SRR24391530, SRR24391486, SRR24391487, SRR24391545
Kidney33GSE231447SRR24391490, SRR24391491, SRR24391531, SRR24391482, SRR24391483, SRR24391539
Liver33GSE231447SRR24391492, SRR24391493, SRR24391527, SRR24391484, SRR24391485, SRR24391540
Lung33GSE231447SRR24391500, SRR24391501, SRR24391532, SRR24391508, SRR24391509, SRR24391542
Muscle33GSE231447SRR24391518, SRR24391519, SRR24391525, SRR24391478, SRR24391479, SRR24391535
Placenta33GSE231447SRR24391516, SRR24391517, SRR24391524, SRR24391476, SRR24391477, SRR24391537
Skin33GSE231447SRR24391496, SRR24391497, SRR24391528, SRR24391504, SRR24391505, SRR24391541
Stomach33GSE231447SRR24391498, SRR24391499, SRR24391529, SRR24391506, SRR24391507, SRR24391544
Embryonic Stem Cell2PRJEB27315ERR2640636, ERR2640637
iNeuron3PRJEB27315ERR2640652, ERR2640653, ERR2640654
MEF3GSE239373SRR25443485, SRR25443484, SRR25443483
Neural Precursor2PRJEB27315ERR2640640, ERR2640641
Striatal cell44PRJNA1093144SRR34804890, SRR34804891, SRR34804892, SRR34804893, SRR34804894, SRR34804895, SRR34804896, SRR34804897

Drosophila melanogaster

Sample Name wild type (#) Non-targeting sgRNA Control (#) Dora Knockout (#) BioProject Number SRR Number
S2 cells53 GSE196837, SRR18048483, SRR18048484, SRR18048425, SRR18048423, SRR18048424, SRR18048427, SRR18048468, SRR18048426
0–4 h Embryos4 GSE196837 SRR18048437, SRR18048436, SRR18048435, SRR18048446
8–12 h Embryos64 GSE196837 SRR18048461, SRR18048433, SRR18048512, SRR18048481, SRR18048482, SRR18048434, SRR18048499, SRR18048531, SRR18048442, SRR18048532
12–16 h Embryos64 GSE196837 SRR18048539, SRR18048525, SRR18048508, SRR18048459, SRR18048432, SRR18048465, SRR18048448, SRR18048497, SRR18048529, SRR18048516
16–20 h Embryos wild type54 GSE196837 SRR18048421, SRR18048538, SRR18048479, SRR18048463, SRR18048527, SRR18048542, SRR18048443, SRR18048495, SRR18048501
Fly Non-targeting Control 3 PRJNA896239 SRR22129292, SRR22129294, SRR22129296

Caenorhabditis elegans

Sample Name wild type (#) Non-targeting sgRNA Control (#) Ebax Knockout (#) BioProject Number SRR Number
Embryos4PRJNA922944SRR23049957, SRR23049959, SRR23049928, SRR23049954
L152GSE68588, GSE262626, GSE267368SRR2010468, SRR2010469, SRR28479534, SRR29013568, SRR29013569, SRR29013570, SRR29013571
L23GSE266398SRR28868053, SRR28868054, SRR28868055
L33PRJNA684142SRR13238604, SRR13238605, SRR13238606
L43PRJNA922944SRR23049963, SRR23049955, SRR23049961
Adult4PRJNA922944, GSE267368SRR23049965, SRR23049966, SRR23049906, SRR23049937

3. What miRNA Expression Profile data is available in CLASHub?

microRNA Expression Profile data from four species: Human, Mouse, Drosophila melanogaster, and Caenorhabditis elegans. Below is the summary of available datasets:

Human

Sample Name Wild Type (#) Non-targeting sgRNA Control (#) ZSWIM8 Knockout (#) BioProject Number SRR Number
A54933GSE163387SRR13264637, SRR13264638, SRR13264639, SRR13264640, SRR13264641, SRR13264642
HEK293T33 GSE123627, GSE158025 SRR12650650, SRR12650651, SRR12650652, SRR12650653, SRR12650654, SRR12650655
HeLa33 GSE123627, GSE163387 SRR13264643, SRR13264644, SRR13264645, SRR13264646, SRR13264647, SRR13264648
K56266 GSE158025, GSE163388 SRR12650656, SRR12650657, SRR12650658, SRR13264707, SRR13264708, SRR13264709, SRR12650659, SRR12650660, SRR12650661, SRR13264710, SRR13264711, SRR13264712
MCF723GSE163388SRR13264649, SRR13264650, SRR13264651, SRR13264652, SRR13264653

Mouse

Sample Name wild type (#) Non-targeting sgRNA Control (#) Zswim8 Knockout (#) BioProject Number SRR Number
Brain33GSE235065SRR24941005, SRR24941026, SRR24940996, SRR24941021, SRR24941036, SRR24941000
Heart33GSE235065SRR24941003, SRR24941027, SRR24940995, SRR24941022, SRR24941035, SRR24940999
Kidney33GSE235065SRR24941001, SRR24940993, SRR24941029, SRR24941011, SRR24941033, SRR24941017
Liver33GSE235065SRR24941004, SRR24940989, SRR24941030, SRR24941010, SRR24941032, SRR24941016
Lung33GSE235065SRR24940992, SRR24940998, SRR24941018, SRR24941008, SRR24941031, SRR24941015
Intestine33GSE235065SRR24941002, SRR24940994, SRR24941028, SRR24941023, SRR24941012, SRR24941034
Neuron32GSE163387SRR13264632, SRR13264633, SRR13264634, SRR13264635, SRR13264636
MEF66GSE163387, GSE158025SRR13264626, SRR13264627, SRR13264628, SRR12650662, SRR12650663, SRR12650664, SRR13264629, SRR13264630, SRR13264631, SRR12650665, SRR12650666, SRR12650667
Stomach33GSE235065SRR24941020, SRR24941009, SRR24940990, SRR24941006, SRR24941025, SRR24941013
Skin33GSE235065SRR24941019, SRR24940991, SRR24940997, SRR24941024, SRR24941007, SRR24941014
Striatal cell 44PRJNA1093144SRR28497187, SRR28497188, SRR28497191, SRR28497192, SRR28497193, SRR28497194, SRR28497195, SRR28497196

Drosophila melanogaster

Sample Name Wild Type (#) Non-targeting sgRNA Control (#) Dora Knockout (#) BioProject Number SRR Number
S2 cells 3 3 GSE163388 SRR13264713, SRR13264714, SRR13264715, SRR13264716, SRR13264717, SRR13264718

Caenorhabditis elegans

Sample Name Wild Type (#) Non-targeting sgRNA Control (#) Ebax Knockout (#) BioProject Number Data Source
Early Embryo22GSE267367SRR29013903, SRR29013904, SRR29013905, SRR29013906
Late Embryo22GSE267367SRR29013899, SRR29013900, SRR29013901, SRR29013902
L144GSE267367SRR29013871, SRR29013872, SRR29013873, SRR29013874, SRR29013895, SRR29013896, SRR29013897, SRR29013898
L222GSE267367SRR29013891, SRR29013892, SRR29013893, SRR29013894
L322GSE267367SRR29013887, SRR29013888, SRR29013889, SRR29013890
L454GSE267367SRR29013866, SRR29013867, SRR29013868, SRR29013869, SRR29013870, SRR29013883, SRR29013884, SRR29013885, SRR29013886
Gravid adult22GSE267367SRR29013879, SRR29013880, SRR29013881, SRR29013882
Glp-422GSE267367SRR29013875, SRR29013876, SRR29013877, SRR29013878

4. How is CLASH data analyzed in CLASHub?

Step 1: Data Upload and Input
CLASHub accepts paired-end FASTQ files or clean single-end FASTA files. Users need to provide minimal information to initiate the analysis.
1.1 Paired-end Adapter Sequences:
5′ Adapter Sequence (default): GATCGTCGGACTGTAGAACT
3′ Adapter Sequence (default): TGGAATTCTCGGGTGCCAAG
1.2 UMI Configuration: Users specify 5′ and 3′ Unique Molecular Identifier (UMI) lengths. Setting both to 0 automatically skips deduplication and UMI-trimming.
1.3 Target species: (e.g., Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans)
1.4 Output file names & Email address

Step 2: Data Preprocessing
CLASHub automatically processes the uploaded data. For paired-end FASTQ files, the preprocessing pipeline includes:
2.1 Adapter Trimming: Adapter sequences are removed using cutadapt (v2.10) .
2.2 Read Merging: Overlapping paired-end reads are merged using PEAR (v0.9.6) .
2.3 Redundancy Collapse & UMI Trimming: If UMIs are present, redundant reads are collapsed using fastx_collapser, and UMIs are trimmed. If UMIs are absent (lengths = 0), this step is bypassed.

Step 3: Genome Mapping & Peak Calling
Cleaned sequences are aligned to the reference genome.
3.1 Downsampling: To prevent memory overload, files exceeding 20 million reads are downsampled prior to mapping.
3.2 Alignment: Reads are aligned using HISAT2, sorted with SAMtools, and converted to BED format.
3.3 Peak Calling: Piranha assesses target site confidence via peak-calling to identify high-confidence binding sites.
3.4 Visualization: BigWig (bw) files are automatically generated for direct inspection of read coverage in genome browsers like IGV.

Step 4: Hybrid Identification
The cleaned data is processed to identify miRNA-target hybrids using:
4.1 hyb: Aligns reads to the reference transcript database using bowtie2.
4.2 Reference Database: Includes Ensembl genome assemblies and mature miRNAs from miRBase.
4.3 Binding Stability: Free energy (ΔG) and pairing patterns are calculated using UNAfold (v3.8).

Step 5: Conservation Score Calculation
Conservation scores assess evolutionary conservation of miRNA binding sites using phyloP tracks from the UCSC Genome Browser (e.g., g38.phyloP100way for human, mm39.phyloP35way for mouse).

Step 6: Output Results
The final output includes an HTML summary report and a detailed results table featuring miRNA Name, Pairing Pattern, Gene Info, Conservation Score, Free Energy, Transcript Annotation, Piranha Peak p-values, and Normalized Hybrid Abundance.

5. How is miRNA AQ-seq data analyzed in CLASHub?

Step 1: Data Upload and Input
Users upload miRNA sequencing data in one of three supported formats:
1.1 Paired-End FASTQ (.gz) or Single-End FASTQ (.gz): Requires adapter sequences.
1.2 Cleaned Single-End FASTA (.gz): Does not require adapter sequences.
1.3 UMI Configuration: For libraries with UMIs (e.g., AQ-seq), specify the UMI length. For standard small RNA-seq libraries (e.g., Illumina TruSeq or NEBNext) lacking UMIs, set lengths to 0.

Step 2: Data Preprocessing
CLASHub processes uploaded data to produce clean FASTA files:
2.1 Adapter Trimming: Adapters are removed using cutadapt.
2.2 Read Merging: For paired-end files, reads are merged using PEAR.
2.3 Redundancy Collapse & UMI Trimming: If UMIs are specified (>0), PCR duplicates are collapsed via fastx_collapser and UMIs trimmed. If UMI lengths are 0, these steps are automatically skipped.

Step 3: miRNA Identification and Quantification
The cleaned data is analyzed for miRNA quantification using CLASHub.py.
3.1 miRNA Mapping: The first 18 nucleotides of each trimmed read are perfectly matched to mature miRNA sequences from miRBase (Release 22.1).
3.2 Quantification: Both total miRNA expression levels and isoform-specific abundances (capturing 3′ variations) are accurately estimated.

Step 4: Output Results
The analysis generates a Total miRNA Table, an Isoform Expression Table, and a Summary HTML Report with key preprocessing and alignment metrics.

6. How is RNA-seq data analyzed in CLASHub?

The RNA-seq pipeline integrates HISAT2, StringTie, and DESeq2, with automated QC, optional Exon-Intron Split Analysis (EISA), and auto-repair mechanisms.

Step 1: Data Upload and Configuration
Users configure Adapter Sequences, UMI lengths (if applicable), Library Type (Stranded vs. Unstranded), and optionally enable EISA to distinguish post-transcriptional regulation.

Step 2: Preprocessing, Alignment & QC
2.1 Auto-Repair: Broken paired-end reads are automatically checked and repaired using repair.sh to maintain read integrity.
2.2 Trimming: Adapters and specified UMIs are removed using Cutadapt.
2.3 Alignment: Reads are aligned to the reference genome using HISAT2. Strand-specific flags are applied based on the library configuration.
2.4 Quality Check: RSeQC calculates read distribution across genomic features to verify library quality.
2.5 Sorting: SAM files are sorted to BAM using SAMtools.

Step 3: Standard Quantification
3.1 Abundance Estimation: StringTie quantifies gene expression using full Ensembl annotations to generate Transcripts Per Million (TPM).
3.2 Count Generation: The prepDE.py3 script extracts raw read counts for differential analysis.

Step 4: EISA Quantification (Optional Add-on)
If enabled, CLASHub performs parallel quantification using custom Exon-only and Intron-only GTF files (with overlapping genes excluded and boundaries masked) to generate separate count matrices for intronic and exonic reads.

Step 5: Differential Expression & Classification
5.1 Standard DE: DESeq2 calculates differential expression.
5.2 EISA Classification: If EISA is selected, changes are classified as Post-transcriptional (exons and introns diverge), Transcriptional (track together), or Ambiguous.

Step 6: Output Files
Outputs include QC reports (HTML), standard DE tables (DESeq2 output), TPM/Count matrices, and—if EISA is enabled—classification tables isolating regulatory mechanisms.

7. How is cumulative fraction curve analysis performed in CLASHub?

Step 1: Data Upload and Input
Users upload a differential gene expression CSV file containing GeneName, BaseMean, and log2FoldChange. A BaseMean threshold (default: 100) filters out low-expression transcripts to ensure robust results.

Step 2: Target Identification
Target genes are classified into two groups:
2.1 CLASH-Derived Targets: Identified via experimental CLASH data (Conserved and All targets).
2.2 TargetScan-Derived Targets: Predicted interactions extracted from TargetScan databases.

Step 3: Curve Generation and Analysis Modes
The tool compares fold change distributions between miRNA targets and non-target genes using two available modes:
3.1 Standard Analysis: Groups targets by broad conservation status.
3.2 Stringent Filtering: Narrows the analysis specifically to the top 25% of high-efficacy targets based on TargetScan Context++ scores, revealing more pronounced repression patterns.
Statistical differences between target groups and background non-targets are quantified via Mann–Whitney U tests.

Step 4: Output Results
Outputs include SVG files of the Cumulative Fraction Curves visually plotting the repression shifts, alongside a comprehensive merged CSV dataset that annotates each gene with its specific target classification (e.g., top 25% Context++, high-confidence CLASH overlaps, or non-targets).