I think we’ve discussed this paper before in lab meetings, but just thought I’d pop it here for posterity.
The authors performed an RNA-seq experiment with 48 replicates per condition, identified the significantly differentially expressed genes, and then examined how many of these could be identified using subsets of the data, from 3 randomly selected replicates per condition up to 40 replicates. They also compared 11 different differential expression tools including cuffdiff, edgeR and DESeq2 using this approach. The abstract sums the results up nicely.
With three biological replicates, nine of the 11 tools evaluated found only 20%–40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools.
Long story short – if you only care about the big changes in expression, 3 replicates should be ok. If you’re looking for anything more subtle or want to capture as many results as possible, you’ll need more.