Home
🧬

Rosalind #1: Counting DNA Nucleotides

string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."
Given: A DNA string ss of length at most 1000 ntnt.
Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in ss.

🔗 Sample Dataset

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

🔗 Sample Output

20 12 17 21 

🔗 Solution

We can verify if two characters are equal with 𝕨 = 𝕩: Equal To.
   'Q' = 'X'
0
   'T' = 'T'
1
Since 𝕨 = 𝕩: Equal To is Pervasive, we can apply it to a list.
   "ACGT" = 'A'
 1 0 0 0 
   "ACGT" = 'G'
 0 0 1 0 
   "ACGT" = 'Q'
 0 0 0 0 
We can abstract “ACGT” = as a Block.
   { "ACGT" = 𝕩 } 'A'
 1 0 0 0 
   { "ACGT" = 𝕩 } 'C'
 0 1 0 0 
   { "ACGT" = 𝕩 } 'Q'
 0 0 0 0 
Or trim off a few characters with 𝕗⊸𝔾 𝕩: Bind Left.
   "ACGT"= 'A'
 1 0 0 0 
   "ACGT"= 'C'
 0 1 0 0 
   "ACGT"= 'Q'
 0 0 0 0 
Given a DNA string like TTC, we can apply our “ACGT-equals” operation to each character with 𝔽¨ 𝕩, 𝕨 𝔽¨ 𝕩: Each.
   "ACGT"=¨ "TTC"
  0 0 0 1   0 0 0 1   0 1 0 0  
   "ACGT"=¨ "AAA"
  1 0 0 0   1 0 0 0   1 0 0 0  
   "ACGT"=¨ "QQQ"
  0 0 0 0   0 0 0 0   0 0 0 0  
We can sum these arrays by applying 𝔽´ 𝕩: Fold to 𝕨 + 𝕩: Add.
   +´ "ACGT"=¨ "TTC"
 0 1 0 2 
   +´ "ACGT"=¨ "AAA"
 3 0 0 0 
   +´ "ACGT"=¨ "QQQ"
 0 0 0 0 
We can now count the instances A, C, G, and T.
   Rosalind1  +´="ACGT"¨
+´="ACGT"¨
   Rosalind1 "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
 20 12 17 21