Home

Rosalind #1: Counting DNA Nucleotides

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

Given: A DNA string $s$ of length at most 1000 $nt$ .

Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in $s$ .

🔗 Sample Dataset

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

🔗 Sample Output

20 12 17 21

🔗 Solution

We can verify if two characters are equal with 𝕨 = 𝕩: Equal To.

   'Q' = 'X'
0
   'T' = 'T'
1

Since 𝕨 = 𝕩: Equal To is Pervasive, we can apply it to a list.

   "ACGT" = 'A'
⟨ 1 0 0 0 ⟩
   "ACGT" = 'G'
⟨ 0 0 1 0 ⟩
   "ACGT" = 'Q'
⟨ 0 0 0 0 ⟩

We can abstract “ACGT” = as a Block.

   { "ACGT" = 𝕩 } 'A'
⟨ 1 0 0 0 ⟩
   { "ACGT" = 𝕩 } 'C'
⟨ 0 1 0 0 ⟩
   { "ACGT" = 𝕩 } 'Q'
⟨ 0 0 0 0 ⟩

Or trim off a few characters with 𝕗⊸𝔾 𝕩: Bind Left.

   "ACGT"⊸= 'A'
⟨ 1 0 0 0 ⟩
   "ACGT"⊸= 'C'
⟨ 0 1 0 0 ⟩
   "ACGT"⊸= 'Q'
⟨ 0 0 0 0 ⟩

Given a DNA string like TTC, we can apply our “ACGT-equals” operation to each character with 𝔽¨ 𝕩, 𝕨 𝔽¨ 𝕩: Each.

   "ACGT"⊸=¨ "TTC"
⟨ ⟨ 0 0 0 1 ⟩ ⟨ 0 0 0 1 ⟩ ⟨ 0 1 0 0 ⟩ ⟩
   "ACGT"⊸=¨ "AAA"
⟨ ⟨ 1 0 0 0 ⟩ ⟨ 1 0 0 0 ⟩ ⟨ 1 0 0 0 ⟩ ⟩
   "ACGT"⊸=¨ "QQQ"
⟨ ⟨ 0 0 0 0 ⟩ ⟨ 0 0 0 0 ⟩ ⟨ 0 0 0 0 ⟩ ⟩

We can sum these arrays by applying 𝔽´ 𝕩: Fold to 𝕨 + 𝕩: Add.

   +´ "ACGT"⊸=¨ "TTC"
⟨ 0 1 0 2 ⟩
   +´ "ACGT"⊸=¨ "AAA"
⟨ 3 0 0 0 ⟩
   +´ "ACGT"⊸=¨ "QQQ"
⟨ 0 0 0 0 ⟩

We can now count the instances A, C, G, and T.

   Rosalind1 ← +´=⟜"ACGT"¨
+´=⟜"ACGT"¨
   Rosalind1 "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
⟨ 20 12 17 21 ⟩