Rosalind #1: Counting DNA Nucleotides
A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.
An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."Given: A DNA string of length at most 1000 .Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in .
🔗 Sample Dataset
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
🔗 Sample Output
20 12 17 21
🔗 Solution
We can verify if two characters are equal with 𝕨 = 𝕩: Equal To.
'Q' = 'X'
0
'T' = 'T'
1
Since 𝕨 = 𝕩: Equal To is Pervasive, we can apply it to a list.
"ACGT" = 'A'
⟨ 1 0 0 0 ⟩
"ACGT" = 'G'
⟨ 0 0 1 0 ⟩
"ACGT" = 'Q'
⟨ 0 0 0 0 ⟩
We can abstract
“ACGT” =
as a Block.
{ "ACGT" = 𝕩 } 'A'
⟨ 1 0 0 0 ⟩
{ "ACGT" = 𝕩 } 'C'
⟨ 0 1 0 0 ⟩
{ "ACGT" = 𝕩 } 'Q'
⟨ 0 0 0 0 ⟩
Or trim off a few characters with 𝕗⊸𝔾 𝕩: Bind Left.
"ACGT"⊸= 'A'
⟨ 1 0 0 0 ⟩
"ACGT"⊸= 'C'
⟨ 0 1 0 0 ⟩
"ACGT"⊸= 'Q'
⟨ 0 0 0 0 ⟩
Given a DNA string like TTC, we can apply our “ACGT-equals” operation to each character with 𝔽¨ 𝕩, 𝕨 𝔽¨ 𝕩: Each.
"ACGT"⊸=¨ "TTC"
⟨ ⟨ 0 0 0 1 ⟩ ⟨ 0 0 0 1 ⟩ ⟨ 0 1 0 0 ⟩ ⟩
"ACGT"⊸=¨ "AAA"
⟨ ⟨ 1 0 0 0 ⟩ ⟨ 1 0 0 0 ⟩ ⟨ 1 0 0 0 ⟩ ⟩
"ACGT"⊸=¨ "QQQ"
⟨ ⟨ 0 0 0 0 ⟩ ⟨ 0 0 0 0 ⟩ ⟨ 0 0 0 0 ⟩ ⟩
We can sum these arrays by applying 𝔽´ 𝕩: Fold to 𝕨 + 𝕩: Add.
+´ "ACGT"⊸=¨ "TTC"
⟨ 0 1 0 2 ⟩
+´ "ACGT"⊸=¨ "AAA"
⟨ 3 0 0 0 ⟩
+´ "ACGT"⊸=¨ "QQQ"
⟨ 0 0 0 0 ⟩
We can now count the instances A, C, G, and T.
Rosalind1 ← +´=⟜"ACGT"¨
+´=⟜"ACGT"¨
Rosalind1 "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
⟨ 20 12 17 21 ⟩