[CS] CS50 Week 6: Python

Image for post

Key Points

  • Python basics
  • Demonstration of converting previous assignments from C to Python
  • Files
  • Features that C does not have

Assignment: DNA

I broke down this programme into a few small functions, dealing with the DNA database, the input DNA sequences, and the computation of DNA results respectively.

At the beginning of this programme, it checks the number of command-line arguments first. If the user does not pass 3 arguments on the command line, this programme exits.

If 3 command-line arguments are passed, this programme reads the database by calling function read_db(). The second command-line argument, supposed to be the file name of a database, is passed to this function. read_db() reads the database and save all data to a global variable db_dict.

Then, it gets a list of DNA result by calling function find_dna(). The third command-line argument, supposed to be the file name of a DNA sequence, is passed to the function along with a list of STRs.

With the DNA result, it identifies if any value from DNA database matches the DNA result.

functionread_db() takes an argument. The filename of the database file got back from the command line should be passed into this function.

After opening the CSV file, it reads data from it into a global variable db_dict.

db_dict is a dictionary containing people’s name as keys and their corresponding DNA data list as values except for the first key-value pair, which are column titles of the raw data.

Take the small database file as an example. The dictionary should look like this after reading data into it.

{‘name’: [‘AGATC’, ‘AATG’, ‘TATC’], 
‘Alice’: [‘2’, ‘8’, ‘3’],
‘Bob’: [‘4’, ‘1’, ‘5’],
‘Charlie’: [‘3’, ‘2’, ‘5’]}

This function takes two arguments: an original DNA sequence from the command line and a list of STRs.

The main idea of computing STRs in this programme is to convert each combination of STR into a number (or any single character DNA sequences do not contain). DNA sequences would be much easier to compute after conversion.

For example, AGATC are converted into 0. Sequence 1 looks like this after conversion:

AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGG0000TATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG

In large.csv, there are 8 STRs in large.csv, each of which corresponds to a number. I chose to use 0 to 7 to represent 8 STRs, for it would be easier to manipulate the data later on.

The list if STRs are retrieved from db_dict with key “name”.

A for-loop iterates from 0 to 7. In each iteration, an STR is converted to a number and count_max() is called to get the max number of consecutive repeated time is computed.

I tried converting all the STRs into numbers at once and computed their max numbers of consecutive repeated time together at first, but the consecutive repeated numbers were not computed correctly in some case. I reckon a single character could be viewed as part of different STR combinations, so if I convert all STRs at once, some STRs would be less than it should be as some characters have been converted with other STR combinations. So I only covert 1 STR and compute them at a time.

This function takes two arguments: the formatted DNA sequence in which a specific STR combination is converted, and the number representing that STR.

A for-loop iterates through each character of the formatted DNA sequence. I didn’t find any Python syntax similar to Swift’s if let after a quick Google search. So here isnumeric() is used to check if a character is numeric. If so, it can be safely cast to type Int and be computed then.

Image for post
Image for post

Written by

iOS developer/ Swift/ Objective-C

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store