Question: You are tasked with implementing the Fitch-Margoliash algorithm to generate an unrooted phylogenetic tree as discussed in class. This algorithm will make use of your
You are tasked with implementing the Fitch-Margoliash algorithm to generate an unrooted phylogenetic tree as discussed in class. This algorithm will make use of your Needleman-Wunsch global alignment algorithm as well as much of Fitch-Margoliash code.Your solution should read through a FASTA file containing N sequences taken as input (-i), calculate a distance matrix, and then utilize the Fitch-Margoliash algorithm to determine the phylogenetic tree. The resulting tree will be written to a output file based on Newick format (see below) as given by the output argument (-o). Your program must be compatible with Python 3.11.7.Your program must be capable of accepting command line arguments as described below: -i o Purpose: This option retrieves the file path to the starting FASTA file. o Input Type: String o Validation: Entire file path must exist, otherwise error. o Required: Yes -o o Purpose: This option retrieves the file path for the output NEWICK file. o Input Type: String o Validation: The directories in the file path must exist, otherwise error. o Required: Yes -s o Purpose: This option retrieves the file path for the scoring matrix. o Input Type: String o Validation: Entire file path must exist, otherwise error. o Required: Yes Example execution: python3 Amruth_Poodipeddi.py -i in.fna -o out.nwk -s BLOSUM50.mtx o Instructs python to run Amruth_Poodipeddi.py o Sets the input file option to in.fna o Sets the output file option to out.nwk o Sets the score matrix file option to BLOSUM50.mtx. The scoring matrix file contains the nucleotide or amino acid scores for each possible pair of bases. Newick File Specifications: Basic Structure 1. Tree Representation a. Trees are represented in a nested set of parentheses, where each pair of parentheses represents a subtree. 2. Leaf Nodes a. Leaf nodes (tips of the tree) are represented by their names. 3. Internal Nodes a. Internal nodes should be named using the original name given in the FASTA file. b. The name appears immediately after the closing parenthesis of the subtree. 4. Branch Lengths a. Branch lengths are denoted by a colon : followed by a number, placed after the name. 5. Semicolon a. Each tree ends with a semicolon ;. Detailed Rules 1. Rooted Trees a. The root is represented by the outermost set of parentheses. 2. Unrooted Trees a. The tree typically starts with a trifurcation (three branches coming out of a single node). 3. Branch Order a. The order of branches within a tree is not significant. 4. Whitespace a. Spaces, tabs, and newlines are ignored and can be used for formatting. 5. Commas a. Commas separate sibling groups within the same set of parentheses. 6. Internal Node Labels a. Labels for internal nodes are optional but, if present, follow the closing parenthesis of the group. 7. Node Names a. Any string without spaces, parentheses, semicolons, or colons can be a node name. 8. Escaping Special Characters a. Special characters in names can be escaped by enclosing the name in single quotes (e.g., 'species one'). 9. Tree Labels a. Trees can optionally have a name or label, which precedes the tree and is followed by an equals sign (e.g., tree1 = (A,B,(C,D));). Example: Rooted tree: ((A:0.1,B:0.2)90:0.3,C:0.4); Unrooted tree: (A:0.1,B:0.2,C:0.3). Begin with writing small trees and gradually increase complexity. Use existing software to load and visualize your Newick formatted trees to ensure they are correctly formatted
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
