Skip to content. | Skip to navigation

Personal tools



Plain Text icon IntroUnix_Homework.txt — Plain Text, 1 kB (1,758 bytes)

File contents

Homework for Introduction to Unix

1.  Copy the saccharomyces_cerevisiae.gff from directory /projects/sreadgrp/homeworkfiles/genomes/S288c
	into your Work directory.
	This file has two sections: annotations and sequence data.  We need to split the file into two
	files.  Find the location of the first line of the fasta data. (HINT: you did this already in 
	class using grep).  Place the lines before this position into file Scer.gff and place the lines 
	after this position (including the position) into Scer.fasta within your work directory.

2.  Count number of genes in gff file. 
	Find out which chromosome has the most genes.
	How many of those genes are on the + strand?

	HINT (grep, wc)

3.  Count number of chromosomes in fasta file 

	HINT (grep)

4.  The fasta file contains all the chromosomes.  Split the combined fasta into separate files, 
	one for each of the chromosomes.

	HINT (grep with line-number option, head, tail)

5.  The gff file contains multiple columns of data.  Cut out chromosome, start position, and stop position 
	columns from the gff file to produce a new file Scer.bed in the bed file format.


6.  Change permissions for each of the new chromosome fasta files to be accessible for writing by 
	everyone in your group.


7.  We created a bed file (Scer.bed) in homework #5.  Sort the output in descending order.
	Also sort the file in ascending values, but use the ending position.  Store the ascending
	file into Scer_end.bed for use in #8.


8.  Find the differences between the Scer.bed and Scer_end.bed files.
	HINT(lookup the diff command)

9.  Create a file called README.txt that includes all of the commands for each of these problems

	HINT(history, nano/vim/emacs) 

This is Pacific Theme