Programming

MATH 230 : Fall 2023

Department of Mathematics - SUNY Geneseo
⇐ Back

Homework 10 - More with Files

Due Date: November 29, 2023

Upload

Write your Python script so that when it is executed the answers to all the questions below are printed to the screen. Recall that for each problem, start your solution by using the characters #%% on the first line of your solution. These three characters create a code block in Spyder. The assignment is due by 11:59 pm on the due date.

When opening a text file with the open() function, it is best to include the encoding='utf-8' option, for example:
with open('us-city-coords.txt', 'r', encoding='utf-8') as us_cities_file:
     # do something with us_cities_file

Download the files us-city-coords.txt and us-city-populations.txt, and save them in the same directory where you save the Python script for this homework assignment. The files contain population data on cities across the USA. Below is a description of the contents of each file.

Structure of file us-city-coords.txt

Each line of the file has the following format:

x,y:city:state

Each line contains the name of a city, the state the city is in, and the \((x,y)\) latitude and longitude coordinates of the city. Notice that the data on each line is separated by a colon.

Structure of file us-city-populations.txt

Each line of the file has the following format:

x,y:number1:number2:number3:number4

Each line contains the 2019 population (number1), 2010 population (number2), land area in square miles (number3), and population density (number4) of the city with coordinates \((x,y)\). Each city's coordinates appears once in each file. Notice that number1 and number2 contain commas, and number3 and number4 contain units.

Problems

  1. Using the open() function in Python, open and read the data in the files and create a list called us_cities where each element of the list is a dictionary containing all the data of each city. The key-value pairs of each dictionary should be:

    KeyValue
    nameName of the city
    stateState the city is in
    populationCity's 2019 population (as an int)
    censusCity's 2010 census population (as an int)
    areaCity's land area (as a float)
    densityCity's population density (as a float)
    IMPORTANT: Each file contains the same number of lines (and thus the same number of cities) but the city information on line \(k\) in the first file is not the information for the same city on line \(k\) of the second file. Thus, part of the problem is to match the coordinates data of each file in order to create each dictionary.
  2. Using the us_cities list, find the total population and average population of all the cities. Find these values for both the 2019 population and the 2010 census population.
  3. Using the us_cities list, find the total land area and average population density of all the cities.
  4. How many of the cities in the data are in Florida? To find out, create a new dictionary, called by_state, whose keys are the states in the data and a value is the number of cities in the state. Then, write code that creates a file called by_state.txt where each line of the file contains the state name and number of cities in the state separated by a colon, that is, each line is of the form

    state_name:number_of_cities

    The lines should be in descending order by number_of_cities, that is, the first line should list the state with the highest number of cities in the data and the last line corresponds to the state with the fewest number of cities. Your script should create the file by_state.txt as described above, however, upload only your Python script.

  5. Find the 2019 population distribution of all the cities using the following bins: \begin{align*} I_1&=[100000,199999]\\ I_2&=[200000,299999]\\ I_3&=[300000,499999]\\ I_4&=[500000,999999]\\ I_5&=[1000000,10000000]. \end{align*} Save the distribution in a list \(d = [d_1,d_2,d_3,d_4,d_5]\), that is, \(d_k\) is the percentage of cities in the data whose 2019 population is in the interval \(I_k\).